FLUIDCLOUD
SOLUTIONS
CLOUD RESILIENCY
READING TIME 8 min · SECTIONS 01—10
OUTAGES
REFRAME
MECHANISM
TIERS
SCENARIOS
ASSESSMENT
PROOF
01
·
SOLUTIONS / CLOUD RESILIENCY
Backups protect your data. Failover redirects your workload. FluidCloud captures your live cloud topology — every VPC, IAM role, route table, and dependency — as production-grade IaC, so you can recover regions, rebuild accounts, and move clouds in minutes, not months. We call that Cloud Cloning™.
ABSTRACT
For two decades, resilience programs have been built around workloads — replicate the database, snapshot the volume, fail the application over. The cluster of outages in 2025 (AWS US-EAST-1 in October, Google Cloud in June, cascading account-level incidents in between) made the structural problem painfully clear: the failures were infrastructure failures, not workload failures. The infrastructure layer those applications depended on failed — DNS, identity, routing, control planes, and account boundaries. This page describes the FluidCloud thesis, the Cloud Cloning™ mechanism, and the three failure domains it covers — to help teams achieve true and continuous resilience.
02
·
The 2025 record
Last year, the cloud went down. A lot.
On October 20, 2025, an internal subsystem monitoring AWS's network load balancers failed. DNS resolution broke for DynamoDB. Within an hour, more than a thousand services were degraded. Snapchat, Fortnite, Coinbase, Robinhood, McDonald's mobile ordering, the UK's HMRC tax website. For fifteen hours, the internet had a noticeably smaller surface area.
Four months earlier, Google Cloud had a similar event. Earlier in the year, Cloudflare did too. Azure has had several. On November 18, the European Supervisory Authorities designated AWS, Azure, and GCP as critical under DORA — formally moving cloud-provider risk onto the regulated firm's balance sheet.
The thing worth noticing isn't that any single provider failed. It's that every one of these failures was an infrastructure failure — control planes, DNS, identity. The workloads were fine. The places they ran weren't.
AWS Post-Incident Analysis
·
October 2025
AWS authentication, DNS, and control-plane services depend on US-EAST-1 endpoints even for workloads nominally deployed elsewhere.
Infrastructure & Control Plane
DNS·IDENTITY·NETWORK·CONTROL PLANES
Different clouds. Same dependency layer. Six major events across four providers. Each caused by failures in the infrastructure that everything else depends on.
Different incidents. Same layer.
03
·
The infrastructure argument
Cloud resilience is an infrastructure problem.
The industry is solving the wrong one.
Thesis
Infrastructure topology, not workloads, determines recovery.
The industry has treated resilience as a workload property: replicate the database, snapshot the disk, fail over the application, document the runbook, and test the procedure once a year. That model made sense when the infrastructure underneath the workload was simple, slow-changing, and assumed to be there when recovery began.
A modern cloud estate is a living topology of VPCs, subnets, IAM roles, route tables, security groups, KMS keys, load balancers, peering links, transit gateways, service endpoints, secrets, parameter stores, policies, and configuration glue. When something fails, the workload is rarely the hardest object to recover. The difficult part is reconstructing the topology underneath it.
Backups restore data. Snapshots restore disks. Almost nothing restores topology. FluidCloud's position is that topology itself is the primary asset of cloud resilience. If it can be scanned, versioned, regenerated, and redeployed on demand, recovery becomes an operational capability rather than a theoretical plan.
Resilience is not a project. It is a continuous reconciliation problem. The estate changes every day, while most recovery plans remain frozen at the moment they were written. The gap between the plan and the live environment is where recovery risk accumulates.
Fact 01
Most DR plans drift out of sync with the live environment within weeks of being written. Security groups, IAM roles, subnets, and routes change faster than runbooks track them.
Fact 02
Drift is the gap that breaks recovery. The longer a plan goes un-reconciled against the live estate, the more recovery becomes guesswork.
Fact 03
Topology failures aren't workload failures. Cells contain service blast radius — not failures in IAM, DNS, routing, or shared control planes.
Fact 04
Active-active pays for idle topology. Regenerate the standby on demand and you stop paying to keep every replica warm.
FluidCloud thesis
Backups protect your data. Nothing protects your topology.
The backup industry covers the top band. The topology underneath it — every VPC, IAM role, and routing rule your workloads depend on — has no restore path today.
04
·
How it works
Discover. Graph. Operate.
Three steps, one portable graph. Same mechanism whether you're cloning for resilience, rebuilding after a compromise, or moving across clouds.
01
STEP
Discover.
Point FluidCloud at any cloud account — agentless, no software to install. A 50,000-resource environment is fully mapped in ~15 seconds. Every VPC, IAM role, security group, secret, and dependency is visible from the outside, using only read-only credentials.
Coverage
AWS · Azure · GCP · OCI · OVH · VMware · Nutanix · Vultr
Throughput
~50k resources · 15s
02
STEP
Graph.
Every resource becomes a node, every dependency a typed edge, enriched with cost, tags, usage, and policy. The LIM (Large Infrastructure Model) reads this graph and emits 99%+ accurate, production-grade Terraform — reviewable, version-controllable, and portable. IaC is an output of the graph, not the mechanism.
Output
Terraform · OpenTofu · stateful
Fidelity
99%+ LIM accuracy · reviewable
03
STEP
Operate.
Apply the graph anywhere — a different region, a different account, a different cloud, or back into the same place with corrections. Any prior state is regenerable on demand. Time Machine for Infrastructure runs against any versioned snapshot of the topology.
Targets
any region / account / cloud
RTO
minutes · full or partial environments
The scan is read-only. IAM is the only credential needed. Nothing leaves your account during the scan.
See a real scan in 15 seconds
05
·
Cloud Cloning™
One mechanism. Three resilience tiers.
Same source topology, three target environments. Click each tier to see what changes. The interaction is the argument: one primitive, three classes of disaster it answers.
The fastest BC/DR pattern, with topology provably identical to primary.
Resilience without paying to keep a second copy warm.
The old way is active-active — a second environment running 24/7, billed 24/7, to cover an event that may never come. Because the console, API, MCP, and CLI run at 100% feature parity, the scan is something you can trigger, not just click. Capture continuously, run nothing idle, and regenerate the topology the moment a signal fires.
- 01SIGNALTelemetry alert, threshold, or scheduleDatadog · CloudWatch · cron
- 02TRIGGERAPI · MCP · CLI100% parity with the console
- 03CAPTUREOn-demand scan + topology capture~15s · read-only · versioned graph
- 04DEPLOYRegenerate the topologyif & when you need it · not before
07
·
Risk, mapped
The risks you plan for.
Answered one way.
Every resilience program is a list of scenarios it's meant to survive. Here's that list — and the single mechanism that answers each one.
Clone the topology into another region or cloud and redeploy in minutes — not a runbook.
Reconstitute the full topology in a clean account — IAM, segmentation, and policy intact.
Time Machine regenerates any prior known-good state of the topology, on demand.
A cross-cloud clone is the exit — signed, repeatable, and audit-ready on request.
Scan it on day one; clone into your sanctioned account structure by week two.
Clone into a sovereign region or local cloud with the topology preserved.
08
·
Self-assessment
How regenerable is your infrastructure today?
Five questions. Under two minutes. No scoring theater — a posture read on where you stand against the failure modes this page has described.
It measures capability, not intent — the gap between what your DR plan says will happen and what actually would if a production account went dark tonight.
09
·
The companies already running it
Customer voice. Industry validation.
Infrastructure and platform teams across fintech, regulated industry, and high-scale cloud operations are running Cloud Cloning™ in production.
Customer voice · evaluation
"This isn't just a migration tool — the platform naturally supports rollback, inventory, and multi-cloud visibility use cases."
Lead Architect
enterprise security company · neutral attribution pending authorization
Customer voice · platform
"Our last cross-cloud migration took eleven months. The next one took a Tuesday afternoon."
VP Platform Engineering
regulated healthcare · authorized for use
Customer outcomes
90%
reduction in migration effort
Vultr · production deployment
10,000
VMs moved on deadline
enterprise cloud migration
~3 mo
7-year AWS estate migrated
financial services customer
Forrester · 2026
Forrester's 2026 resilience research distinguishes infrastructure resilience from workload resilience, validating the structural premise here.
DORA · 11/18/2025
The ESA designated AWS, Azure, and GCP as critical ICT providers — moving cross-cloud exit strategy from preference to regulatory requirement under Articles 28–31.
DRJ · 2026 trends
The Disaster Recovery Journal's 2026 trends research identifies multi-cloud fragmentation and demonstrated resilience as the two structural shifts reshaping DR.
10
·
Go deeper
Four ways to
read further.
For the reader who's not ready to talk yet. Long-form essay, structural postmortem, concept video, regulatory briefing.
ESSAY · 12 MIN
The thesis, in long form.
Read
POSTMORTEM · 9 MIN
October 2025, structurally.
Read
CONCEPT VIDEO · 5 MIN
Cloud Cloning™ in five minutes.
Watch
BRIEFING · 14 MIN
DORA and the exit-strategy mandate.
Read
11
·
Your move
Point us at one account.
The fastest way to understand what we do is to let us scan a single cloud account and see what comes back. The scan takes about fifteen seconds, runs read-only, and produces a topology graph you can keep — whether or not you ever talk to us again.
