FLUIDCLOUD

SOLUTIONS

CLOUD RESILIENCY

READING TIME 8 min ·  SECTIONS 01—10

OUTAGES

REFRAME

MECHANISM

TIERS

SCENARIOS

ASSESSMENT

PROOF

01

·

SOLUTIONS / CLOUD RESILIENCY

Cloud resilience
is an infrastructure
problem.

Cloud resilience
is an 
infrastructure
problem.

Cloud resilience is an infrastructure problem.

Backups protect your data. Failover redirects your workload. FluidCloud captures your live cloud topology — every VPC, IAM role, route table, and dependency — as production-grade IaC, so you can recover regions, rebuild accounts, and move clouds in minutes, not months. We call that Cloud Cloning™.

ABSTRACT

For two decades, resilience programs have been built around workloads — replicate the database, snapshot the volume, fail the application over. The cluster of outages in 2025 (AWS US-EAST-1 in October, Google Cloud in June, cascading account-level incidents in between) made the structural problem painfully clear: the failures were infrastructure failures, not workload failures. The infrastructure layer those applications depended on failed — DNS, identity, routing, control planes, and account boundaries. This page describes the FluidCloud thesis, the Cloud Cloning™ mechanism, and the three failure domains it covers — to help teams achieve true and continuous resilience.

01Data
Backup · replication · snapshots
02Workloads
Failover · runbooks · active-active
03Infrastructure
Topology · IAM · network · policy
Topologymade resilientbyFluidCloud.
50,000Resources mapped in 15 seconds
99%+LIM Terraform accuracy
8Cloud providers supported
70–90%Faster migration timelines
<5 minFull environment clone
01Data
Backup · replication · snapshots
02Workloads
Failover · runbooks · active-active
03Infrastructure
Topology · IAM · network · policy

Topology made resilient by FluidCloud.

50,000Resources mapped in 15 seconds
99%+LIM Terraform accuracy
8Cloud providers supported
70–90%Faster migration timelines
<5 minFull environment clone

02

·

The 2025 record

Last year, the cloud went down. A lot.

On October 20, 2025, an internal subsystem monitoring AWS's network load balancers failed. DNS resolution broke for DynamoDB. Within an hour, more than a thousand services were degraded. Snapchat, Fortnite, Coinbase, Robinhood, McDonald's mobile ordering, the UK's HMRC tax website. For fifteen hours, the internet had a noticeably smaller surface area.

Four months earlier, Google Cloud had a similar event. Earlier in the year, Cloudflare did too. Azure has had several. On November 18, the European Supervisory Authorities designated AWS, Azure, and GCP as critical under DORA — formally moving cloud-provider risk onto the regulated firm's balance sheet.

The thing worth noticing isn't that any single provider failed. It's that every one of these failures was an infrastructure failure — control planes, DNS, identity. The workloads were fine. The places they ran weren't.

AWS Post-Incident Analysis

·

October 2025

AWS authentication, DNS, and control-plane services depend on US-EAST-1 endpoints even for workloads nominally deployed elsewhere.

MAY 2025Azure ADIdentity · IAM
JUN 2025Google CloudAuth · IAM
JUL 2025CloudflareDNS · CDN edge
SEP 2025Azure StorageRegion unavailable
OCT 20, 2025AWS US-EAST-1DNS · 15+ hrs
NOV 18, 2025DORAAWS, Azure, GCP
The shared layer

Infrastructure & Control Plane

DNS·IDENTITY·NETWORK·CONTROL PLANES

Different clouds. Same dependency layer. Six major events across four providers. Each caused by failures in the infrastructure that everything else depends on.

Different incidents. Same layer.

03

·

The infrastructure argument

Cloud resilience is an infrastructure problem.

The industry is solving the wrong one.

Thesis

Infrastructure topology, not workloads, determines recovery.

The industry has treated resilience as a workload property: replicate the database, snapshot the disk, fail over the application, document the runbook, and test the procedure once a year. That model made sense when the infrastructure underneath the workload was simple, slow-changing, and assumed to be there when recovery began.

A modern cloud estate is a living topology of VPCs, subnets, IAM roles, route tables, security groups, KMS keys, load balancers, peering links, transit gateways, service endpoints, secrets, parameter stores, policies, and configuration glue. When something fails, the workload is rarely the hardest object to recover. The difficult part is reconstructing the topology underneath it.

Backups restore data. Snapshots restore disks. Almost nothing restores topology. FluidCloud's position is that topology itself is the primary asset of cloud resilience. If it can be scanned, versioned, regenerated, and redeployed on demand, recovery becomes an operational capability rather than a theoretical plan.

Resilience is not a project. It is a continuous reconciliation problem. The estate changes every day, while most recovery plans remain frozen at the moment they were written. The gap between the plan and the live environment is where recovery risk accumulates.

Fact 01

Most DR plans drift out of sync with the live environment within weeks of being written. Security groups, IAM roles, subnets, and routes change faster than runbooks track them.

Fact 02

Drift is the gap that breaks recovery. The longer a plan goes un-reconciled against the live estate, the more recovery becomes guesswork.

Fact 03

Topology failures aren't workload failures. Cells contain service blast radius — not failures in IAM, DNS, routing, or shared control planes.

Fact 04

Active-active pays for idle topology. Regenerate the standby on demand and you stop paying to keep every replica warm.

FluidCloud thesis

Backups protect your data. Nothing protects your topology.

A · WHAT GETS PROTECTEDRESTORES

The bytes. A $30B industry of backup, replication, and continuous data protection covers the data that workloads produce.

DATABASEVOLUMESOBJECT STORESNAPSHOTS
COVERED BY
VeeamCommvaultRubrikCohesityVeritasDruva
B · WHAT DOESN'T⚠ BY HAND

The place the bytes land: the connective tissue under every workload. Backups restore bytes; almost nothing restores topology. This is FluidCloud.

VPCSUBNETSROUTE TABLESNAT GATEWAYSPEERINGTRANSIT GWIAM ROLESTRUST POLICIESSTSFEDERATIONORG STRUCTURESCPsSECURITY GROUPSNACLSKMS KEYSSECRETSPARAM STOREDNS / R53VPC ENDPOINTSLOAD BALANCERSWAF RULESLOG ROUTESTAGSQUOTAS
COVERED BY
recover.shrebuild-vpc.shRUNBOOK.mdtribal knowledge

The backup industry covers the top band. The topology underneath it — every VPC, IAM role, and routing rule your workloads depend on — has no restore path today.

04

·

How it works

Discover. Graph. Operate.

Three steps, one portable graph. Same mechanism whether you're cloning for resilience, rebuilding after a compromise, or moving across clouds.

01

STEP

Discover.

Point FluidCloud at any cloud account — agentless, no software to install. A 50,000-resource environment is fully mapped in ~15 seconds. Every VPC, IAM role, security group, secret, and dependency is visible from the outside, using only read-only credentials.

Coverage

AWS · Azure · GCP · OCI · OVH · VMware · Nutanix · Vultr

Throughput

~50k resources · 15s

02

STEP

Graph.

Every resource becomes a node, every dependency a typed edge, enriched with cost, tags, usage, and policy. The LIM (Large Infrastructure Model) reads this graph and emits 99%+ accurate, production-grade Terraform — reviewable, version-controllable, and portable. IaC is an output of the graph, not the mechanism.

Output

Terraform · OpenTofu · stateful

Fidelity

99%+ LIM accuracy · reviewable

03

STEP

Operate.

Apply the graph anywhere — a different region, a different account, a different cloud, or back into the same place with corrections. Any prior state is regenerable on demand. Time Machine for Infrastructure runs against any versioned snapshot of the topology.

Targets

any region / account / cloud

RTO

minutes · full or partial environments

The scan is read-only. IAM is the only credential needed. Nothing leaves your account during the scan.

See a real scan in 15 seconds

05

·

Cloud Cloning™

One mechanism. Three resilience tiers.

Same source topology, three target environments. Click each tier to see what changes. The interaction is the argument: one primitive, three classes of disaster it answers.

The fastest BC/DR pattern, with topology provably identical to primary.

USE CASESActive-active · active-passive · regional failover
CLOUD · AWSSHAREDACCOUNT · 411-prodSHAREDREGION · us-east-1REGION · us-west-2CLONEPRIMARYSTANDBYCROSSES  REGION BOUNDARY  ·  cloud + account intact
Failover RTO~4m 12s
Topology driftcontinuously reconciled
Cost vs warm−68% standby idle
PLUS · TIME MACHINE FOR INFRASTRUCTUREAny of the three tiers, against any prior state of the topology — not just the current one. Forensics, audit holds, drift forensics, rollback.
A fourth pattern · event-driven

Resilience without paying to keep a second copy warm.

The old way is active-active — a second environment running 24/7, billed 24/7, to cover an event that may never come. Because the console, API, MCP, and CLI run at 100% feature parity, the scan is something you can trigger, not just click. Capture continuously, run nothing idle, and regenerate the topology the moment a signal fires.

  1. 01SIGNALTelemetry alert, threshold, or scheduleDatadog · CloudWatch · cron
  2. 02TRIGGERAPI · MCP · CLI100% parity with the console
  3. 03CAPTUREOn-demand scan + topology capture~15s · read-only · versioned graph
  4. 04DEPLOYRegenerate the topologyif & when you need it · not before

07

·

Risk, mapped

The risks you plan for.
Answered one way.

Every resilience program is a list of scenarios it's meant to survive. Here's that list — and the single mechanism that answers each one.

Risk scenarioWhat actually breaksThe FluidCloud response
Provider or region outageUS-EAST-1 · GCP control plane
Control planes, DNS, and identity go dark — workloads deployed “elsewhere” fail anyway.

Clone the topology into another region or cloud and redeploy in minutes — not a runbook.

Account compromise or ransomwarecredential theft · lockout
The account is locked or encrypted; IAM and network posture can no longer be trusted.

Reconstitute the full topology in a clean account — IAM, segmentation, and policy intact.

Configuration drift & bad deployssilent divergence · faulty change
Standby quietly diverges from prod; a bad change breaks the live topology with no clean state to fall back to.

Time Machine regenerates any prior known-good state of the topology, on demand.

Regulatory exit mandateDORA Art. 28–31
You must prove a tested exit from a critical provider — a capability, not a binder on a shelf.

A cross-cloud clone is the exit — signed, repeatable, and audit-ready on request.

Inherited or unverified estateM&A · handover
You own a topology you've never seen and a security posture you can't independently verify.

Scan it on day one; clone into your sanctioned account structure by week two.

Sovereignty & data residencyin-region · in-country
A mandate lands: this workload has to run in a specific region or country, soon.

Clone into a sovereign region or local cloud with the topology preserved.

08

·

Self-assessment

How regenerable is your infrastructure today?

Five questions. Under two minutes. No scoring theater — a posture read on where you stand against the failure modes this page has described.

The posture read

It measures capability, not intent — the gap between what your DR plan says will happen and what actually would if a production account went dark tonight.

5questions~2min4postures
Where will you land?Capability →
01Hopeful
02Partial
03Operational
04Regulated

09

·

The companies already running it

Customer voice. Industry validation.

Infrastructure and platform teams across fintech, regulated industry, and high-scale cloud operations are running Cloud Cloning™ in production.

Customer voice · evaluation

"This isn't just a migration tool — the platform naturally supports rollback, inventory, and multi-cloud visibility use cases."

Lead Architect

enterprise security company · neutral attribution pending authorization

Customer voice · platform

"Our last cross-cloud migration took eleven months. The next one took a Tuesday afternoon."

VP Platform Engineering

regulated healthcare · authorized for use

Customer outcomes

90%

reduction in migration effort

Vultr · production deployment

10,000

VMs moved on deadline

enterprise cloud migration

~3 mo

7-year AWS estate migrated

financial services customer

Forrester · 2026

Forrester's 2026 resilience research distinguishes infrastructure resilience from workload resilience, validating the structural premise here.

DORA · 11/18/2025

The ESA designated AWS, Azure, and GCP as critical ICT providers — moving cross-cloud exit strategy from preference to regulatory requirement under Articles 28–31.

DRJ · 2026 trends

The Disaster Recovery Journal's 2026 trends research identifies multi-cloud fragmentation and demonstrated resilience as the two structural shifts reshaping DR.

10

·

Go deeper

Four ways to
read further.

For the reader who's not ready to talk yet. Long-form essay, structural postmortem, concept video, regulatory briefing.

11

·

Your move

Point us at one account.

The fastest way to understand what we do is to let us scan a single cloud account and see what comes back. The scan takes about fifteen seconds, runs read-only, and produces a topology graph you can keep — whether or not you ever talk to us again.

4900 Hopyard Rd, Suite 240
Pleasanton, CA 94588

PRODUCT

Cloud Cloning™

Shift-Left DR

Control Plane

Native Translation

Large Infrastructure Model

USE CASES

Cloud to Cloud

Region to Region

Account to Account

RESOURCES

Docs

Blog

Changelog

Newsroom

Podcasts

COMPANY

About

Customers

Careers

Security

Contact

©2026 FluidCloud Inc.

Terms

Privacy

Cookies

Status · Operational

4900 Hopyard Rd, Suite 240
Pleasanton, CA 94588

PRODUCT

Cloud Cloning™

Shift-Left DR

Control Plane

Native Translation

Large Infrastructure Model

USE CASES

Cloud to Cloud

Region to Region

Account to Account

RESOURCES

Docs

Blog

Changelog

Newsroom

Podcasts

COMPANY

About

Customers

Careers

Security

Contact

©2026 FluidCloud Inc.

Terms

Privacy

Cookies

Status · Operational

4900 Hopyard Rd, Suite 240
Pleasanton, CA 94588

PRODUCT

Cloud Cloning™

Shift-Left DR

Control Plane

Native Translation

Large Infrastructure Model

USE CASES

Cloud to Cloud

Region to Region

Account to Account

RESOURCES

Docs

Blog

Changelog

Newsroom

Podcasts

COMPANY

About

Customers

Careers

Security

Contact

©2026 FluidCloud Inc.

Terms

Privacy

Cookies

Status · Operational

4900 Hopyard Rd, Suite 240
Pleasanton, CA 94588

PRODUCT

Cloud Cloning™

Shift-Left DR

Control Plane

Native Translation

Large Infrastructure Model

USE CASES

Cloud to Cloud

Region to Region

Account to Account

RESOURCES

Docs

Blog

Changelog

Newsroom

Podcasts

COMPANY

About

Customers

Careers

Security

Contact

©2026 FluidCloud Inc.

Terms

Privacy

Cookies

Status · Operational