Yesterday, Cloudflare experienced a network outage that affected large parts of its service. The incident began in the morning, when users trying to access customer sites started seeing these error pages. Ironically, even DownDetector, the site many run to for outage updates, was down. Cloudflare said the problem was not caused by an attack but by a change to its database system’s permissions. This change caused a file used by its Bot Management system to double in size, exceeding the software’s limits and triggering errors.
The file in question is updated every few minutes to help the system detect automated traffic. A query on Cloudflare’s ClickHouse database cluster generated duplicate data, gradually affecting all parts of the network. The error caused HTTP 5xx status codes to appear for traffic passing through the affected modules.
Initial investigations misidentified the cause as a large-scale DDoS attack. Teams quickly corrected this and replaced the faulty file with a previous version. Core traffic largely returned to normal in the afternoon, and all systems were fully operational by close of business UTC time.
Christina Kosmowski, CEO of LogicMonitor, says: “The Cloudflare outage was a gut punch.
One misstep and suddenly the digital world stalls. Apps freeze. Services go dark. It’s a wake-up call. Not just for IT, but for every leader betting their business on the cloud.
“We’ve built modern economies on invisible infrastructure. Layers of APIs, services, and platforms that work beautifully… until they don’t.
“Here’s the hard truth: Every outage is a visibility issue. If you can’t see what’s happening across vendors, clouds, and systems in real time, you’re flying blind. And when things break — and they will — you’re not recovering. You’re reacting.
“Resilience isn’t about who can reboot the fastest. It’s about who sees the signal before the system flatlines. The companies that lead through failure? They don’t guess. They know, instantly, where the issue is and what to do next.
“In a hybrid, AI-powered world, visibility isn’t a nice-to-have. It’s the whole ballgame.
You can’t control every outage.
“But you can control how clearly you see it and how confidently you respond.”
Which Services Were Affected?
The outage affected multiple services, including Cloudflare’s core CDN, Workers KV, Access, Turnstile, and parts of its Dashboard. Users attempting to log in to the Dashboard often faced errors because Turnstile, Cloudflare’s authentication system, failed.
Workers KV returned elevated levels of 5xx errors as the core proxy system struggled. Cloudflare Access saw widespread authentication failures, and some configuration updates propagated slowly. Email Security faced a temporary reduction in spam-detection accuracy, but there was no critical impact on customers.
Services on the newer FL2 proxy engine experienced HTTP 5xx errors, while those on the older FL proxy engine did not see errors but received incorrect bot scores. Customers using rules to block bots saw false positives during the incident.
Fadl Mantash, Chief Information Security Officer, Tribe Payments said: “Today’s Cloudflare outage shows how vulnerable the digital economy has become. When a single upstream provider experiences issues, the impact doesn’t stay contained; it cascades across industries, touching everything from social media platforms to e-commerce checkouts and backend payment services.
“Payments are particularly exposed. The infrastructure behind a single transaction relies on a chain of cloud platforms, processors, third-party APIs, authentication tools, and card schemes. When any link in that chain fails, the entire journey can break. It’s the same pattern we saw during last year’s CrowdStrike incident: the initial issue wasn’t in payments, yet payments were among the most visible casualties.
“This is exactly why resilience can’t start at the moment of crisis. The payments industry needs to adopt the ‘prepper’ mindset – building modular systems that isolate faults, rehearsing failure scenarios, and ensuring teams know precisely how to respond when something goes down. This also reflects the importance of adhering to robust frameworks in our day-to-day activities. As a highly regulated industry, the many compliance frameworks provide critical guarantees that cover not just security, but also resilience against incidents.
“Resilience is one side of the foundational information security triad: confidentiality, integrity, and availability. Companies need to make all three principles their ‘bread and butter’. By ensuring the confidentiality of sensitive data, the integrity of transactions, and the availability of services even during disruptions, we can build a more secure and trustworthy financial ecosystem.”
How Was The Problem Resolved?
Cloudflare first bypassed Workers KV and Access to reduce the impact at 13:05. Teams then focused on restoring a known good version of the Bot Management configuration file. By 14:30, the file was deployed globally, and most services returned to normal.
The remaining errors were addressed as services restarted and traffic flows stabilised. Cloudflare confirmed that all systems were functioning normally by 17:06 UTC.
What Lessons Is Cloudflare Taking?
This outage was its worst since the one in 2019, affecting the majority of its core traffic. Teams are now working on ways to harden systems against future failures. Measures include better handling of configuration files, adding global kill switches for features, and reviewing error conditions across core proxy modules.
The company apologised to customers and the wider Internet community for the disruption caused by the outage.
More from News
- Microsoft, NVIDIA And Anthropic Set Up New Partnership
- UK Ranked As Europe’s Most Flexible Place For Work, Here’s Why
- Perplexity Voted The No. 1 “Most Likely To Fail” Startup At A Major Tech Summit
- Peter Thiel Exits Nvidia And Tesla: Are We Closer To The AI Bubble Burst Than We Think?
- Apple Makes Several Changes On App Store, Including Blocking AI
- Venture Speak Easy At Slush 2025: Podcast And Drinks Hosted By TRMNL4, F1V, Meta, Solidgate and Oyster
- Have Consumers Lost Faith In Black Friday Sales?
- Reports Show The Biggest Barrier For UK Startup Success, Here’s What They Found
Rob Demain, CEO at e2e-assure spoke on the outage: “Cloudflare provides a number of critical website availability and cyber security services as part of its shield that organisations rely on and can also act as an alternate to VPNs, so many organisations use it for ‘secure remote access’ and zero trust as well as protecting their websites. When it goes down, the impact is immediate and widespread.
“It’s technically very difficult to add a ‘circuit breaker’ due to the way these services work, e.g. a bypass would drop the security that they rely on and workarounds are undesirable. If Cloudflare is unreachable, those websites and services that rely on it are then faced with users being unable to connect to their underlying web servers. Outages like this typically stem from one of three things: DNS issues, BGP routing problems, or a configuration change gone wrong.
“Cloudflare is designed to ensure business continuity, yet outages like this result in quite the opposite with no backup or alternative when things go wrong. These systems are architected with strict uptime guarantees and are never supposed to go offline, but in reality, this is not the case. it’s not very easy to have two content delivery network providers, though organisations may now look into this.
“Whilst Cloudflare is aware and investigating, what is likely an enormous global traffic backlog will be building, so we could be waiting a while for things to fully recover. Given that Cloudflare provides a DNS service offered by NCSC (P-DNS), that can be used as a ‘secure DNS’ feed that filters out known bad websites, providing a valuable security function by blocking bad websites, the wider impact could be hugely significant.
“Cloudflare is a large, U.S company and when issues like this occur it highlights how dependant the U.K. is on U.S cloud providers, who offer services for economic prosperity and cyber security. This is a reminder of how fragile our digital systems can be and how much we rely on just a few key players to keep the internet running smoothly.”
Jano Bermudes, Chief Operations Officer at global cybersecurity consultancy, CyXcel also shared comments: “Today’s Cloudflare outage, and the resulting internet disruption, underscores just how dependent businesses are on cloud infrastructure providers and the inherent risk of single points of failure in cloud-based systems. However, unlike the relatively prolonged AWS outage, Cloudflare resolved the issue swiftly, demonstrating strong preparedness and effective client communications.
“This incident is a stark reminder of the dangers of centralisation. Organisations must go beyond basic resilience measures and rethink dependency models by adopting multi-region architectures, robust failover strategies, and comprehensive contingency planning. Resilience isn’t just a technology challenge, it’s a governance, risk management, and operational continuity imperative.
“Multi-cloud strategies can help reduce reliance on a single provider and mitigate systemic risk. However, they introduce complexity and demand careful planning. While multi-cloud is not a silver bullet, when implemented with clear governance and interoperability standards, it can significantly strengthen resilience without adding unnecessary risk.
“Business continuity planning should be a priority. This includes automated failover systems, distributed architectures, and well-defined incident response protocols. Regular testing of these measures ensures critical operations can continue even during major outages. Ultimately, resilience comes from preparation, not reaction.”
How Were Startups And Businesses Affected?
Forrester principal analyst Brent Ellis commented on the impact this outage would have had on businesses and how it highlights the issue of concentration risk: “The Cloudflare outage is not explicitly caused or linked to the AWS or Azure outages last month, but like those failures, it shows the impact of concentration risk. In this case, the 3 hour, 20 minute outage could have direct and indirect losses of around $250 million to $300 million when you consider the cost of down-time and the downstream effects of services like Shopify or Etsy that host the stores for tens to hundreds of thousands of businesses.
“Being resilient from failures like this means learning what type of outages that service provider might be vulnerable to and then architecting failover measures. Sadly, resilience isn’t free and businesses will need to decide if they want to make the investment in alternative service providers and failover solutions. Some industries, like financial services, must already address these concerns as part of regulation. Given the high profile of cloud related outages recently, I expect operational resilience regulation might spread outside the financial sector.”
Eileen Haggery, AVP at NETSCOUT also commented, saying: “In the wake of several major internet outages in recent weeks, today’s Cloudflare outage reveals the relative fragility of the underlying technology that connects us. Modern networks are more distributed, complex, and reliant on third-party services than ever, making it difficult to identify issues and restore services without the right visibility. Unfortunately, disruptions can and do happen to all types of organisations, including the world’s best providers, with the best technology and systems designed and architected to state-of-the-art levels.
“In the wake of a major network outage, organisations may pause, take stock of the business impact, and evaluate their own networks to determine how they can prevent, avoid, or rapidly respond to a similar situation. Organisations can’t stop things from breaking in global service provider environments, but they can build resilience into their own environment and processes. Recent outages have highlighted the need for incident readiness processes that, much like fire drills, require regular practice, rehearsal, and refinement. True observability, which helps understand not just what is broken but why and where, is essential to greater resiliency. This helps organisations understand who to call and what to expect from vendors to limit the impact of outages.”
The following startups reported to us how exactly the outage affected their companies…
Argentum AI
![]()
“When Cloudflare goes down, the entire internet feels it. Yesterday’s outage impacting banks, airlines, e-commerce, SaaS tools, and countless enterprise workflows is another reminder of a larger systemic weakness:
“Our digital world is over-centralized. A single network chokepoint can disrupt global commerce in minutes.
“Cloudflare operates as one of the internet’s backbones. When it fails, large chunks of the modern economy become unreachable. This isn’t a Cloudflare problem it’s a centralization problem.
“It’s a preview of the risks ahead.
“Global AI infrastructure cannot depend on a few centralized chokepoints.
“Argentum AI was built on the opposite model: a decentralized marketplace of compute with no single choke point, no single vendor dependency, and no single network whose failure can take the ecosystem down.”
JobLeads
![]()
“We definitely had plenty of disruptions of our work today, in pretty much every department. All our LLM tools like Claude, Perplexity, and ChatGPT became slow or unreachable, which stalled our AI workflows for our developers, data analysts, and marketers who use APIs of these tools.
“At the same time, Zoom calls began dropping or failing to connect across the company, and parts of Microsoft 365, our main toolset, crashed. This included Power BI where we store and track the most important data that is critical to update as fast as possible.”