Skip to Content

Scaling Security Insights: A 10x Increase in Global Scanning Capacity

12 June 2026 by
TechStora

Introduction to Security Insights

Security Insights provides actionable security recommendations for every Cloudflare account, utilizing regular scans to identify potential security risks and misconfigurations. To achieve this, we perform automated scans for all accounts, zones, and DNS records, analyzing the data to provide valuable insights for our customers. However, two key issues emerged: infrequent scans and opt-in scanning for many free plan accounts, resulting in undetected security risks and unscanned accounts.

The risks of infrequent or nonexistent scans are rising as automated attacks accelerate, making it crucial to detect security misconfigurations in a timely manner. To address this, we aimed to increase scanning frequencies and enable automatic scanning for all accounts, requiring a 10x increase in scanning throughput, from 10 scans per second to 100 per second.

System Challenges and Limitations

Our system was already struggling with its load and scalability, with millions of events filling up our backlog and waiting to be processed. The API was frequently timing out, and our processes were crashing, making it essential to fix and scale our system. We needed to improve our system's performance and increase its capacity to handle the increased scanning throughput.

To achieve this, we had to analyze and understand our system's bottlenecks and limitations, identifying areas that required optimization and improvement. This involved examining our architecture and identifying opportunities for scalability and performance enhancements, ensuring that our system could handle the increased scanning frequency and throughput.

Scanning Process and Architecture

At a high level, our automatic security scans are triggered by a scheduler, which publishes messages to Apache Kafka, an open-source distributed event streaming platform. These messages fan out to a number of checkers, specialized Go applications that perform specific security checks and analyze the results. The checkers then publish their findings to Kafka topics, which are consumed by our processing pipeline, analyzing and storing the results in our database.

Performance Optimization and Scalability

To achieve the required 10x increase in scanning throughput, we had to optimize and improve our system's performance and scalability. This involved identifying and addressing bottlenecks and limitations, improving our architecture, and increasing our capacity to handle the increased scanning frequency and throughput. We also had to ensure that our system could handle the increased load and traffic, scaling our infrastructure to meet the growing demands of our security scanning and processing pipeline.

Results and Achievements

Through our performance optimization and scalability efforts, we were able to increase our scanning throughput by more than 10x, enabling security insights for millions of customers and doubling our scanning frequency for all customers. Our system can now handle the increased load and traffic, providing actionable security recommendations and valuable insights to our customers, helping to build a better Internet for everyone.