Quality Control Bug Blamed by CrowdStrike for Global Windows Outage Causing Update
A cyber security company has vowed to enhance its error-handling processes and improve future updates.
CrowdStrike, an organization known for cyber security, attributed a major global IT disruption to a bug in their quality control software. This bug allowed faulty data to be sent to millions of computers running Microsoft Windows in an update.
CrowdStrike regularly deploys configuration updates for its Falcon Sensor product, which is a software suite designed to monitor and safeguard computers against threats.
These updates are distributed in two ways. The first method is called “Sensor Content,” which directly updates the Falcon Sensor and operates at a high level of system resource access. The second approach is “Rapid Response Content,” which updates the sensor’s malware detection behavior for quick adaptation to evolving threats.
However, a faulty file made its way into a Rapid Response Content update released on the morning of July 19 due to a flaw in CrowdStrike’s quality control software.
The review revealed that while CrowdStrike conducts both automated and manual tests on sensor content, they placed excessive trust in the Content Validator for Rapid Response Content, which had previously functioned without issues.
The assumption that the Rapid Response Content update wouldn’t cause problems led to the Falcon Sensor loading the faulty update. This triggered an out-of-bounds memory read error, which occurs when a program tries to access data outside of permissible memory boundaries. This led to an exception that caused a crash in the Windows operating system, according to the review.
Following the incident, CrowdStrike, headquartered in California, experienced a significant decline in stock value. The company promised to improve its critical content update issuance process.
Specifically, CrowdStrike announced plans to introduce a “staggered deployment strategy” for future updates, deploying them initially to a small number of machines before a global rollout, known as a “canary deployment” in the industry.
The company also pledged to enhance error handling in the Content Interpreter and involve human testing for Rapid Response Content, add additional validation checks to the content validator, and offer customers control over the timing and location of these updates.
In a statement following the outages, George Kurtz, the company’s founder and CEO, emphasized, “Nothing is more important to me than the trust and confidence that our customers and partners have put into CrowdStrike. As we resolve this incident, you have my commitment to provide full transparency on how this occurred and steps we’re taking to prevent anything like this from happening again.”