CrowdStrike today released its Root Cause Analysis (RCA) of the faulty software update that crippled 8.5 million Windows machines on July 19, and also outlined changes it will make in the wake of the devastating outage.
The 12-page CrowdStrike Root Cause Analysis report provides a deeper explanation than CrowdStrike’s Preliminary Post-Incident Review (PIR) that was released five days after the massive global outage that could lead to $15 billion in largely uncovered losses for CrowdStrike customers. The outage has led to shareholder and customer legal action – and threats and counterthreats between CrowdStrike and Delta Airlines over the airline’s lengthy recovery from the outage, which took yet another turn today when Microsoft joined the fray.
CrowdStrike Root Cause Analysis Details Extra Input Parameter Field
One interesting new revelation in the root cause report is that the initial cause of the error occurred back in February when CrowdStrike released sensor version 7.11, which included a new Template Type for Windows interprocess communication (IPC) mechanisms. IPC Template Instances are delivered as Rapid Response Content to sensors via a corresponding Channel File numbered 291.
The new IPC Template Type defined 21 input parameter fields, but the integration code that invoked the Content Interpreter with Channel File 291’s Template Instances supplied only 20 input values to match against. The parameter count mismatch “evaded multiple layers of build validation and testing,†CrowdStrike said, due in part to the use of wildcard matching criteria for the 21st input during testing and in the initial IPC Template Instances.
On July 19, two additional IPC Template Instances were deployed, one of which introduced a non-wildcard matching criterion for the 21st input parameter.
“These new Template Instances resulted in a new version of Channel File 291 that would now require the sensor to inspect the 21st input parameter,†CrowdStrike said. “Until this channel file was delivered to sensors, no IPC Template Instances in previous channel versions had made use of the 21st input parameter field. The Content Validator evaluated the new Template Instances, but based its assessment on the expectation that the IPC Template Type would be provided with 21 inputs.
“Sensors that received the new version of Channel File 291 carrying the problematic content were exposed to a latent out-of-bounds read issue in the Content Interpreter. At the next IPC notification from the operating system, the new IPC Template Instances were evaluated, specifying a comparison against the 21st input value. The Content Interpreter expected only 20 values. Therefore, the attempt to access the 21st value produced an out-of-bounds memory read beyond the end of the input data array and resulted in a system crash.â€
CrowdStrike pledged a half-dozen changes in the wake of the global outage:
Validating the number of input fields in the Template Type at sensor compile time
Correcting for a runtime array bounds check that was missing for Content Interpreter input fields on Channel File 291
Template Type testing covering a wider variety of matching criteria
Template Instance validation expanding to include testing within the Content Interpreter
Staged deployment for template instances, including customer control over rollout
Windows Kernel Driver Usage Addressed
CrowdStrike also noted that it moves kernel driver functions to less-sensitive user space as those capabilities evolve.
“As new versions of Windows introduce support for performing more of these security functions in user space, CrowdStrike updates its agent to utilize this support,†the company said. “Significant work remains for the Windows ecosystem to support a robust security product that doesn’t rely on a kernel driver for at least some of its functionality. We are committed to working directly with Microsoft on an ongoing basis as Windows continues to add more support for security product needs in user space.â€
Kurtz Apologizes as Microsoft Enters Delta Battle
CrowdStrike also released a statement from CEO and founder George Kurtz on the outage remediation page in conjunction with the report’s release.
“We are deeply sorry for the impact this had on you,†says the statement from Kurtz. “Nothing is more important than regaining your trust and confidence. Since our founding, we have always put customer protection at the forefront. This has been our North Star, and it continues to be our focus every single day.â€
But before the incident is completely behind the company, a lengthy legal battle may yet play out.
Microsoft entered the fray today, saying that Delta’s longer outage than its peers appeared to be due to non-Microsoft systems. “In fact, it is rapidly becoming apparent that Delta likely refused Microsoft’s help because the IT systems it was most having trouble restoring – its crew-tracking and scheduling system – was being serviced by other technology providers, such as IBM, because it runs on those providers’ systems, and not Microsoft Windows or Azure,†attorney Mark Cheffo wrote to Delta’s lawyers on behalf of Microsoft.
“Microsoft empathizes with Delta and its customers regarding the impact of the CrowdStrike incident. But your letter and Delta’s public comments are incomplete, false, misleading, and damaging to Microsoft and its reputation,†Cheffo said, noting the company will “vigorously defend itself in any litigation if Delta chooses to pursue that path.â€
Source: Read More