AI Finds What Humans Missed: OpenAI’s o3 Spots Linux Zero-Day

A zero-day vulnerability in the Linux kernel’s SMB (Server Message Block) implementation, identified as CVE-2025-37899, has been discovered using OpenAI’s powerful language model, o3. The vulnerability is a use-after-free flaw located in the logoff command handler of the ksmbd kernel module.

Security researcher Sean H. documented the process in a detailed technical blog. He had initially set out to audit ksmbd, a Linux kernel module responsible for implementing the SMB3 protocol. While intending to take a break from large language model (LLM) tools, curiosity led him to benchmark the capabilities of o3, a new AI model from OpenAI.

Rather than using complex frameworks or automation tools, Sean leveraged only the o3 API to analyze targeted code sections. During this process, o3 successfully unearthed CVE-2025-37899, a zero-day vulnerability in the Linux kernel. The model identified a scenario where shared objects between concurrent server connections led to unsafe memory access—specifically, a use-after-free situation in the SMB ‘logoff’ command handler.

Technical Breakdown of CVE-2025-37899

The issue arises when one thread processes an SMB2 LOGOFF request and frees the sess->user object while another thread may still be using it. This occurs without proper synchronization mechanisms, which can lead to dereferencing of freed memory, opening doors to kernel memory corruption or arbitrary code execution.

The vulnerability exploits a subtle interaction between SMB session handling and Linux kernel memory management:

Multiple connections may bind to the same SMB session.
One thread (Worker-B) handling a LOGOFF request frees the session’s user object (ksmbd_free_user(sess->user)).
Another thread (Worker-A), still processing requests using the same session, continues accessing sess->user, now pointing to freed memory.

Depending on timing, this results in a traditional use-after-free exploit or a null pointer dereference, leading to system crashes or privilege escalation.

Comparative Performance: o3 vs. Other Models

Interestingly, o3 also rediscovered CVE-2025-37778, another use-after-free vulnerability that Sean had previously identified manually. This bug resides in the Kerberos authentication path during SMB session setup. The AI detected this bug in 8 out of 100 runs, while OpenAI’s Claude Sonnet 3.7 managed only 3 detections in 100 tries, and Claude 3.5 failed to detect it altogether.

These results reflect both the promise and current limitations of AI-assisted vulnerability research. o3 showed notable capability but also returned a high false positive rate—about 28 out of 100 attempts. Still, with a true positive to false positive ratio of around 1:4.5, the model proved useful enough to warrant serious consideration in practical workflows.

Lessons from o3’s Analysis

One of the most insightful takeaways from o3’s analysis of CVE-2025-37899 was its understanding of concurrency in kernel operations. The model successfully reasoned through non-trivial control flow paths and object lifecycle management under concurrent execution—something even experienced researchers may overlook, especially under time pressure.

What’s more compelling is that o3 sometimes offered better remediation advice than its human counterpart. For example, in addressing CVE-2025-37778, Sean had initially suggested setting sess->user = NULL after freeing it. However, o3 identified that such a fix might be insufficient due to the SMB protocol allowing multiple connections to bind to a session.

Conclusion

Large language models are not yet a replacement for expert analysts. o3’s success in identifying complex flaws highlights its ability to augment human expertise, streamline analysis, and extend the reach of automated security tools. Though the experiment revealed limitations in processing large codebases, it also highlighted the model’s effectiveness in targeted scans and the importance of developing tools to manage false positives and intelligently structure input.

Source: Read More

Error’d: Pickup Sticklers

From Prompt To Partner: Designing Your Custom AI Assistant

Microsoft unveils reimagined Marketplace for cloud solutions, AI apps, and more

Design Dialects: Breaking the Rules, Not the System

Building personal apps with open source and AI

What Can We Actually Do With corner-shape?

Craft, Clarity, and Care: The Story and Work of Mengchu Yao

Cailabs secures €57M to accelerate growth and industrial scale-up

Using phpinfo() to Debug Common and Not-so-Common PHP Errors and Warnings

Using phpinfo() to Debug Common and Not-so-Common PHP Errors and Warnings

Mastering PHP File Uploads: A Guide to php.ini Settings and Code Examples

The first browser with JavaScript landed 30 years ago

AI Finds What Humans Missed: OpenAI’s o3 Spots Linux Zero-Day

Technical Breakdown of CVE-2025-37899

Comparative Performance: o3 vs. Other Models

Lessons from o3’s Analysis

Conclusion

Using phpinfo() to Debug Common and Not-so-Common PHP Errors and Warnings

Mastering PHP File Uploads: A Guide to php.ini Settings and Code Examples

Starlink Outage Sparks Cyberattack Speculation—But SpaceX Says Software to Blame

China-Linked Hackers Exploit SAP and SQL Server Flaws in Attacks Across Asia and Brazil

Performance Analysis with Laravel’s Measurement Tools

Yes, you can edit video like a pro on Linux – here are my 4 go-to apps

How to Use AI to Enhance Your WordPress Blog

State Management in React with Jotai

Mastering API Integration in React Native: A Step-by-Step Success Blueprint🔗

The evolution of five of Adobe’s iconic icons

AI Finds What Humans Missed: OpenAI’s o3 Spots Linux Zero-Day

Technical Breakdown of CVE-2025-37899

Comparative Performance: o3 vs. Other Models

Lessons from o3’s Analysis

Conclusion

Related Posts