Can AI Agents Change How We Detect Malware?

Microsoft has introduced an autonomous agent called Project Ire which has already completed a rare task in cybersecurity. The tool successfully detected an active hacking group through its ability to reverse engineer files, even when those files had not been seen before.

The agent works with large language models and specialised cybersecurity tools to classify malware. The idea is to lessen the load on human analysts, who often face hundreds of files each day. In trials, Project Ire was tested against known hacking samples and harmless Windows drivers.

It correctly flagged 90% of files and recorded a false positive rate of only 2%. One file it spotted was a kernel-level rootkit, flagged because of suspicious features such as process termination and a command and control system.

Researchers at Microsoft described this ability to blindly reverse engineer files as the “gold standard in malware classification.” It is also the first time a reverse engineering system at Microsoft has built enough of a case against an advanced persistent threat to justify its automatic blocking in Windows Defender.
 

How Did Project Ire Perform In Larger Tests?

 

In bigger experiments, Project Ire was given 4,000 files that Microsoft’s automated systems could not classify. Normally, human reverse engineers would have had to review them. The new agent achieved a precision score of 0.89, meaning 90% of the files it marked as malicious were genuine threats. It also had a recall score of 0.26, which means it identified about 25% of all malware in the group.

Microsoft researchers said that this performance came without the agent having seen any of these files during training. Other automated tools built by Microsoft could not classify the files at all. Project Ire’s ability to operate on its own without pre-existing knowledge marks it out as a different kind of AI tool for malware work.

The project was built through collaboration between Microsoft Research, Microsoft Defender Research, and Microsoft Discovery and Quantum. Classification has long been hard to automate, as malicious code can look similar to harmless code. Project Ire manages this through multi-level reasoning and the ability to call open source decompilers and tools.

 

 

How Does The Agent Work?

 
Each time Project Ire looks at a file, it begins with a triage process that captures details about the file’s structure and possible origin. From there, it reverse engineers the file’s control flow graph, using open source frameworks like angr and Ghidra. This allows the agent to track execution paths and functions.

Through API calls, it can summon extra tools to examine particular parts of the code. Each finding becomes part of an auditable chain of evidence that human analysts can review. A built-in validator tool then cross-checks claims against knowledge provided by expert reverse engineers during the system’s development. This chain makes the findings more reliable and transparent.

 

What About AI In Product Design?

 
While Project Ire deals with malware, another experiment at Microsoft is testing AI agents in design work. A tool called Lacuna was created during a hackathon to surface hidden assumptions in product documents. Built using Copilot Studio and Azure AI Foundry, Lacuna scans documents such as requirements lists or vision decks and identifies the beliefs within them.

These beliefs can be simple assumptions like “users want this feature” or “this word means the same to everyone.” The system looks for signals such as speculative verbs or vague nouns. It then ranks these assumptions through three lenses: impact, confidence, and reversibility. The outcome is a list of risks and suggestions for how to test them, such as surveys or A/B testing.

 

Can Lacuna Change Product Thinking?

 
Lacuna is not designed to fix cognitive bias but to make it visible. Product teams often work with confirmation bias or overconfidence without realising it. Lacuna acts as a collaborator, pointing out where beliefs might need evidence before they harden into code or roadmaps.

The creators describe the system as a way of encouraging reflection rather than output. It cannot push deadlines or provide testing resources, but it creates awareness of the hidden scaffolding behind product decisions. This makes it easier for teams to ask questions before committing to a direction.

Microsoft’s experiments with both Project Ire and Lacuna show different ways that AI agents can act as partners in human work. One investigates hidden threats in code, the other reveals hidden beliefs in design. Both demonstrate how AI can operate as an assistant that notices details people might miss.