OpenAI has launched a new benchmark that evaluates how well different AI models detect, patch, and even exploit security vulnerabilities found in crypto smart contracts.

OpenAI released the “EVMbench: Evaluating AI Agents on Smart Contract Security” paper on Wednesday, in collaboration with crypto investment firm Paradigm and crypto security firm OtterSec, to evaluate how much the AI agents could theoretically exploit from 120 smart contract vulnerabilities.

Anthropic’s Claude Opus 4.6 came out on top with an average “detect award” of $37,824, followed by OpenAI’s OC-GPT-5.2 and Google’s Gemini 3 Pro at $31,623 and $25,112, respectively.

Detect awards won by AI agents. Source: OpenAI

While AI agents are becoming increasingly efficient at handling basic tasks, OpenAI said it is becoming more important to evaluate their performance in “economically meaningful environments.”

“Smart contracts secure billions of dollars in assets, and AI agents are likely to be transformative for both attackers and defenders.”

“We expect agentic stablecoin payments to grow, and help ground it in a domain of emerging practical importance,” OpenAI added.

Circle CEO Jeremy Allaire predicted on Jan. 22 that billions of AI agents will be transacting with stablecoins for everyday payments on behalf of users within five years, while former Binance boss Changpeng “CZ” Zhao also recently tipped that crypto would end up being the “native currency for AI agents.”

The need to test agentic AI performance in spotting security vulnerabilities comes as attackers stole $3.4 billion worth of crypto funds in 2025, a marginal increase from 2024.

EVMbench drew on 120 curated vulnerabilities from 40 smart contract audits, with most of them sourced from open-source audit competitions. OpenAI said it hopes the benchmark will help track AI progress in spotting and mitigating smart contract vulnerabilities at scale.

Smart contracts weren’t built for humans: Dragonfly

In a post to X on Wednesday, Dragonfly’s managing partner Haseeb Qureshi said crypto’s promise of replacing property rights and legal contracts never materialized, not because the technology failed, but because it was never designed for human intuition.

Qureshi said it still feels “terrifying” to sign large transactions, particularly with drainer wallets and other threats always present, whereas bank transfers rarely provoke the same fear.

Instead, Qureshi believes the future of crypto transactions will be facilitated by AI-intermediated, self-driving wallets, which will take care of those threats and manage complex operations on behalf of users:

“A technology often snaps into place once its complement finally arrives. GPS had to wait for the smartphone, TCP/IP had to wait for the browser. For crypto, we might just have found it in AI agents.”

