SAN FRANCISCO — June 3 — Fourteen leading organizations in blockchain and artificial intelligence, including Cyber, EigenLayer, Sentient, and others, today announced the formation of the Crypto AI Benchmark Alliance (CAIBA). The community-led initiative aims to establish transparent standards for evaluating AI models and agents in the crypto ecosystem.
The founding members — Alchemy, Cyber, EigenLayer, Goldsky, IOSG, LazAI, Magic Newton, Metis, MyShell, OpenGradient, RootData, Sentient, Surf, and Thirdweb — are contributing datasets, tools, and domain knowledge to build out a benchmarking framework. Each benchmark will include tasks, reference answers, and grading scripts, published on platforms like GitHub and Hugging Face under open licenses when permitted.
AI models are playing a growing role in crypto, powering everything from trading tools to research assistants. Yet most existing AI benchmarks don’t account for the specific demands of the crypto industry. CAIBA aims to close that gap by developing crypto-specific benchmarks.
“Transparent, rigorous testing is essential,” said Ryan Li, co-founder of Cyber. “Models must not only answer correctly but also act reliably so users can make decisions with confidence.”
The alliance’s first release, a benchmark for Crypto AI Agents (CAIA), is now live. CAIA evaluates AI on three key areas:
Knowledge: Accurately answering protocol- and token-related questions.
Planning: Mapping out multi-step tasks.
Action: Using tools like block explorers and APIs to complete tasks.
CAIA includes tasks related to tokenomics, onchain analysis, project research, and transaction workflows. Models being evaluated include general-purpose LLMs like GPT-4o, Claude 4, Gemini 2.5, and DeepSeek-1, as well as crypto-native models.
CAIBA’s goal is to make AI in crypto more trustworthy by creating open, domain-specific benchmarks that reflect how these tools are actually used. By testing models on real-world tasks, the alliance establishes a shared standard for evaluating performance in a crypto context.
Additional benchmarks are already in development, and the alliance is open to new contributors. Developers, researchers, and protocols can get involved by submitting models for evaluation or proposing new tasks.
About the Crypto AI Benchmark Alliance (CAIBA)
The Crypto AI Benchmark Alliance is a community-governed initiative that sets standards for evaluating AI model performance in crypto-specific contexts. Through open datasets, reproducible tasks, and public leaderboards, CAIBA provides tools to help developers, researchers and protocols measure and improve AI systems used in blockchain applications. For more information, visit caiba.ai.