Written by Arijit Sarkarformer writerReviewed by Felix Ngstaff editor

Grok-3 outperforms all AI models in benchmark test, xAI claims

Latest NewsPublishedFeb 18, 2025

Grok-3, the latest AI model from xAI, outperformed ChatGPT, Gemini and DeepSeek in a blind AI evaluation, achieving a record-breaking score, according to xAI’s internal analysis.

Update (Feb. 19, 8:34 am UTC): This article has been updated to clarify that LMArena has not independently confirmed whether Grok-3’s ranking represents a significant breakthrough over competitors.

An earlier version of the newly launched Grok-3, an AI large language model (LLM), has beat rival AI systems from Google, OpenAI and DeepSeek in a community-driven blind evaluation.

On Feb. 18, Elon Musk announced xAI’s latest AI model release, Grok-3, during a livestream on X. In the discussion, the xAI team revealed it had released an early Grok-3 version on LMarena under the alias “chocolate” for community testing.

Source: LMArena

Grok-3 tops multiple AI performance metrics

Chatbot Arena, a community-driven AI evaluation platform, allows users to compare AI models in a blind test by ranking responses from two anonymous chatbots. According to its website, the platform has recorded over a million votes from users.

According to xAI’s internal comparison of AI models, Grok-3 scored at least 10 points more than its biggest competitors — ChatGPT o3mini, o1, Deepseek-R1 and Gemini-2 Flash Thinking — in math, science and coding.

Comparison between Grok-3 and other AI models. Source: xAI

Grok-3 dominates AI chatbots across all categories

LMArena also noted that the early Grok-3 model currently ranks first in all categories, including overall with style control, hard prompts and hard prompts with style control, coding, math, creative writing, instruction following, longer query and multi-turn.

Grok-3’s performance across all the top categories. Source: LMArena

Musk and the xAI team reiterated LMArena’s finding that the early Grok-3 model — codenamed chocolate — achieved a record milestone of 1400 score. “And it’s still climbing. So we have to keep updating it. It’s 1400 and climbing,” Musk said.

LMArena has not independently confirmed whether Grok-3’s ranking represents a significant breakthrough over competitors or if external factors, such as audience demographics, may have contributed to the model’s ranking.

Elon Musk prepares Grok-powered Tesla Bots for space exploration

Further into the announcement, Musk revealed plans to send a Tesla Bot, powered by xAI’s artificial intelligence model Grok, on SpaceX’s next Mars mission by the end of 2026.

During a discussion, he revealed that most of SpaceX’s projects for Mars exploration are slated for around Q4 2026.

He explained that the Earth-Mars transit window occurs every 26 months, making November 2026 the next ideal opportunity for rocket launches to the Red Planet.

Source: xAI

Musk also said he may be sending a Tesla Bot and Grok on the Mars mission:

“If all goes well, SpaceX will send Starship rockets to Mars with Optimus robots and Grok.”

Grok-3 engineer exits upon ultimatum

On Feb. 12, an xAI engineer quit over an X post in which he had ranked Grok-3 lower than ChatGPT, sharing his personal opinion prior to the model’s release.

Source: Benjamin DeKraker

“I either had to delete the post quoted below or face being fired, DeKraker wrote, adding:

“After reviewing everything and thinking a lot, I’ve decided that I’m not going to delete the post -- which is very clearly a harmless personal opinion.”

Magazine: Korea to lift corporate crypto ban, beware crypto mining HDs: Asia Express

Subscribe to daily byte-sized crypto news from Cointelegraph

Cointelegraph is committed to independent, transparent journalism. This news article is produced in accordance with Cointelegraph’s Editorial Policy and aims to provide accurate and timely information. Readers are encouraged to verify information independently.

Grok-3 outperforms all AI models in benchmark test, xAI claims

Grok-3 tops multiple AI performance metrics

Grok-3 dominates AI chatbots across all categories

Elon Musk prepares Grok-powered Tesla Bots for space exploration

Grok-3 engineer exits upon ultimatum

Subscribe to daily byte-sized crypto news from Cointelegraph

More on the subject

Bitcoin ETFs extend outflow streak as BTC fails to hold $65K

Bitcoin lows pierce $63K as Asia chip-stock crash spreads to Wall Street

Markets eye Bank of Japan meeting on Friday as yen repeats 40-year US dollar lows

Bitcoin ETFs extend outflow streak as BTC fails to hold $65K

Bitcoin lows pierce $63K as Asia chip-stock crash spreads to Wall Street

Markets eye Bank of Japan meeting on Friday as yen repeats 40-year US dollar lows

BNY to bring transfer agency records onchain in blockchain push

Bitcoin price wedged into ‘most divided’ FOMC as Iran war spikes oil prices 8%

The 100x obsession: Fundamentals grow in importance as crypto matures

UK policy sprint finds cross-border payments are stablecoins’ top use case

Hungary repeals crypto checks as first MiCA license is granted