Cointelegraph
Arijit Sarkar
Written by Arijit Sarkar,Former Staff Writer
Felix Ng
Reviewed by Felix Ng,Staff Editor

Grok-3 outperforms all AI models in benchmark test, xAI claims

Grok-3, the latest AI model from xAI, outperformed ChatGPT, Gemini and DeepSeek in a blind AI evaluation, achieving a record-breaking score, according to xAI’s internal analysis.

Grok-3 outperforms all AI models in benchmark test, xAI claims
News

Update (Feb. 19, 8:34 am UTC): This article has been updated to clarify that LMArena has not independently confirmed whether Grok-3’s ranking represents a significant breakthrough over competitors.

An earlier version of the newly launched Grok-3, an AI large language model (LLM), has beat rival AI systems from Google, OpenAI and DeepSeek in a community-driven blind evaluation.

On Feb. 18, Elon Musk announced xAI’s latest AI model release, Grok-3, during a livestream on X. In the discussion, the xAI team revealed it had released an early Grok-3 version on LMarena under the alias “chocolate” for community testing.

Source: LMArena

Grok-3 tops multiple AI performance metrics

Chatbot Arena, a community-driven AI evaluation platform, allows users to compare AI models in a blind test by ranking responses from two anonymous chatbots. According to its website, the platform has recorded over a million votes from users.

According to xAI’s internal comparison of AI models, Grok-3 scored at least 10 points more than its biggest competitors — ChatGPT o3mini, o1, Deepseek-R1 and Gemini-2 Flash Thinking — in math, science and coding.

Bot, United States, Space, Elon Musk

Comparison between Grok-3 and other AI models. Source: xAI

Grok-3 dominates AI chatbots across all categories

LMArena also noted that the early Grok-3 model currently ranks first in all categories, including overall with style control, hard prompts and hard prompts with style control, coding, math, creative writing, instruction following, longer query and multi-turn.

Grok-3’s performance across all the top categories. Source: LMArena

Musk and the xAI team reiterated LMArena’s finding that the early Grok-3 model — codenamed chocolate — achieved a record milestone of 1400 score. “And it’s still climbing. So we have to keep updating it. It’s 1400 and climbing,” Musk said.

LMArena has not independently confirmed whether Grok-3’s ranking represents a significant breakthrough over competitors or if external factors, such as audience demographics, may have contributed to the model’s ranking.

Elon Musk prepares Grok-powered Tesla Bots for space exploration

Further into the announcement, Musk revealed plans to send a Tesla Bot, powered by xAI’s artificial intelligence model Grok, on SpaceX’s next Mars mission by the end of 2026.

During a discussion, he revealed that most of SpaceX’s projects for Mars exploration are slated for around Q4 2026. 

He explained that the Earth-Mars transit window occurs every 26 months, making November 2026 the next ideal opportunity for rocket launches to the Red Planet.

Source: xAI

Musk also said he may be sending a Tesla Bot and Grok on the Mars mission:

“If all goes well, SpaceX will send Starship rockets to Mars with Optimus robots and Grok.”

Grok-3 engineer exits upon ultimatum

On Feb. 12, an xAI engineer quit over an X post in which he had ranked Grok-3 lower than ChatGPT, sharing his personal opinion prior to the model’s release.

Source: Benjamin DeKraker

“I either had to delete the post quoted below or face being fired, DeKraker wrote, adding:

“After reviewing everything and thinking a lot, I’ve decided that I’m not going to delete the post -- which is very clearly a harmless personal opinion.”

Magazine: Korea to lift corporate crypto ban, beware crypto mining HDs: Asia Express

Cointelegraph is committed to independent, transparent journalism. This news article is produced in accordance with Cointelegraph’s Editorial Policy and aims to provide accurate and timely information. Readers are encouraged to verify information independently. Read our Editorial Policy https://cointelegraph.com/editorial-policy