OpenAI o1 models vs. GPT-4o models: What’s different?

Key takeaways

The o1 models are advanced AI systems designed for complex reasoning tasks. They utilize a step-by-step chain-of-thought process to handle challenging problems.
The o1 models excel in complex reasoning tasks and benchmarks, such as advanced academic and science questions, whereas GPT-4o remains strong for general language tasks and quick responses.
The o1 models are available in preview and mini versions, with specific access limits and pricing. GPT-4o remains widely accessible and more affordable.
AI reasoning models like o1 are advancing toward better context understanding and complex problem-solving, promising more sophisticated interactions and decision-making capabilities in the future.

Artificial intelligence is evolving fast, and OpenAI is leading the way with its innovative large language models (LLMs). You’ve probably heard of GPT-4, but now there’s something new: OpenAI’s much-anticipated o1 model release, known as Strawberry. So, what’s the difference, and why is everyone talking about it?

GPT-4 is a powerhouse for generating text, answering questions and holding conversations. It’s great at language tasks, but the new o1 models bring something more: better reasoning and context handling.

With time, these models’ deeper problem-solving abilities will enable them to provide smarter, more accurate responses.

These models are in the early stages, which is why OpenAI is calling its release of the o1 models a “preview.” It’s important to remember that it’s just the beginning — OpenAI might come up with more surprises.

Let’s explore what the OpenAI o1 models are, how OpenAI o1-preview and OpenAI o1-mini work, how to access them, and how o1 models differ from GPT-4.

What are OpenAI o1 models? o1-preview and o1-mini, explained

OpenAI’s o1 models are designed to handle complex reasoning tasks. These advanced AI systems, released on Sept. 12, “think” through problems, generating a detailed internal process before answering, unlike other models that just respond quickly.

The chain of thought works internally, and this reasoning ability makes them good at subjects like math, coding and science. According to OpenAI, these models can perform well in competitive programming, ranking high in math competitions and even outperforming PhD-level experts on some scientific topics.

Currently, there are two versions available:

o1-preview: Currently available as a preview. Designed for challenging problems that require broad knowledge.
o1-mini: A faster, more affordable version ideal for coding, math and science tasks without requiring much general knowledge.

You might think that o1 models render GPT-4 models outdated. Well, that’s not true. GPT-4 models still work better for tasks needing quick answers or image inputs. But if your project mandates heavy reasoning, o1 could be a better fit!

Did you know? The o1-preview and o1-mini models lack several advanced features, including memory, custom instructions and web browsing. For these tools, you’ll need to use GPT-4.

What’s new and different about OpenAI’s o1 models?

Learning about AI models is fascinating, isn’t it? Let’s explore some unique facts about OpenAI’s o1 models:

Advanced chain-of-thought reasoning: The o1 models use a step-by-step reasoning process, breaking down problems into sequential steps before answering users’ prompts.
Improved academic performance: The o1 models yield improved academic performance, scoring in the 89th percentile on Codeforces and ranking in the top 500 in the USA Mathematical Olympiad qualifiers (see image below).
Enhanced safety: These models ensure safer use in sensitive areas by performing well in testing for prohibited content and resisting jailbreaks. The o1-preview, in particular, has a high security rating because of its enhanced logic and compliance with ethical rules.
Reduce hallucination rates: In AI, the creation of erroneous or unsupported information is referred to as a hallucination. By employing sophisticated reasoning and a methodical thought process, OpenAI’s o1 models reduce these errors and aim to produce more accurate results.
Rigorous red teaming for enhanced safety: OpenAI’s o1 models underwent extensive red teaming and evaluations before deployment to meet high safety and ethical standards. Red teaming in LLMs involves testing AI by simulating attacks or tricky prompts to uncover vulnerabilities and ensure safety.

Performance evaluation of o1 models for various academic exams

Did you know? Jailbreaking AI models involve bypassing their safety features to produce harmful outputs. This growing security risk is addressed by advanced models like OpenAI’s o1, which show improved resistance against such attacks.

How to access OpenAI o1 models

Here’s how you can access OpenAI o1 models:

Access for ChatGPT users

If you’re using ChatGPT Plus or are part of a team (i.e., shared account or workspace), you can now try out the o1 models in ChatGPT. Both o1-preview and o1-mini are available to experiment with. You can choose which one to use manually from the model selector inside the app.

Access o1 models via ChatGPT

As of Sept. 13, there are some limits on how many messages you can send using these models:

o1-preview: 30 messages per week.
o1-mini: 50 messages per week.

If you are a ChatGPT Free user, don’t worry — OpenAI also plans to cover you by bringing access to o1-mini to all ChatGPT Free users. Be sure to follow future announcements.

Plus, in the future, ChatGPT will be able to automatically choose the ideal model for your request, saving you the trouble of having to choose one manually.

Access for ChatGPT Enterprise and Edu users

If you’re using ChatGPT through an Enterprise or Education account, you’ll get access to both o1-preview and o1-mini starting Sept. 16.

API access for developers

Tier 5 API usage developers can also immediately build and test apps using the o1 models. The models are currently limited to 20 requests per minute (RPM), but OpenAI plans to raise this limit after further testing.

However, the API for these o1 models still lacks some of the capabilities you may be familiar with, such as function calling, streaming responses or support for system messages. As a developer, you can begin by looking through the API documentation.

Did you know? OpenAI’s o1-preview and o1-mini models have the same knowledge cut-off as GPT-4o, set to October 2023.

o1 vs GPT-4o: Which is better?

Both o1 and GPT-4o are powerful language models with unique strengths and weaknesses.

The Transformer neural network architecture, which has transformed natural language processing, serves as the foundation for both models. Each can produce text, translate across languages, create many types of creative content, and provide informed answers to your queries. Due to their extensive training on text data, both models can recognize linguistic patterns and relationships.

However, the o1 models consistently outperform GPT-4o across all challenging reasoning benchmarks and questions, including machine learning (ML) benchmarks, various exams, PhD-level science questions, and massive multitask language understanding (MMLU) categories, as shown in the image below.

o1 models vs. GPT-4o performance on a wide range of benchmarks

On the other hand, accessing the o1 models through the API is expensive for developers. The o1-preview model costs $15 per million input tokens (the text you send) and $60 per million output tokens (the text the model generates). In comparison, GPT-4 is much cheaper at $5 per million input tokens and $15 per million output tokens. But of course, the output may vary, given the reasoning capability of o1-preview.

Thus, choosing whether to use one of the o1 models or GPT-4o comes down to your expectations in terms of performance and affordability. Also, don’t forget that the o1 models are still new and evolving, while GPT-4o remains a powerful LLM.

Did you know? ML benchmarks, exams and MMLU categories test an AI’s performance across machine learning, academic assessments and diverse knowledge areas. Excelling in these indicates strong versatility and adaptability. Consistent success shows the AI’s ability to tackle various challenges effectively.

Best practices for prompting o1 models

Want to experiment with o1 models? Here are some of the best practices you can follow:

Skip detailed reasoning steps: As discussed, you don’t need to ask the model to explain its reasoning or to “think step by step.” The o1 models handle their chain of thoughts internally, so they’ll naturally provide logical responses without extra guidance.
Keep it simple: Short, direct prompts work well with reasoning-powered models. Avoid overcomplicating your prompts.
Focus on relevant information: If you provide extra context, especially in retrieval-augmented generation (RAG), include only the most relevant details. Too much context can cause the model to get sidetracked or give overly complicated responses. RAG is a technique where the model combines its built-in knowledge with additional information retrieved from external sources, like documents or databases.
Prioritize key terms or concepts: Use capitalization or delimiters to draw attention to key terms or concepts while creating a prompt. This focuses the model’s attention on the important details of your request, resulting in outputs that are more precisely tailored.

Explain prompt for o1 models

The future of AI reasoning models

The future of AI reasoning models is about making machines smarter and more intuitive in understanding and processing information. At present, AI is great at answering questions or generating text. Still, in the future, these models will get much better at understanding context — meaning they’ll remember what was said earlier and use that information to give more thoughtful responses.

Additionally, they will be more adept at applying humanlike thinking to complex tasks, such as multistep problem-solving and considering long-term impacts. Organizations such as OpenAI are already refining AI’s ability to comprehend questions and their underlying meanings in addition to just answering them.

As AI improves at this, we will likely see increasingly sophisticated systems in decision-making tools and customer service, making our interactions with AI more efficient and natural.

But does the advancement in AI reasoning raise concerns about the potential threats to humanity? Well, the risks associated with AI increase with its intelligence and intuitiveness. These concerns include autonomous decision-making in areas that affect people’s personal lives and could invade their privacy.

Organizations like OpenAI must keep improving these systems to ensure their advantages outweigh their risks. The development of AI could result in ground-breaking discoveries, but if not carefully managed, it could also bring unanticipated dangers to society, making it more critical than ever to strike a balance between innovation and security.

Written by Tayyub Yaqoob

OpenAI o1 models vs. GPT-4o models: What’s new and what’s different?