Back to list

GPT-5 is Here: Features, Benchmarks, and How to Use It

OpenAI has released GPT-5. Discover key features, model variants, benchmark insights, and how to use it inside your workflows.

OpenAI has officially released GPT-5, its most advanced model yet. With major breakthroughs in reasoning, latency, and multimodal capabilities, GPT-5 pushes the frontier of AI into real enterprise and product workflows.

What Is GPT-5?

GPT-5 is OpenAI's most advanced language model yet—built to be more robust, reliable, and helpful than any of its predecessors. It delivers state-of-the-art accuracy on real-world tasks while significantly reducing hallucinations and deceptive responses. Designed with real workflows in mind, it powers real-time AI agents, developer tools, and enterprise copilots across platforms like Slack, HubSpot, and Microsoft Teams.

Key Improvements in GPT-5

  • State-of-the-art performance on leading AI benchmarks:
    • 94.6% on AIME 2025 (math)
    • 74.9% on SWE-bench Verified (real-world software engineering)
    • 88.4% on GPQA (PhD-level science)
    • 84.2% on MMMU (multimodal problem solving)
  • Fewer hallucinations: GPT-5 responses are up to 80% more factual than previous models like GPT-4o or o3.
  • Safer and more honest: Reduced deceptive behavior in edge cases or underspecified prompts.
  • Improved instruction following: More reliable for agents, workflows, and multi-step tasks.
  • Multimodal reasoning: Enhanced capabilities across text, image, video, and charts.

What makes GPT-5 different?

Rather than a single monolithic model, GPT-5 introduces a router-based model architecture:

  • GPT-5: Fast, general-purpose response generation
  • GPT-5 Thinking: Deep reasoning for complex tasks
  • GPT-5 Pro: Long-context, high-precision model tuned for advanced applications

These models are deployed intelligently behind the scenes, depending on query difficulty, user intent, and context. This allows GPT-5 to feel both fast and deeply capable—delivering expert-level answers where needed, while still handling everyday queries with speed and grace.

Why GPT-5 Feels More Trustworthy

GPT-5 doesn't just score higher—it acts more responsibly, explains better, and lies less.

Fewer Hallucinations, Even When Reasoning

GPT-5 is significantly more factually accurate:

  • 80% fewer hallucinations than GPT-4o when reasoning ("thinking" mode)
  • ~6x fewer factual errors on open-ended benchmarks like LongFact and FActScore

Whether answering complex science questions or writing long-form content, GPT-5 produces more grounded, verifiable answers.

More Honest When Tasks Can't Be Done

GPT-5 is trained to say “I don't know” when the task is impossible, underspecified, or missing key inputs:

  • In deception benchmarks, GPT-5 reduced false claims from 4.8% (o3) to just 2.1%
  • In multimodal tests with missing images, GPT-5 hallucinated only 9% of the time, compared to o3's 86.7%

This means better transparency and fewer misleading answers.

Safety with Nuance

With GPT-5, OpenAI introduced safe completions — a new training paradigm that:

  • Avoids over-refusing ambiguous prompts
  • Answers safely when possible
  • Explains why a task can't be done, instead of deflecting

This enables GPT-5 to be safer without feeling frustratingly limited.

Less Sycophantic, More Thoughtful

GPT-5 feels less like a cheerleader and more like a thoughtful colleague:

  • Sycophantic replies reduced by over 50%
  • Clearer follow-ups and fewer unnecessary flattery

The result? Conversations that are more productive, respectful, and grounded in truth.

Evaluations: How Smart Is GPT-5?

Math Mastery

GPT-5 dominates math evaluations:

  • AIME 2025: 94.6% (no tools) — new state-of-the-art.
  • HMMT: 96.7% (no tools), 100% (with tools).
  • FrontierMath: 26.3% → 32.1% with tool support.
  • GPQA (PhD-level): 88.4% (no tools) → 89.4% (with thinking).

Real-World Coding

On practical software engineering and code editing benchmarks:

  • SWE-bench Verified: 74.9% accuracy (GPT-5) vs 69.1% (OpenAI o3) and 30.8% (GPT-4o).
  • Aider Polyglot: 88% accuracy (GPT-5), far ahead of all previous models.

These improvements suggest GPT-5 can handle significantly more complex engineering workflows.

Instruction Following & Tool Use

GPT-5 is vastly better at multi-step reasoning and agentic behaviors:

  • Scale MultiChallenge (multi-turn): 69.6% vs GPT-4o's 40.3%.
  • BrowseComp (search + browsing): 54.9% vs 49.7%.
  • COLLIE (freeform following): 99.0% accuracy.

It not only follows instructions better but coordinates tools to complete tasks more reliably.

Multimodal Understanding

GPT-5 outperforms on visual, video, and diagram-based tasks:

  • MMMU: 84.2% vs GPT-4o's 72.2%.
  • ERQA: 65.7% vs 35.2% (GPT-4o).
  • CharXiv Reasoning: 81.1% vs 58.8%.

This means stronger capabilities for interpreting images, visual documents, and presentations.

Health Conversations

GPT-5 is the most accurate and least hallucinatory model for medical applications:

  • HealthBench: 67.2% vs GPT-4o's 32.0%.
  • Hallucination Rate: 1.6% (with thinking), down from GPT-4o's 15.8%.

Its performance on sensitive conversations makes it viable for high-trust settings.

Economically Valuable Tasks

GPT-5 beats both o3 and ChatGPT Agent on complex, real-world professional tasks:

  • Outperforms in law, logistics, sales, and engineering.
  • Achieves 47.1% wins over industry experts in internal benchmarks.

This suggests GPT-5 is not just smarter — it's more useful in economically important domains.

Faster, More Efficient Thinking

GPT-5 achieves better performance with fewer output tokens than OpenAI o3:

  • 50–80% fewer tokens across reasoning, coding, and scientific benchmarks.
  • Better performance with less effort and latency — critical for real-time use.

Detailed Benchmarks

GPT-5 Benchmark - intelligence GPT-5 Benchmark - hallucinations GPT-5 Benchmark - coding GPT-5 Benchmark - long-context

Who is GPT-5 for?

  • Business leaders & teams: Analyze documents, monitor operations, and plan strategy with better context awareness and safer outputs.
  • Developers: Use GPT-5 to generate production-quality frontends, debug codebases, and automate software tasks.
  • Healthcare professionals & patients: Understand diagnoses and treatment options with greater clarity and proactive reasoning (GPT-5 scored highest ever on HealthBench).
  • Knowledge workers & writers: Get help crafting thoughtful reports, articles, or even poetry—with deeper structure and emotional impact.

Real-World Use Cases with Runbear

  • Slack-based AI teammates: Use GPT-5 to power intelligent agents in Slack that summarize daily updates, answer policy questions, and track tasks with long-context memory.
  • AI-driven sales agents: Combine GPT-5 with your CRM via Runbear to generate weekly client reports, respond to customer FAQs, and prep for sales calls with context from past interactions.
  • Meeting intelligence agents: Let GPT-5 summarize meetings, surface action items, and connect them to your docs and workflows in Notion or Confluence.

Availability & Pricing

GPT-5 is now available across ChatGPT, the OpenAI API platform. Here's how you can access it and what it costs.

Access Options

ChatGPT

  • Rolling out now to Plus, Pro, Team, and Free users.
  • Enterprise and Education access starts in one week.
  • Available in the Codex CLI for Pro, Plus, and Team users.

OpenAI API

  • Available in three model sizes:
    • gpt-5
    • gpt-5-mini
    • gpt-5-nano
  • Supports:
    • Chat Completions API, Responses API, and Codex CLI
    • reasoning_effort and verbosity parameters
    • Parallel tool use, streaming, structured outputs, and batching
    • Built-in tools like file search, web search, image generation
    • Prompt caching and Batch API for cost-efficient use
  • Non-reasoning version available as gpt-5-chat-latest

Microsoft Integrations

  • GPT-5 is launching across:
    • Microsoft 365 Copilot
    • GitHub Copilot
    • Azure AI Foundry
    • Other Copilot experiences

API Pricing

ModelInput Price / 1M tokensOutput Price / 1M tokens
gpt-5$1.25$10.00
gpt-5-mini$0.25$2.00
gpt-5-nano$0.05$0.40
gpt-5-chat-latest$1.25$10.00
You can explore more in the GPT-5 documentation, pricing details, and prompting guide.

Final Thoughts

GPT-5 sets a new standard in reasoning, reliability, and real-world AI performance. With better benchmarks and lower hallucination rates, it's shaping up to be the most capable foundation model to date.

We're excited to bring it to Runbear the moment API access becomes available — stay tuned.


Benchmark scores and images used in this post are sourced from OpenAI's official GPT-5 materials: Introducing GPT-5 and the Introducing GPT-5 for developers.