Close Menu
    Facebook X (Twitter) Instagram
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    Facebook X (Twitter) Instagram
    Fintech Fetch
    • Home
    • Crypto News
      • Bitcoin
      • Ethereum
      • Altcoins
      • Blockchain
      • DeFi
    • AI News
    • Stock News
    • Learn
      • AI for Beginners
      • AI Tips
      • Make Money with AI
    • Reviews
    • Tools
      • Best AI Tools
      • Crypto Market Cap List
      • Stock Market Overview
      • Market Heatmap
    • Contact
    Fintech Fetch
    Home»AI News»OpenAI Introduces GPT 5.2: A Long Context Workhorse For Agents, Coding And Knowledge Work
    OpenAI Introduces GPT 5.2: A Long Context Workhorse For Agents, Coding And Knowledge Work
    AI News

    OpenAI Introduces GPT 5.2: A Long Context Workhorse For Agents, Coding And Knowledge Work

    December 12, 20255 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email
    changelly

    OpenAI has just introduced GPT-5.2, its most advanced frontier model for professional work and long running agents, and is rolling it out across ChatGPT and the API.

    GPT-5.2 is a family of three variants. In ChatGPT, users see ChatGPT-5.2 Instant, Thinking and Pro. In the API, the corresponding models are gpt-5.2-chat-latest, gpt-5.2, and gpt-5.2-pro. Instant targets everyday assistance and learning, Thinking targets complex multi step work and agents, and Pro allocates more compute for hard technical and analytical tasks.

    Benchmark profile, from GDPval to SWE Bench

    GPT-5.2 Thinking is positioned as the main workhorse for real world knowledge work. On GDPval, an evaluation of well specified knowledge tasks across 44 occupations in 9 large industries, it beats or ties top industry professionals on 70.9 percent of comparisons, while producing outputs at more than 11 times the speed and under 1 percent of the estimated expert cost. For engineering teams this means the model can reliably generate artifacts such as presentations, spreadsheets, schedules, and diagrams given structured instructions.

    On an internal benchmark of junior investment banking spreadsheet modeling tasks, average scores rise from 59.1 percent with GPT-5.1 to 68.4 percent with GPT-5.2 Thinking and 71.7 percent with GPT-5.2 Pro. These tasks include three statement models and leveraged buyout models with constraints on formatting and citations, which is representative of many structured enterprise workflows.

    In software engineering, GPT-5.2 Thinking reaches 55.6 percent on SWE-Bench Pro and 80.0 percent on SWE-bench Verified. SWE-Bench Pro evaluates repository level patch generation over multiple languages, while SWE-bench Verified focuses on Python.

    murf

    Long context and agentic workflows

    Long context is a core design target. GPT-5.2 Thinking sets a new state of the art on OpenAI MRCRv2, a benchmark that inserts multiple identical ‘needle’ queries into long dialogue “haystacks” and measures whether the model can reproduce the correct answer. It is the first model reported to reach near 100 percent accuracy on the 4 needle MRCR variant out to 256k tokens.

    For workloads that exceed even that context, GPT-5.2 Thinking integrates with the Responses /compact endpoint, which performs context compaction to extend the effective window for tool heavy, long running jobs. This is relevant if you are building agents that iteratively call tools over many steps and need to maintain state beyond the raw token limit.

    On tool usage, GPT-5.2 Thinking reaches 98.7 percent on Tau2-bench Telecom, a multi turn customer support benchmark where the model must orchestrate tool calls across a realistic workflow. The official examples show scenarios like a traveler with a delayed flight, missed connection, lost bag and medical seating requirement, where GPT-5.2 manages rebooking, special assistance seating and compensation in a consistent sequence while GPT-5.1 leaves steps unfinished.

    Vision, science and math

    Vision quality also moves up. GPT-5.2 Thinking roughly halves error rates on chart reasoning and user interface understanding benchmarks like CharXiv Reasoning and ScreenSpot Pro when a Python tool is enabled. The model shows improved spatial understanding of images, for example when labeling motherboard components with approximate bounding boxes, GPT-5.2 identifies more regions with tighter placement than GPT-5.1.

    For scientific workloads, GPT-5.2 Pro scores 93.2 percent and GPT-5.2 Thinking 92.4 percent on GPQA Diamond, and GPT-5.2 Thinking solves 40.3 percent of FrontierMath Tier 1 to Tier 3 problems with Python tools enabled. These benchmarks cover graduate level physics, chemistry, biology and expert mathematics, and early use where GPT-5.2 Pro contributed to a proof in statistical learning theory under human verification.

    Comparison Table

    ModelPrimary positioningContext window / max outputKnowledge cutoffNotable benchmarks (Thinking / Pro vs GPT-5.1 Thinking)
    GPT-5.1 Flagship model for coding and agentic tasks with configurable reasoning effort400,000 tokens context, 128,000 max output2024-09-30SWE-Bench Pro 50.8 percent, SWE-bench Verified 76.3 percent, ARC-AGI-1 72.8 percent, ARC-AGI-2 17.6 percent
    GPT-5.2 (Thinking) New flagship model for coding and agentic tasks across industries and for long running agents400,000 tokens context, 128,000 max output2025-08-31GDPval wins or ties 70.9 percent vs industry professionals, SWE-Bench Pro 55.6 percent, SWE-bench Verified 80.0 percent, ARC-AGI-1 86.2 percent, ARC-AGI-2 52.9 percent
    GPT-5.2 Pro Higher compute version of GPT-5.2 for the hardest reasoning and scientific workloads, produces smarter and more precise responses400,000 tokens context, 128,000 max output2025-08-31GPQA Diamond 93.2 percent vs 92.4 percent for GPT-5.2 Thinking and 88.1 percent for GPT-5.1 Thinking, ARC-AGI-1 90.5 percent and ARC-AGI-2 54.2 percent

    Key Takeaways

  • GPT-5.2 Thinking is the new default workhorse model: It replaces GPT-5.1 Thinking as the main model for coding, knowledge work and agents, while keeping the same 400k context and 128k max output, but with clearly higher benchmark performance across GDPval, SWE-Bench, ARC-AGI and scientific QA.
  • Substantial accuracy jump over GPT-5.1 at similar scale: On key benchmarks, GPT-5.2 Thinking moves from 50.8 percent to 55.6 percent on SWE-Bench Pro and from 76.3 percent to 80.0 percent on SWE-bench Verified, and from 72.8 percent to 86.2 percent on ARC-AGI-1 and from 17.6 percent to 52.9 percent on ARC-AGI-2, while keeping token limits comparable.
  • GPT-5.2 Pro is targeted at high end reasoning and science: GPT-5.2 Pro is a higher compute variant that mainly improves hard reasoning and scientific tasks, for example reaching 93.2 percent on GPQA Diamond versus 92.4 percent for GPT-5.2 Thinking and 88.1 percent for GPT-5.1 Thinking, and higher scores on ARC-AGI tiers.
  • aistudios
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Fintech Fetch Editorial Team
    • Website

    Related Posts

    MIT affiliates named 2025 Schmidt Sciences AI2050 Fellows | MIT News

    MIT affiliates named 2025 Schmidt Sciences AI2050 Fellows | MIT News

    December 10, 2025

    Mistral launches powerful Devstral 2 coding model including open source, laptop-friendly version

    December 9, 2025
    Instacart pilots agentic commerce by embedding in ChatGPT

    Instacart pilots agentic commerce by embedding in ChatGPT

    December 8, 2025
    Google Colab Integrates KaggleHub for One Click Access to Kaggle Datasets, Models and Competitions

    Google Colab Integrates KaggleHub for One Click Access to Kaggle Datasets, Models and Competitions

    December 7, 2025
    Add A Comment

    Comments are closed.

    Join our email newsletter and get news & updates into your inbox for free.


    Privacy Policy

    Thanks! We sent confirmation message to your inbox.

    Customgpt
    Latest Posts
    How Ripple Convinced Wall Street About Its Post-SEC Future

    How Ripple Persuaded Wall Street Regarding Its Future After the SEC

    December 12, 2025
    Decrypt logo

    Tether, the Cryptocurrency Leader, Proposes Purchase of Juventus Football Club

    December 12, 2025
    businessman trader analyst in glasses spectacles with notebook and thinking, on diagram background. Trading on stock exchange concept

    Cryptocurrency Update: Markets Stabilize Amid $4.3B Expiration of BTC and ETH Options

    December 12, 2025
    Friday's ETF with Unusual Volume: QLTI

    Unusual Volume ETF for Friday: QLTI

    December 12, 2025
    OpenAI Introduces GPT 5.2: A Long Context Workhorse For Agents, Coding And Knowledge Work

    OpenAI Introduces GPT 5.2: A Long Context Workhorse For Agents, Coding And Knowledge Work

    December 12, 2025
    kraken
    LEGAL INFORMATION
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    Top Insights
    The $25 Stock I'm Buying for Just $0.87 Each

    The $25 Stock I’m Buying for Just $0.87 Each

    December 13, 2025
    Your Language Method Is Obsolete — Try These 3 AI Hacks

    Your Language Method Is Obsolete — Try These 3 AI Hacks

    December 13, 2025
    kraken
    Facebook X (Twitter) Instagram Pinterest
    © 2025 FintechFetch.com - All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.