Close Menu
    Facebook X (Twitter) Instagram
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    Facebook X (Twitter) Instagram
    Fintech Fetch
    • Home
    • Crypto News
      • Bitcoin
      • Ethereum
      • Altcoins
      • Blockchain
      • DeFi
    • AI News
    • Stock News
    • Learn
      • AI for Beginners
      • AI Tips
      • Make Money with AI
    • Reviews
    • Tools
      • Best AI Tools
      • Crypto Market Cap List
      • Stock Market Overview
      • Market Heatmap
    • Contact
    Fintech Fetch
    Home»AI News»DeepReinforce Releases Ornith-1.0: An Open-Source Coding Model Family That Learns Its Own RL Scaffolds
    DeepReinforce Releases Ornith-1.0: An Open-Source Coding Model Family That Learns Its Own RL Scaffolds
    AI News

    DeepReinforce Releases Ornith-1.0: An Open-Source Coding Model Family That Learns Its Own RL Scaffolds

    June 26, 20266 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email
    kraken

    rewrite this content and keep HTML tags as is. This is content from rss feed and I don’t need their *Daily Debrief Newsletter*, their tags from bottom like this *Share this articleCategoriesTags*, Editorial Process section, phrases like *Featured image from Peakpx, chart from Tradingview.com*, SPECIAL OFFERS and similar sections – just remove such sections and save only article itself:





    DeepReinforce has released Ornith-1.0, an open-source model family built for agentic coding. The lineup spans four sizes, from a 9B dense model to a 397B mixture-of-experts flagship. Every checkpoint ships under the MIT license on Hugging Face. The models are post-trained on top of pretrained Gemma 4 and Qwen 3.5.

    Most coding agents pair a model with a fixed, human-designed harness. Ornith-1.0 instead learns to write its own. The DeepReinforce research team reports state-of-the-art results among open models of comparable size.

    TL;DR

    • Ornith-1.0 ships in 9B, 31B, 35B-MoE, and 397B-MoE sizes under MIT, built on Gemma 4 and Qwen 3.5.
    • The model learns its own scaffold during RL, jointly optimizing the harness and the solution.
    • Ornith-1.0-397B tops Claude Opus 4.7 on both headline benchmarks, but not Opus 4.8 or the larger GLM-5.2-744B.
    • Three layers — fixed trust boundary, deterministic monitor, frozen LLM judge — guard against reward hacking.

    What is Ornith-1.0?

    Ornith-1.0 is a set of reasoning models tuned for coding agents. The variants are 9B Dense, 31B Dense, 35B MoE, and 397B MoE. The 35B model is mixture-of-experts and activates roughly 3B parameters per token. FP8 and GGUF builds are also published for faster local serving.

    notion

    Each model is a reasoning model. Replies open with a block before the final answer. The serving recipes enable a reasoning parser, so that trace returns in a separate reasoning_content field. The models also emit well-formed tool calls for agent loops.

    Deployment is straightforward. The 9B model is about 19GB in bf16 and serves on a single 80GB GPU. Serving recipes target vLLM, SGLang, and Transformers. Each model exposes an OpenAI-compatible endpoint. Standard agent frameworks therefore work without code changes.

    Interactive Explainer

    =5){clearInterval(timer);timer=null;b.textContent=”Auto-run ▶”;}else{doStep();}},1400);
    });
    root.querySelector(‘#resetBtn’).addEventListener(‘click’,function(){
    if(timer){clearInterval(timer);timer=null;root.querySelector(‘#autoBtn’).textContent=”Auto-run ▶”;}
    step=0;reward=0.08;
    root.querySelector(‘#rFill’).style.width=”8%”;
    root.querySelector(‘#rVal’).textContent=”0.08″;
    root.querySelector(‘#scaffTxt’).textContent=scaffs[0];
    root.querySelector(‘#outTxt’).textContent=”Press “Run training step” to begin.”;
    root.querySelector(‘#stepOut’).innerHTML=’Step 0 — untrained policy with a fixed, hand-written harness.’;
    resize();
    });

    /* benchmark data (vendor-reported) */
    var BENCHES=[‘Terminal-Bench 2.1′,’SWE-Bench Verified’,’SWE-Bench Pro’,’SWE-Bench Multilingual’,’NL2Repo’,’ClawEval Avg’];
    var DATA={
    t397:{label:’Ornith-1.0-397B’,hero:’Ornith-1.0-397B’,
    models:[‘Ornith-1.0-397B’,’Qwen3.5-397B’,’Qwen3.7-Max’,’GLM-5.2-744B’,’Minimax-M3-428B’,’DeepSeek-V4-Pro-1.6T’,’Claude Opus 4.7′,’Claude Opus 4.8′],
    vals:[[77.5,53.5,73.5,81.0,64,64,70.3,85],[82.4,76.4,80.4,null,null,80.6,80.8,87.6],[62.2,51.6,60.6,62.1,59,55.4,64.3,69.2],[78.9,69.3,78.3,null,null,76.2,null,null],[48.2,36.8,47.2,48.9,42.1,null,null,69.7],[77.1,70.7,65.2,null,null,75.8,78.2,null]]},
    t35:{label:’Ornith-1.0-35B-A3B’,hero:’Ornith-1.0-35B-A3B’,
    models:[‘Ornith-1.0-35B-A3B’,’Qwen3.5-35B-A3B’,’Qwen3.6-35B-A3B’,’Gemma4-31B’,’Qwen3.5-397B’],
    vals:[[64.2,41.4,52.5,42.1,53.5],[75.6,70,73.4,52,76.4],[50.4,44.6,49.5,35.7,51.6],[69.3,60.3,67.2,51.7,69.3],[34.6,20.5,29.4,15.5,36.8],[69.8,65.4,68.7,48.5,70.7]]},
    t9:{label:’Ornith-1.0-9B’,hero:’Ornith-1.0-9B’,
    models:[‘Ornith-1.0-9B’,’Qwen3.5-9B’,’Qwen3.5-35B-A3B’,’Gemma4-12B’,’Gemma4-31B’],
    vals:[[43.1,21.3,41.4,21,42.1],[69.4,53.2,70,44.2,52],[42.9,31.3,44.6,27.6,35.7],[52,39.7,60.3,32.5,51.7],[27.2,16.2,20.5,10.3,15.5],[63.1,53.2,65.4,32.5,48.5]]}
    };
    var curTier=”t397″,curB=0;
    var bchips=root.querySelector(‘#benchChips’);
    BENCHES.forEach(function(b,i){
    var c=document.createElement(‘div’);c.className=”chip”+(i===0?’ on’:”);c.textContent=b;c.dataset.b=i;
    c.addEventListener(‘click’,function(){curB=i;bchips.querySelectorAll(‘.chip’).forEach(function(x){x.classList.remove(‘on’)});c.classList.add(‘on’);draw();});
    bchips.appendChild(c);
    });
    root.querySelectorAll(‘.chip[data-tier]’).forEach(function(c){
    c.addEventListener(‘click’,function(){curTier=c.dataset.tier;root.querySelectorAll(‘.chip[data-tier]’).forEach(function(x){x.classList.remove(‘on’)});c.classList.add(‘on’);draw();});
    });
    function draw(){
    var d=DATA[curTier];var row=d.vals[curB];var chart=root.querySelector(‘#chart’);chart.innerHTML=”;
    var max=Math.max.apply(null,row.filter(function(v){return v!=null}));
    d.models.forEach(function(m,i){
    var v=row[i];var hero=(m===d.hero);
    var div=document.createElement(‘div’);div.className=”row”+(hero?’ hero’:”)+(v==null?’ na’:”);
    div.innerHTML=’

    ‘+m+’

    ‘+(v==null?’n/a’:v)+’

    ‘;
    chart.appendChild(div);
    (function(bf,val){setTimeout(function(){bf.style.width=(val==null?0:(val/max*100))+’%’;},40);})(div.querySelector(‘.bf’),v);
    });
    root.querySelector(‘#benchNote’).textContent=”Benchmark: “+BENCHES[curB]+’. Bars scaled to the highest score shown. “n/a” = not reported by the vendor. Self-reported, not independently verified.’;
    resize();
    }
    draw();

    /* defenses accordion */
    root.querySelectorAll(‘.layer’).forEach(function(l){
    l.addEventListener(‘click’,function(){l.classList.toggle(‘open’);resize();});
    });

    /* auto-resize for WordPress iframe */
    function resize(){
    try{
    var h=root.offsetHeight+40;
    if(window.parent){window.parent.postMessage({type:’mtp-ornith-height’,height:h},’*’);}
    }catch(e){}
    }
    window.addEventListener(‘load’,resize);
    setTimeout(resize,300);
    window.addEventListener(‘resize’,resize);
    })();

    ” style=”width:100%;border:0;display:block;min-height:600px;overflow:hidden” height=”600″ scrolling=”no” loading=”lazy” title=”Ornith-1.0 Interactive Explainer”>

    The Self-Scaffolding Idea

    Most coding agents rely on a scaffold, also called a harness. A scaffold wraps the model with memory, tools, error handling, and orchestration logic. AI teams usually hand-design one scaffold per task category.

    Ornith-1.0 treats the scaffold as a learnable object instead. During reinforcement learning, the scaffold co-evolves with the model’s policy. Each RL step runs in two stages.

    First, the model reads the task and its previous scaffold. It then proposes a refined scaffold. Second, it uses that scaffold and the task to generate a solution rollout. Reward from the rollout flows back to both stages.

    So the model is optimized to author orchestration, not just answers. Over training, higher-reward scaffolds are mutated and selected automatically. Per-task strategies emerge without hand-engineered harness design.

    Training also runs asynchronously, using a pipeline-RL setup. A staleness weight downweights older, off-policy tokens and drops them past a threshold. The optimization uses a token-level GRPO objective.

    Guarding Against Reward Hacking

    Letting a model write its own scaffold invites reward hacking. A scaffold could read visible test files and hardcode expected outputs. It could also copy an oracle solution sitting in the environment. DeepReinforce team describes three defense layers.

  • The outer trust boundary is fixed and immutable. The environment, tool surface, and test isolation stay outside the model’s reach. The model evolves only its inner policy scaffold.
  • A deterministic monitor flags banned actions. Reading withheld paths or editing verification scripts earns zero reward. Those trajectories are excluded from the advantage computation.
  • A frozen LLM judge acts as a veto. It sits on top of the verifier, not as the primary reward.
  • Benchmark

    DeepReinforce reports vendor numbers across several agentic coding benchmarks. At flagship scale, Ornith-1.0-397B posts 77.5 on Terminal-Bench 2.1 and 82.4 on SWE-Bench Verified. On SWE-Bench Verified, that 82.4 trails only Claude Opus 4.8 (87.6) among the listed models. On Terminal-Bench 2.1, the picture is more mixed.

    Ornith-1.0-397B beats Claude Opus 4.7 (70.3) on Terminal-Bench 2.1. But it trails Claude Opus 4.8 (85) and the larger GLM-5.2-744B (81.0). So the ‘state-of-the-art’ claim is scoped to open models of comparable size.

    The smaller models carry the efficiency case. The 35B model scores 64.2 on Terminal-Bench 2.1, above Qwen 3.5-397B’s 53.5. The 9B model reaches 43.1 on Terminal-Bench 2.1 and 69.4 on SWE-Bench Verified.

    BenchmarkOrnith-1.0-397BQwen3.5-397BQwen3.7-MaxGLM-5.2-744BMinimax-M3-428BDeepSeek-V4-Pro-1.6TClaude Opus 4.7Claude Opus 4.8Terminal-Bench 2.177.553.573.581.0646470.385SWE-Bench Verified82.476.480.4––80.680.887.6SWE-Bench Pro62.251.660.662.15955.464.369.2SWE-Bench Multilingual78.969.378.3––76.2––NL2Repo48.236.847.248.942.1––69.7ClawEval Avg77.170.765.2––75.878.2–

    Use Cases and a Quick Start

    The models target terminal-native coding agents and repository-scale work. Practical fits include multi-file refactors, bug localization, and test-driven patches. The 9B model suits edge or single-GPU setups where latency and cost matter. The 397B model targets maximum accuracy on long, multi-step tasks.

    For example, a dev can run the 9B model locally to triage a failing test suite. A platform team can self-host the 397B model for an internal coding agent.

    Serving is a one-liner with vLLM:

    vllm serve deepreinforce-ai/Ornith-1.0-9B \
    –served-model-name Ornith-1.0-9B \
    –max-model-len 262144 \
    –enable-auto-tool-choice –tool-call-parser qwen3_xml \
    –reasoning-parser qwen3 \
    –trust-remote-code

    Then call it with any OpenAI client:

    from openai import OpenAI

    client = OpenAI(base_url=”http://localhost:8000/v1″, api_key=”EMPTY”)

    resp = client.chat.completions.create(
    model=”Ornith-1.0-9B”,
    messages=[{“role”: “user”, “content”: “Write a Python is_prime(n).”}],
    temperature=0.6, top_p=0.95,
    )
    msg = resp.choices[0].message
    print(getattr(msg, “reasoning_content”, None)) # the trace
    print(msg.content) # the final answer

    The reasoning trace returns in reasoning_content, with the answer in content. Recommended sampling is temperature=0.6, top_p=0.95, top_k=20. The model also plugs into OpenHands, OpenClaw, and OpenCode.

    Check out the Model Weights and Technical details. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

    Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us

    aistudios
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Fintech Fetch Editorial Team
    • Website

    Related Posts

    Exploring the societal impacts of AI | MIT News

    Exploring the societal impacts of AI | MIT News

    June 25, 2026
    Enterprise-grade AI image generation in 2 seconds is here: Krea 2 Raw and Turbo available as open weights under custom license

    Enterprise-grade AI image generation in 2 seconds is here: Krea 2 Raw and Turbo available as open weights under custom license

    June 24, 2026
    Mitigating vendor lock-in with Sakana AI Fugu multi-agent models

    Mitigating vendor lock-in with Sakana AI Fugu multi-agent models

    June 23, 2026
    How to Design Python-First Interactive Dashboards with Prefab Reactive UI Components and Static HTML Export

    How to Design Python-First Interactive Dashboards with Prefab Reactive UI Components and Static HTML Export

    June 22, 2026
    Add A Comment

    Comments are closed.

    Join our email newsletter and get news & updates into your inbox for free.


    Privacy Policy

    Thanks! We sent confirmation message to your inbox.

    aistudios
    Latest Posts
    DeepReinforce Releases Ornith-1.0: An Open-Source Coding Model Family That Learns Its Own RL Scaffolds

    DeepReinforce Releases Ornith-1.0: An Open-Source Coding Model Family That Learns Its Own RL Scaffolds

    June 26, 2026
    Give me 17 Minutes, I’ll Make you 1.5L/Month with AI (Passive)

    Give me 17 Minutes, I’ll Make you 1.5L/Month with AI (Passive)

    June 26, 2026
    AI for Beginners in 2026: Start With One Useful Workflow

    AI for Beginners in 2026: Start With One Useful Workflow

    June 26, 2026
    Mythos AI HACKED ENTIRE NSA In Hours, Top Intel Sen Says

    Mythos AI HACKED ENTIRE NSA In Hours, Top Intel Sen Says

    June 26, 2026
    Bitcoin Didn't Lose to Gold, the Rotation Story Is Wrong: Analyst

    rewrite this title in other words: Bitcoin Didn’t Lose to Gold, the Rotation Story Is Wrong: Analyst

    June 25, 2026
    quillbot
    LEGAL INFORMATION
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    Top Insights
    Mining Profits Dry Up Across Bitcoin, DOGE, LTC, and BCH

    rewrite this title in other words: Mining Profits Dry Up Across Bitcoin, DOGE, LTC, and BCH

    June 26, 2026

    rewrite this title in other words: Bitcoin slips under $60K as Polymarket pegs 80% odds of 0 Fed cuts in 2026

    June 26, 2026
    synthesia
    Facebook X (Twitter) Instagram Pinterest
    © 2026 FintechFetch.com - All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.