Close Menu
    Facebook X (Twitter) Instagram
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    Facebook X (Twitter) Instagram
    Fintech Fetch
    • Home
    • Crypto News
      • Bitcoin
      • Ethereum
      • Altcoins
      • Blockchain
      • DeFi
    • AI News
    • Stock News
    • Learn
      • AI for Beginners
      • AI Tips
      • Make Money with AI
    • Reviews
    • Tools
      • Best AI Tools
      • Crypto Market Cap List
      • Stock Market Overview
      • Market Heatmap
    • Contact
    Fintech Fetch
    Home»AI News»Meta AI Releases SAM Audio: A State-of-the-Art Unified Model that Uses Intuitive and Multimodal Prompts for Audio Separation
    Meta AI Releases SAM Audio: A State-of-the-Art Unified Model that Uses Intuitive and Multimodal Prompts for Audio Separation
    AI News

    Meta AI Releases SAM Audio: A State-of-the-Art Unified Model that Uses Intuitive and Multimodal Prompts for Audio Separation

    December 17, 20253 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email
    synthesia

    Meta has released SAM Audio, a prompt-driven audio separation model that targets a common editing bottleneck, isolating one sound from a real-world mix without building a custom model per sound class. Meta released 3 main sizes: sam-audio-small, sam-audio-base, and sam-audio-large. The model is available to download and experiment with in the Segment Anything Playground.

    Architecture

    SAM Audio uses separate encoders for each conditioning signal—an audio encoder for the mixture, a text encoder for the natural language description, a span encoder for time anchors, and a visual encoder that consumes a visual prompt derived from video plus an object mask. The encoded streams are concatenated into time-aligned features, which are then processed by a diffusion transformer that applies self-attention over the time-aligned representation and cross-attention to the textual feature. A DACVAE decoder reconstructs waveforms and emits two outputs: target audio and residual audio.

    What SAM Audio does, and what ‘segment’ means here?

    SAM Audio takes an input recording that contains multiple overlapping sources, like speech plus traffic plus music, and separates a target source based on a prompt. In the public inference API, the model produces two outputs: result.target (the isolated sound) and result.residual (everything else).

    This target-residual interface maps directly to editor operations. For instance, to remove a dog bark from a podcast track, treat the bark as the target and keep only the residual. Conversely, if you want to extract a guitar part from a concert clip, you keep the target waveform instead. Meta uses these examples to illustrate the model’s potential.

    The 3 prompt types Meta is shipping

    Meta positions SAM Audio as a single unified model supporting three prompt types, usable alone or in combination:

    coinbase
  • Text prompting: Describe the sound in natural language, e.g., “dog barking” or “singing voice,” and the model separates that sound from the mixture. Text prompts are a core interaction mode, with an end-to-end example available in the open-source repo using SAMAudioProcessor and model.separate.
  • Visual prompting: Click on a person or object in a video to ask the model to isolate the audio linked to that visual object, implemented by passing video frames and masks into the processor via masked_videos.
  • Span prompting: Mark time segments where the target sound occurs; the model uses those spans to guide separation. This is crucial for ambiguous cases, such as when the same instrument appears multiple times or when a sound is brief, helping to prevent over-separation.
  • Results

    The Meta team claims SAM Audio achieves cutting-edge performance across diverse, real-world scenarios and serves as a unified alternative to single-purpose audio tools. They published a subjective evaluation across categories—General, SFX, Speech, Speaker, Music, Instr(wild), Instr(pro)—with General scores of 3.62 for sam audio small, 3.28 for sam audio base, and 3.50 for sam audio large, while Instr(pro) scores reached 4.49 for sam audio large.

    Key Takeaways

  • SAM Audio is a unified audio separation model that segments sound from complex mixtures using text prompts, visual prompts, and time span prompts.
  • The core API produces two waveforms per request: target for the isolated sound and residual for everything else, easily mapping to common edit operations like removing noise, extracting stems, or keeping ambience.
  • Meta released multiple checkpoints and variants, including sam-audio-small, sam-audio-base, sam-audio-large, plus TV variants that perform better for visual prompting. The repo also includes a subjective evaluation table by category.
  • The release includes tooling beyond inference: Meta provides a sam-audio-judge model that scores separation results against a text description, evaluating overall quality, recall, precision, and faithfulness.
  • binance
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Fintech Fetch Editorial Team
    • Website

    Related Posts

    MIT’s new ‘recursive’ framework lets LLMs process 10 million tokens without context rot

    MIT’s new ‘recursive’ framework lets LLMs process 10 million tokens without context rot

    January 20, 2026
    SAP and Fresenius to build sovereign AI backbone for healthcare

    SAP and Fresenius to build sovereign AI backbone for healthcare

    January 19, 2026
    Vercel Releases Agent Skills: A Package Manager For AI Coding Agents With 10 Years of React and Next.js Optimisation Rules

    Vercel Releases Agent Skills: A Package Manager For AI Coding Agents With 10 Years of React and Next.js Optimisation Rules

    January 18, 2026
    At MIT, a continued commitment to understanding intelligence | MIT News

    At MIT, a continued commitment to understanding intelligence | MIT News

    January 17, 2026
    Add A Comment

    Comments are closed.

    Join our email newsletter and get news & updates into your inbox for free.


    Privacy Policy

    Thanks! We sent confirmation message to your inbox.

    aistudios
    Latest Posts
    Bitcoin Sell-off Risk Rises As New Whales Control The Price Action

    Increased Sell-off Threat for Bitcoin as New Whales Influence Price Movements

    January 20, 2026
    The SIMPLEST Way to Make Money Online with AI in 2026

    The SIMPLEST Way to Make Money Online with AI in 2026

    January 20, 2026
    Nvidia's Strategy | Jensen Huang's 4 Big Bets to Stay #1

    Nvidia’s Strategy | Jensen Huang’s 4 Big Bets to Stay #1

    January 20, 2026
    This ONE Stock is the Steal of a Century

    This ONE Stock is the Steal of a Century

    January 20, 2026
    Suno AI Metatags Hacks: Make Songs Sound 10x More Expressive

    Suno AI Metatags Hacks: Make Songs Sound 10x More Expressive

    January 20, 2026
    kraken
    LEGAL INFORMATION
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    Top Insights
    Optimism Price

    Optimism (OP) Declines to $0.25 as January 22 Buyback Vote Approaches

    January 20, 2026

    Strategy’s innovative approach to Bitcoin fund attracts BlackRock’s attention.

    January 20, 2026
    kraken
    Facebook X (Twitter) Instagram Pinterest
    © 2026 FintechFetch.com - All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.