Close Menu
    Facebook X (Twitter) Instagram
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    Facebook X (Twitter) Instagram
    Fintech Fetch
    • Home
    • Crypto News
      • Bitcoin
      • Ethereum
      • Altcoins
      • Blockchain
      • DeFi
    • AI News
    • Stock News
    • Learn
      • AI for Beginners
      • AI Tips
      • Make Money with AI
    • Reviews
    • Tools
      • Best AI Tools
      • Crypto Market Cap List
      • Stock Market Overview
      • Market Heatmap
    • Contact
    Fintech Fetch
    Home»AI News»Google Releases Gemini 3.5 Live Translate, a Streaming Speech-to-Speech Audio Model Covering 70+ Languages Across Meet, Translate, and the Live API
    Google Releases Gemini 3.5 Live Translate, a Streaming Speech-to-Speech Audio Model Covering 70+ Languages Across Meet, Translate, and the Live API
    AI News

    Google Releases Gemini 3.5 Live Translate, a Streaming Speech-to-Speech Audio Model Covering 70+ Languages Across Meet, Translate, and the Live API

    June 10, 20265 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email
    aistudios

    rewrite this content and keep HTML tags as is. This is content from rss feed and I don’t need their *Daily Debrief Newsletter*, their tags from bottom like this *Share this articleCategoriesTags*, Editorial Process section, phrases like *Featured image from Peakpx, chart from Tradingview.com*, SPECIAL OFFERS and similar sections – just remove such sections and save only article itself:

    Google just announced Gemini 3.5 Live Translate. It is their latest audio model for live speech-to-speech translation. Speech-to-speech means spoken audio goes in, and translated spoken audio comes out. The model detects over 70 languages automatically and generates translated speech. It preserves the speaker’s intonation, pacing, and pitch in the output. Turn-by-turn systems wait for a speaker to finish before responding. Gemini 3.5 Live Translate generates speech continuously instead. It balances a trade-off between waiting for context and translating immediately. More context improves quality. Faster output keeps the translation in sync with the speaker. The result stays a few seconds behind the speaker throughout a session.

    Gemini 3.5 Live Translate

    Gemini 3.5 Live Translate is a single audio model (gemini-3.5-live-translate-preview), not a chat assistant. It processes speech as the audio streams in, rather than after a full sentence. It handles multilingual inputs without manually configuring settings. Its noise robustness lets applications run in loud, unpredictable environments.

    The model is rolling out across three surfaces. Developers get it in public preview through the Gemini Live API and Google AI Studio. Enterprises get a private preview in Google Meet starting this month. Everyone else gets it through the Google Translate app on Android and iOS.

    How the Continuous Streaming Works

    The design difference matters for building real-time features. A conversational Live agent uses turn-based interactions. It relies on pauses, intent detection, and interruption handling. Live Translation uses continuous stream processing instead. It translates as the speaker talks, without waiting for turns to end.

    frase

    To hold strict real-time latency thresholds, the translation path accepts audio input only. Text input is not supported in translation mode. The model also drops tool use and system instructions in this mode. That keeps it a focused translator pipeline rather than a general agent.

    Building With the Live API

    Developers configure translation inside the Live API session setup. You set a translationConfig block within the generationConfig. The targetLanguageCode field takes a BCP-47 code, such as “pl” or “es”. BCP-47 is the standard format for language tags like en or pt-BR. It defaults to “en”. The echoTargetLanguage boolean controls input that is already in the target language. When true, the model echoes that speech. When false, it stays silent. You can also enable inputAudioTranscription and outputAudioTranscription for text transcripts.

    Audio formats are fixed. Input is raw 16-bit PCM at 16kHz, mono, little-endian. Output is raw 16-bit PCM at 24kHz, mono, little-endian. PCM is uncompressed raw audio. You send audio in chunks of 100ms. For client-side apps, ephemeral tokens on the v1alpha endpoint avoid exposing your API key.

    DimensionLive AgentLive TranslationModel roleAssistant that listens, reasons, and actsInterpreter / real-time translator pipelineInteractionTurn-based, with interruption handlingContinuous stream processing, no turnsToolsFunction calling, Google Search, instructionsTranslation only, no tools or instructionsInputsText, audio, video, and imageAudio only, for strict latencyConfigurationGeneration, speech, tools, instructionstargetLanguageCode and echoTargetLanguage

    Use Case

    The model targets live interpretation across several settings. Google lists multilingual calls, meetings, lessons, and broadcasts. Developer platforms reduce the integration work for real-time media. Agora, Fishjam, LiveKit, Pipecat, and Vision Agents already use the Live API. These platforms handle the complex real-time media streaming infrastructure. That lets developers focus on the user experience instead.

    Google’s example app demonstrates dubbing and simultaneous multi-language translation. Grab is testing the model for driver-and-traveler communication at pickups. Grab users make over 10 million voice calls per month. CJ ENM, LiveKit, and others reported positive feedback on quality, accuracy, and low latency.

    How It Changes Google Meet and Translate

    According to Google’s official release, Google Meet will soon use 3.5 Live Translate for speech translation. The table shows the stated before-and-after for Meet.

    CapabilityPrevious MeetWith 3.5 Live TranslateLanguages570+Combinations per meetingOnly to and from English2000+ combinationsAccessExisting interfaceUpdated interface for instant access

    The Meet update is in private preview for select business Workspace customers this month. A broader rollout follows later this year. In the Translate app, the Live translate feature works with any connected headphones. It mirrors the speaker’s tone across 70+ languages. Android also gains a listening mode. You hold the phone to your ear like a regular call. The translated audio then streams through the earpiece, without others hearing.

    Key Takeaways

    • Gemini 3.5 Live Translate is Google’s latest audio model for live speech-to-speech translation across 70+ languages.
    • It streams continuously instead of turn-by-turn, staying a few seconds behind the speaker.
    • Developers can configure it via the Live API using targetLanguageCode and echoTargetLanguage; audio-only, 16kHz in, 24kHz out.
    • It rolls out to the Gemini Live API, Google Meet (5→70+ languages), and the Translate app.
    • All generated audio carries an imperceptible SynthID watermark for detectability.

    Check out the Model Card and Technical details. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

    Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us

    binance
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Fintech Fetch Editorial Team
    • Website

    Related Posts

    The crucial human component in computing and AI | MIT News

    The crucial human component in computing and AI | MIT News

    June 9, 2026
    When Claude changed, everything changed: Managing AI blast radius in production

    When Claude changed, everything changed: Managing AI blast radius in production

    June 8, 2026
    How C3 AI agents will automate predictive maintenance for Shell

    How C3 AI agents will automate predictive maintenance for Shell

    June 7, 2026
    A Hands-On Coding Tutorial on Qualcomm AI Hub Models for Classification, Object Detection, and Hardware-Aware Deployment

    A Hands-On Coding Tutorial on Qualcomm AI Hub Models for Classification, Object Detection, and Hardware-Aware Deployment

    June 6, 2026
    Add A Comment

    Comments are closed.

    Join our email newsletter and get news & updates into your inbox for free.


    Privacy Policy

    Thanks! We sent confirmation message to your inbox.

    aistudios
    Latest Posts
    Google Releases Gemini 3.5 Live Translate, a Streaming Speech-to-Speech Audio Model Covering 70+ Languages Across Meet, Translate, and the Live API

    Google Releases Gemini 3.5 Live Translate, a Streaming Speech-to-Speech Audio Model Covering 70+ Languages Across Meet, Translate, and the Live API

    June 10, 2026
    Cointelegraph

    AI-Assisted Attackers Target Hidden DeFi Code

    June 10, 2026
    I Just Used Claude AI To Make $10,025 in 24 Hours

    I Just Used Claude AI To Make $10,025 in 24 Hours

    June 10, 2026
    Did the teacher use AI? How many ways can you arrange MISSISSIPPI? Reddit r/homeworkhelp

    Did the teacher use AI? How many ways can you arrange MISSISSIPPI? Reddit r/homeworkhelp

    June 10, 2026
    5 AI Hacks That Most Businesses Are Missing

    5 AI Hacks That Most Businesses Are Missing

    June 10, 2026
    aistudios
    LEGAL INFORMATION
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    Top Insights
    Cointelegraph

    rewrite this title in other words: Bitcoin’s Correction May Be Canary In Coal Mine Moment for Macro

    June 10, 2026
    Zcash (ZEC)

    rewrite this title in other words: Zcash developers propose ‘Ironwood’ upgrade, ZEC price rebounds, but there is a risk

    June 10, 2026
    murf
    Facebook X (Twitter) Instagram Pinterest
    © 2026 FintechFetch.com - All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.