Google Launches Gemini Omni Flash: A Conversational Video-Generation Model with Avatar Mode Held Back
May 20, 2026 – 9:01 am
Google introduced Gemini Omni on Tuesday at the I/O 2026 developer conference, a new multimodal model family from Google DeepMind designed to generate and edit video from any combination of image, audio, video, and text inputs. The first model in the family, Gemini Omni Flash, started rolling out the same day to the Gemini app and Google Flow for Google AI Plus, Pro, and Ultra subscribers, and to YouTube Shorts and the YouTube Create app at no cost. API access for developers and enterprise customers will follow in the coming weeks.
Key Features of Gemini Omni Flash:
- Multimodal Input: Combines images, audio, video, and text as input to generate high-quality videos grounded in Gemini’s real-world knowledge.
- Conversational Editing: Edits are made through a conversational interface, with each instruction building upon the previous one.
- Improved Physics Understanding: The model has an improved intuitive understanding of physical forces, including gravity, kinetic energy, and fluid dynamics.
- World Knowledge Integration: Uses Gemini’s existing world knowledge to connect language, imagery, and meaning beyond pattern-matching.
- Consistency Across Revisions: Maintains consistency across multi-turn revisions, preserving character identity and scene continuity.
Avatar Generation:
The release also extends the Omni family to digital-avatar generation, allowing users to record their own voice and likeness to create videos that look and sound like them. Onboarding requires recording yourself and speaking a series of numbers aloud.
Responsible Audio Editing:
Google is explicitly withholding general-purpose audio and speech editing inside Omni for now, stating, "We are still working to test this and better understand how we can bring this capability to users responsibly." This decision comes after third-party coverage suggested a deliberate step back from deepfake-adjacent territory.
SynthID Watermarking:
All videos generated with Omni will carry Google’s SynthID imperceptible digital watermark by default. Users can verify whether a clip was generated by Omni through the Gemini app, Gemini in Chrome and Google Search. The SynthID layer is the same watermarking infrastructure OpenAI adopted earlier this year under the C2PA open standard.