DeepL launches real-time voice-to-voice translation in 40+ languages

DeepL Launches Real-Time Voice-to-Voice Translation in Over 40 Languages

April 17, 2026 - 9:11 am

The Cologne-based translation company best known for its text tools has unveiled a full voice product suite covering meetings, conversations, group settings, and an API for enterprise integration. A live demo in Seoul showed one-to-two sentence delays, and DeepL’s CPO acknowledged word order differences between languages remain a fundamental challenge.

DeepL, the Cologne-based language AI company that built its reputation on high-quality text translation, has launched DeepL Voice-to-Voice: a real-time spoken translation suite designed for live business communication.

The product covers four distinct use cases: virtual meetings, mobile and web conversations, group settings for frontline workers, and enterprise applications through an API, and supports more than 40 languages including all 24 official EU languages and additions such as Vietnamese, Thai, Arabic, Norwegian, Hebrew, Bengali, and Tagalog.

The suite’s four components are at different stages of availability:

  • Voice for Conversations, which enables real-time translation across mobile and web without requiring app installation, is now generally available.
  • Voice for Meetings integrates with Microsoft Teams and Zoom so participants can speak in their native language while others hear simultaneous translation in theirs, opening an early access program in June.
  • The Voice-to-Voice API lets businesses embed DeepL’s translation engine into their own customer-facing applications such as call centers; it is currently in ongoing early access.
  • Spoken Terms, a customisation feature allowing the system to learn industry-specific vocabulary, company names, and personal names, is scheduled to become generally available on May 7.

Jarek Kutylowski, DeepL’s founder and CEO, described the launch as reaching “another frontier in translation.”

“DeepL Voice-to-Voice allows everyone to speak naturally in their own language without the friction or cost of interpreters,” he said.

DeepL has positioned the product as an enterprise tool rather than a consumer one: the company states its voice technology never uses customer data to train its models, and does not permanently store transcription or translation data after a call ends, a security framing that distinguishes it from consumer AI voice products and is aimed at regulated industries.

The current system works through a three-step pipeline: speech is converted to text, the text is translated using DeepL’s established translation engine, and the output is then converted back to speech.

DeepL’s competitive argument rests on the quality of the middle step: the company claims its text translation models outperform alternatives, and that advantage propagates through to the voice output.

In blind evaluations commissioned by DeepL and conducted independently by Slator, a language industry research firm, 96% of professional linguists preferred DeepL Voice over the native translation solutions in Google Meet, Microsoft Teams, and Zoom, citing superior fluency and accuracy.