DeepL introduces DeepL Voice, real-time, text-based translation from speech and video

Salma November 13, 2024

0 0 3 minutes read

DeepL introduces DeepL Voice, real-time, text-based translation from speech and video

DeepL has made a name for itself with online text translation that it says is subtler and more accurate than services from the likes of Google — a pitch that has landed the German startup a $2 billion valuation and more than 100,000 paying customers.

Now, as the hype for AI services continues to grow, DeepL is adding another platform mode: audio. Users will now be able to use DeepL Voice to listen to someone speaking in one language and translate it into another automatically, in real time.

English, German, Japanese, Korean, Swedish, Dutch, French, Turkish, Polish, Portuguese, Russian, Spanish and Italian are the languages DeepL can “hear” today. Translated subtitles are available for all 33 languages currently supported by DeepL Translator.

Photo credits:DeepL under a license.

DeepL Voice currently stops at delivering the result as an audio or video file itself: the service is intended for real-time, live chats and video conferencing, and comes as text, not audio.

In the first of these, you can set your translation to appear as ‘mirrors’ on a smartphone – the idea is that you place the phone between you on the conference table so that each side can see the translated words – or as a transcript you share with another person. The video conferencing service sees translations appear as subtitles.

That could change over time, Jarek Kutylowski, the company’s founder and CEO (pictured above), said in an interview. This is DeepL’s first voice product, but it’s unlikely to be the last. “[Voice] that’s where translation will come into play next year,” he added.

There is some evidence to support that statement. Google – one of DeepL’s biggest competitors – has also started adding real-time translated captions to its Meet video conferencing service. Also, there are a number of AI startups building voice translation services such as the AI voice expert Eleven Labs (Eleven Labs Dubbing), and Panjaya, which creates translations using “deep” voices and video with audio.

The latter uses Eleven Labs’ API, and according to Kutylowski, Eleven Labs itself uses technology from DeepL to power its translation service.

The speaker isn’t the only feature yet to be implemented.

And there is no voice product API yet. DeepL’s core business is B2B and Kutylowski said the company works with partners and customers directly.

And there isn’t a wide choice of integrations: The only video calling service that supports DeepL subtitles right now is Teams, which “covers most of our customers,” Kutylowski said. There’s no word on when Zoom or Google Meet will be integrating DeepL Voice down the line.

The product will feel like it’s been a long time coming for DeepL users, and not just because we’ve been overwhelmed by the plethora of other AI voice services aimed at translation. Kutylowski said this was request No. 1 from customers since 2017, the year DeepL was launched.

Part of the reason for the wait is that DeepL has been taking a very deliberate approach to building its product. Unlike many others in the world of AI applications that rely on and replace other companies’ large language models (LLMs), DeepL’s goal is to build its service from the ground up. In July, the company released a new translation-oriented LLM that it says surpasses GPT-4, as well as those from Google and Microsoft, not least because its primary purpose is translation. The company also continued to improve the quality of its written output and inventory.

Likewise, one of DeepL Voice’s unique selling points is that it will work in real time, which is important since most “AI translation” services on the market actually work with delays, making it difficult or impossible to use in live situations, which is the use case DeepL is talking about.

Kutylowski pointed out that this is another reason why the new voice-processing product is focused on text-based translation: It can be computerized and produced very quickly, while the processing and AI architecture still has a way to go before audio production and audio production. video immediately.

Video conferencing and meetings may use DeepL Voice, but Kutylowski noted that another big one the company sees is in the service industry, where front-line employees, say restaurants, can use the service to help communicate with customers more easily.

This may be helpful, but it also highlights one strong point of the service. In a world where we’re all suddenly more aware of data protection and concerned about how new services and platforms pick up private or proprietary information, it remains to be seen how sensitive people will be to having their voices controlled and used in this way.

Kutylowski emphasized that although the words will travel to their servers for translation (the processing does not happen on the device), nothing is stored by its systems, and it is not used to train its LLMs. Finally, DeepL will work with its customers to ensure that they are not in breach of the GDPR or any other data protection laws.

Source link

Salma November 13, 2024

0 0 3 minutes read