THE #1 AV NEWS PUBLICATION. PERIOD.

Real-Time AI-Based Translation/Transcription Coming to AV via Start-Up Gladia; Expect Acquisition

gladia start up ai based transcription

Gladia, an AI transcription and audio intelligence provider, has completed a $16 million Series A funding round. The company will use the funding to develop an end-to-end audio infrastructure – starting with a new real-time audio transcription and analytics engine – enabling voice-first platforms to deliver more value to their users across borders with AI. Founded in 2022, Gladia has now raised a total of $20.3 million.

“I founded Gladia for a very personal reason — I was frustrated that existing audio transcription services were not able to understand my French accent,” explained Jean-Louis Quéguiner, CEO and co-founder, Gladia. “Our international team and customers often switch between languages during meetings, but finding a transcription solution that can handle different languages and accents simultaneously was impossible.”

Given that most speech recognition models today are trained predominantly on English audio data and are therefore, potentially inherently biased, Gladia prioritized building the first real-time product that is “truly multilingual.” The new fine-tuned engine delivers advanced real-time transcription in over 100 languages, along with enhanced support for accents and the ability to adapt to different languages on the fly. Gladia’s new engine is unique in its ability to extract insights from a call — like the caller’s sentiment, key information and conversation summary — in real time. This means it takes less than a second to generate both transcript and insights from a call or meeting using Gladia. Building an accurate, low-latency, and multilingual engine in-house is a complex and resource-intensive task. It requires extensive expertise in language understanding, real-time data handling, with continuous optimization and maintenance. Real-time models require more computing power and may struggle to produce accurate output immediately due to limited context.

gladia transcription 2

The real-time speech-to-text engine claims to be under 300 milliseconds without compromising accuracy, regardless of the language, geography, or tech stack used. If this is accurate, this is astounding!

The company’s first async transcription and audio intelligence API launched in June 2023 and was based on a proprietary version of Whisper ASR. It rapidly gained traction in the enterprise market, particularly with meeting recorders and note-taking assistants. The API is now adopted by over 600 customers around the world, including Attention, Circleback, Method Financial, Recall, Sana and VEED.IO and has more than 70,000 users.

Gladia says it will use the new capital to advance its R&D efforts and soon bring to market a one-stop AI toolkit for audio and expand its product offering with additional à la carte models— including large language models (LLMs) and retrieval-augmented generation (RAG). With several design partners in the contact-center-as-a-service (CCaaS) segment, the company is currently piloting an agent-assist solution powered by Gladia’s real-time AI engine. Additionally, Gladia will continue to expand its talent base as it prepares for international expansion.

This will be fun to watch — I expect that it will be acquired by Legrand, Zoom or Microsoft within a year.

Top