According to him, this changed the entire paradigm of language technology. Whereas previously a separate model had to be created for each task, now one good language model is enough.
Language technology is a rapidly developing field that directly impacts the survival of smaller languages, like Estonian, in the digital world. TalTech’s Laboratory of Language Technology holds a leading position in the field of language models in Estonia and develops both scientific and practical applications.
Digital survival starts with shared data
Earlier this year, news that Estonian language data had found its way into the training materials of major international AI models sparked debate in society. “We didn’t give the data directly to Meta – it was already publicly available,” clarified Alumäe.
In his view, sharing Estonian language data is in our own interest, because if we want large language technology systems to understand Estonian, we must show them the way to access this data.
According to Alumäe, Estonian cannot survive in the technological future without a clear principle that quality language data must be available in Estonian for both researchers and businesses. “If artificial intelligence doesn’t understand Estonian, the language will gradually disappear in real life too. We must do everything we can to ensure that Estonian is visible and accessible in future-shaping systems.”
“We must do everything we can to ensure that Estonian is visible and accessible in future-shaping systems.”
Estonian language engineers: TalTech’s team fine-tunes models for work
Alumäe distinguishes between commercial models (e.g., OpenAI’s ChatGPT or Google Gemini) and open-source models, as the latter allow for customization and independent application. The work in language technology focuses specifically on open models. “Our goal is not to reinvent the wheel, but to fine-tune existing good models to work for the Estonian language.”
This work is primarily valuable for companies and developers who want to integrate language models into their systems and applications. Whether it’s chatbots, automated customer service, text summarisation, or specific task solutions – when data cannot be shared externally or control over costs and security is desired, open-source models that can be adapted as needed are the right tools.
Although language technology may seem far removed from the business world at first glance, solutions created in TalTech’s lab have already been widely used – including outside academic circles.
TalTech offers the popular service Tekstiks.ee – an online platform where audio recordings (e.g., interviews or meetings) can be uploaded and automatically transcribed. Journalists, social and humanities researchers, and companies that want to quickly create memos or protocols from meetings use this solution.
In addition, the lab’s speech recognition models are used in the parliament, courts, and are also used to create subtitles for ETV live broadcasts. “Where stenographers used to write out speech from scratch, now the focus of the work has shifted to editing automatic transcriptions,” said Alumäe.
Although language technology solutions are becoming increasingly accurate and easier to implement, companies have only just begun to take an interest in specific models. According to Alumäe, companies can approach TalTech mainly with problems requiring a scientific approach. “The lab helps solve complex and novel problems – issues that require a scientific approach and are more complex than regular development work,” said Alumäe.
“The lab helps solve complex and novel problems – issues that require a scientific approach and are more complex than regular development work.”

According to Tanel Alumäe, what has surprised him most in the field of language technology in recent years is how well large language models work. Just recently, such an approach was considered unlikely, but now these models can solve many tasks more efficiently than all previous methods combined. Photo: Unsplash
World-Class solutions
In addition to practical applications, TalTech’s Laboratory of Language Technology is known for its high-level research. Recently, they won an international competition in which language and speech recognition systems were developed in 150 languages. “Our success was due to our ability to identify spoken language with great accuracy, even in cases of strong accents or dialectal speech,” described Alumäe.
The lab is also developing a system to detect emotions in speech acts, and TalTech’s language technologists recently succeeded in an international competition for early Alzheimer’s detection, where they earned third place out of nearly forty top groups.
These achievements demonstrate that TalTech’s small, yet versatile research group is developing practical, adaptive, state-of-the-art technology, placing the lab among the best in the world.
When asked by Tanel Alumäe what has surprised him most in language technology in recent years, he does not hesitate for a moment: “What surprised me was that large language models work so well at all. Just recently, it was believed that such an approach would never work. Now, however, these models solve many tasks better than all the old solutions combined.”
“What surprised me was that large language models work so well at all. Just recently, it was believed that such an approach would never work. Now, however, these models solve many tasks better than all the old solutions combined.”
Looking to the future, Alumäe finds the multimodal approach particularly promising – language models that not only learn from text but also acquire knowledge from audio, images, and video. According to him, this would mean a much deeper understanding of the world