Performance Testing of Machine Translation Engines Based on Different Domain-Specific Texts
Abstract
The rapid development and spread of digital technologies – most notably artificial intelligence – have brought about a profound transformation in the translation industry, fundamentally reshaping translation workflows and professional practices. One of the most significant advances has been the rise of neural machine translation (NMT) engines, which have vastly improved upon earlier rule-based and statistical systems. These modern systems are now capable of producing target-language output that demonstrates markedly better performance in terms of grammar, adherence to language norms, and overall fluency and readability, even in the context of the Hungarian language (Prószéky 2021, Laki and Yang 2022a). Given these significant improvements, it is not surprising that general neural translation engines have become an integral part of contemporary translation workflows, not only within professional language services but also among everyday users (Sulyok 2023, Seresi 2025, ELIS 2025). These tools offer increased efficiency and accessibility, reducing the time and cost associated with human translation. However, despite their widespread use, questions remain regarding the suitability and effectiveness of these translation engines across different domains. It is particularly unclear which systems perform best when translating specialised texts that require domain-specific terminology, stylistic conventions, and a deeper understanding of content.
This exploratory study seeks to examine whether the performance of widely used general-purpose NMT engines varies depending on the domain of the source text. Specifically, the research investigates the English-to-Hungarian output of four prominent neural translation systems—Google Translate, DeepL, eTranslation, and Globalese—when applied to texts from three distinct subject areas: social sciences, economics, and information technology. To assess the text-quality of the machine-generated translations, the study employs the MQM Core error typology (Lommel 2018), a widely accepted framework for evaluating translation quality across multiple linguistic dimensions. By identifying and analysing the types and frequency of errors in the translated output, the research aims to reveal patterns of strengths and weaknesses in each engine’s handling of specialised, domain-specific content. The findings are intended to provide guidance for professional translators and language service providers. By identifying which engines perform most reliably for specific domains and subject areas, translators can make more informed choices when integrating machine translation into their workflows. Moreover, the present study highlights the persistent challenges in specialised translation that remain difficult for NMT systems to resolve, such as handling of technical terminology, coherence, and context sensitivity.
Copyright (c) 2025 Lejla Borsiczki, Edina Robin

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.






1.png)