TY - GEN
T1 - Evaluating Terminology Translation in MT
AU - Haque, Rejwanul
AU - Hasanuzzaman, Mohammed
AU - Way, Andy
N1 - Publisher Copyright:
© 2023, Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - Terminology translation plays a crucial role in domain-specific machine translation (MT). Preservation of domain knowledge from source to target is arguably the most concerning factor for clients in translation industry, especially for critical domains such as medical, transportation, military, legal and aerospace. Evaluation of terminology translation, despite its huge importance in the translation industry, has been a less examined area in MT research. Term translation quality in MT is usually measured with domain experts, either in academia or industry. To the best of our knowledge, as of yet there is no publicly available solution to automatically evaluate terminology translation in MT. In particular, manual intervention is often needed to evaluate terminology translation in MT, which, by nature, is a time-consuming and highly expensive task. In fact, this is unimaginable in an industrial setting where customised MT systems are often needed to be updated for many reasons (e.g. availability of new training data or leading MT techniques). Hence, there is a genuine need to have a faster and less expensive solution to this problem, which could aid the end-users to instantly identify term translation problems in MT. In this study, we propose an automatic evaluation metric, TermEval, for evaluating terminology translation in MT. To the best of our knowledge, there is no gold-standard dataset available for measuring terminology translation quality in MT. In the absence of gold-standard evaluation test set, we semi-automatically create a gold-standard dataset from English–Hindi judicial domain parallel corpus. We trained state-of-the-art phrase-based SMT (PB-SMT) and neural MT (NMT) models on two translation directions: English-to-Hindi and Hindi-to-English, and use TermEval to evaluate their performance on terminology translation over the created gold-standard test set. In order to measure the correlation between TermEval scores and human judgments, translations of each source terms (of the gold-standard test set) is validated with human evaluator. High correlation between TermEval and human judgements manifests the effectiveness of the proposed terminology translation evaluation metric. We also carry out comprehensive manual evaluation on terminology translation and present our observations.
AB - Terminology translation plays a crucial role in domain-specific machine translation (MT). Preservation of domain knowledge from source to target is arguably the most concerning factor for clients in translation industry, especially for critical domains such as medical, transportation, military, legal and aerospace. Evaluation of terminology translation, despite its huge importance in the translation industry, has been a less examined area in MT research. Term translation quality in MT is usually measured with domain experts, either in academia or industry. To the best of our knowledge, as of yet there is no publicly available solution to automatically evaluate terminology translation in MT. In particular, manual intervention is often needed to evaluate terminology translation in MT, which, by nature, is a time-consuming and highly expensive task. In fact, this is unimaginable in an industrial setting where customised MT systems are often needed to be updated for many reasons (e.g. availability of new training data or leading MT techniques). Hence, there is a genuine need to have a faster and less expensive solution to this problem, which could aid the end-users to instantly identify term translation problems in MT. In this study, we propose an automatic evaluation metric, TermEval, for evaluating terminology translation in MT. To the best of our knowledge, there is no gold-standard dataset available for measuring terminology translation quality in MT. In the absence of gold-standard evaluation test set, we semi-automatically create a gold-standard dataset from English–Hindi judicial domain parallel corpus. We trained state-of-the-art phrase-based SMT (PB-SMT) and neural MT (NMT) models on two translation directions: English-to-Hindi and Hindi-to-English, and use TermEval to evaluate their performance on terminology translation over the created gold-standard test set. In order to measure the correlation between TermEval scores and human judgments, translations of each source terms (of the gold-standard test set) is validated with human evaluator. High correlation between TermEval and human judgements manifests the effectiveness of the proposed terminology translation evaluation metric. We also carry out comprehensive manual evaluation on terminology translation and present our observations.
KW - Machine translation
KW - Neural machine translation
KW - Terminology translation
UR - http://www.scopus.com/inward/record.url?scp=85149985564&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-24337-0_35
DO - 10.1007/978-3-031-24337-0_35
M3 - Conference contribution
AN - SCOPUS:85149985564
SN - 9783031243363
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 495
EP - 520
BT - Computational Linguistics and Intelligent Text Processing - 20th International Conference, CICLing 2019, Revised Selected Papers
A2 - Gelbukh, Alexander
PB - Springer
T2 - 20th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2019
Y2 - 7 April 2019 through 13 April 2019
ER -