[1] S. Ranathunga, E.-S. A. Lee, M. P. Skenduli, R. Shekhar, M. Alam, and R. Kaur, “Neural machine translation for low-resource languages: A survey,” 2021. [2] S. Thillainathan, S. Ranathunga, and S. Jayasena, “Fine-tuning self-supervised mul- tilingual sequence-to-sequence models for extremely low-resource nmt,” in 2021 Moratuwa Engineering Research Conference (MERCon). IEEE, 2021, pp. 432– 437. [3] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoy- anov, and L. Zettlemoyer, “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” arXiv preprint arXiv:1910.13461, 2019. [4] P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens et al., “Moses: Open source toolkit for statistical machine translation,” in Proceedings of the 45th annual meeting of the as- sociation for computational linguistics companion volume proceedings of the demo and poster sessions, 2007, pp. 177–180. [5] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” arXiv preprint arXiv:1409.3215, 2014. [6] Y. Liu, J. Gu, N. Goyal, X. Li, S. Edunov, M. Ghazvininejad, M. Lewis, and L. Zettle- moyer, “Multilingual denoising pre-training for neural machine translation,” Trans- actions of the Association for Computational Linguistics, vol. 8, pp. 726–742, 2020. [7] Y. Tang, C. Tran, X. Li, P.-J. Chen, N. Goyal, V. Chaudhary, J. Gu, and A. Fan, “Multilingual translation with extensible multilingual pretraining and finetuning,” arXiv preprint arXiv:2008.00401, 2020. [8] L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, A. Barua, and C. Raffel, “mt5: A massively multilingual pre-trained text-to-text transformer,” arXiv preprint arXiv:2010.11934, 2020. 46 [9] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018. [10] R. Dabre, C. Chu, and A. Kunchukuttan, “A survey of multilingual neural machine translation,” ACM Computing Surveys (CSUR), vol. 53, no. 5, pp. 1–38, 2020. [11] P. Tennage, P. Sandaruwan,M. Thilakarathne, A. Herath, S. Ranathunga, S. Jayasena, and G. Dias, “Neural machine translation for sinhala and tamil languages,” in 2017 International Conference on Asian Language Processing (IALP). IEEE, 2017, pp. 189–192. [12] P. Tennage, P. Sandaruwan, M. Thilakarathne, A. Herath, and S. Ranathunga, “Han- dling rare word problem using synthetic training data for sinhala and tamil neural machine translation,” in Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018. [13] P. Tennage, A. Herath, M. Thilakarathne, P. Sandaruwan, and S. Ranathunga, “Transliteration and byte pair encoding to improve tamil to sinhala neural ma- chine translation,” in 2018 Moratuwa Engineering Research Conference (MER- Con). IEEE, 2018, pp. 390–395. [14] A. Pramodya, R. Pushpananda, and R. Weerasinghe, “A comparison of transformer, recurrent neural networks and smt in tamil to sinhala mt,” in 2020 20th International Conference on Advances in ICT for Emerging Regions (ICTer). IEEE, 2020, pp. 155–160. [15] T. Fonseka, R. Naranpanawa, R. Perera, and U. Thayasivam, “English to sinhala neural machine translation,” in 2020 International Conference on Asian Language Processing (IALP). IEEE, 2020, pp. 305–309. [16] R. Naranpanawa, R. Perera, T. Fonseka, and U. Thayasivam, “Analyzing subword techniques to improve english to sinhala neural machine translation,” International Journal of Asian Language Processing, vol. 30, no. 04, p. 2050017, 2020. 47 [17] B. Janarthanasarma and T. Uthayasanker, “A survey on neural machine translation for english-tamil language pair.” [18] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” arXiv preprint arXiv:1706.03762, 2017. [19] N. Arivazhagan, A. Bapna, O. Firat, D. Lepikhin, M. Johnson, M. Krikun, M. X. Chen, Y. Cao, G. Foster, C. Cherry et al., “Massively multilingual neural machine translation in the wild: Findings and challenges,” arXiv preprint arXiv:1907.05019, 2019. [20] A. Arukgoda, A. Weerasinghe, and R. Pushpananda, “Improving sinhala-tamil trans- lation through deep learning techniques.” in NL4AI@ AI* IA, 2019. [21] L. Nissanka, B. Pushpananda, and A.Weerasinghe, “Exploring neural machine trans- lation for sinhala-tamil languages pair,” in 2020 20th International Conference on Advances in ICT for Emerging Regions (ICTer). IEEE, 2020, pp. 202–207. [22] N. Kalchbrenner and P. Blunsom, “Recurrent continuous translation models,” inPro- ceedings of the 2013 conference on empirical methods in natural language process- ing, 2013, pp. 1700–1709. [23] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014. [24] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997. [25] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder-decoder for sta- tistical machine translation,” arXiv preprint arXiv:1406.1078, 2014. [26] K. Epaliyana, S. Ranathunga, and S. Jayasena, “Improving back-translation with it- erative filtering and data selection for sinhala-english nmt,” in 2021 Moratuwa En- gineering Research Conference (MERCon). IEEE, 2021, pp. 438–443. 48 [27] H. Choudhary, A. K. Pathak, R. R. Saha, and P. Kumaraguru, “Neural machine trans- lation for english-tamil,” in Proceedings of the third conference on machine trans- lation: shared task papers, 2018, pp. 770–775. [28] T. Banerjee, A. Kunchukuttan, and P. Bhattacharyya, “Multilingual indian language translation system at wat 2018: Many-to-one phrase-based smt,” in Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation, 2018. [29] R. Aharoni, M. Johnson, and O. Firat, “Massively multilingual neural machine trans- lation,” arXiv preprint arXiv:1903.00089, 2019. [30] D. Dong, H. Wu, W. He, D. Yu, and H. Wang, “Multi-task learning for multiple lan- guage translation,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2015, pp. 1723–1732. [31] M.-T. Luong, Q. V. Le, I. Sutskever, O. Vinyals, and L. Kaiser, “Multi-task sequence to sequence learning,” arXiv preprint arXiv:1511.06114, 2015. [32] M. Johnson, M. Schuster, Q. V. Le, M. Krikun, Y. Wu, Z. Chen, N. Thorat, F. Viégas, M. Wattenberg, G. Corrado et al., “Google’s multilingual neural machine translation system: Enabling zero-shot translation,” Transactions of the Association for Com- putational Linguistics, vol. 5, pp. 339–351, 2017. [33] B. Zoph and K. Knight, “Multi-source neural translation,” arXiv preprint arXiv:1601.00710, 2016. [34] O. Firat, K. Cho, and Y. Bengio, “Multi-way, multilingual neural machine translation with a shared attention mechanism,” arXiv preprint arXiv:1601.01073, 2016. [35] T.-L. Ha, J. Niehues, andA.Waibel, “Towardmultilingual neural machine translation with universal encoder and decoder,” arXiv preprint arXiv:1611.04798, 2016. 49 [36] O. Firat, B. Sankaran, Y. Al-Onaizan, F. T. Y. Vural, and K. Cho, “Zero- resource translation with multi-lingual neural machine translation,” arXiv preprint arXiv:1606.04164, 2016. [37] Y. Lu, P. Keung, F. Ladhak, V. Bhardwaj, S. Zhang, and J. Sun, “A neural interlingua for multilingual machine translation,” arXiv preprint arXiv:1804.08198, 2018. [38] S. M. Lakew, M. Federico, M. Negri, and M. Turchi, “Multilingual neural machine translation for zero-resource languages,” arXiv preprint arXiv:1909.07342, 2019. [39] G. Blackwood, M. Ballesteros, and T. Ward, “Multilingual neural machine transla- tion with task-specific attention,” arXiv preprint arXiv:1806.03280, 2018. [40] Y. Wang, J. Zhang, F. Zhai, J. Xu, and C. Zong, “Three strategies to improve one-to- manymultilingual translation,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, pp. 2955–2960. [41] V. Goyal, S. Kumar, and D. M. Sharma, “Efficient neural machine translation for low-resource languages via exploiting related languages,” in Proceedings of the 58th AnnualMeeting of the Association for Computational Linguistics: Student Research Workshop, 2020, pp. 162–168. [42] S. M. Lakew, A. Erofeeva, M. Negri, M. Federico, and M. Turchi, “Transfer learning in multilingual neural machine translation with dynamic vocabulary,” arXiv preprint arXiv:1811.01137, 2018. [43] S. M. Lakew, M. Cettolo, and M. Federico, “A comparison of transformer and re- current neural networks on multilingual neural machine translation,” arXiv preprint arXiv:1806.06957, 2018. [44] B. Zoph, D. Yuret, J. May, and K. Knight, “Transfer learning for low-resource neural machine translation,” arXiv preprint arXiv:1604.02201, 2016. [45] R. Dabre, T. Nakagawa, and H. Kazawa, “An empirical study of language relatedness for transfer learning in neuralmachine translation,” inProceedings of the 31st Pacific Asia Conference on Language, Information and Computation, 2017, pp. 282–286. 50 [46] T. Q. Nguyen and D. Chiang, “Transfer learning across low-resource, related lan- guages for neural machine translation,” arXiv preprint arXiv:1708.09803, 2017. [47] G. Neubig and J. Hu, “Rapid adaptation of neural machine translation to new lan- guages,” arXiv preprint arXiv:1808.04189, 2018. [48] A. F. Aji, N. Bogoychev, K. Heafield, and R. Sennrich, “In neural machine trans- lation, what does transfer learning transfer?” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 7701–7710. [49] B. Ji, Z. Zhang, X. Duan, M. Zhang, B. Chen, and W. Luo, “Cross-lingual pre- training based transfer for zero-shot neural machine translation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 01, 2020, pp. 115–122. [50] T. Kocmi and O. Bojar, “Efficiently reusing old models across languages via transfer learning,” arXiv preprint arXiv:1909.10955, 2019. [51] M. Maimaiti, Y. Liu, H. Luan, and M. Sun, “Multi-round transfer learning for low- resource nmt using multiple high-resource languages,” ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), vol. 18, no. 4, pp. 1–26, 2019. [52] Y. Kim, Y. Gao, and H. Ney, “Effective cross-lingual transfer of neural machine translation models without shared vocabularies,” arXiv preprint arXiv:1905.05475, 2019. [53] M. Maimaiti, Y. Liu, H. Luan, and M. Sun, “Enriching the transfer learning with pre- trained lexicon embedding for low-resource neural machine translation,” Tsinghua Science and Technology, p. 1, 2020. [54] A. Imankulova, R. Dabre, A. Fujita, and K. Imamura, “Exploiting out-of-domain parallel data through multilingual transfer learning for low-resource neural machine translation,” arXiv preprint arXiv:1907.03060, 2019. [55] C. Chu, R. Dabre, and S. Kurohashi, “An empirical comparison of domain adaptation methods for neural machine translation,” in Proceedings of the 55th Annual Meeting 51 of the Association for Computational Linguistics (Volume 2: Short Papers), 2017, pp. 385–391. [56] G. Luo, Y. Yang, Y. Yuan, Z. Chen, and A. Ainiwaer, “Hierarchical transfer learning architecture for low-resource neural machine translation,” IEEE Access, vol. 7, pp. 154 157–154 166, 2019. [57] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” 2018. [58] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettle- moyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining approach,” arXiv preprint arXiv:1907.11692, 2019. [59] S. Clinchant, K. W. Jung, and V. Nikoulina, “On the use of BERT for neural ma- chine translation,” in Proceedings of the 3rd Workshop on Neural Generation and Translation, 2019, pp. 108–117. [60] J. Yang, M. Wang, H. Zhou, C. Zhao, W. Zhang, Y. Yu, and L. Li, “Towards making the most of bert in neural machine translation,” in Proceedings of the AAAI Confer- ence on Artificial Intelligence, vol. 34, no. 05, 2020, pp. 9378–9385. [61] X. Qiu, T. Sun, Y. Xu, Y. Shao, N. Dai, and X. Huang, “Pre-trained models for natural language processing: A survey,” Science China Technological Sciences, pp. 1–26, 2020. [62] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” arXiv preprint arXiv:1910.10683, 2019. [63] Z. Chi, L. Dong, S. Ma, S. H. X.-L. Mao, H. Huang, and F. Wei, “mt6: Mul- tilingual pretrained text-to-text transformer with translation pairs,” arXiv preprint arXiv:2104.08692, 2021. [64] E.-S. A. Lee, S. Thillainathan, S. Nayak, S. Ranathunga, D. I. Adelani, R. Su, and 52 A. D. McCarthy, “Pre-trained multilingual sequence-to-sequence models: A hope for low-resource language translation?” arXiv preprint arXiv:2203.08850, 2022. [65] F. Guzmán, P.-J. Chen, M. Ott, J. Pino, G. Lample, P. Koehn, V. Chaudhary, and M. Ranzato, “The flores evaluation datasets for low-resource machine translation: Nepali-english and sinhala-english,” arXiv preprint arXiv:1902.01382, 2019. [66] L. Madaan, S. Sharma, and P. Singla, “Transfer learning for related languages: Sub- missions to the wmt20 similar language translation task,” in Proceedings of the Fifth Conference on Machine Translation, 2020, pp. 402–408. [67] S. Cahyawijaya, G. I. Winata, B. Wilie, K. Vincentio, X. Li, A. Kuncoro, S. Ruder, Z. Y. Lim, S. Bahar, M. L. Khodra et al., “Indonlg: Benchmark and resources for evaluating indonesian natural language generation,” arXiv preprint arXiv:2104.08200, 2021. [68] A. Bapna, N. Arivazhagan, and O. Firat, “Simple, scalable adaptation for neural machine translation,” arXiv preprint arXiv:1909.08478, 2019. [69] Z. Liu, G. I. Winata, and P. Fung, “Continual mixed-language pre-training for ex- tremely low-resource neural machine translation,” arXiv preprint arXiv:2105.03953, 2021. [70] P.-J. Chen, A. Lee, C.Wang, N. Goyal, A. Fan, M.Williamson, and J. Gu, “Facebook ai’s wmt20 news translation task submission,” arXiv preprint arXiv:2011.08298, 2020. [71] R. H. Susanto, D. Wang, S. Yadav, M. Jain, and O. Htun, “Rakuten’s participation in wat 2021: Examining the effectiveness of pre-trained models for multilingual and multimodal machine translation,” in Proceedings of the 8th Workshop on Asian Translation (WAT2021), 2021, pp. 96–105. [72] I. Beltagy, K. Lo, and A. Cohan, “Scibert: A pretrained language model for scientific text,” arXiv preprint arXiv:1903.10676, 2019. 53 [73] E. Alsentzer, J. R. Murphy, W. Boag, W.-H. Weng, D. Jin, T. Naumann, and M. McDermott, “Publicly available clinical bert embeddings,” arXiv preprint arXiv:1904.03323, 2019. [74] S. Gururangan, A. Marasović, S. Swayamdipta, K. Lo, I. Beltagy, D. Downey, and N. A. Smith, “Don’t stop pretraining: adapt language models to domains and tasks,” arXiv preprint arXiv:2004.10964, 2020. [75] J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, and J. Kang, “Biobert: a pre-trained biomedical language representation model for biomedical text mining,” Bioinformatics, vol. 36, no. 4, pp. 1234–1240, 2020. [76] R. Zhang, R. G. Reddy, M. A. Sultan, V. Castelli, A. Ferritto, R. Florian, E. S. Kayi, S. Roukos, A. Sil, and T. Ward, “Multi-stage pre-training for low-resource domain adaptation,” arXiv preprint arXiv:2010.05904, 2020. [77] L. K. Hansen and P. Salamon, “Neural network ensembles,” IEEE transactions on pattern analysis and machine intelligence, vol. 12, no. 10, pp. 993–1001, 1990. [78] H. Chen, S. Lundberg, and S.-I. Lee, “Checkpoint ensembles: Ensemble methods from a single training process,” arXiv preprint arXiv:1710.03282, 2017. [79] R. Sennrich, B. Haddow, and A. Birch, “Edinburgh neural machine translation sys- tems for wmt 16,” arXiv preprint arXiv:1606.02891, 2016. [80] R. Sennrich, A. Birch, A. Currey, U. Germann, B. Haddow, K. Heafield, A. V. M. Barone, and P. Williams, “The university of edinburgh’s neural mt systems for wmt17,” arXiv preprint arXiv:1708.00726, 2017. [81] K. Imamura and E. Sumita, “Ensemble and reranking: Using multiple models in the nict-2 neural machine translation system at wat2017,” in Proceedings of the 4th Workshop on Asian Translation (WAT2017), 2017, pp. 127–134. [82] A. Fernando, S. Ranathunga, and G. Dias, “Data augmentation and terminology in- tegration for domain-specific sinhala-english-tamil statistical machine translation,” arXiv preprint arXiv:2011.02821, 2020. 54 [83] M. Rajitha, L. Piyarathna, M. Nayanajith, and S. Surangika, “Sinhala and english document alignment using statistical machine translation,” in 2020 20th Interna- tional Conference on Advances in ICT for Emerging Regions (ICTer). IEEE, 2020, pp. 29–34. [84] F. Farhath, S. Ranathunga, S. Jayasena, and G. Dias, “Integration of bilingual lists for domain-specific statistical machine translation for sinhala-tamil,” in 2018Moratuwa Engineering Research Conference (MERCon). IEEE, 2018, pp. 538–543. [85] R. Sennrich and B. Zhang, “Revisiting low-resource neural machine translation: A case study,” arXiv preprint arXiv:1905.11901, 2019. [86] A. Fan, S. Bhosale, H. Schwenk, Z. Ma, A. El-Kishky, S. Goyal, M. Baines, O. Celebi, G. Wenzek, V. Chaudhary et al., “Beyond english-centric multilingual machine translation,” Journal of Machine Learning Research, vol. 22, no. 107, pp. 1–48, 2021. [87] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: a method for automatic evaluation of machine translation,” in Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002, pp. 311–318. [88] R. Futrell, K. Mahowald, and E. Gibson, “Quantifying word order freedom in depen- dency corpora,” in Proceedings of the third international conference on dependency linguistics (Depling 2015), 2015, pp. 91–100. [89] M. Anand Kumar, V. Dhanalakshmi, K. Soman, and S. Rajendran, “A sequence la- beling approach tomorphological analyzer for tamil language,” IJCSE) International Journal on Computer Science and Engineering, vol. 2, no. 06, pp. 1944–195, 2010. 55