|
mySentence: Sentence Segmentation for Myanmar Language using Neural Machine Translation Approach |
|---|---|
| รหัสดีโอไอ | |
| Creator | Thura Aung |
| Title | mySentence: Sentence Segmentation for Myanmar Language using Neural Machine Translation Approach |
| Contributor | Ye Kyaw Thu, Zar Zar Hlaing |
| Publisher | Sirindhorn International Institute of Technology, Bangkadi Campus (SIIT-BKD) |
| Publication Year | 2566 |
| Journal Title | Journal of Intelligent Informatics and Smart Technology |
| Journal Vol. | 9 |
| Page no. | 1-9 |
| Keyword | Sentence segmentation, Neural machine translation, Sequence Tagging |
| URL Website | https://ph05.tci-thaijo.org/index.php/JIIST |
| Website title | Journal of Intelligent Informatics and Smart Technology |
| ISSN | 2586-9167 |
| Abstract | A sentence is an independent unit which is a string of complete words containing valuable information of the text. In informal Myanmar Language, for which most of NLP applications like Automatic Speech Recognition (ASR) are used, there is no predefined rule to mark the end of sentence. In this paper, we contributed the first corpus for Myanmar Sentence Segmentation and proposed the first systematic study with Machine Learning based Sequence Tagging as baseline and Neural Machine Translation approach. Before conducting the experiments, we prepared two types of data - one containing only sentences and the other containing both sentences and paragraphs. We trained each model on both types of data and evaluated the results on both types of test data. The accuracies were measured in terms of Bilingual Evaluation Understudy (BLEU) and character n-gram F-score (CHRF ++) scores. Word Error Rate (WER) was also used for the detailed study of error analysis. The experimental results show that Sequence-to-Sequence architecture based Neural Machine Translation approach with the best BLEU score (99.78), which is trained on both sentence-level and paragraph-level data, achieved better CHRF ++ scores (+18.4) and (+16.7) than best results of such machine learning models on both test data. |