dc.description.abstract |
This paper investigates various code-switching prop-erties of conversational speech from bilingual English-Malay Singaporean speakers with data obtained from the National Speech Corpus (NSC) and provides baseline language models for various combinations between English-Malay monolingual and codeswitching transcripts. Specifically, the study analyzed the correlation between code-switching patterns and (i) trigger words and code-switched word pairs at code-switching points, and (ii) wordwise POS and pairwise POS tags. Our analysis shows there is a certain set of words that frequently “triggered” code-switching behavior, and speakers tend to code-switch more frequently around nouns. Additionally, we provide perplexities for language models built on the selected datasets. These perplexities could serve as baselines for future language models for Singaporean speech, especially, English-Malay code-switch speech. |
en_US |