An entropy-based approach for comparing prosodic properties in tonal and pitch accent languages

Raymond W. M. Ng, The Chinese University of Hong Kong
Cheung-Chi Leung, Institute for Infocomm Research
Tan Lee, The Chinese University of Hong Kong
Bin Ma, Institute for Infocomm Research
Haizhou Li, Insitutite for Infocomm Research, Singapore and Department of Computer Science and Statistics, University of Eastern Finland, Finland

Our previous work shows strong prosodic characteristics are present in tonal and pitch accent languages leading to better performance in detecting these languages. This study uses an entropy-based approach to analyze prosodic features for effective modeling. 17 tonal or pitch accent languages, including a number of under-resourced languages in Africa, are studied. Prosodic trigrams are rated as either strong, moderate or weak according to the language-specific information they contain. The three-level rating helps to find the most efficient prosodic trigrams for language recognition. The feature inventory is reduced by 80\% while performance degradation is acceptable. Important prosodic attributes found by analysis reflect the linguistic facts in different languages in nice manners. With this analysis method, selection to an expanded prosodic feature inventory can be done to explore better performance in detecting non-tonal languages.