A database of Polish dictionary with words statistics for automatic speech recognition

Mariusz Mąsior, Bartosz Ziółko, Dawid Skurzok, Tomasz Jadczyk

Abstract


A dictionary of Polish implemented as a data base for automatic speech recognition is presented. The dictionary allows improvement of recognition by language modelling using statistics stored in the data base. The data currently kept in the database are presented as well.

Keywords


speech recognition; ASR; Polish dictionary; text statistics

Full Text:

PDF (Polski)

References


Ziółko M., Gałka J., Ziółko B., Jadczyk T., Skurzok D., Wicijowski J.: Automatic Speech Recognition System Based on Wavelet Analysis - DEMO. Proceedings of Fourth IEEE International Conference on Semantic Computing (IEEE ICSC2010), Pitsburg, USA 2010.

Demenko G., Wypych M., Baranowska E.: Implementation of Grapheme-to-phoneme Rules and Extended SAMPA Alphabet in Polish Text-to-speech Synthesis, Speech and Language Technology. PTFon, Poznań 2003.

Ziółko B., Manandhar S., Wilson R.C, Ziółko M.: Bag-of-words Modelling for Speech Recognition. Proceedings of International Conference on Future Computer and Communication (ICFCC 2009). Kuala Liumpur, Malezja 2009.

Wicijowski J„ Ziółko B.: Analiza skupień i redukcja wymiarowości w hierarchicznym modelu korpusowym. Studia Informatica, Vol. 31, No. 2A (89), Wyd. Pol. Śl., Gliwice 2010, s. 133-145.

Ziółko B., Skurzok D., Michalska M.: Polish n-grams and their correction process. Proceedings of The 4th International Conference on Multimedia and Ubiquitous Engineering (MUE 2010), Cebu, Filipiny 2010.

Dijkstra E.W.: A Note on Two Problems in Connexion with Graphs, Numerische Mathematik, 1959.

Lerner R.M.: Open-Source Databases, Part III: Choosing a Database. Linux Jurnal, 2007.




DOI: http://dx.doi.org/10.21936/si2011_v32.n2B.314