|
Ebru Arýsoy,
Levent Arslan, Murat
Saraçlar
We are working on the design of Turkish dictation system. Dictation
is one of the most challenging areas in automatic speech recognition.
There is a large demand for speech-to-text systems because speaking
is faster than typing in most of the languages. However, today most
dictation systems do not perform at desired recognition rates, since
the vocabulary size can be huge for any given language. In addition
to that, Turkish is a challenging language for speech recognition
applications. Turkish is an agglutinative language with free word
order. These characteristics of the language result in the vocabulary
explosion and the complexity of the N-gram language models in speech
recognition. In order to alleviate this problem, firstly, we propose
a task-specific, Radiology Dictation System.
Using words as recognition units, we achieve 87.06 % recognition
performance with a small vocabulary size in a speaker independent
system. Secondly, we try a large vocabulary dictation system, Dictation
for Newspaper Content. In that case we faced with the problems
of the agglutinative nature of the language. Therefore, rather than
words, we are searching for new recognition units, units which may
cover most of the language and achive better recognition performance.
This project is supported by SIMILAR
Network of Excellence within EU's 6th Framework Programme, WP
9.
Radiology Dictation
System:
One common example of task specific dictation systems
is dictation for radiologists who are often eyes and hands-busy
at work. In Turkey, in most of the hospitals, radiologist perform
their task by recording the diagnosis about the X-ray photograph
or the MRI of the patient and then a secretary converts these recordings
into written form. Therefore using a dictation system can make the
life easier from the point of the radiologist. Different than the
agglutinative nature of Turkish, the specific vocabulary of radiological
terminology and systematic arrangement of words in sentence formation,
make the radiology area suitable for the dictation applications.
In Turkish Radiological dictation system, the vocabulary size can
be reduced to only several thousand words, and the perplexity can
be very small. Below is the GIU of our radiological dictation. HTK
is used for the speech recognition system. Also a Radiology
Text and Speech Corpus is collected.

Newspaper
Content Transcription System:
In this research, we focused on the selection of base
recognition units for Large Vocabulary Continuous Speech Recognition
(LVCSR) applications, especially for agglutinative languages. There
is a high tendency to select words as recognition units. However,
the selection decision has to be changed according to the characteristics
of the language. For English, words are good choices, however for
agglutinative languages, words as recognition units will be failed
due to the productive morphology of the language. The criterion
for appropriate base recognition units is that, the units have to
be longer enough in terms of acoustic information to make a reliable
decision. Also the units will be able to cover the language with
the moderate vocabulary size. Our research and experiments on different
recognition units are for Turkish; however, indications of this
research can be generalized to other agglutinative languages, like
Finnish, Korean, etc…
Firstly, we try a combined model where recognition
units like words, stems and endings and morphemes are used together.
This model takes the advantages of each units and compensate the
drawbacks by using all the models together. In this model, the most
frequent words are left as stems and these words have more chance
to be recognized correctly. This model solves the problems of large
number of OOV-words and perplexity, however no significant improvement
is achieved in recognition performance.
Previous Research: "Statistical
Language Models For Large Vocabulary Turkish Speech Recognition"
Helin Dutaðacý, Levent
Arslan
In this project, we have compared four statistical
language models for large vocabulary Turkish speech recognition.
Turkish is an agglutinative language and has a productive morphotactics.
This property of Turkish results in a vocabulary explosion and misestimation
of N-gram probabilities while designing speech recognition systems.
The solution is to parse the words, in order to get smaller base
units that are capable of covering the language with relatively
small vocabulary size. Three different ways of decomposing words
into base units are described: Morpheme-based model, stem-ending-based
model and syllable-based model. These models with the word-based
model are compared with respect to vocabulary size, text coverage,
bigram perplexity and speech recognition performance. We have constructed
a Turkish Text Corpus of size 10 million words, containing various
texts collected from the Web. These texts have been parsed into
their morphemes, stems, endings and syllables and statistics of
these base units are estimated. Finally we have performed speech
recognition experiments with models constructed with these base
units.
Publications
1. Arisoy, E., Dutagaci, H. and Arslan, L. M., 2006,
"A Unified Language Model for Large Vocabulary Continuous
Speech Recognition of Turkish", Signal Processing, 86(10):2844-2862,
October 2006.
2. Arisoy, E., and Arslan, L. M., 2005, "Turkish Dictation
System for Broadcast News Applications", 13th European Signal
Processing Conference - EUSIPCO 2005, Antalya, Turkey. (pdf)
3. Arýsoy, E., and Arslan, L. M., 2004, "Turkish
Radiology Dictation System", 9th International Conference
Speech and Computer - SPECOM 2004, St. Petersburg, Russia. (pdf)
4. H. Dutagaci and L.M. Arslan “Comparison
of Statistical Language Models for Turkish Speech Recognition”
in Proceedings of the 7th International Conference On Spoken Language
Processing (ICSLP-2002), Denver, Colorado, USA, Sep. 2002, pp.
729-732.
Publications
(in Turkish)
1. Arisoy, E., and Arslan, L. M, 2005, "Türkçe Gazete Haberleri
Dikte Sistemi", SIU 2005 (IEEE 13. Sinyal Ýþleme ve Ýletiþim
Uygulamalarý Konferansý) , Kayseri, Turkey. (pdf)
2. H. Dutagaci and L.M. Arslan “Türkçe
Konuþma Tanýma için Ýstatistiksel Dil Modelleri”, 2002 Sinyal
Ýþleme ve Ýletiþim Uygulamalarý Kurultayý (SIU-2002), Denizli,
Türkiye, pp. 64-69.
Thesis
1. “Turkish
Dictation System for Radiology and Broadcast News Applications”,
M.S. Thesis, by Ebru Arýsoy,
2004
2. “Statistical
Language Models For Large Vocabulary Turkish Speech Recognition”,
M.S. Thesis, by Helin Dutaðacý,2002
|