VOS: the Corpus-Based etnamese Text-to-Speech System

  • Vo Quang Dieu Ha
  • Nguyen Manh Tuan
  • Cao Xuan Nam
  • Pham Minh Nhut
  • Vu Hai Quan

Abstract

This paper presents a complete specification of the  Vietnamese  speech  synthesis  system  named  VOS (Voice  of  Southern  Vietnam).  Due  to  the  fact  that current  Vietnamese  text-to-speech  systems  lack  the naturalness of output synthetic speech, VOS is based on the  unit  selection  approach  which  aims  to  achieve maximum  naturalness.  There  are  three  main  parts constituting VOS: a corpus manager, a synthesizer, and a  transliteration  model.  Corpus  manager  manages automated  speech  indexing  and  segmentation  for  unit selection  executed  by  the  synthesizer,  while transliteration  model  deals  with  the  pronunciation  of words  in  foreign  languages.  A  comparative experimental  evaluation  of  VnSpeech,  VietVoice,  and VOS  is  conducted  using  ITU-T  P.85  standard.  Results show  that  VOS  outperforms  the  former  two  TTS systems.
Published
2010-10-28
Section
Regular Articles