Smaller languages, that is, those spoken by 5,000 people or less are dying at an alarming rate (Krauss 1992). Many are disappearing without having been studied acoustically. The methodology discussed in this paper can help build formant-based speech synthesis systems for the documentation and revitalization of these languages. Developing Text-to-Speech (TTS) functionalities for use in smart devices can breathe a new life into dying languages (Crystal 2000). In the first tutorial on this topic, Koffi (2020) explained how the Arpabet transcription system can be expanded for use in African languages and beyond. In the present tutorial, Author 1 and Author 2 lay the foundations for formant-based speech synthesis patterned after Klatt (1980) and Klatt and Klatt (1990). Betine, (ISO: 639-3-eot), a critically endangered language in Côte d’Ivoire, West Africa, is used to illustrate the processes involved in building a speech synthesis from the ground up for moribund languages. The steps include constructing a language model, a speaker model, a software model, an intonation model, extracting relevant acoustic phonetic data, and coding them. Ancillary topics such as text normalization, downsampling, and bandwidth calculations are also discussed.
Koffi, Ettien and Petzold, Mark
"A TUTORIAL ON FORMANT-BASED SPEECH SYNTHESIS FOR THE DOCUMENTATION OF CRITICALLY ENDANGERED LANGUAGES,"
Linguistic Portfolios: Vol. 11, Article 3.
Available at: https://repository.stcloudstate.edu/stcloud_ling/vol11/iss1/3