Learning to control an articulatory synthesizer by imitating real speech

Authors

  • Ian S. Howard
  • Mark A. Huckvale

DOI:

https://doi.org/10.21248/zaspil.40.2005.258

Abstract

The goal of our current project is to build a system that can learn to imitate a version of a spoken utterance using an articulatory speech synthesiser. The approach is informed and inspired by knowledge of early infant speech development. Thus we expect our system to reproduce and exploit the utility of infant behaviours such as listening, vocal play, babbling and word imitation. We expect our system to develop a relationship between the sound-making capabilities of its vocal tract and the phonetic/phonological structure of imitated utterances. At the heart of our approach is the learning of an inverse model that relates acoustic and motor representations of speech. The acoustic to auditory mappings uses an auditory filter bank and a self-organizing phase of learning. The inverse model from auditory to vocal tract control parameters is estimated using a babbling phase, in which the vocal tract is essentially driven in a random manner, much like the babbling phase of speech acquisition in infants. The complete system can be used to imitate simple utterances through a direct mapping from sound to control parameters. Our initial results show that this procedure works well for sounds generated by its own voice. Further work is needed to build a phonological control level and achieve better performance with real speech.

 

Downloads

Published

2005

How to Cite

Howard, Ian S., and Mark A. Huckvale. 2005. “Learning to Control an Articulatory Synthesizer by Imitating Real Speech”. ZAS Papers in Linguistics 40 (January):63-78. https://doi.org/10.21248/zaspil.40.2005.258.