Introduction to the PhonDat Database of Spoken German

Christoph Draxler

doi:10.21248/zaspil.11.1998.682

Autor/innen

Christoph Draxler

DOI:

https://doi.org/10.21248/zaspil.11.1998.682

Abstract

The PhonDat project within the German Verbmobil research initiative aims at creating and making accessible a very large database of symbolic and signal data of spoken high German. Currently, the PhonDat database consists of one corpus of sentences containing all phoneme combinations of high German, and of one corpus of sentences from a train enquiries scenario. All symbolic data is held in a Prolog system with a powerful database management system extension; signal data is stored in external files.

The database is accessed through queries over the symbolic data. The result of a query evaluation is either again symbolic data, or a reference to signal files and signal fragments within these files. Two access modes are supported: a toolbox of predefined high-level query predicates for standard, albeit complex, queries; and the full Prolog programming language for custom applications. The PhonDat database interacts with external signal analysis and display applications through interprocess communication.

The PhonDat database has been used in various research applications, e.g. speech recognition training, segmentation comparisons, and statistical phoneme analyses. It is now being extended to hold the Verbmobil multi-language spontaneous speech corpus collected in a scheduling scenario.

Introduction to the PhonDat Database of Spoken German

Autor/innen

DOI:

Abstract

Downloads

Veröffentlicht

Zitationsvorschlag

Ausgabe

Rubrik

Sprache