ASR Bundestag – IISYS OpenData

A dataset for Automatic Speech Recognition (ASR) Systems, consisting of multiple subsets (pending publishing).
The dataset consists of over 1,000 hours of audio-transcripts from political speeches of the German Bundestag.

Download Clean (610h) here.
Download Dirty Delta (Supserset of clean without clean subset part) (156h) here.
Download Unlabeled (All audio snippets without transcriptions) (1,038h) here.

Quelle der Rohdaten:
https://www.bundestag.de/mediathek Nutzungsbedingungen

Nutzungsbedinungen:
https://www.bundestag.de/resource/blob/296016/301050a2c21ce66e24014805c235f9c7/nutzungsbedingungen_de-data.pdf

Der Inhalt ist nicht für «gewerbliche oder kommerzielle Werbezwecke» zu verwenden.