The datasets for the ZeroSpeech Challenge 2019 are provided here for download.
Training dataset (English): english.tgz (2.5G, md5sum:
Toy dataset (subset of the training dataset): english_small.tgz (156M, md5sum:
- Test dataset: surprise.zip (1.5G, md5sum:
7d48828a055d4b50d39c28529377d54b) This archive is protected by a password, please read and accept the following agreement to get the password.
In order to use the surprise language data for the Zero Resource Speech Challenge, you must agree to the following terms.
The data may be used only for the Zero Resource Speech Challenge. Other usages, both research and commercial, are prohibited.
The data in the corpus shall not be redistributed. It is permissible, however, to cite examples from the corpus to present research results.
All reports/publications using the corpus must acknowledge its use via a citation to the paper describing the source corpus.
The citation will be given to challenge participants after the evaluation has been completed.
Please accept the above agreement to download the dataset for the Zero Speech Challenge 2019 .