The datasets for the ZeroSpeech Challenge 2020 are provided here for download. Please note that the archives are protected by a password that is communicated once you accepted the agreement below.
|zerospeech2020.z01||Data for the 2020 edition (1/3)||10.0 GB||c9906d9062744cec87f4a4048a0c551b|
|zerospeech2020.z02||Data for the 2020 edition (2/3)||10.0 GB||7eaa187d403c3aeef94e13f9053ce861|
|zerospeech2020.zip||Data for the 2020 edition (3/3)||6.0 GB||839a18a0dfe11c706428ddc27d87d5b8|
|baseline.zip||Baseline submission||6.8 GB||b5934920fcbb0b3af90611185696510b|
|2017_vads.zip||VAD for the 2017 wavs||1.3 M||c78b21df917b7de4d952d60492327a29|
The following script will download and unzip the datasets
#!/bin/bash PASSWORD=XXXX_REPLACE_WITH_THE_PASSWORD_XXXX for ext in zip z01 z02 do wget https://download.zerospeech.com/archive/2020/zerospeech2020.$ext || exit 1 done 7z x -p$PASSWORD zerospeech2020.zip || exit 1 rm -f zerospeech2020.z* || exit 1 exit 0
In order to receive the archive password, please agree to the following terms regarding the surprise language data for the 2019 task:
The data may be used only for the Zero Resource Speech Challenge. Other usages, both research and commercial, are prohibited. The data in the corpus shall not be redistributed. It is permissible, however, to cite examples from the corpus to present research results. All reports/publications using the corpus must acknowledge its use via a citation to the paper describing the source corpus:
S. Sakti, R. Maia, S. Sakai, T. Shimizu, S. Nakamura, “Development of HMM-based Indonesian Speech Synthesis,” in Proc. O-COCOSDA, pp. 215-220, Kyoto, Japan, November 2008
S. Sakti, E. Kelana, H. Riza, S. Sakai, K. Markov, S. Nakamura, “Development of Indonesian Large Vocabulary Continuous Speech Recognition System within A-STAR Project,” in Proc. TCAST, pp. 19-24, Hyderabad, India, January 2008
Please accept the above agreement to download the dataset for the Zero Speech Challenge 2020 .