Zerospeech 2020
Data
The datasets for the ZeroSpeech Challenge 2020 are provided here for download. Please note that the archives are protected by a password that is communicated once you accepted the agreement below.
File | Description | Size | MD5 sum |
---|---|---|---|
zerospeech2020.z01 | Data for the 2020 edition (1/3) | 10.0 GB | c9906d9062744cec87f4a4048a0c551b |
zerospeech2020.z02 | Data for the 2020 edition (2/3) | 10.0 GB | 7eaa187d403c3aeef94e13f9053ce861 |
zerospeech2020.zip | Data for the 2020 edition (3/3) | 6.0 GB | 839a18a0dfe11c706428ddc27d87d5b8 |
baseline.zip | Baseline submission | 6.8 GB | b5934920fcbb0b3af90611185696510b |
2017_vads.zip | VAD for the 2017 wavs | 1.3 M | c78b21df917b7de4d952d60492327a29 |
The following script will download and unzip the datasets
#!/bin/bash
PASSWORD=XXXX_REPLACE_WITH_THE_PASSWORD_XXXX
for ext in zip z01 z02
do
wget https://download.zerospeech.com/archive/2020/zerospeech2020.$ext || exit 1
done
7z x -p$PASSWORD zerospeech2020.zip || exit 1
rm -f zerospeech2020.z* || exit 1
exit 0
In order to receive the archive password, please agree to the following terms regarding the surprise language data for the 2019 task:
The data may be used only for the Zero Resource Speech Challenge. Other usages, both research and commercial, are prohibited. The data in the corpus shall not be redistributed. It is permissible, however, to cite examples from the corpus to present research results. All reports/publications using the corpus must acknowledge its use via a citation to the paper describing the source corpus:
-
S. Sakti, R. Maia, S. Sakai, T. Shimizu, S. Nakamura, “Development of HMM-based Indonesian Speech Synthesis,” in Proc. O-COCOSDA, pp. 215-220, Kyoto, Japan, November 2008
-
S. Sakti, E. Kelana, H. Riza, S. Sakai, K. Markov, S. Nakamura, “Development of Indonesian Large Vocabulary Continuous Speech Recognition System within A-STAR Project,” in Proc. TCAST, pp. 19-24, Hyderabad, India, January 2008
Please accept the above agreement to download the dataset for the Zero Speech Challenge 2020 .