Zerospeech 2020
To register, send an email to, with the subject “Registration” (the email body can be empty).
After accepting a license agreement, you will receive a password to uncompress the dataset archives.
We will keep you informed at your registered email address of all updates.
CodaLab signup: Sign up as a participant on our CodaLab ZeroSpeech 2020 Challenge using the same e-mail address. Your signup will be verified manually to ensure that you have done the manual registration. Please do this as soon as possible in order to allow us time to verify your signup manually.
The Zero Resource Challenge 2020 Software allows you to validate your submission to ensure that it is complete and in the correct format before submitting, and to run the evaluation on the development languages. It will work on any recent Linux or MacOS distribution with Python 3.
Install the Anaconda Python distribution.
Create a conda environment containing the Zero Resource Challenge 2020 Software by following the instructions here: When using the tools, do not forget to activate your environment:
conda activate zerospeech2020
- The package provides two command-line tools:
is used for submission validation andzerospeech2020-evaluate
is used for submission evaluation.
The training data is exactly the same as in the 2017 and 2019 challenges. The test data has been updated to include additional submission files which will support novel analyses of systems’ behaviour. You must work with the updated datasets.
Follow this link to download the datasets. You must first register (see Registration above), so we will send you the password protecting the archive.
Submission structure and format
The files should be organized in a ZIP archive named
with the following content:
english/{1s, 10s, 120s}/*.txt
french/{1s, 10s, 120s}/*.txt
mandarin/{1s, 10s, 120s}/*.txt
LANG1/{1s, 10s, 120s}/*.txt
LANG2/{1s, 10s, 120s}/*.txt
test/*.txt and *.wav
test/*.txt and *.wav
Root folder
The presence of any other file or folder will invalidate the submission.
Submissions must contain either a 2017
folder (for submissions
to the 2017 Track 1 or Track 2 tasks), a 2019
folder (for submissions
to the 2019 task), or both.
The root metadata.yaml
file must contain the following entries (order does not matter):
authors of the submission
affiliation of the authors (university or company)
open source:
true or false, if true you must provide a 'code' folder in the 2017
and 2019 folders with source code for the submitted system(s).
You are strongly encouraged to submit your code. Participants who submit
their code in this way will be awarded an OPEN SCIENCE badge that will
appear on the leaderboard. The 2017/code
and 2019/code
folders can
contain a full working source tree or a README file with a permanent link to
download your code.
2017 submission folder
english/{1s, 10s, 120s}/*.txt
french/{1s, 10s, 120s}/*.txt
mandarin/{1s, 10s, 120s}/*.txt
LANG1/{1s, 10s, 120s}/*.txt
LANG2/{1s, 10s, 120s}/*.txt
The presence of any other file or folder in the 2017 submission folder will
invalidate the submission. 2017 submissions must contain either a track1
folder (for submissions to the 2017 Track 1 task), a track2
folder (for
submissions to the 2017 Track 2 task), or both. All languages must be present.
The 2017/metadata.yaml
file must contain the following entries (order does
not matter):
system description:
a brief description of your system, pointing to a paper where possible
values of all the hyperparameters
track1 supervised:
true if you used a supervised model for track1, false otherwise
track2 supervised:
true if you used a supervised model for track2, false otherwise
The development languages are provided for model building and hyperparameter
tuning. The system you use should be exactly the same for all five
languages, and must not require any language-specific tuning. Any optimization
of parameters beyond what is listed in the hyperparameters
field must be
automated into the training pipeline for the model.
The 2017/code
folder must include your source code if you set the open source
flag in the root metadata file (see above.
2019 submission folder
test/*.txt and *.wav
test/*.txt and *.wav
The presence of any other file or folder in the 2019 submission folder will invalidate the submission. Both languages must be present.
2019 submissions must contain the two test/
folders. These folders must
contain *.txt
files with the learned unit representations corresponding to
all of the audio files in the test data for the given corpus and the *.wav
folders) files that contain your resynthesis of the audio files
listed in surprise/synthesis.txt
and english/synthesis.txt
. The
folders auxiliary_embedding1/
and auxiliary_embedding2/
are optional
(see Auxiliary Embeddings on the 2019 page for details).
The test/*.txt
files must be the full input to the synthesis
(decoder) component of your system. You are welcome to resynthesize the
other test audio files (this does not invalidate the submission). Only
the files listed in synthesis.txt
will be used for the main evaluation.
The 2019/metadata.yaml
file must contain the following entries (order does not matter):
abx distance:
the ABX distance used for ranking the test embeddings,
must be 'dtw_cosine', 'dtw_kl' or 'levenshtein'
system description:
a brief description of your system, eventually pointing to a paper
values of all the hyperparameters
auxiliary1 description:
description of the auxiliary1 embeddings (if used)
auxiliary2 description:
description of the auxiliary1 embeddings (if used)
using parallel train:
true or false, set to true if you used the parallel train dataset
using external data:
true or false, set to true if you used an external datasete
The development language is provided for model building and hyperparameter
tuning. The system you use should be exactly the same for all five
languages, and must not require any language-specific tuning. Any
optimization of parameters beyond what is listed in the
field must be automated into the training
pipeline for the model.
The 2019/code
folder must include your source code if you set the open source
flag in the root metadata file (see above).
The 2017/track1/*/*.txt
files are the features computed on the
evaluation (test) files by your subword modeling system.
They must be text files with the .txt
extension. For each wav file
in the test set (e.g., mandarin/1s/aghsu09.wav
), a corresponding
feature file with the same base name must be present
(e.g., 2017/track1/mandarin/1s/aghsu09.txt
Each line encodes one frame with a timestamp followed by the feature vector:
<time> <val_1> ... <val_n>
<time> <val_1> ... <val_n>
0.0125 12.3 428.8 -92.3 0.021 43.23
0.0225 19.0 392.9 -43.1 10.29 40.02
The time is in seconds. It corresponds to the center of the frame of each feature. In this example, there are frames every 10ms and the first frame spans a duration of 25ms starting at the beginning of the file, hence, the first frame is centered at .0125 seconds and the second 10ms later. It is not required that the frames be regularly spaced. The timestamp of frame n+1 must be strictly larger than the timestamp of frame n. The timestamps are used by the ABX evaluation, which compares sub-segments of the representations extracted by timestamp.
Format: 2017 Track 2 cluster files
The 2017/track2/*/*.txt
files are the clusters computed by your spoken
term discovery system. Track 2 does not have a train/test split. The files
should list clusters discovered in the train
folder only.
They must be text files with the .txt
extension. For each of
the five languages, a single cluster file should be provided,
e.g. 2017/track2/mandarin.txt
. The cluster file lists the fragments
that were found and groups them into classes, as follows:
Class <classnb>
<filename> <fragment_onset> <fragment_offset>
<filename> <fragment_onset> <fragment_offset>
Class <classnb>
<filename> <fragment_onset> <fragment_offset>
Class 1
dsgea01 1.238 1.763
dsgea19 3.380 3.821
reuiz28 18.036 18.537
Class 2
zeoqx71 8.389 9.132
The <fragment_onset>
and <fragment_offset>
times are in seconds.
must be the basename of a wav file in the train set for
the given language. Do not include the directory path or the .wav
extension. The file must be terminated by a blank line.
Some systems may only do matching, and no further clustering. This is not a problem: submit classes with two elements each. Classes with only one element are also acceptable. This will typically happen in systems that do exhaustive parsing.
Format: 2019 feature files
The format of embedding files is plain text, with no header, with one discovered unit per line. No requirement is placed on the length of the sequence (i.e., number of lines). The sequence of units need not represent “frames” at any constant sampling rate. Each line corresponds to a unit in a fixed-dimension encoding. Each column represents a dimension. Columns are separated by spaces. The contents must be numerical. “Textual” (non-numerical) encodings must be converted into one-hot representations (one binary dimension per symbol).
Example (dense, continuous encoding):
42.286527175400906 -107.68503050450957 59.79000088588511 -113.85831030071697
0.7872647311548775 45.33505222077471 -8.468742865224545 0
328.05422046327067 -4.495454384937348 241.186547397405 40.16161685378687
Example (binary encoding, converted from a non-numeric representation with an alphabet of size four):
0 1 0 0
0 0 1 0
0 0 1 0
1 0 0 0
2019 synthesis files
The file synthesis.txt
in the dataset specifies the list of
files to be resynthesized. It also specifies which of the
synthesis voices is to be used for resynthesizing a given file. For a given
test audio file <SXXX>_<ID>.wav
, the corresponding resynthesized file
should be called <VXXX>_<ID>.wav
, where <VXXX>
is the name of the
voice indicated in synthesis.txt
. Thus, for example, the file
, which is marked in the English development data
set as going with voice V002
, should be resynthesized in the submission as
The zerospeech2020-validate
program (the exact same script
included in the Zero Resource Speech Challenge 2020 Software
described above under Software) will be automatically executed
upon submission. This will verify that all required files
exist and are in conformance with the required format. If the check fails, your
submission will be rejected by Codalab (it will not be counted as one of the two
You are strongly advised to run the validation program on your own before making your submission. To apply the script that will be run when you make your submission, run:
zerospeech2020-validate <submission> [--njobs N]
where <submission>
is the path to your submission (can be a
zip archive ready for submission or a directory containing all the required
files). Use the option --njobs
to specify the number of CPU cores to use
(validation can take a long time, particularly if 2017 Track 1 is present).
Here is an example output:
$ zerospeech2020-validate ./baseline --njobs 4
| validating directory top-level ...
| validating top-level metadata.yaml ...
| submission declared as open source
| validating directory 2017 ...
| validating 2017/metadata.yaml
| validating directory 2017/code ...
| non-empty directory, it will be manually inspected to confirm the submission is open source
| validating directory 2017/track1 ...
| validating directory 2017/track1/LANG1/1s ...
| validating directory 2017/track1/LANG1/10s ...
| validating directory 2017/track1/LANG1/120s ...
| validating directory 2017/track1/LANG2/1s ...
| validating directory 2017/track1/LANG2/10s ...
| validating directory 2017/track1/LANG2/120s ...
| validating directory 2017/track1/english/1s ...
| validating directory 2017/track1/english/10s ...
| validating directory 2017/track1/english/120s ...
| validating directory 2017/track1/french/1s ...
| validating directory 2017/track1/french/10s ...
| validating directory 2017/track1/french/120s ...
| validating directory 2017/track1/mandarin/1s ...
| validating directory 2017/track1/mandarin/10s ...
| validating directory 2017/track1/mandarin/120s ...
| validating directory 2017/track2 ...
| validating 2017/track2/LANG1 ...
| validating 2017/track2/LANG2 ...
| validating 2017/track2/english ...
| validating 2017/track2/french ...
| validating 2017/track2/mandarin ...
| validating directory 2019 ...
| found auxiliary_embedding1
| validating 2019/metadata.yaml
| validating directory 2019/code ...
| non-empty directory, it will be manually inspected to confirm the submission is open source
| validating 2019/english/test directory ...
| validating 2019/english/auxiliary_embedding1 directory ...
| validating 2019/surprise/test directory ...
| validating 2019/surprise/auxiliary_embedding1 directory ...
| success, the submission is valid!
Once your submission passes the validation, you can use the
program to get the scores on the development
datasets. Several options are available, depending on which track you are
working on.
The evaluation program needs to access the ABX task files provided in the
dataset. You can specify the path to the dataset either using the -D / --dataset
option or by setting the ZEROSPEECH2020_DATASET
export ZEROSPEECH2020_DATASET=/path/to/the/downloaded/dataset
To evaluate a submission to the 2017 Track 1 task, the ABX metric will be run by the tool on the test set for each language (english, french and mandarin) and for each duration (1 second, 10 seconds, 120 seconds). You can restrict the evaluation to a specific language and duration to make the evaluation run faster on the subset you want.
An example of usage:
zerospeech2020-evaluate 2017-track1 -j10 -l mandarin -dr 1s -o mandarin-1s.json
will run the evaluation only on the features extracted from the
test set, using 10 CPU cores and the ABX tasks from$ZEROSPEECH2020_DATASET
, and will write the output in the filemandarin-1s.json
To evaluate a submission to the 2017 Track 2 task, the metrics of the TDE package will be run. As for the Track 1 evaluation, you can restrict the evaluation to specific language and duration. An example of usage:
zerospeech2020-evaluate 2017-track2 -l english -o english.json
will run the evaluation on the English dataset and write the outputs in the file
. -
2019 To evaluate a submission to the 2019 task, as in Track 1 of the 2017 challenge, we use the ABX metric, but as the task is different, so is the choice of distances. For the 2019, besides the default metrics in ABX, you have the choice to use the Levenshtein distance if your submission is symbolic. An example of usage:
zerospeech2020-evaluate 2019 -j10 -d levenshtein -o 2019_levenshtein.json
will use the Levenshtein distance and write the output in the file
. -
all You can evaluate in one command the whole submission (all languages, all durations):
zerospeech2020-evaluate all -j10 -o baseline.json
Additional options
allow the user to specify the distances to use for the two ABX evaluations.
Submit your single .zip
file using the “Submit” button on CodaLab. Depending
on your network connection and the size of your submission, this can take some
time. (There is currently no upload status indicator on CodaLab. You will be
redirected to another page when your submission is complete.)
If you are experiencing any issues, please contact us at