Instructions

Note

Please remember the deadline is for challenge submission in Codalab on March 15, 2020, 23h59 GMT - 12. This deadline will be strictly enforced.

Registration

Note

You must register with the challenge organizers before doing anything else (including downloading the data and signing up on CodaLab).

  • To register, send an email to zerospeech2020@gmail.com, with the subject “Registration” (the email body can be empty).
  • After accepting a license agreement, you will receive a password to uncompress the dataset archives.
  • We will keep you informed at your registered email address of all updates.
  • CodaLab signup: Sign up as a participant on our CodaLab ZeroSpeech 2020 Challenge using the same e-mail address. Your signup will be verified manually to ensure that you have done the manual registration. Please do this as soon as possible in order to allow us time to verify your signup manually.

Software

Note

The evaluation tools and the submission format have changed since the 2017 and 2019 challenges. Please make sure you are using the Zero Resource Challenge 2020 Software.

The Zero Resource Challenge 2020 Software allows you to validate your submission to ensure that it is complete and in the correct format before submitting, and to run the evaluation on the development languages. The Zero Resource Challenge 2020 Software will work on any recent Linux distribution with Python 3 (Anaconda).

Datasets

  • The training data is exactly the same as in the 2017 and 2019 challenges. The test data has been updated to include additional submission files which will support novel analyses of systems’ behaviour. You must work with the updated datasets.
  • Follow this link to download the datasets. You must first register (see Registration above) so we will send you the password protecting the archive.

Submission structure and format

Structure

The files should be organized in a ZIP archive named submission.zip, with the following content:

metadata.yaml
2017/
    metadata.yaml
    code/*
    track1/
        english/{1s, 10s, 120s}/*.txt
        french/{1s, 10s, 120s}/*.txt
        mandarin/{1s, 10s, 120s}/*.txt
        LANG1/{1s, 10s, 120s}/*.txt
        LANG2/{1s, 10s, 120s}/*.txt
    track2/
        english.txt
        french.txt
        mandarin.txt
        LANG1.txt
        LANG2.txt
2019/
    metadata.yaml
    code/*
    surprise/
        test/*.txt and *.wav
        auxiliary_embedding1/*.txt
        auxiliary_embedding2/*.txt
    english/
        test/*.txt and *.wav
        auxiliary_embedding1/*.txt
        auxiliary_embedding2/*.txt

Root folder

metadata.yaml
2017/
2019/

The presence of any other file or folder will invalidate the submission. Submissions must contain either a 2017 folder (for submissions to the 2017 Track 1 or Track 2 tasks), a 2019 folder (for submissions to the 2019 task), or both.

The root metadata.yaml file must contain the following entries (order does not matter):

author:
  authors of the submission
affiliation:
  affiliation of the authors (university or company)
open source:
  true or false, if true you must provide a 'code' folder in the 2017
  and 2019 folders with source code for the submitted system(s).

You are strongly encouraged to submit your code. Participants who submit their code in this way will be awarded an OPEN SCIENCE badge that will appear on the leaderboard. The 2017/code and 2019/code folders can contain a full working source tree or a README file with a permanent link to download your code.

2017 submission folder

Note

All languages must be submitted. This is different from the earlier submission structure in the first run of the Zero Resource Speech Challenge 2017.

2017/
    metadata.yaml
    code/*
    track1/
        english/{1s, 10s, 120s}/*.txt
        french/{1s, 10s, 120s}/*.txt
        mandarin/{1s, 10s, 120s}/*.txt
        LANG1/{1s, 10s, 120s}/*.txt
        LANG2/{1s, 10s, 120s}/*.txt
    track2/
        english.txt
        french.txt
        mandarin.txt
        LANG1.txt
        LANG2.txt

The presence of any other file or folder in the 2017 submission folder will invalidate the submission. 2017 submissions must contain either a track1 folder (for submissions to the 2017 Track 1 task), a track2 folder (for submissions to the 2017 Track 2 task), or both. All languages must be present.

The 2017/metadata.yaml file must contain the following entries (order does not matter):

system description:
  a brief description of your system, pointing to a paper where possible
hyperparameters:
  values of all the hyperparameters
track1 supervised:
  true if you used a supervised model for track1, false otherwise
track2 supervised:
  true if you used a supervised model for track2, false otherwise

The development languages are provided for model building and hyperparameter tuning. The system you use should be exactly the same for all five languages, and must not require any language-specific tuning. Any optimization of parameters beyond what is listed in the hyperparameters field must be automated into the training pipeline for the model.

The 2017/code folder must include your source code if you set the open source flag in the root metadata file (see above).

2019 submission folder

2019/
    metadata.yaml
    code/*
    surprise/
        test/*.txt and *.wav
        auxiliary_embedding1/*.txt
        auxiliary_embedding2/*.txt
    english/
        test/*.txt and *.wav
        auxiliary_embedding1/*.txt
        auxiliary_embedding2/*.txt

The presence of any other file or folder in the 2019 submission folder will invalidate the submission. Both languages must be present.

2019 submissions must contain the two test/ folders. These folders must contain *.txt files with the learned unit representations corresponding to all of the audio files in the test data for the given corpus and the *.wav (test/ folders) files that contain your resynthesis of the audio files listed in surprise/synthesis.txt and english/synthesis.txt. The folders auxiliary_embedding1/ and auxiliary_embedding2/ are optional (see Auxiliary embeddings on the 2019 page for details).

The test/*.txt files must be the full input to the synthesis (decoder) component of your system. You are welcome to resynthesize the other test audio files (this does not invalidate the submission). Only the files listed in synthesis.txt will be used for the main evaluation.

The 2019/metadata.yaml file must contain the following entries (order does not matter):

abx distance:
  the ABX distance used for ranking the test embeddings,
  must be 'dtw_cosine', 'dtw_kl' or 'levenshtein'
system description:
  a brief description of your system, eventually pointing to a paper
hyperparameters:
  values of all the hyperparameters
auxiliary1 description:
  description of the auxiliary1 embeddings (if used)
auxiliary2 description:
  description of the auxiliary1 embeddings (if used)
using parallel train:
  true or false, set to true if you used the parallel train dataset
using external data:
  true or false, set to true if you used an external datasete

The development language is provided for model building and hyperparameter tuning. The system you use should be exactly the same for all five languages, and must not require any language-specific tuning. Any optimization of parameters beyond what is listed in the hyperparameters field must be automated into the training pipeline for the model.

The 2019/code folder must include your source code if you set the open source flag in the root metadata file (see above).

Format

Format: 2017 Track 1 feature files

The 2017/track1/*/*.txt files are the features computed on the evaluation (test) files by your subword modeling system.

They must be text files with the .txt extension. For each wav file in the test set (e.g., mandarin/1s/aghsu09.wav), a corresponding feature file with the same base name must be present (e.g., 2017/track1/mandarin/1s/aghsu09.txt).

Each line encodes one frame with a timestamp followed by the feature vector:

<time> <val_1> ... <val_n>
<time> <val_1> ... <val_n>
...

Example:

0.0125 12.3 428.8 -92.3 0.021 43.23
0.0225 19.0 392.9 -43.1 10.29 40.02
...

The time is in seconds. It corresponds to the center of the frame of each feature. In this example, there are frames every 10ms and the first frame spans a duration of 25ms starting at the beginning of the file, hence, the first frame is centered at .0125 seconds and the second 10ms later. It is not required that the frames be regularly spaced. The timestamp of frame n+1 must be strictly larger than the timestamp of frame n. The timestamps are used by the ABX evaluation, which compares sub-segments of the representations extracted by timestamp.

Format: 2017 Track 2 cluster files

The 2017/track2/*/*.txt files are the clusters computed by your spoken term discovery system. Track 2 does not have a train/test split. The files should list clusters discovered in the train folder only.

They must be text files with the .txt extension. For each of the five languages, a single cluster file should be provided, e.g. 2017/track2/mandarin.txt. The cluster file lists the fragments that were found and groups them into classes, as follows:

Class <classnb>
<filename> <fragment_onset> <fragment_offset>
<...>
<filename> <fragment_onset> <fragment_offset>
<NEWLINE>
Class <classnb>
<filename> <fragment_onset> <fragment_offset>
<...>
<NEWLINE>

Example:

Class 1
dsgea01 1.238 1.763
dsgea19 3.380 3.821
reuiz28 18.036 18.537

Class 2
zeoqx71   8.389  9.132
...etc...

The <fragment_onset> and <fragment_offset> times are in seconds. <filename> must be the basename of a wav file in the train set for the given language. Do not include the directory path or the .wav extension. The file must be terminated by a blank line.

Some systems may only do matching, and no further clustering. This is not a problem: submit classes with two elements each. Classes with only one element are also acceptable. This will typically happen in systems that do exhaustive parsing.

Format: 2019 feature files

Note

The synthesized audio files must be resynthesized from the representations given in the corresponding feature files. The contents of a feature file must not contain any supplementary information not read by the decoder.

The format of embedding files is plain text, with no header, with one discovered unit per line. No requirement is placed on the length of the sequence (i.e., number of lines). The sequence of units need not represent “frames” at any constant sampling rate. Each line corresponds to a unit in a fixed-dimension encoding. Each column represents a dimension. Columns are separated by spaces. The contents must be numerical. “Textual” (non-numerical) encodings must be converted into one-hot representations (one binary dimension per symbol).

Example (dense, continuous encoding):

42.286527175400906 -107.68503050450957 59.79000088588511 -113.85831030071697
0.7872647311548775 45.33505222077471 -8.468742865224545 0
328.05422046327067 -4.495454384937348 241.186547397405 40.16161685378687

Example (binary encoding, converted from a non-numeric representation with an alphabet of size four):

0 1 0 0
0 0 1 0
0 0 1 0
1 0 0 0

2019 synthesis files

Note

The synthesized audio files must be resynthesized from the representations given in the corresponding feature files.

The file synthesis.txt in the dataset specifies the list of files to be resynthesized. It also specifies which of the synthesis voices is to be used for resynthesizing a given file. For a given test audio file <SXXX>_<ID>.wav, the corresponding resynthesized file should be called <VXXX>_<ID>.wav, where <VXXX> is the name of the voice indicated in synthesis.txt. Thus, for example, the file test/S002_0379088085.wav, which is marked in the English development data set as going with voice V002, should be resynthesized in the submission as test/V002_0379088085.wav.

Validation

The zerospeech2020-validate program (the exact same script included in the Zero Resource Speech Challenge 2020 Software described above under Software) will be automatically executed upon submission. This will verify that all required files exist and are in conformance with the required format. If the check fails, your submission will be rejected by Codalab (it will not be counted as one of the two submissions).

You are strongly advised to run the validation program on your own before making your submission. To apply the script that will be run when you make your submission, run:

zerospeech2020-validate <submission> [--njobs N]

where <submission> is the path to your submission (can be a zip archive ready for submission or a directory containing all the required files). Use the option --njobs to specify the number of CPU cores to use (validation can take a long time, particularly if 2017 Track 1 is present). Here is an example output:

$ zerospeech2020-validate ./baseline --njobs 4
| validating directory top-level ...
| validating top-level metadata.yaml ...
|     submission declared as open source
| validating directory 2017 ...
| validating 2017/metadata.yaml
| validating directory 2017/code ...
|     non-empty directory, it will be manually inspected to confirm the submission is open source
| validating directory 2017/track1 ...
| validating directory 2017/track1/LANG1/1s ...
| validating directory 2017/track1/LANG1/10s ...
| validating directory 2017/track1/LANG1/120s ...
| validating directory 2017/track1/LANG2/1s ...
| validating directory 2017/track1/LANG2/10s ...
| validating directory 2017/track1/LANG2/120s ...
| validating directory 2017/track1/english/1s ...
| validating directory 2017/track1/english/10s ...
| validating directory 2017/track1/english/120s ...
| validating directory 2017/track1/french/1s ...
| validating directory 2017/track1/french/10s ...
| validating directory 2017/track1/french/120s ...
| validating directory 2017/track1/mandarin/1s ...
| validating directory 2017/track1/mandarin/10s ...
| validating directory 2017/track1/mandarin/120s ...
| validating directory 2017/track2 ...
| validating 2017/track2/LANG1 ...
| validating 2017/track2/LANG2 ...
| validating 2017/track2/english ...
| validating 2017/track2/french ...
| validating 2017/track2/mandarin ...
| validating directory 2019 ...
|     found auxiliary_embedding1
| validating 2019/metadata.yaml
| validating directory 2019/code ...
|     non-empty directory, it will be manually inspected to confirm the submission is open source
| validating 2019/english/test directory ...
| validating 2019/english/auxiliary_embedding1 directory ...
| validating 2019/surprise/test directory ...
| validating 2019/surprise/auxiliary_embedding1 directory ...
| success, the submission is valid!

Evaluation

Once your submission passes the validation, you can use the zerospeech2020-evaluate program to get the scores on the development datasets. Several options are available, depending on which track you are working on.

The evaluation program needs to access the ABX task files provided in the dataset. You can specify the path to the dataset either using the -D / --dataset option or by setting the ZEROSPEECH2020_DATASET environment variable:

export ZEROSPEECH2020_DATASET=/path/to/the/downloaded/dataset
  • 2017-track1

    To evaluate a submission to the 2017 Track 1 task, the ABX metric will be run by the tool on the test set for each language (english, french and mandarin) and for each duration (1 second, 10 seconds, 120 seconds). You can restrict the evaluation to a specific language and duration to make the evaluation run faster on the subset you want.

    An example of usage:

    zerospeech2020-evaluate 2017-track1 -j10 baseline.zip -l mandarin -dr 1s -o mandarin-1s.json
    

    will run the evaluation only on the features extracted from the 2017/mandarin/1s test set, using 10 CPU cores and the ABX tasks from $ZEROSPEECH2020_DATASET, and will write the output in the file mandarin-1s.json

  • 2017-track2

    To evaluate a submission to the 2017 Track 2 task, the metrics of the TDE package will be run. As for the Track 1 evaluation, you can restrict the evaluation to specific language and duration. An example of usage:

    zerospeech2020-evaluate 2017-track2 baseline.zip -l english -o english.json
    

    will run the evaluation on the English dataset and write the outputs in the file english.json.

  • 2019

    To evaluate a submission to the 2019 task, as in Track 1 of the 2017 challenge, we use the ABX metric, but as the task is different, so is the choice of distances. For the 2019, besides the default metrics in ABX, you have the choice to use the Levenshtein distance if your submission is symbolic. An example of usage:

    zerospeech2020-evaluate 2019 -j10 -d levenshtein baseline.zip -o 2019_levenshtein.json
    

    will use the Levenshtein distance and write the output in the file 2019_levenshtein.json.

  • all

    You can evaluate in one command the whole submission (all languages, all durations):

    zerospeech2020-evaluate all -j10 baseline.zip -o baseline.json
    

    Additional options --distance-2017 and --distance-2019 allow the user to specify the distances to use for the two ABX evaluations.

Submission

Submit your single .zip file using the “Submit” button on CodaLab. Depending on your network connection and the size of your submission, this can take some time. (There is currently no upload status indicator on CodaLab. You will be redirected to another page when your submission is complete.)

Troubleshooting

If you are experiencing any issues, please contact us at zerospeech2020@gmail.com.