Instructions¶
Questions? Contact zerospeech2021 [at] gmail [dot] com for questions or comments.
Submission instructions are the same for Track 1 and Track 2. Submissions need to indicate which track they are submitting to.
Individual research groups may make no more than four submissions each.
Summary
Software¶
The Zero Resource Speech Challenge 2021 Software is a python3 package working on any recent Linux or MacOS distribution. It provides two command-line tools:
zerospeech2021-validate
is used to validate a submission to ensure that it is complete and in the correct format before submitting.zerospeech2021-evaluate
is used to run the evaluation on the development sets.
See https://github.com/bootphon/zerospeech2021 for installation and usage instructions.
Evaluation dataset¶
The set of evaluation items is released under a Creative Commons 4.0 licence. Download it on https://download.zerospeech.com.
It is made up of four parts (phonetic, lexical, syntactic and semantic) each one divided in dev and test subsets.
The wav files have randomized names in the form
beKZpnGdzo.wav
and the gold files (either in.csv
or.item
format) are provided for dev sets only in order to run the evaluation.Once uncompressed, the dataset is a directory with the following structure:
README.md phonetic/ test-clean/*.wav test-other/*.wav dev-clean/*.wav dev-other/*.wav dev-clean.item dev-other.item lexical/ test/*.wav dev/*.wav dev/gold.csv syntactic/ test/*.wav dev/*.wav dev/gold.csv semantic/ test/librispeech/*.wav test/synthetic/*.wav dev/librispeech/*.wav dev/synthetic/*.wav dev/gold.csv dev/pairs.csv
Submission format¶
Warning
The submission will be invalidated if any extra file or directory is present, or if any required file or directory is missing.
The files should be organized in a ZIP archive with the following content:
meta.yaml
code/ (optional, see below)
phonetic/
{dev-clean,dev-other}/*.txt
{test-clean,test-other}/*.txt
lexical/
dev.txt
test.txt
syntactic/
dev.txt
test.txt
semantic/
dev/{librispeech,synthetic}/*.txt
test/{librispeech,synthetic}/*.txt
/meta.yaml¶
The meta.yaml
file must contain the following entries (order does not matter):
author: <str>
authors of the submission
affiliation: <str>
affiliation of the authors (university or company)
description: <str>
description of the submitted system
open_source: <bool>
true or false, if true you must provide a 'code' folder with source code
for the submitted system
train_set: <str>
description of the train set used (which subset of LibriSpeech or
libri-light, along with VAD or not, ...)
visually_grounded: <bool>
true if the submission is visually grounded, false if speech based only.
gpu_budget: <float>
number of hours * GPU used for training
parameters:
phonetic:
metric: <str>
The metric to use for phonetic evaluation, must be 'euclidean',
'cosine', 'kl' or 'kl_symmetric'. **WARNING** the 'cosine' metric
here refeers to an angular distance as in the usual ABX evaluation.
frame_shift: <float>
Shift (in s) between two features frames
semantic:
metric: <str>
The metric to use for semantic evaluation. May be any metric
supported by scipy.spatial.distance.cdist.
pooling: <str>
The pooling method to use for semantic evaluation, must be 'min',
'max', 'mean', 'sum', 'last', 'lastlast' or 'off'.
/code¶
The code
directory must be submitted only if the open_source
flag is set
to true in meta.yaml
. It can contain a full working source tree or a README
file with a permanent link to download your code (on github for instance).
You are strongly encouraged to submit your code. Participants who submit their code in this way will be awarded an OPEN SCIENCE badge that will appear on the challenge leaderboard.
/phonetic¶
The phonetic
folder of the submission must contain the following
subdirectories: dev-clean
, dev-other
, test-clean
and test-other
.
Each
.wav
file in the dataset must have its corresponding.txt
file in the submission under the same directory structure. For example the dataset file/path/to/dataset/phonetic/dev-clean/1272-128104-0000.wav
must have its submitted file/path/to/submission/phonetic/dev-clean/1272-128104-0000.txt
.Each
.txt
file encodes a single 2D numpy array of floats, each line encoding one features frame. For example:42.286527175400906 -107.68503050450957 59.79000088588511 -113.85831030071697 0.7872647311548775 45.33505222077471 -8.468742865224545 0 328.05422046327067 -4.495454384937348 241.186547397405 40.16161685378687
The number of columns (the features dimension) must be constant across the files. The number of lines depends on the speech sample duration.
The frame shift (the shift between two successive frames) must be given in
meta.yaml
along with the metric used for evalution of those features.Each array must contain at least 2 frames (i.e. each file must have at least 2 lines).
/lexical and /syntactic¶
The /lexical
and /syntactic
folders of the submission must contain the
two files dev.txt
and test.txt
. For each *.wav
file in the dataset
must correspond a line either in dev.txt
or test.txt
with its
corresponding pseudo-probability (order does not matter). For example if the dev
dataset contains:
/path/to/dataset/lexical/dev
├── aAAfmkmQpVz.wav
├── AaaggUZsvkR.wav
├── aAakhKfuvQI.wav
├── aAaOswLeeBL.wav
├── AaasVuoMJnS.wav
The submitted file dev.txt
must contain entries like:
aAAfmkmQpVz -313.37445068359375
AaaggUZsvkR -447.8950500488281
aAakhKfuvQI -383.8902587890625
aAaOswLeeBL -430.2048645019531
AaasVuoMJnS -356.9426574707031
/semantic¶
The semantic
folder of the submission must contain the following
subdirectories: dev/synthetic
, dev/librispeech
, test/synthtic
and
test/librispeech
.
Each
.wav
file in the dataset must have its corresponding.txt
file in the submission under the same directory structure. For example the dataset file/path/to/dataset/semantic/dev/synthetic/aAbcsWWKCz.wav
must have its submitted file/path/to/submission/semantic/dev/synthetic/aAbcsWWKCz.txt
.Each
.txt
file encodes a single 2D numpy array of floats, each line encoding one features frame. For example:42.286527175400906 -107.68503050450957 59.79000088588511 -113.85831030071697 0.7872647311548775 45.33505222077471 -8.468742865224545 0 328.05422046327067 -4.495454384937348 241.186547397405 40.16161685378687
The number of columns (the features dimension) must be constant across the files. The number of lines depends on the speech sample duration.
The metric and pooling method used for evaluation must be specified in
meta.yaml
.
Validation¶
The zerospeech2021-validate
program as provided by the Software)
will be automatically executed upon submission. This will verify that all
required files exist and are in conformance with the required format. If the
check fails, your submission will be rejected by Codalab.
You are strongly advised to run the validation program on your own before making your submission. To apply the script that will be run when you make your submission, run:
zerospeech2021-validate <dataset> <submission> [--njobs <int>]
where <dataset>
is the path to the challenge dataset and <submission>
is
the path to your submission (can be a zip archive ready for submission or a
directory containing all the required files). The --njobs
parameter specify
the number of CPU cores to use for phonetic and semantic evaluation.
Here is an example output:
$ zerospeech2021-validate /path/to/dataset /path/to/submission -j8
Prepare input...
> dataset: /path/to/dataset
> submission: /path/to/submission
Validating root folder...
> meta.yaml
> root folder
> code folder detected: submission will be manually inspected to ensure it is open source
Validating phonetic...
> phonetic/dev
> phonetic/test
Validating lexical...
> lexical/dev
> lexical/test
Validating syntactic...
> syntactic/dev
> syntactic/test
Validating semantic...
> semantic/dev/synthetic
> semantic/dev/librispeech
> semantic/test/synthetic
> semantic/test/librispeech
Success!
Evaluation¶
Once your submission passes the validation, you can use the
zerospeech2021-evaluate
program to get the scores on the development
datasets:
zerospeech2021-evaluate <dataset> <submission> -o <output_directory> [--njobs <int>]
where <dataset>
and <submission>
are as for validation, and
<output_directory>
is the folder where to store results as .csv
files.
The parameters required to evaluate the phonetic and semantic tasks are read
from <submission>/meta.yaml
.
Note
The evaluation of the lexical and syntactic parts are cheap and computed
on a single CPU core. The semantic evalation is computed on several CPU
cores, as controlled by the --njobs
parameter. The phonetic part is
computed on GPU using pytorch (fallback to CPU is no GPU available).
The evaluation process will write the following files:
/path/to/output_directory/
├── score_lexical_dev_by_frequency.csv
├── score_lexical_dev_by_length.csv
├── score_lexical_dev_by_pair.csv
├── score_phonetic.csv
├── score_semantic_dev_correlation.csv
├── score_semantic_dev_pairs.csv
├── score_syntactic_dev_by_pair.csv
└── score_syntactic_dev_by_type.csv
Submission¶
The submission to the challenge must be done on Codalab at https://competitions.codalab.org/competitions/27711:
Sign in or create an account on Codalab here.
Click on the Participate button then click on submit to upload your submission to our evaluation server.
Note: There is no feedback on Codalab during the upload but do not close or refresh the page. When done, a table like this one will be displayed:
Troubleshooting¶
If you are experiencing any issues related to the software, please open an issue on github: https://github.com/bootphon/zerospeech2021/issues.
For any other issue, please contact us at zerospeech2021@gmail.com.