How to participate
Contents
Choosing a train dataset
You can train on any of the standard ZeroSpeech Task 4 train sets listed on the Benchmarks and Datasets page, together or combined. You can also train on external datasets, as long as they are publicly available. During the submission process, you will be asked to specify what dataset was used to train your system, providing a link (or publication reference) if it is an external dataset.
The provided datasets can be downloaded using our toolkit or directly using the provided URLs in our repository.
Using our toolkit
It is recommended to install and use our toolkit to manage, evaluate & upload your submissions. The toolkit consists of a python package containing evaluation scripts, scripts to download datasets & other relevant files, also scripts to facilitate uploading of results to the leaderboards. You can find instructions on how to download and use our toolkit here
Submission Preparation
Each benchmark requires a specific set of files to be prepared.
To facilitate this you can use the
zrc submission:init sLM21 <location>
command from the toolkit to create an empty submission template folder.<location>
is the path where the directory will be created
meta.yaml
This file contains meta information about the author and how this submission was created.
example :
model_info:
model_id: null
gpu_budget: 60
system_description: "CPC-big (trained on librispeech 960), kmeans (trained on librispeech 100), LSTM. See https://zerospeech.com/2021 for more details."
train_set: "librispeech 960, librispeech 100"
publication:
author_label: "Nguyen et al."
authors: "Nguyen, T., Seyssel, M., Rozé, P., Rivière, M., Kharitonov, E., Baevski, A., Dunbar, E. & Dupoux, E."
paper_title: "The zero resource speech benchmark 2021: Metrics and baselines for unsupervised spoken language modeling."
paper_url: "https://arxiv.org/abs/2011.11589"
publication_year: 2021
institution: "EHESS, ENS, PSL Research University, CNRS and Inria"
team: "CoML Team"
code_url: "https://github.com/zerospeech/zerospeech2021_baseline"
open_source: true
To Note
While most of the information in meta.yaml is optional, we appreciate if you take the time and fill this information as it allows us to verify the submissions and be able to keep track of all the systems that use our benchmarks.
We also would appreciate if you made your code open source and provided a link to it, although we understand that this is not always possible.
model_id
parameter is generated when you submit a system to our backend, if you wish to submit the same system to
multiple benchmarks keep the model_id the same to allow our system to link the submissions. params.yaml
This file contains various parameters that can override the defaults of each benchmark.
semantic:
metric: <str>
The metric to use for semantic evaluation. May be any metric
supported by scipy.spatial.distance.cdist.
n_jobs: <int> accelerate semantic evaluation by adding multiple processes
pooling: <str>
The pooling method to use for semantic evaluation, must be 'min',
'max', 'mean', 'sum', 'last' or 'lastlast'.
model outputs
For each of the tasks a model output is required.
/lexical, /syntactic
The /lexical and /syntactic folders of the submission must contain the two files
dev.txt
and test.txt
. For each *.wav file in the dataset must correspond
a line either in dev.txt
or test.txt
with its corresponding pseudo-probability (order does not matter).
For example if the dev dataset contains:
/path/to/dataset/lexical/dev
├── aAAfmkmQpVz.wav
├── AaaggUZsvkR.wav
├── aAakhKfuvQI.wav
├── aAaOswLeeBL.wav
├── AaasVuoMJnS.wav
The submitted file dev.txt must contain entries like:
aAAfmkmQpVz -313.37445068359375
AaaggUZsvkR -447.8950500488281
aAakhKfuvQI -383.8902587890625
aAaOswLeeBL -430.2048645019531
AaasVuoMJnS -356.9426574707031
/semantic
The semantic folder of the submission must contain the following subdirectories:
dev/synthetic
, dev/librispeech
, test/synthetic
and test/librispeech
.
-
Each
.wav
file in the dataset must have its corresponding.npy
file in the submission under the same directory structure. For example the dataset file/path/to/dataset/semantic/dev/synthetic/aAbcsWWKCz.wav
must have its submitted file/path/to/submission/semantic/dev/synthetic/aAbcsWWKCz.npy
. -
Each .npy file encodes a single 2D numpy array of floats, each line encoding one features frame.
-
The number of columns (the features dimension) must be constant across the files. The number of lines depends on the speech sample duration.
-
The metric and pooling method used for evaluation must be specified in
params.yaml
.
It is recommended to use .npy files to save your arrays as they are binary formats and use less space. Although .txt format is supported as well for backwards compatibility. Reference to both methods of export :
- for
.txt
numpy.save_txt - for
.npy
numpy.save
Running the evaluation
Once the submission has been successfully created we can now run the evaluation.
zrc benchmarks:run sLM21 </path/to/submission> -o scores_dir
Your results are created in the scores_dir
directory.
Notes:
- A validation will run before each evaluation to skip use option
--skip-validation
- If the dataset has subsets you can run the eval on only a selected subset
--sets dev
- If the benchmark has multiple sub tasks you can run your benchmark on a selected subtask using
--task lexical semantic
Uploading Results
We appreciate if you upload your results so that we can compile them into our leaderboards, this helps us with a couple of ways :
- It allows us to follow new systems that are evaluated on our benchmarks and compare them.
- It also helps us with creating a central place where all systems trying to solve unsupervised speech processing can be indexed.
- It shows that interest in our benchmarks is still active and motivates us to create more
To submit your results you need to create an account on our website (if one is not already available). You can follow this link to create your account
Using the toolkit create a local session zrc user:login
provide your username & password.
Once this is done you can upload using the following command zrc submit <submission_dir>
To submit your scores you need include all the required files in the same directory.
- source files: (embeddings/probabilities) these are files extracted from your model.
- score files: these are the result of the evaluation process.
- params.yaml: these are the parameters of the evaluation process.
- meta.yaml: generic information on submission
ProsAudit Benchmark
To run the ProsAudit evaluation task you also need to create a separate submission (as the two benchmarks have been separated).
- Create a submission directory (same as for the sLM21 task)
zrc submission:init prosAudit <location>
- Add your pseudo-probability files
/path/to/submission
├── english_dev.txt
├── english_test.txt
With each txt
file a list of pseudo-probabilities same as done for the lexical & syntactic tasks but using the
prosAudit-dataset this time as shown in the following example (order does not matter)
10_7_2723 -2.2400479316711426
2_7_2723 -2.21551513671875
10_7_2807 -1.9842218160629272
2_7_2807 -1.886082410812378
10_7_1624 -2.1218247413635254
2_7_1624 -2.1739344596862793
10_7_1540 -1.8953361511230469
2_7_1540 -1.8365209102630615
10_7_3596 -2.123969078063965
2_7_3596 -2.1809585094451904
...
meta.yaml
is in the same format as shown here
Evaluation can be run using the command zrc benchmarks:run /path/to/submission
Scores are added in the scores’ subdirectory.
Multiple Submissions
If your system can be used for multiple tasks (for example, Task 1 and Task 3, Task 1 and Task 4), you are strongly encouraged to make submission to all the tasks you can.
To link submissions of a single system you need to use the same model_id
in your meta.yaml
auto-generated after the first submission.