Spoken Language Modeling Benchmarks & Datasets How to participate Leaderboard

How to participate

For any issues please email us at issue@zerospeech.com

Choosing a train dataset

You can train on any of the standard ZeroSpeech Task 4 train sets listed on the Benchmarks and Datasets page, together or combined. You can also train on external datasets, as long as they are publicly available. During the submission process, you will be asked to specify what dataset was used to train your system, providing a link (or publication reference) if it is an external dataset.

The provided datasets can be downloaded using our toolkit or directly using the provided URLs in our repository.

Using our toolkit

It is recommended to install and use our toolkit to manage, evaluate & upload your submissions. The toolkit consists of a python package containing evaluation scripts, scripts to download datasets & other relevant files, also scripts to facilitate uploading of results to the leaderboards. You can find instructions on how to download and use our toolkit here

Submission Preparation

Each benchmark requires a specific set of files to be prepared.

To facilitate this you can use the zrc submission:init sLM21 <location> command from the toolkit to create an empty submission template folder. <location> is the path where the directory will be created

`meta.yaml`

This file contains meta information about the author and how this submission was created.

example :

model_info:
  model_id: null
  gpu_budget: 60
  system_description: "CPC-big (trained on librispeech 960), kmeans (trained on librispeech 100), LSTM. See https://zerospeech.com/2021 for more details."
  train_set: "librispeech 960, librispeech 100"
publication:
  author_label: "Nguyen et al."
  authors: "Nguyen, T., Seyssel, M., Rozé, P., Rivière, M., Kharitonov, E., Baevski, A., Dunbar, E. & Dupoux, E."
  paper_title: "The zero resource speech benchmark 2021: Metrics and baselines for unsupervised spoken language modeling."
  paper_url: "https://arxiv.org/abs/2011.11589"
  publication_year: 2021
  institution: "EHESS, ENS, PSL Research University, CNRS and Inria"
  team: "CoML Team"
code_url: "https://github.com/zerospeech/zerospeech2021_baseline"
open_source: true

To Note

While most of the information in meta.yaml is optional, we appreciate if you take the time and fill this information as it allows us to verify the submissions and be able to keep track of all the systems that use our benchmarks.

We also would appreciate if you made your code open source and provided a link to it, although we understand that this is not always possible.

The model_id parameter is generated when you submit a system to our backend, if you wish to submit the same system to multiple benchmarks keep the model_id the same to allow our system to link the submissions.

`params.yaml`

This file contains various parameters that can override the defaults of each benchmark.

semantic:
  metric:  <str>
    The metric to use for semantic evaluation. May be any metric
    supported by scipy.spatial.distance.cdist.
  n_jobs: <int> accelerate semantic evaluation by adding multiple processes
  pooling: <str>
    The pooling method to use for semantic evaluation, must be 'min',
    'max', 'mean', 'sum', 'last' or 'lastlast'.

`model outputs`

For each of the tasks a model output is required.

/lexical, /syntactic

The /lexical and /syntactic folders of the submission must contain the two files dev.txt and test.txt. For each *.wav file in the dataset must correspond a line either in dev.txtor test.txt with its corresponding pseudo-probability (order does not matter). For example if the dev dataset contains:

   /path/to/dataset/lexical/dev
   ├── aAAfmkmQpVz.wav
   ├── AaaggUZsvkR.wav
   ├── aAakhKfuvQI.wav
   ├── aAaOswLeeBL.wav
   ├── AaasVuoMJnS.wav

The submitted file dev.txt must contain entries like:

   aAAfmkmQpVz -313.37445068359375
   AaaggUZsvkR -447.8950500488281
   aAakhKfuvQI -383.8902587890625
   aAaOswLeeBL -430.2048645019531
   AaasVuoMJnS -356.9426574707031

/semantic

The semantic folder of the submission must contain the following subdirectories: dev/synthetic, dev/librispeech, test/synthetic and test/librispeech.

Each .wav file in the dataset must have its corresponding .npy file in the submission under the same directory structure. For example the dataset file /path/to/dataset/semantic/dev/synthetic/aAbcsWWKCz.wav must have its submitted file /path/to/submission/semantic/dev/synthetic/aAbcsWWKCz.npy.
Each .npy file encodes a single 2D numpy array of floats, each line encoding one features frame.
The number of columns (the features dimension) must be constant across the files. The number of lines depends on the speech sample duration.
The metric and pooling method used for evaluation must be specified in params.yaml.

It is recommended to use .npy files to save your arrays as they are binary formats and use less space. Although .txt format is supported as well for backwards compatibility. Reference to both methods of export :

for .txt numpy.save_txt
for .npy numpy.save

Running the evaluation

Once the submission has been successfully created we can now run the evaluation.

zrc benchmarks:run sLM21 </path/to/submission> -o scores_dir

Your results are created in the scores_dir directory.

Notes:

A validation will run before each evaluation to skip use option --skip-validation
If the dataset has subsets you can run the eval on only a selected subset --sets dev
If the benchmark has multiple sub tasks you can run your benchmark on a selected subtask using --task lexical semantic

Uploading Results

DEV-NOTE: The upload functionality will become available in January 2023

We appreciate if you upload your results so that we can compile them into our leaderboards, this helps us with a couple of ways :

It allows us to follow new systems that are evaluated on our benchmarks and compare them.
It also helps us with creating a central place where all systems trying to solve unsupervised speech processing can be indexed.
It shows that interest in our benchmarks is still active and motivates us to create more

To submit your results you need to create an account on our website (if one is not already available). You can follow this link to create your account

Using the toolkit create a local session zrc user:login provide your username & password.

Once this is done you can upload using the following command zrc submit <submission_dir>

To submit your scores you need include all the required files in the same directory.

source files: (embeddings/probabilities) these are files extracted from your model.
score files: these are the result of the evaluation process.
params.yaml: these are the parameters of the evaluation process.
meta.yaml: generic information on submission

ProsAudit Benchmark

To run the ProsAudit evaluation task you also need to create a separate submission (as the two benchmarks have been separated).

Create a submission directory (same as for the sLM21 task)

When using the toolkit this can be done with the command : zrc submission:init prosAudit <location>

Add your pseudo-probability files

/path/to/submission
├── english_dev.txt
├── english_test.txt

With each txt file a list of pseudo-probabilities same as done for the lexical & syntactic tasks but using the prosAudit-dataset this time as shown in the following example (order does not matter)

10_7_2723 -2.2400479316711426
2_7_2723 -2.21551513671875
10_7_2807 -1.9842218160629272
2_7_2807 -1.886082410812378
10_7_1624 -2.1218247413635254
2_7_1624 -2.1739344596862793
10_7_1540 -1.8953361511230469
2_7_1540 -1.8365209102630615
10_7_3596 -2.123969078063965
2_7_3596 -2.1809585094451904
...

meta.yaml is in the same format as shown here

Evaluation can be run using the command zrc benchmarks:run /path/to/submission

Scores are added in the scores’ subdirectory.

Multiple Submissions

If your system can be used for multiple tasks (for example, Task 1 and Task 3, Task 1 and Task 4), you are strongly encouraged to make submission to all the tasks you can. To link submissions of a single system you need to use the same model_id in your meta.yaml auto-generated after the first submission.