Getting Started with our Toolbox
A toolbox has been created to allow you to use the different benchmarks. This toolbox allows to download all resources linked with the benchmarks (datasets, model checkpoints, samples, etc..), to run the various benchmarks on your own submissions, and to upload the results to our website for them to be included in our leaderboards. The toolbox (for Python) can be installed via pip (source available on our github. Below, we provide documentation and instructions on how to install and use the toolbox.
The zrc ZeroSpeech Benchmark toolbox is a Python package. You require a version of Python installed on your system before you can start.
Once Python is installed, you can install the package using :
pip install zerospeech-benchmarks[all]
If you are a conda user, you can use our conda environment :
conda env create coml/zerospeech-benchmark
Or you can install it directly from source on github
To verify that the toolbox is installed correctly you can try
zrc version which should print the version information.
If this is not the case you can open an issue with your errors directly on our github
Installation location can be specified using the environment variable
APP_DIR, in linux & macos this can be done in a terminal:
$ export APP_DIR=/location/to/data
By default all data is saved in
/tmpthis can be changed using the environment variable
Download benchmark datasets
You can start by listing the available datasets using the command
zrc datasets then you can download the
dataset you want using the command
zrc datasets:pull <dataset-name>.
When listing datasets the Installed column specifies whether the dataset has been downloaded.
Datasets are installed in the
To delete a dataset you can use the command
zrc dataset:rm <dataset-name>
Some datasets will not be available for download (for example the abx15 & td15 use the buckeye corpus) due restrictive licencing. They are marked in the list as those that have an external origin. Those datasets can be obtained by alternate means, instructions will be in the relevant tasks. You will however be required to import them for indexing into the toolbox to do that you can use the following command :
zrc datasets:import <name> </dataset/location>
Download model checkpoints
zrc checkpoints allows you to list available checkpoints.
You can then download each set by typing
zrc checkpoints:pull <name>
Checkpoints are installed in the
To delete the checkpoints you can use the command :
zrc checkpoints:rm <name>
zrc samples allows you to list the available samples.
You can then download each sample by typing
zrc samples:pull <name>.
Samples are installed in the
To delete a sample from your system you can use
zrc samples:rm <name> or just delete the relevant folder manually.
You can list available benchmarks by typing the
zrc benchmarks command.
To create a submission you have to follow the instructions on each of our task pages Task1, Task2, Task3, Task4
Once the submission has been created you can run the benchmark on it with the following command :
zrc benchmarks:run <name> /path/to/submission -o /path/to/scores_dir
Some benchmarks are split into sub-tasks you can run partial tasks by using the following syntax:
zrc benchmarks:run sLM21 /path/to/submission -o /path/to/scores_dir -t lexical syntactic
With this syntax we run the sLM21 benchmark our submission but only for the lexical and syntactic task and we omit the semantic.
In the same way we can also only run on the dev set (or the test) :
zrc benchmarks:run sLM21 /path/to/submission -o /path/to/scores_dir -s dev -t lexical syntactic
We run the same tasks as previously but only on the dev set of the benchmark.
Each benchmark has a specific format that a submission has to follow, you can initialize a
submission directory by using the following syntax :
zrc submission:init <name> /path/to/submission, this will
create a set of folders in the architecture corresponding to the benchmark name selected.
For more detailed information on each benchmark you can see each Task page respectively.
Once all submission files have been created your can validate your submission to see if everything is working properly.
To do so use the following syntax :
zrc submission:verify <name> /path/to/submission this will verify that all files
are setup correctly, or show informative errors if not.
Note: During benchmark evaluation the default behavior is to run validation on your submission, you can deactive this by adding
The upload functionality is used to upload your submission and results to our leaderboards.
To submit your results you need to create an account on our website (if one is not already available). You can follow this link to create your account
Using the toolkit create a local session
zrc user:login provide your username & password.
Once this is done you can upload using the following command
zrc submit <submission_dir>
To submit your scores you need include all the required files in the same directory.
- source files: (embeddings/probabilities) these are files extracted from your model.
- score files: these are the result of the evaluation process.
- params.yaml: these are the parameters of the evaluation process.
- meta.yaml: generic information on submission
If your system can be used for multiple tasks (for example, Task 1 and Task 3, Task 1 and Task 4), you are strongly encouraged to make submission to all the tasks you can.
To link submissions of a single system you need to use the same
model_id in your
meta.yaml auto-generated after the first submission.