Benchmarks & Datasets
Summary
sLM-21
Datasets
| Set | Language | Dataset Source | Type | Train Set (Duration/Speakers | Test Set (Duration Speakers) | Dev Set (Duration/Speakers) |
|---|---|---|---|---|---|---|
| lexical (sWuggy) | English | |||||
| syntactic (sSIMI) | English | |||||
| semantic/synthetic | English | |||||
| semantic/librispeech | English | audiobook | ||||
| Train-Librispeech | English | Librispeech | audiobook | libriSpeech,Libri-light, etc. | ||
| prosAudit-dataset | English | audiobook |
Downloading
The datasets can be downloaded from download.zerospeech.com
Or using the toolkit with the following commands :
zrc datasets:pull sLM21-datasetzrc datasets:pull prosaudit-dataset