Benchmarks & Datasets
Summary
sLM-21
Datasets
Set | Language | Dataset Source | Type | Train Set (Duration/Speakers | Test Set (Duration Speakers) | Dev Set (Duration/Speakers) |
---|---|---|---|---|---|---|
lexical (sWuggy) | English | |||||
syntactic (sSIMI) | English | |||||
semantic/synthetic | English | |||||
semantic/librispeech | English | audiobook | ||||
Train-Librispeech | English | Librispeech | audiobook | libriSpeech,Libri-light, etc. | ||
prosAudit-dataset | English | audiobook |
Downloading
The datasets can be downloaded from download.zerospeech.com
Or using the toolkit with the following commands :
zrc datasets:pull sLM21-dataset
zrc datasets:pull prosaudit-dataset