Benchmarks & Datasets

sLM-21

Datasets

Set Language Dataset Source Type Train Set (Duration/Speakers Test Set (Duration Speakers) Dev Set (Duration/Speakers)
lexical (sWuggy) English
syntactic (sSIMI) English
semantic/synthetic English
semantic/librispeech English audiobook
Train-Librispeech English Librispeech audiobook libriSpeech,Libri-light, etc.
prosAudit-dataset English audiobook

Downloading

The datasets can be downloaded from download.zerospeech.com

Or using the toolkit with the following commands :

  • zrc datasets:pull sLM21-dataset
  • zrc datasets:pull prosaudit-dataset