Zerospeech 2017
The 2017 Challenge appeared as a special session at ASRU 2017 (Dec, 16-20, 2017, Okinawa, Japan, see the ASRU 2017 webpage. The challenge’s aims and metrics are an extension of those presented in [1] and the main results summarized in [2]. Below are the baseline results for the hyper training set. The baselines and system results for the hyper test will be revealed after the paper’s acceptance.
Track 1
Baseline and topline
The baseline ABX error rates for Track 1 are given in Table 1 (see also [1]). For the baseline model, we used 39 dimensions MFCC+Delta+Delta2 features computed every 10ms and the ABX score was computed using the frame-wise cosine distance averaged along the DTW path. The topline consists in posteriorgrams from a supervised phone recognition Kaldi pipeline.
Systems comparison
Authors | Affiliation | DOI | # | English | French | mandarin | LANG1 | LANG2 | with supervision | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1s | 10s | 120s | 1s | 10s | 120s | 1s | 10s | 120s | 1s | 10s | 120s | 1s | 10s | 120s |
Authors | Affiliation | DOI | # | English | French | mandarin | LANG1 | LANG2 | with supervision | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1s | 10s | 120s | 1s | 10s | 120s | 1s | 10s | 120s | 1s | 10s | 120s | 1s | 10s | 120s |
Track 2
Baseline and topline
For the baseline model, we used the JHU system described in Jansen & van Durme (2011) on PLP features. It performs DTW matching and uses random projections for increasing efficiency, and uses connected-component graph clustering as a second step. A topline using an Adaptor Grammar using a unigram model on the decoding provided by the phone recognizer is also reported; The topline performance is probably not attainable by unsupervised systems since it uses the gold transcription and it is more of a reference for the maximum value that it reasonable to expect for the used scores.
Authors | DOI | English | French | mandarin | LANG1 | LANG2 | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NED | cov | words | NED | cov | words | NED | cov | words | NED | cov | words | NED | cov | words |
Note:
The spoken term discovery baseline can be found here. The Pitman-Yor Adaptor Grammar sampler can be found here. The baseline can be replicated by running the code keep in github.
Challenge References
- [1] Dunbar, E., Cao, X. N., Benjumea, J., Karadayi, J., Bernard, M., Besacier, L., … & Dupoux, E. (2017, December). The zero resource speech challenge 2017. In 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (pp. 323-330). IEEE.