Zerospeech 2017

The 2017 Challenge appeared as a special session at ASRU 2017 (Dec, 16-20, 2017, Okinawa, Japan, see the ASRU 2017 webpage. The challenge’s aims and metrics are an extension of those presented in [1] and the main results summarized in [2]. Below are the baseline results for the hyper training set. The baselines and system results for the hyper test will be revealed after the paper’s acceptance.

Track 1

Baseline and topline

The baseline ABX error rates for Track 1 are given in Table 1 (see also [1]). For the baseline model, we used 39 dimensions MFCC+Delta+Delta2 features computed every 10ms and the ABX score was computed using the frame-wise cosine distance averaged along the DTW path. The topline consists in posteriorgrams from a supervised phone recognition Kaldi pipeline.

Systems comparison

Track 2

Baseline and topline

For the baseline model, we used the JHU system described in Jansen & van Durme (2011) on PLP features. It performs DTW matching and uses random projections for increasing efficiency, and uses connected-component graph clustering as a second step. A topline using an Adaptor Grammar using a unigram model on the decoding provided by the phone recognizer is also reported; The topline performance is probably not attainable by unsupervised systems since it uses the gold transcription and it is more of a reference for the maximum value that it reasonable to expect for the used scores.

Challenge References

  • [1] Dunbar, E., Cao, X. N., Benjumea, J., Karadayi, J., Bernard, M., Besacier, L., … & Dupoux, E. (2017, December). The zero resource speech challenge 2017. In 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (pp. 323-330). IEEE.