Results

The 2017 Challenge appeared as a special session at ASRU 2017 (Dec, 16-20, 2017, Okinawa, Japan, see the ASRU 2017 webpage. The challenge’s aims and metrics are an extension of those presented in [1] and the main results summarized in [2]. Below are the baseline results for the hyper training set. The baselines and system results for the hyper test will be revealed after the paper’s acceptance.

Track 1

Baseline and topline

The baseline ABX error rates for Track 1 are given in Table 1 (see also [1]). For the baseline model, we used 39 dimensions MFCC+Delta+Delta2 features computed every 10ms and the ABX score was computed using the frame-wise cosine distance averaged along the DTW path. The topline consists in posteriorgrams from a supervised phone recognition Kaldi pipeline.

Systems comparison

Authors Affiliation DOI # English French mandarin LANG1 LANG2 with supervision
1s 10s 120s 1s 10s 120s 1s 10s 120s 1s 10s 120s 1s 10s 120s
Table 1. Track 1 - Across Speaker Results
Authors Affiliation DOI # English French mandarin LANG1 LANG2 with supervision
1s 10s 120s 1s 10s 120s 1s 10s 120s 1s 10s 120s 1s 10s 120s
Table 2. Track1 - Within Speaker Results

Track 2

Baseline and topline

For the baseline model, we used the JHU system described in Jansen & van Durme (2011) on PLP features. It performs DTW matching and uses random projections for increasing efficiency, and uses connected-component graph clustering as a second step. A topline using an Adaptor Grammar using a unigram model on the decoding provided by the phone recognizer is also reported; The topline performance is probably not attainable by unsupervised systems since it uses the gold transcription and it is more of a reference for the maximum value that it reasonable to expect for the used scores.

Authors DOI English French mandarin LANG1 LANG2
NED cov words NED cov words NED cov words NED cov words NED cov words
Table 3. Track 2 metrics for a baseline and topline model on English, French and Mandarin datasets.

Note

The spoken term discovery baseline can be found here. The Pitman-Yor Adaptor Grammar sampler can be found here. The baseline can be replicated by running the code keep in github.

Challenge References

  • [1] Dunbar, E., Cao, X. N., Benjumea, J., Karadayi, J., Bernard, M., Besacier, L., … & Dupoux, E. (2017, December). The zero resource speech challenge 2017. In 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (pp. 323-330). IEEE.