ZRC Series Resources Full Bibliography

General Resources

In this page we list links to software to specific interest to the unsupervised speech learning. It has the emphasis on listing free software. Please, follow the instructions provided by the authors for the software installation and its operation, provide the appropriate reference and contact the authors in case of any issue.

The CoML team develops pipelines for data analysis, speech processing or machine learning and distribute them in an open source format in our repo on GitHub github.com/bootphon

All the tools linked to the ZRC series are on github.com/zerospeech

Discovery of sub-word units or representations

Discrete units, Bayesian approaches:
- Lee, C. & Glass, J. (2012). A Nonparametric Bayesian Approach to Acoustic Model Discovery, ACL, [github].
- Ondel, L., Burget, L., & Cernocky, J. (2016). Variational Inference for Acoustic Unit Discovery. Procedia Computer Science, 81, 80-86. [github].
Continuous representations, posteriorgrams:
- Chen, H., Leung, C. C., Xie, L., Ma, B., & Li, H. (2015). Parallel inference of dirichlet process gaussian mixture models for unsupervised acoustic modeling: A feasibility study. In Proceedings of Interspeech. code.
- Michael Heck, Sakriani Sakti, Satoshi Nakamura (2016). Unsupervised Linear Discriminant Analysis for Supporting DPGMM Clustering in the Zero Resource Scenario. Procedia Computer Science, Volume 81, pp73-79. the code is the same, plus kaldi.
Continuous representations, DNNs (this requires spoken term discovery):
- Synnaeve, G., Schatz, T., & Dupoux, E. (2014, December). Phonetics embedding learning with side information. In Spoken Language Technology Workshop (SLT), 2014 IEEE (pp. 106-111). IEEE. [github].
- Thiolliere, R., Dunbar, E., Synnaeve, G., Versteegh, M., & Dupoux, E. (2015). A hybrid dynamic time warping-deep neural network architecture for unsupervised acoustic modeling. In Sixteenth Annual Conference of the International Speech Communication Association. [github].

Spoken Term Discovery

DTW-based:
- MODIS: Catanese, L., Souviraa-Labastie, N., Qu, B., Campion, S., Gravier, G., Vincent, E., & Bimbot, F. (2013, August). MODIS: an audio motif discovery software. In Show & Tell-Interspeech 2013.
- Jansen, A., & Van Durme, B. (2011, December). `Efficient spoken term discovery using randomized algorithms . In Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on (pp. 401-406). IEEE. [github].
Bayesian approaches (text based):
- Adaptor Grammar: Johnson, M., Griffiths, T. L., & Goldwater, S. (2006). `Adaptor grammars: A framework for specifying compositional nonparametric Bayesian models . In Advances in neural information processing systems (pp. 641-648). [website].
Bayesian approaches (signal based):
- Lee, C., O’Donnell, T., Glass, J. (2015). `Unsupervised Lexicon Discovery from Acoustic Input Transactions of Association for Computational Linguistics (TACL). [github].