In this page we list links to software to specific interest to the unsupervised speech learning. It has the emphasis on listing free software. Please, follow the instructions provided by the authors for the software installation and its operation, provide the appropriate reference and contact the authors in case of any issue.
The bootphon team develops pipelines for data analysis, speech processing or machine learning and distribute them in an open source format in the bootphon repo on github.
Discovery of subword units or representations¶
Discrete units, Bayesian approaches:
Lee, C. & Glass, J. (2012). A Nonparametric Bayesian Approach to Acoustic Model Discovery, ACL, [github].
Ondel, L., Burget, L., & Cernocky, J. (2016). Variational Inference for Acoustic Unit Discovery. Procedia Computer Science, 81, 80-86. [github].
Continuous representations, posteriorgrams:
Chen, H., Leung, C. C., Xie, L., Ma, B., & Li, H. (2015). Parallel inference of dirichlet process gaussian mixture models for unsupervised acoustic modeling: A feasibility study. In Proceedings of Interspeech. [code].
Michael Heck, Sakriani Sakti, Satoshi Nakamura (2016). Unsupervised Linear Discriminant Analysis for Supporting DPGMM Clustering in the Zero Resource Scenario. Procedia Computer Science, Volume 81, pp73-79. [the code is the same, plus kaldi].
Continuous representations, DNNs (this requires spoken term discovery):
Synnaeve, G., Schatz, T., & Dupoux, E. (2014, December). Phonetics embedding learning with side information. In Spoken Language Technology Workshop (SLT), 2014 IEEE (pp. 106-111). IEEE. [github].
Thiolliere, R., Dunbar, E., Synnaeve, G., Versteegh, M., & Dupoux, E. (2015). A hybrid dynamic time warping-deep neural network architecture for unsupervised acoustic modeling. In Sixteenth Annual Conference of the International Speech Communication Association. [github].
Spoken Term Discovery¶
MODIS: Catanese, L., Souviraa-Labastie, N., Qu, B., Campion, S., Gravier, G., Vincent, E., & Bimbot, F. (2013, August). MODIS: an audio motif discovery software. In Show & Tell-Interspeech 2013. [code].
Jansen, A., & Van Durme, B. (2011, December). Efficient spoken term discovery using randomized algorithms. In Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on (pp. 401-406). IEEE. [github].
Bayesian approaches (text based):
Adaptor Grammar: Johnson, M., Griffiths, T. L., & Goldwater, S. (2006). Adaptor grammars: A framework for specifying compositional nonparametric Bayesian models. In Advances in neural information processing systems (pp. 641-648). [website].
Bayesian approaches (signal based):
Lee, C., O’Donnell, T., Glass, J. (2015). Unsupervised Lexicon Discovery from Acoustic Input, Transactions of Association for Computational Linguistics (TACL). [github].