Questions? Contact zerospeech2021 [at] gmail [dot] com for questions or comments.
The columns are sortable by clicking on the picture of each column header. A detailed view of the results is available by clicking on the picture of each row.
The columns are interpreted as follows (see Evaluation metrics for details):
Phonetic (across and within)
ABX error rate on embeddings
Scale is \([0, 1]\), lower is better
Lexical and Syntactic
Mean correct / incorrect classification accurary
Scale is \([0, 1]\), higher is better
For Lexical the all column is the mean accuracy over five frequency bins (based on raw frequency counts in LibriSpeech-960: OOV; 1-5; 6-20; 21-100; 101+), and the in vocab. column leaves out the OOV category. Only the all column was published in the Interspeech summary paper.
Human judgement correlation coeficient (x 100)
Scale is \([-100, 100]\), far from 0 is better
Mean score across all datasets
Semantic (Weighted)**`: Same as **Semantic with mean score wigthed by the number of pairs in each dataset. Only the unweighted (Semantic) columns were published in the Interspeech summary paper.
|Phonetic (Within)||Phonetic (Across)||Lexical||Syntactic||Semantic||Semantic (Weighted)|