The columns are sortable by clicking on the sortable picture of each column header. A detailed view of the results is available by clicking on the details picture of each row.

The columns are interpreted as follows (see Evaluation metrics for details):

  • Phonetic (across and within)

    • ABX error rate on embeddings

    • Scale is \([0, 1]\), lower is better

  • Lexical and Syntactic

    • Mean correct / incorrect classification accurary

    • Scale is \([0, 1]\), higher is better

    • For Lexical the all column is the mean accuracy over five frequency bins (based on raw frequency counts in LibriSpeech-960: OOV; 1-5; 6-20; 21-100; 101+), and the in vocab. column leaves out the OOV category. Only the all column was published in the Interspeech summary paper.

  • Semantic

    • Human judgement correlation coeficient (x 100)

    • Scale is \([-100, 100]\), far from 0 is better

    • Mean score across all datasets

    • Semantic (Weighted)**`: Same as **Semantic with mean score wigthed by the number of pairs in each dataset. Only the unweighted (Semantic) columns were published in the Interspeech summary paper.

