DementiaBank ADReSS 2020 Challenge

INTERSPEECH 2020 - Alzheimer's Dementia Recognition through Spontaneous Speech: The ADReSS Challenge

This challenge was organized by Saturnino Luz, Fasih Haider, and Sofia de la Fuente Garcia of the University of Edinburgh and Davida Fromm and Brian MacWhinney of Carnegie Mellon University.

The objective of the ADReSS challenge is to make available a benchmark dataset of spontaneous speech, which is acoustically pre-processed and balanced in terms of age and gender, defining a shared task through which different approaches to AD recognition in spontaneous speech can be compared. Our JAD review describes the state of research at the beginning of the challenge.

The Challenge has these features:

The Challenge

This powerpoint describes the challenge. It consists of two tasks:
  1. an AD classification task, where you are required to produce a model to predict the label (AD or non-AD) for a speech session. Your model can use speech data, language data (transcipts are provided), or both.
  2. an MMSE score regression task, where you will create a model to infer the subject's Mini Mental Status Examination (MMSE) score based on speech and/or language data.

After joining as a DementiaBank member, you can gain access to the training and test data from here.

The training data consists of three folders of data (full enhanced audio, normalised sub-chunks, transcriptions) as well as two text files with information on age, gender and MMSE scores for participants with and without a diagnosis of AD (cc_meta_data.txt, cd_meta_data.txt). A README file is also included for further details.

The baseline results are in this paper

Performance on AD classification is evaluated through F scores. Performance on MMSE prediction is through root mean squared error (RMSE).

This sheet lists the participants in the challenge.

This sheet summarizes the results.

This file gives the labels.

The complete set of conference papers is here .

AD non-AD
Age Interval Male Female Male Female
[50, 55) 2 0 2 0
[55, 60) 7 6 7 6
[60, 65) 4 9 4 9
[65, 70) 9 14 9 14
[70, 75) 9 11 9 11
[75, 80) 4 3 4 3
Total 35 43 35 43

Each session was segmented for voice activity using a voice activity detection system based on a signal energy threshold. We set the log energy threshold parameter to 65dB with a maximum duration of 10 seconds per speech segment. The segmented dataset contains 1,955 speech segments from 78 non-AD subjects and 2122 speech segments from 78 AD subjects. The average number of speech segments produced per participant was 24.86 (standard deviationsd= 12.84). Audio volume was normalised across all speech segments to control for variation caused by recording conditions, such as microphone placement.


The ADReSS Challenge acknowledges the support and sponsorship of the European Union's Horizon 2020 research programme, under grant agreement No 769661, towards the SAAM project