DementiaBank

DementiaBank

ADReSSo 2021 Challenge

INTERSPEECH 2021 - Alzheimer's Dementia Recognition through Spontaneous Speech (audio only): The ADReSSo Challenge

This challenge was organized by Saturnino Luz, Fasih Haider, and Sofia de la Fuente Garcia of the University of Edinburgh and Davida Fromm and Brian MacWhinney of Carnegie Mellon University.

The objective of the ADReSSo-2021 challenge is to make available a benchmark dataset of spontaneous speech, which is acoustically pre-processed and balanced in terms of age and gender, defining a shared task through which different approaches to AD recognition in spontaneous speech can be compared. Our JAD systematic review describes the state of research as background for the challenge.

Alzheimer's Dementia Recognition through Spontaneous Speech (audio only): The ADReSSo Challenge

Dementia is a category of neurodegenerative diseases that entails a long-term and usually gradual decrease of cognitive functioning. The main risk factor for dementia is age, and therefore its greatest incidence is among the elderly. Due to the severity of the situation worldwide, institutions and researchers are investing considerably on dementia prevention and early detection, focusing on disease progression. There is a need for cost-effective and scalable methods for detection of dementia from its most subtle forms, such as the preclinical stage of Subjective Memory Loss (SML), to more severe conditions like Mild Cognitive Impairment (MCI) and Alzheimer's Dementia (AD) itself.

The main features of the ADReSSo (ADReSS, speech only) Challenge are:

The Challenge targets a difficult automatic prediction problem of societal and medical relevance, namely, the detection of Alzheimer's Dementia (AD). The challenge builds on the success of the ADReSS-2020 Challenge at INTERSPEECH 2020 in Shanghai , the first such shared-task event focused on AD, which attracted 34 teams from across the world.
The ADReSSo Challenge will provide a forum for those different research groups to test their existing methods (or develop novel approaches) on a new shared standardized dataset. The approaches that performed best on the original dataset employed features extracted from manual transcripts, which were provided.
The challenge uses a spontaneous speech dataset, and requires the creation of models straight from speech, without manual transcription, although automatic transcription is allowed and encouraged.
In keeping with the objectives of AD prediction evaluation, the ADReSSo challenge dataset will be statistically balanced so as to mitigate common biases often overlooked in evaluations of AD detection methods, including repeated occurrences of speech from the same participant (common in longitudinal datasets), variations in audio quality, and imbalances of gender and age distribution.
This task focuses on AD recognition using spontaneous speech, which marks a departure from neuropsychological and clinical evaluation approaches. Spontaneous speech analysis has the potential to enable novel applications for speech technology in longitudinal, unobtrusive monitoring of cognitive health, in line with the theme of this year's INTERSPEECH, "Speech Everywhere!".

The Challenge

The ADReSSo challenge consists of the following tasks:

an AD classification task, where you are required to produce a model to predict the label (AD or non-AD) for a speech session. You may use the speech signal directly (acoustic features), or attempt to convert the speech into text automatically (ASR) and extract linguistic features from this automatically generated transcript;
an MMSE score regression task, where you create a model to infer the subject's Mini Mental Status Examination (MMSE) score based on speech data;
and a cognitive decline (disease progression) inference task, where you create a model to predict changes in cognitive status over time, for a given speaker, based on speech data collected at baseline (beginning of a cohort study).

You may choose to do one or more of these tasks. You will be provided with access to training and test sets.

Access to the data set

You must first join as a DementiaBank member.

For the diagnosis task you can use this training data

For the progression tasks you can use this training data

For the diagnosis task you can use this testing data

For the progression tasks you can use this testing data

The training data consists of three folders of data (full enhanced audio, normalised sub-chunks, transcriptions). There are also .csv files with information on age, gender and MMSE scores for Task 1 and Tasks 2 and 3.

The training data are organised into two folders: diagnosis and progression. The diagnosis/train/audio/ad folder contains speech from speakers with Alzheimer's dementia diagnosis. The diagnosis/train/audio/cn folder contains speech from controls. The progression/train/audio/decline folrder contains baseline data from patients who exhibited cognitive dcline between their baseline assessment and their year-2 visit to the clinic. The progression/train/audio/no_decline folder has speech from patients with no decline during that period. Decline is defined as a difference in MMSE score between baseline and year-2 greater than or equal to 5 points.

For the AD/CN diagnosis task and the MMSE predication task, each sub-directory contains compressed (ZIP) archives with recordings of a picture description task ("Cookie Theft" picture from the Boston Diagnostic Aphasia Exam). Those recodings have been acoustically enhanced (noise reduction through spectral subtraction) and normalised. The directory structure and files for the disease progression prediction task are similarly organised. They consists of recordings of a laguage fluency task, also normalised and acoustically enhanced.

The diagnosis task dataset has been balanced with respect to age and gender in order to eliminate potential confunding and bias. We employed a propensity score approach to matching (Rosenbaum & Rubin, 1983; Rubin, 1973; Ho et al., 2007). The dataset was checked for matching according to scores defined in terms of the probability of an instance being treated as AD given covariates age and gender, estimated through logistic regression, and matching instances were selected. All standardized mean differences for the covariates were well below 0.1 and all standardized mean differences for squares and two-way interactions between covariates were well below 0.15, indicating adequate balance for the covariates. The propensity score was estimated using a probit regression of the treatment on the covariates 'age' and 'gender' (probit generated a better balanced than logistic regression).

Performance Metrics

Additional Materials

A full description of the ADReSSo Challenge and its datasets, along with a basic set of baseline results can be found in this paper

The ground truth for the test sets is available for task1, task2, and task3.

The Challenge papers submitted for INTERSPEECH2021 are combined into this PDF.

Several Challenge papers, along with related work, are compiled in this Frontiers Research Topic special issue.

We used CLAN's EVAL program to extract a wide range of linguistic features. The raw EVAL results are given here and a summary is given here.

The ADReSSo Challenge acknowledges the support and sponsorship of the European Union's Horizon 2020 research programme, under grant agreement No 769661, towards the SAAM project.

References

de la Fuente Garcia S, Ritchie C, & Luz S. Artificial Intelligence, Speech, and Language Processing Approaches to Monitoring Alzheimer's Disease: A Systematic Review. Journal of Alzheimer's Disease. 2020:1-27.
Luz S, Haider F, de la Fuente S, Fromm D, MacWhinney B. Alzheimer's Dementia Recognition through Spontaneous Speech: The ADReSS Challenge. Proceedings of INTERSPEECH 2020. Also available as arXiv preprint arXiv:2004.06833. 2020.
Rosenbaum, Paul R., and Donald B. Rubin. 1983. The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika 70 (1): 41-55. .
Rubin, Donald B. 1973. Matching to Remove Bias in Observational Studies. Biometrics 29 (1): 159. .
Ho, Daniel E., Kosuke Imai, Gary King, and Elizabeth A. Stuart. 2007. Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Analysis 15 (3): 199-236. https://doi.org/10.1093/pan/mpl013.