Skip to main content

Matched Sampling Approaches for Statistical Inference in Large-Scale Clinical Neuroimaging Studies

Periodic Reporting for period 1 - zelig (Matched Sampling Approaches for Statistical Inference in Large-Scale Clinical Neuroimaging Studies)

Reporting period: 2017-06-01 to 2019-05-31

Observational neuroimaging studies undertaken at-scale can offer deep insight into the complexities underlying neurological and psychiatric disease. While neuroimaging studies have shown promise over the last decade in the investigation, diagnosis and in treatment monitoring of these disorders, causal inference methodology in population neuroimaging is relatively underdeveloped compared to areas where large datasets are common as, for example in epidemiological studies and the social sciences.
According to the World Health Organisation, neurological disease in its various forms afflicts tens of millions of people worldwide and in some of its domains is forecast to rise significantly. In prevalence studies of Alzheimers Disease (AD) for example, figures are predicted to rise from 46 million today to over 130 million by 2050. While there are limited treatment options available for many of these afflictions, it is likely that future treatment strategies will be most effective if applied at the earlier stages of disease. This work seeks to improve techniques and models for early disease detection using existing large scale high potential neuroimaging datasets including the UK Biobank.
The main project objective was to increase the clinical utility of large-scale neuroimaging datasets through improved statistical modelling of the underlying disease factors related to Alzheimers Disease. We sought to both enhance early disease detection and advance understanding of complex neurological mechanisms that foreshadow disease onset. Specifically we developed statistical and algorithmic techniques from the fields of Causal Inference and Machine Learning for application to observational neuroimaging studies. We wished to build statistical models that go beyond capturing simple associational models to richer classes of technique capable of providing definitive causal links between interventions or risk factors, nascent imaging markers of disease and health outcomes.
Conclusions:This project investigated the use of matching techniques from causal inference in observational neuroimaging datasets. These techniques have theoretical and empirical properties that render them attractive to clinical application : a treatment effect observed to be significant with these methods is more likely to have a valid interpretation compared to a naive estimate. On a large neuroimaging dataset we saw strong evidence of a causal relationship between education and certain brain structures, some of which have not previously been implicated with educational attainment. Since there also exists evidence of a relationship between risk of developing cognitive impairment and/or dementia and education, this finding holds promise for its extension to more complex models of dementia. We are continuing work in these directions.
The work performed for this project can be broken down into four major parts: (1) Scientific and statistical problem formulation (2) Methods: statistical analysis and software (3) Results. (4) Dissemination.

The overall focus of this project was to develop matching techniques useful for understanding, detecting and informative in suggesting suitable interventions in brain disease using large observational neuroimaging datasets. We used the UK Biobank project as a testbed to develop and implement data-analytic aspects. The UKBiobank data is a cohort study that has been specifically designed “ with the aim of improving the prevention, diagnosis and treatment of a wide range of serious and life-threatening illnesses – including cancer, heart diseases, stroke, diabetes, arthritis, osteoporosis, eye disorders, depression and forms of dementia”.

Our goal was to use the large scale of this dataset to reliably uncover important relationships between key (potentially modifiable) risk factors and outcomes that through previous literature have shown evidence of being implicated in future dementia or risk of dementia. There is strong evidence of a significant statistical relationship between Alzheimers risk and educational attainment, a factor upon which we focussed. Furthermore use of matched sampling in large imaging studies has been hitherto limited. We now give an overview of the methods and results to date.

Methods
The UK Biobank dataset provided us with more than 20000 subjects with full MRI imaging. We describe here just the first step in estimating the full mediation model which was to first assess the relationship between levels of education and brain structure. To this end we used an image processing methods known as Voxel-based morphometry to process each subjects T1-weighted MRI image into a common brain atlas space into which they could be easily compared across education in terms of grey matter density. Computationally this process took around 10-20 minutes per subject, leading to a total compute time per analysis of about 5-6 days on a 40 cpu linux cluster.

Results
We fit an adjusted 1x4 ANOVA across the 16631 subjects, to test if there was a significant difference in means of brain structure between the four educational levels. The subject numbers within each ANOVA level were (1) College or University Degree (9422) (2) A-levels or equivalent (2736) (3) O Levels or equivalent (3866) (4) CSEs or equivalent (607). We first fit a nonparametric adjusted ANOVA model and saw large areas of highly significantly different grey matter density primarily in the cerebellum. To simplify the problem somewhat we modified the groups into just two categories, contrasting those who had a degree (9422) and those who did not (7209). As seen in Figure 1 we also saw large areas of cerebellum and additional areas in the fusiform gyrus, temporal pole, parahippocampus, orbito-frontal areas and striatum.

Dissemination: One manuscript (joint first name) that used a subset of the methods developed in this project submitted to a clinical journal. This work used MR imaging texture analysis for prediction of BRCA-associated genetic risk. One manuscript in preparation reporting the results showing the effects of Education on Brain Structure under proper conditioning of covariates. Another is in preparation detailing the methods used above and extended to the setting where dementia outcomes are also known. One methodological conference planned for presentation in 2020.
There are two major differences in this work as it compares to the literature. The first is our introduction of neuroimaging into the causal chain of dementia (as a mediating factor) and the second our manner of mitigating confounding factors that affect educational attainment, such as socioeconomic status, nutrition or other lifestyle factors.
Since dementia is a disease that has far reaching consequences for all those impacted in our society, this work is another step in the direction of helping us build a reliable picture of how social, demographic, and lifestyle interact and to what proportion with the disease. In future work, such study could set the scene for suggesting useful courses of preventative action, as for example lifestyle changes likely to mitigate an individuals risk of disease.
FIgure 1 - FWE-Corrected difference between subjects with degree vs no degree