Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

Gaurav Bhalerao is a postdoctoral researcher currently working in the Dementias Platform UK (DPUK) project. His research interests include neuroimaging in psychiatry, machine learning, and computational modelling of brain stimulation. In this one-on-one discussion, he takes time to go into greater depth on specific aspects of his research work in imaging studies.

DPUK postdoctoral researcher Gaurav Bhalerao, who is based at the Department of Psychiatry, University of Oxford.

The availability of open-source platforms has led to increasing amounts of neuroimaging data being shared. Pooling MRI data can be quite challenging, and for this, researchers rely on harmonization methods. We address some pressing questions existing in this area of research.


Why is sharing neuroimaging data important?

Sharing data accelerates scientific discoveries, enhances open science, and allows for efficient use of public funding and research resources. Earlier studies in neuroimaging were conducted with small samples leading to poor usability of the results.

The sharing of neuroimaging data provides increased sample size and statistical power as well as encouraging more engagement and collaboration between researchers in the field of neuroscience. This will eventually benefit in understanding the mechanisms of brain to diagnose, treat and prevent brain disorders. 

What is data harmonization? And what are the benefits of this method?

Data harmonization is the improvement of data quality and utilisation by pooling data from multiple sources. The method reduces the source variability and preserves the biological variability for neuroimaging data. It interprets existing characteristics of data and action taken on data and uses that information to remove unwanted variabilities in data from different sites/studies.  At its simplest, data harmonization enhances the quality and utility of data from multiple sites. This method allows for increased sample size, reduced heterogeneity, better representation of geographic diversity, and increasing the statistical power for reproducible and generalisable results.

Why do you think harmonization is important while dealing with MRI data?

Harmonization is typically performed to reduce heterogeneity across pooled data sets. The aim is to consider MRI acquisition parameters, hardware, quality measures etc. that can be potentially minimised across different scanners/sites. In literature, findings on small samples in neuroimaging are used to generalize the results for an entire population without considering intra- and inter-site variations. MRI data harmonization allows neuroimaging analysis with increase in the sample size; the methods preserve the biological variability but reduces the inter-site variability.

How important are these image acquisition parameters that you’re talking about?

In multi-site MRI studies, it is important to take note of parameters such as image quality, hardware systems, protocols differences, etc across different studies. Ideally it is expected that samples adequately capture these harmonization parameters.

This is extremely important because in machine learning classification - such as those used in imaging studies - parameters learn from training data that are applied during the testing phase. This makes it vital to take note of what is being fed into the system.

Can you briefly explain what you were trying to achieve

Very little is known about the sample size requirement for effectively learning the harmonization parameters. Sample size calculation is of paramount importance in multi-site MRI studies. Samples should not be either too big or too small since both have limitations that can compromise the conclusions drawn from the studies. Too small a sample may prevent the findings from being extrapolated, whereas too large a sample may amplify the detection of differences, emphasizing statistical differences that are not clinically relevant. In our study, we performed experiments to find the minimum sample size required to achieve multisite harmonization using neuroHarmonize. 

What were the main findings of your study?

Using multisite structural brain MRI features, we provide a framework for understanding site/scanner effect by demonstrating that the scanner-effect can be qualified as the Mahalanobis distance between sites/scanners (with reference to a reference distribution). Using this framework, we found that as the Mahalanobis distance increases, the sample size required to correct the scanner effect also increases. The proposed framework will benefit the machine learning classification and prediction for multisite MRI studies.

How does your work add value for future research?

We proposed a framework to understand the site effects using the Mahalanobis distance. We detailed some insights on the various factors in a cross-validation design to achieve optimal inter-site harmonization. The study provides some rules of thumb for the sample size requirement under different scenarios using simulated data. Our framework can be thus utilised for any imaging or non-imaging derived phenotypes wherever there is suspected batch/site-effects. 

Is there anything else that you would like to share?

Further research in this area needs to be carried out while accounting for several factors discussed in this study. These include multivariate distance between the feature matrix of different sites, number of features and sites, effect of covariates (such as age, sex, brain size etc.) and other alternative methods of harmonization in the literature.

I look forward to working further on exploring this framework on DPUK data. I think that it will be very useful as well as challenging because the DPUK data is from different sites and clinical population.  

The full paper ‘Sample size requirement for achieving multisite harmonization using structural brain MRI features’ is available to read in NeuroImage journal. The paper is published by Gaurav who currently works in DPUK and will be used for future DPUK-related research.