DPUK is set to publish a groundbreaking report titled 'User Requirements of Synthetic Data,' exploring how synthetic data can accelerate dementia research. The report addresses critical needs identified by the research community for advancing this innovative approach. In a fascinating development, Lewis Hotchkiss from Swansea University has been sharing his pioneering work on AI and neuroimaging within trusted research environments with DPUK's Bota Godwin. Hotchkiss's research on synthetic data is creating new pathways that could transform dementia research, demonstrating how technological innovation is revolutionizing the field.
What is synthetic data and why does it matter?
Synthetic data refers to artificially generated information designed to replicate characteristics of real-world datasets without containing actual patient information. It exists on a spectrum of fidelity—from basic structural data that mimics only the format of real data, to high-fidelity versions that closely mirror statistical relationships found in original datasets.
The concept might sound abstract, but its applications are remarkably practical. In dementia research, where patient data is highly sensitive and access is tightly controlled, synthetic data offers a pathway to accelerate discovery while maintaining privacy.
Yet there are still many challenges that AI models present to practical deployment into the real world, whether that’s ensuring models protect patient privacy, or making sure that models are robust and generalisable to real world populations.
How synthetic data transforms dementia research
Dementia research faces unique challenges. Misdiagnosis is common—conditions like frontotemporal or Lewy body dementia are often incorrectly identified as Alzheimer's disease. AI models trained on neuroimaging data can help differentiate between these forms, ensuring patients receive appropriate treatment.
However, developing these models requires access to vast amounts of sensitive health data, which is typically stored in Trusted Research Environments (TREs) like the DPUK Data Portal. While these secure environments are essential for protecting patient privacy, they can create bottlenecks in the research process.
This is where synthetic data becomes transformative. Lewis Hotchkiss and the DPUK team has identified several key applications:
Code development: Researchers often wait months for ethical approvals to access real data. Synthetic versions allow them to develop and test analytical workflows during this waiting period, significantly reducing project delays.
Training and education: Students and early-career researchers struggle to access real data for learning. Synthetic alternatives provide safe practice environments for essential skills like data cleaning and analysis.
Data discovery: Before committing to lengthy access applications, researchers can explore synthetic versions to assess whether datasets will meet their needs.
AI model development: Perhaps most exciting is the potential for training privacy-preserving AI models. These can help with early detection of dementia—crucial when current treatments focus on slowing progression rather than prevention.
Challenges and considerations
DPUK works closely with ADDI to support federated analysis across data environments. Synthetic data can be used to support researchers to write queries as they cannot see the actual data, only run code.
Despite its promise, synthetic data isn't without challenges. A recent researcher workshop report from DPUK highlighted several concerns:
Privacy vs utility: Higher-fidelity synthetic data provides greater analytical value but potentially increases privacy risks. Finding the right balance is crucial.
Trust and transparency: Researchers need clear documentation about how synthetic data was generated to evaluate its reliability.
Data types: Some information is particularly difficult to synthesise effectively, including time-series data, genomics, and highly relational datasets.
The team is also exploring how synthetic data can address bias in AI models. "People from ethnic minorities can often be disadvantaged by the AI models that we're training because it hasn't really learned how to deal with that kind of data if it hasn't seen it before," Lewis noted in the Brewing Brilliance podcast. Synthetic data offers potential solutions by creating more diverse and representative training sets.
The future of AI in dementia research
The integration of synthetic data into research infrastructure represents just one aspect of how AI is transforming dementia research. From wearable data that can predict conditions like Parkinson's years before symptoms appear, to neuroimaging analysis that improves diagnostic accuracy, AI tools are becoming essential components of the research toolkit.
As DPUK continues to develop these resources, their commitment remains focused on responsible innovation—ensuring that AI models are fair, transparent, and designed with privacy at their core. By building robust frameworks for synthetic data generation and use, they're creating pathways for faster, more inclusive research that ultimately aims to improve outcomes for people living with dementia.