Questions about the Data Portal
We asked researchers from a range of disciplines for their questions about the Data Portal. Do you have a question relating to your proposal in the Data Portal?
How has DPUK cleaned its cohort data? Have the data been changed in any way in order to bring it all together? - Ruby, Postdoctoral researcher
The data have not been changed by DPUK in any way at all. However, the datasets could be in many different spreadsheets and many different formats. Sometimes cohort datasets contain duplicate variables or notes for researchers. When the DPUK team receives cohort data, we organise them by combining multiple spreadsheets if necessary. The variables are then categorised into 22 broad categories and 120 subcategories. Read the blog from one of the researchers involved in this area of DPUK's work.
Who reviews my research proposal? - Catherine, Early Career Researcher
Your proposal is reviewed for a sense check by the DPUK team and then looked at by each of the cohorts in your application. The DPUK check is designed to help you improve your proposal if necessary, to give you the best chance of getting a positive response from the cohort data owners. DPUK manages and fast-tracks this process by managing communication with cohorts on your behalf.
Can I use DPUK data for my Masters thesis or do I have to be a very experienced researcher?
- Sophia, Masters student
Yes. You are encouraged to make an application for data for your Masters thesis. As is the case for published research, you must reference DPUK in your thesis. If you are applying for genetic data, you may be asked for a copy of your CV to show that you have some experience with big data, or provide evidence that you are working with analysts who can support you in your research. Please note that the application screening process is stricter for genetic data applications.
How exactly is DPUK standardising its cohort data? - Christoph, Postdoctoral researcher
DPUK is standardising cohorts' data according to the DPUK ontology – a naming system based on a scientific rationale developed by senior researchers within DPUK. This involves cleaning the data and then applying the ontology through a STATA script we have programmed. This script renames the variable names and labels across the full cohort dataset, adding the DPUK assigned unique cohort identifier, ontology category number (for example, 'Physical Health Status'), recoded variable name (for example, 'AGEASTHMA') and the number of arrays within the variable. Arrays are the number of recorded responses within that variable, or levels to that variable. In curating the cohort datasets in this way, DPUK is enabling better cross-cohort research by normalising variable naming structures.