Synthetic biobank data generation for data sharing with stakeholders
About the client
This leading biobank in the Netherlands, conducts a multigenerational cohort study since 2006 with over 167,000 participants to collect relevant data and biosamples. This data is related to lifestyle, health, personality, BMI, blood pressure, cognitive abilities, and more. This biobank offers this valuable data, making it an essential resource for national and international researchers, organizations, policymakers, and other stakeholders that typically focus on preventing, predicting, diagnosing, and treating diseases.
As this biobank is on a mission to make its data more accessible for researchers, organizations, policymakers, and other stakeholders, having strategic solutions in place to safeguard the privacy of its participants is essential. Hence, this biobank partners with Syntho to synthesize the data, thereby enhancing its accessibility and preserving the privacy of participants. As an alternative to using real data, everyone has now the possibility to work with synthetic data. Anyone interested in the data is encouraged to reach out for further information and support.
As for adopting new solutions, this biobank wanted to evaluate Synthetic Data and Syntho in practice via an initial evaluation study. Here, this biobank approved synthetic data from Syntho on accuracy, privacy, and ease of use in comparison to open-source solutions and commercial solutions. Here, as for the set, geographical location and longitudinal data are crucial. As a sneak preview, we can see the distributions of postal codes of participants for the real data, the synthetic data, and a comparison graph between real data and synthetic data. As the graphs overlap closely, it was concluded by this biobank that fidelity and accuracy are preserved. As this is only one element as part of this evaluation, other results are available on request.
Researchers, organizations, policymakers, and other stakeholders have now the opportunity to receive synthetic datasets
This successful evaluation of synthetic data generated by Syntho marks a significant step forward for this biobank in leveraging new solutions to make their data more accessible while preserving the privacy of participants. Hence, this biobank utilizes now synthetic data to create artificial datasets that mirror the statistical properties of real data without compromising participant privacy. Consequently, researchers, organizations, policymakers, and other stakeholders that have an interest in this data have now the opportunity to receive customized synthetic datasets, generated in collaboration with Syntho. By embracing synthetic data, this biobank boosts access to data and accelerates research while maintaining the highest level of privacy protection for their participants. This underlines their commitment to both scientific advancement and privacy preservation.
Faster access to data
Synthetic data allows for faster access to data by minimizing compliance paperwork and procedures. This enables data users for quicker analysis, faster hypothesis testing, and earlier results, without delays caused by compliance procedures.
Preserve the privacy of participants
By incorporating synthetic data, participant information remains secure, safeguarding their sensitive details effectively. Privacy-enhancing techniques, like synthetic data, improve confidence in participants that their data is protected, encouraging their active participation in research projects. This fosters trust in this biobank as a reliable and trusted resource, further accelerating participant engagement.
Increased accessibility of data
Synthetic data opens new possibilities for sharing information with organizations that might not be preferred to access real data or might have access to minimal data. This approach allows for increased data accessibility while mitigating risks associated with sharing actual data.
Preview data before buying with a data catalogue
With data commercialization, potential buyers often prefer to preview the data before making a purchase in something like a sandbox environment. However, using real data for previews becomes problematic due to compliance paperwork requirements and the risk of devaluing the data if exchanged beforehand. One could overcome these challenges by employing a synthetic data catalogue, allowing prospective buyers to preview data conveniently, thereby enhancing the commercialization process.
Organization: Leading Biobank
Location: The Netherlands
Size: 100+ employees
Use case: Analytics
Target data: Healthcare historical data
Website: On Request