From Privacy to Possibility: Using Synthetic Data via the integrated Syntho Engine in SAS Viya as part of the SAS Hackathon to unlock privacy sensitive data
We unlock the full potential of healthcare data with generative AI during the SAS Hackathon.
Why unlock privacy sensitive healthcare data?
Healthcare severely needs data drive insights. Because healthcare is understaffed, over pressured with the potential to save lives. However, healthcare data is the most privacy sensitive data and is therefore locked. This privacy sensitive data:
- Is time-consuming to access
- Requires extensive paperwork
- And cannot simply be used
This is problematic, as our goal for this hackathon it the predict deterioration and mortality as part of cancer research for a leading hospital. That is why Syntho and SAS collaborate for this hospital, where Syntho unlocks data with synthetic data and SAS realizes data insights with SAS Viya, the leading analytics platform.
Our Syntho Engine generates completely new artificially generated data. Key difference, we apply AI to mimic the characteristics of real world data in the synthetic data, and to such an extent that it can even be used for analytics. That’s why we call it a synthetic data twin. It is as good as real and statistically identical to the original data, but without the privacy risks.
Syntho Engine integrated in SAS Viya
During this hackathon, we integrated the Syntho Engine API in SAS Viya as step. Here we also validated that the synthetic data is indeed as good as real in SAS Viya. Before we started with the cancer research, we tested this integrated approach with an open dataset and validated if the synthetic data is indeed as-good-as real via various validations methods in SAS Viya.
Is synthetic data as-good-as real?
The correlations, the relations between variables, are preserved.
The Area Under the curve, a measure for model performance, is preserved.
And even the variable importance, the predictive power of variables for a model, holds when we compare the original data with the synthetic data.
Hence, we can conclude that synthetic data generated by the Syntho Engine in SAS Viya is indeed as-good-as-real and that we can use synthetic data for model development. Hence, we can start with this cancer research to predict deterioration and mortality.
Synthetic data for Cancer Research for a leading hospital
Here, we used the integrated Syntho Engine as step in SAS Viya to unlock this privacy sensitive data with synthetic data.
The result, an AUC of 0.74 and a model that is able to predict deterioration and mortality.
As result of using synthetic data, we were able to unlock this healthcare in a situation with less risk, more data and faster data access.
Combine data from multiple hospitals
This is not only possible within the hospital, also data from multiple hospitals could be combined. Hence, the next step was to synthesize data from multiple hospitals. Different relevant hospital data was synthesized as input for the model in SAS Viya via the Syntho Engine. Here, we realized an AUC of 0.78, demonstrating that more data results in better predictive power of those models.
And these are the results from this hackathon:
- Syntho is integrated in SAS Viya as step
- synthetic data is successfully generated via Syntho in SAS Viya
- Synthetic data accuracy is approved, as Models trained on synthetic data score similar then models trained on original data
- we predicted deterioration and mortality on synthetic data as part of cancer research
- and demonstrated increase in AUC when combining synthetic data from multiple hospitals.
Next steps are to
- include more hospitals
- to extend use cases and
- to extend to any other organization, as the techniques are sector agnostic.
This is how Syntho and SAS unlock data and realize data driven insights in healthcare to make sure healthcare is well staffed, with normal pressure to save lives.