When synthesizing a dataset, it is essential that the synthetic data holds no sensitive information that can be used to re-identify individuals. This way, we can guarantee that there are no PII in the synthetic data. In the video below, Marijn introduces privacy measures that are in our quality report to demonstrate this.
This video is captured from the Syntho x SAS D[N]A Café about AI Generated Synthetic Data. Find the full video here.
What are the privacy protection measures we take when generating synthetic data?
Mainly, those are metrics to prevent overfitting, looking at distance-measures. This means they check how close the synthetic data is to the original data. If that gets too close, there might be a privacy risk. These metrics make sure that the synthetics data does not get too close to the original data. Additionally, when doing this, the Syntho Engine also uses a holdout set to be able to do this in a fair way.
Contact Syntho and one of our experts will get in touch with you at the speed of light to explore the value of synthetic data!