Questions about synthetic data?
Understandable! Luckily, we have the answers and we’re here to help.
Please open up a question below and click the links to find more information. Have a more complicated question that is not stated here? Ask our experts directly!
Most asked questions
Whereas original data is collected in all your interactions with persons (clients, patients, etc.) and via all your internal processes, synthetic data is generated by a computer algorithm. This computer algorithm generates completely new and artificial datapoints. Read more.
Since the data quality of synthetic data in comparison to original data is key, we recently hosted a webinar with SAS (market leader in analytics) to demonstrate this. Edwin van Unen, analytics expert from SAS, evaluated generated synthetic datasets from Syntho via various analytics (AI) assessments and shared the outcomes. Watch a short recap of that video here.
If you anonymize your data before performing data testing of data analytics, there are several factors at play:
- In almost all cases, anonymized data can still be traced back to individuals due to specific and unique rows (e.g. medical records)
- The more you anonymize or generalize, the more data you destroy. This lowers the quality of your data and thus your insights
- Anonymization works differently for different data formats. This means it is not scalable and can be very time-consuming
Synthetic data solves all of these shortcomings and more. The difference is so striking that we made a video about it. Watch it here.
Frequently asked questions
Generally, most of our clients use synthetic data for:
- Software testing & development
- Model development & advanced analytics (AI)
- Product demos
With a Synthetic Data Twin, Syntho aims for superior synthetic data quality in comparison to the original data. We do this this with our synthetic data software that uses state-of-the-art AI models. Those AI models generate completely new datapoints and models them in such a way that we preserve the characteristics, relationships and statistical patterns of the original data to such an extent that you can use it as-if it is original data. Read more.
Yes we do. We offer various value-adding synthetic data optimization and augmentation features to take your (both ‘dirty’ or ‘clean’) data to the next level. Read more.
Syntho offers a quality report for every generated synthetic dataset. Our quality report contains various basic statistics, including aggregates, distributions and correlations, enriched with more advanced measures, such as multivariate distributions. Read more.
Yes it is. The synthetic data even holds patterns of which you did not know they were present in the original data.
But don’t just take our word for it. The analytics experts of SAS (global market leader in analytics) did an (AI) assessment of our synthetic data and compared it with the original data. Curious? Watch the whole event here or watch the short version about data quality here.
Yes we do. We can link columns in one dataset to columns in another dataset, thus preserving referential integrity and business logic over larger databases. Especially with using synthetic data for software testing purposes, this is very helpful.
Curious to find out more about this? Ask our experts directly.
No we don’t. We can easily deploy the Syntho Engine on-premise or in your private cloud via a docker. The only thing you have to do is connect your data to the engine and the synthetic data will be generated behind your firewalls. We cannot access anything. Read more about the deploy & connect process.
Yes we can. When synthesizing a dataset, it is essential that the synthetic data holds no sensitive information that can be used to re-identify individuals. In this video, Marijn introduces privacy measures that are in our quality report to demonstrate this.
The Syntho Engine is shipped in a Docker container and can be easily deployed and plugged into your environment of choice.
Possible deployment options include:
- Any (private) cloud
- Syntho hosting in IBM HyperProtect Cloud
Syntho enables you to easily connect with your databases, applications, data pipelines or file systems. Easily read and share the generated synthetic data from/to your desired location, i.e. on-premise or (private) cloud.
Connection features that we support:
- Plug-and-play with Docker
- 20+ database connectors
- 20+ filesystem connectors
Naturally, the generation time depends on the size of the database. However, it is safe to say that synthesizing generally takes less than 1 hour.
Very scalable. Synthesizing goes fast and works the same for every type of dataset. Basically, there is no limit to the amount of datasets/databases you can synthesize. Find out more.
Not at all. Although it may take some effort to fully understand the advantages, workings and use cases of synthetic data, the process of synthesizing is very simple and anyone with basic computer knowledge can do it. For more information about the synthesizing process, check out this page or request a demo.
The Syntho Engine works best on structured, tabular data. Within these structures, we support the following data types:
- Structures data formatted in tables (categorical, numerical, etc.)
- Direct identifiers and PII
- Large datasets and databases
- Geographic location data (like GPS)
- Time series data
- Multi-table databases (with referential integrity)
- Open text data
We are experts in synthetic data. But, don’t worry, our team is real!
Contact Syntho and one of our experts will get in touch with you at the speed of light to explore the value of synthetic data!