What is synthetic data?

Syntho enables organizations to boost data driven innovation in a privacy preserving manner by providing AI software for generating synthetic data. But, what is actually synthetic data? What types of synthetic data do exists and how is AI generated synthetic data by Syntho different?

This image illustrates that this page aims to explain what synthetic data is

The definition of what synthetic data is

What is synthetic data?

The answer is relatively simple. Whereas original data is collected in all your interactions with clients, via all your internal processes and by the (original) source, synthetic data is generated by a computer algorithm. Synthetic data is generated by a computer algorithm that generates completely new and artificial datapoints. Although the focus of Syntho is on structured data (data formatted in tables containing rows and columns, like you see in Excel sheets), we always like to illustrate the concept of synthetic data via images, because it is more appealing.

Original data

This is a photo taken with a photo camera of Wim Kees Janssen, one of the co-founders of Syntho. 

This is a photo of wim kees janssen

Synthetic data

This is a photo generated by a computer algorithm of a person that does not exist in the real world.

This image aims to illustrate what synthetic data.

Explore synthetic data

What types of synthetic data do exist?

Various types of synthetic data exists within the umbrella domain of synthetic data. The 3 key types of synthetic data are: dummy data, rule based generated synthetic data and artificial intelligence (AI) generated synthetic data. We shortly explain what the 3 different types of synthetic data are.

Dummy data

Dummy data is randomly generated data. Dummy data is generated by a random noise generator and characteristics, relationships and statistical patterns of the original data are not preserved, captured and reproduced in the generated dummy data. Hence, this data is not representative in any form in comparison to the original data.

Rule based generated synthetic data

Rule based generated synthetic data is synthetic data generated by a pre-defined set of rules. Examples of those pre-defined rules could be a certain minimum value, maximum value or average value. Here, any of the characteristics, relationships and statistical patterns, that you would like to have reproduced in the rule based generated synthetic data, need to be pre-defined.

However, this results in challenges when high data quality is of the essence. First, one can define only a limited set of rules to be captured in the synthetic data. Additionally, setting up multiple rules will typically result in overlapping and conflicting rules. Moreover, you will never fully cover all relevant rules. Furthermore, there will always be relevant rules that you will not even be aware of. And finally (and not to forget), this will take you a lot of time and energy resulting in a non-scalable solution.

In summary, with rule based generated synthetic data, you will end up in a non-scalable situation with synthetic data quality that is as good as the quality of the pre-defined rules.

Syntho focus: artificial intelligence (AI) generated synthetic data

AI generated synthetic data is synthetic data generated by an Artificial Intelligence (AI) algorithm. The AI model is trained on the original data to learn all characteristics, relationships and statistical patterns. Thereafter, this AI algorithm is able to generate synthetic and model them in such a way the it mimics the characteristics, relationships and statistical patterns from the original dataset. Instead of you studying and defining relevant rules (as with rule based generated synthetic data), the AI algorithm does this automatically for you. Here, not only characteristics, relationships and statistical patterns that you are aware of will be covered, also characteristics, relationships and statistical patterns that you are not aware of will be covered.

Our synthetic data engine uses state-of-the-art AI models to generate completely new synthetic data. As opposed to using sensitive original data, customers use our AI software to create top-notch synthetic data. We generate entirely new data, but we are able to model those new datapoints to preserve the characteristics, relationships and statistical patterns of the original data to such an extent that it can be used as-if it is original data. This opens up a wide range of use cases (e.g. in data analytics or testing and development), where synthetic data is preferred over the (sensitive) original data. The Syntho software gives organizations a strong and widely applicable platform to realize innovations with more data, faster data access and zero data privacy risks.

Summary of what synthetic data is and how AI generated synthetic data is different

Data quality is the key differentiator, where artificial intelligence (AI) generated synthetic data offers superior data quality.

This image shows the various synthetic data types to illustrate what synthetic data is

Data quality

AI generated synthetic data by Syntho offers superior data quality

Some insights in AI generated synthetic data by Syntho

Syntho develops software for AI generated synthetic data. As opposed to using sensitive original data, customers use our AI software to create top-notch synthetic data. We generate entirely new data, but we are able to model those new datapoints to preserve the characteristics, relationships and statistical patterns of the original data to such an extent that it can be used as-if it is original data. Syntho offers a quality report for every generated synthetic dataset to demonstrate this. Our quality report contains various basic statistics, including aggregates, distributions and correlations, enriched with more advanced measures, such as multivariate distributions.

Distributions

Distributions give insight in the frequency of a certain data record for a given category or value and are captured by the Syntho Engine. 

The synthetic Data quality report by syntho includes univariate distributions. In this example, the birthday year variable is shown.

Correlations

Correlations provide insight in the degree to which two variables are related and are captured by the Syntho Engine.  

The synthetic Data quality report by syntho includes correlations. This illustration shows the correlation matrix for the generated synthetic data.

Multivariates

Multivariate distributions and correlations provide insight for combinations of categories and are also captured by the Syntho Engine.

The synthetic Data quality report by syntho includes multivariate distributions and correlations. This illustration shows the multivariate distribution matrix for the generated synthetic data

Award winning synthetic data solution

Philips Innovation Award Winner 2020

Photo of Syntho with the Philips innovation award after pitching the synthetic data proposition
Boost the realization of data driven innovation now!
Boost the realization of data-driven innovation now!