What is synthetic data?

Syntho is expert in synthetic data generation and implementation and enables organizations to boost data driven innovation in a privacy preserving manner.

But, what is actually synthetic data? What types of synthetic data do exists and how is AI generated synthetic data by Syntho different?

This image illustrates that this page aims to explain what synthetic data is

Introduction

What is synthetic data?

The answer is relatively simple. Whereas original data is collected in all your interactions with persons (e.g. clients, patients, employees etc.) and via all your internal processes, synthetic data is generated by a computer algorithm. This computer algorithm generates completely new and artificial datapoints.

Here, the focus of Syntho is structured data (data formatted in tables containing rows and columns, like you see in a Excel sheets), but we always like to illustrate the concept of synthetic data via images, because it is more appealing.

In doing so, we have 2 images for you in the following section. From these, the left image is an original photo of Wim Kees Janssen (CEO) and is taken by a photo camera. On the right side however, we see an image generated by a computer algorithm of a person that does not exists in the real world. This is what we call a synthetic image.

Original data

This is a photo taken with a photo camera of Wim Kees Janssen, one of the co-founders of Syntho. 

This is a photo of wim kees janssen

Synthetic data

This is a photo generated by a computer algorithm of a person that does not exist in the real world.

This image aims to illustrate what synthetic data.

Types

What types of synthetic data do exist?

Three main synthetic data types exists within the synthetic data umbrella. Those 3 types of synthetic data are: dummy data, rule-based generated synthetic data and synthetic data generated by artificial intelligence (AI). We shortly explain what the 3 different types of synthetic data are.

Dummy data

Dummy data is randomly generated data (e.g. by a random noise generator). Consequently, characteristics, relationships and statistical patterns that are in the original data are not preserved, captured and reproduced in the generated dummy data. Hence, this data is not representative in any form in comparison to the original data.

Rule-based generated synthetic data

Rule-based generated synthetic data is synthetic data generated by a pre-defined set of rules. Examples of those pre-defined rules could be that you would like to have synthetic data with a certain minimum value, maximum value or average value. Here, any of the characteristics, relationships and statistical patterns, that you would like to have reproduced in the rule based generated synthetic data, need to be pre-defined.

Hence, the data quality will be as good as the pre-defined set of rules. However, this results in challenges when high data quality is of the essence. First, one can define only a limited set of rules to be captured in the synthetic data. Additionally, setting up multiple rules will typically result in overlapping and conflicting rules. Moreover, you will never fully cover all relevant rule. Furthermore, there might be relevant rules that you are not even aware of. And finally (and not to forget), this will take you a lot of time and energy resulting in a non-efficient solution.

In summary, with rule based generated synthetic data, you will end up in a non-scalable situation with synthetic data quality that is as good as the quality of the pre-defined set of rules.

Syntho's focus: synthetic data generated by artificial intelligence (AI)

As you expect from the name, synthetic data generated by artificial intelligence (AI) is synthetic data generated by an artificial intelligence (AI) algorithm. The AI model is trained on the original data to learn all characteristics, relationships and statistical patterns. Thereafter, this AI algorithm is able to generate completely new datapoints and models those new datapoints in such a way that it reproduces the characteristics, relationships and statistical patterns from the original dataset. Instead of you studying and defining relevant rules (as with rule based generated synthetic data), the AI algorithm does this automatically for you. Here, not only characteristics, relationships and statistical patterns that you are aware of will be covered, also characteristics, relationships and statistical patterns that you are not even aware of will be covered.

Summary

AI generated synthetic data differentiates with superior data quality

Data quality is the key differentiator when we compare AI generated synthetic data with dummy data and rule-based generated synthetic data, because AI generated synthetic data offers superior data quality.

This image shows the various synthetic data types to illustrate what synthetic data is

Synthetic data by Syntho

Synthetic data generated by artificial intelligence (AI) opens up opportunities

Generating a Synthetic Data Twin

Synthetic Data Twin

When generating a Synthetic Data Twin, Syntho mimics the original data as closely as possible while realizing privacy. Syntho generates completely new datapoints and models them in such a way that the properties, relationships and statistical patterns of the original data are preserved. Even complex, hidden patterns, relationships and inefficiencies are captured, so the synthetic data can be used as a direct alternative to the original data.

Data Optimization and Augmentation features

Data Optimization and Augmentation

The foundation for Synthetic Data Optimization and Augmentation is a Synthetic Data Twin. From this foundation, we can optimize and augment your data using smart generative AI based on the requirements, logic and constraints of your business. We offer various value-adding synthetic data optimization and augmentation features to take your (both ‘dirty’ or ‘clean’) data to the next level. 

Boost the realization of data driven innovation now!
Boost the realization of data-driven innovation now!