Testing & Development

Code 3

Establish a representative and GDPR compliant test and acceptance environment by embracing synthetic data. 

Does GDPR impact my test environment and my acceptance environment?

since the introduction of GDPR, companies are obligated to define why using personal data is required and if so, specifically obtain permission. Typically, application testing and development is not the initial purpose of personal data collection. Moreover, the ‘’data minimalization’’ principle states that companies are required to minimize the use of personal data and consequently only use it when strictly necessary. Hence, using original client data (production data) for your test environment and acceptance environment is not allowed, while those typically require representative data to allow your testers, developers and product owners to assess your software or application in representative scenario’s.

The failure of classic anonymization techniques

To overcome this dilemma, one typically applies classic anonymization techniques. Examples that we see in practice:

TechniqueOriginal data

Test and acceptance data

Generalization27 years oldBetween 25 and 30 years old
Masking / wipinginfo@syntho.aixxxx@xxxxxx.xx
Row and column switchingalignedscrambled

Those classic anonymization techniques are limited in three-fold, as illustrated in figure 1:

  1. Data-utility significantly decreases as confidential information is removed, transformed or distorted by the applied classic anonymization techniques.
  2. Classic anonymization techniques make It only harder to link data to individuals, but exposure to privacy risk will always exist.
  3. Data interactions and patterns within the dataset are distorted.

The result: a suboptimal test environment and acceptance environment that contains privacy risk.

Figure 1: the limitation of classic anonymization techniques

classic anonymization

Go synthetic! Use Synthetic data for your test environment and acceptance environment!

Synthetic data by Syntho reproduces the same statistical characteristics of your original dataset, while warranting that no records from the original dataset are present and specific individuals cannot be traced back. Hence, one can set up a test environment and an acceptance environment that has the same statistical characteristics of the original production environment that does not contain records from it. Consequently, using synthetic data for your test environment and development environment has 3 benefits, as illustrated in figure 2:

  1. Synthetic data approaches the statistical properties of the original data, so interactions and patterns are preserved. Consequently synthetic data is realistic and representative.
  2. Synthetic data does not contain records from the original dataset. Hence, synthetic data rules out privacy risk.
  3. Original sensitive or poorly (classicaly) anonimized data does not leave the building, so the likelihood of data breaches is minimized.

The result: a representative test environment and a representative acceptance environment with no privacy risk.

Figure 2: synthetic data for your test environment and acceptance environment

Synthetic data generation for non-existing data

Often when developing (new) features, data-quantity is insufficient, data is not present yet or data is not present at all to perform the desired test scenarios to assess the quality of your application. To overcome this, the Syntho engine operates as data generator to tailor the data quantity, calibrate the statistical properties or even create dummy data. This allows you to produce data for test scenario’s that you otherwise would not be able to perform.

Figure 3: data-synthetisation and data-generation