The Syntho Engine’s supported data

What types of data are supported by Syntho?

Syntho supports any form of tabular data

Syntho supports any form of tabular data and also supports complex data types. Tabular data is a type of structured data that is organized in rows and columns, typically in the form of a table. Most of the times, you see this type of data in databases, spreadsheets, and other data management systems.

Complex data support

Complex data support

Syntho supports for large multi-table datasets and databases

Syntho supports for large multi-table datasets and databases. Also for multi-table datasets and databases, we maximize the data accuracy for every synthetic data generation job and demonstrate this via our data quality report. In addition, the SAS data experts assessed and approved our synthetic data from an external point of view.

We optimized our platform to minimize computational requirements (e.g. no GPU required), without compromising on the data accuracy. In addition, we support auto scaling, so that one can synthesize huge databases.

Specifically for multi-table datasets and databases, we automatically detect the data types, schemas and formats to maximize data accuracy. For multi-table database, we support automatic table relationship inference and synthesis to preserve the referential integrity. Finally, we support for comprehensive table and column operations so that you can configure your synthetic data generation job, also for multi-table datasets and databases.

Preserved referential integrity

Syntho supports for automatic table relationship inference and synthesis. We automatically infer and generate primary and foreign keys that reflect your source tables and safeguard relationships throughout your databases and across different systems to preserve the referential integrity. Foreign key relationships are automatically captured from your database to preserve referential integrity. Alternatively, one can run a scan to scan for potential foreign key relations (when foreign keys are not defined in the database, but for example in the application layer) or one can add them manually.

Comprehensive table and column operations

Synthesize, duplicate or exclude tables or columns to your preference. When you synthesize a database with multiple tables, one typically would like to be able to configure the synthetic data generation job to include and / or exclude the desired combination of tables.

Table modes:

  • Synthesize: Use AI to synthesize the table
  • Duplicate: copy the table over as-is to the target database
  • Exclude: exclude the table from the target database
multi table datasets

Complex data support

Syntho supports for synthetic data contaning time series data

Syntho supports also for time series data. time series data is a type of data that is collected and organized in chronological order, with each data point representing a specific point in time. This type of data is commonly used in many sectors. This could for example be in finance (for example with customers making transactions) or in healthcare (where patients undergo procedures), and many others where trends and patterns over time are important to understand.

Time series data can be collected at regular or irregular intervals. The data can be univariate, consisting of a single variable such as temperature, or multivariate, consisting of multiple variables that are measured over time, such as a stock portfolio’s value or a company’s revenue and expenses.

Analyzing time series data often involves identifying patterns, trends, and seasonal fluctuations over time, as well as making predictions about future values based on past data. The insights gained from analyzing time series data can be used for a wide range of applications, such as forecasting sales, predicting the weather, or detecting anomalies in a network. Hence, supporting for time series data is often required when synthesizing data.

Supported types of time series data

Auto-correlations are included in our quality assurance report

Supported data

Syntho supports any form of tabular data

Data Type Description Example
Integer A whole number without any decimal places, either positive or negative 42
Float A decimal number with either a finite or infinite number of decimal places, either positive or negative 3,14
Boolean A binary value True or false, yes or no etc.
String A sequence of characters, such as letters, digits, symbols, or spaces, that represent text, categories or other data "Hello, world!"
Date/Time A value representing a specific point in time, either a date, a time, or both (any data/time format is supported) 2023-02-18 13:45:00
Object A complex data type that can contain multiple values and properties, also known as a dictionary, map, or hash table { "name": "John", "age": 30, "address": "123 Main St." }
Array An ordered collection of values of the same type, also known as a list or vector [1, 2, 3, 4, 5]
Null A special value representing the absence of any data, often used to indicate a missing or unknown value null
Character A single character, such as a letter, digit, or symbol 'A'
Any other Any other form of tabular data is supported

User documentation

Request Syntho’s User Documentation!