Is there a book or series of articles on preparing synthetic test data, data profiling?

R

remez2017-11-16 12:21:53

Software testing

remez, 2017-11-16 12:21:53

Often it is necessary to develop system modules in the absence of real or any data at all. But it is necessary to test the work of the implemented software (in my case, ETL, web services, SOA).
Working in different companies, I met different approaches to preparing synthetic test data, profiling.
The most complete method is to perform data profiling: guess/find possible values, outliers and extremes; determine the cardinality of connections; define integral indicators for checks, such as the number of rows, average values, expected data volumes in bytes at the input and output; etc.
Then, synthetic/test data are prepared, which will be fed to the input of the black box - the system module under test.
Then, the standard data expected at the output and / or a set of test cases is prepared that checks the integral indicators (number of rows, volumes, ...).
The results of the black box are compared with the standard.
The problem is that there can be millions and milliards of tracked attribute values and their combinatorial combinations of input data. And it is not known when to stop, how to determine sufficiency.
What are the methods for classifying the analyzed indicators? What are the automation solutions?
Is there a complete monumental work, a book, a series of articles describing, regulating the preparation of synthetic data?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

A

azShoo, 2017-11-16
@azShoo

If we are talking about preparing data for testing , then this all directly follows from the quite capacious principles of test design.
Boundary values, equivalence classes, pairwise testing - all this allows you to quickly determine the necessary data sets for each scenario and collapse them to the required minimum of variations.
Next is the question of identifying scenarios for which you need to prepare data. Here you need to analyze the system, write test documentation, etc.
In general, I would say that a couple of fundamental books on test design and test analysis will help you understand what and on what data to check.
Further, there is only a long and painstaking work on the formation and synthesis of test data.