I
I
Ivan Melnikov2020-10-30 16:59:18
Oracle
Ivan Melnikov, 2020-10-30 16:59:18

How in Oracle or Teradata to split all table rows into N equal RANDOM samples?

By a random sample I understand the uniform distribution of the elements of a given sample over the general population.
In Teradata, you can get a random sample of size k like this:

SELECT *
FROM t1
SAMPLE RANDOMIZED ALLOCATION k;

You can, of course, continuing further, make the same selection from the remaining rows, and so on.
And if k = number of rows in the table / N, then we get N equal-sized random samples.
But something is too hemorrhoid. I think there should be a one or two line built-in solution. Interested in how to do this in Teradata.
Tell me please.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
A
Artem Cherepakhin, 2020-11-07
@AltZ

select t1. *, ntile(N) over (order by dbms_random.random) nbatch  from t1

And then select the necessary parts of nbatch.
ntile in Oracle is used for plotting histograms, so +/- should work fast.

A
alexalexes, 2020-10-30
@alexalexes

In practice, using the Monte Carlo method, we mark numbers from a random interval of the table entry and take the portion of interest according to this random metric. Since random works for us according to the uniform distribution law, then you will receive a portion of data of approximately the expected length.

select *
from (
select t.*, dbms_random.value(0, 100) rnd
  from table t
) A
where A.rnd <= 30 -- выбираем примерно 30% случайных записей от ген. выборки

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question