I
I
Ivan Kolesnik2016-02-14 14:37:22
C++ / C#
Ivan Kolesnik, 2016-02-14 14:37:22

Find median of two sorted arrays?

A curious problem fell into my hands yesterday. Two arrays are given, sorted in advance in ascending order, with a volume of terabytes (which implies the fact that they cannot be merged into one common array and re-sorted). It is required to find the median (the number that is in the middle of these arrays, we connect them together. For example, the arrays [1, 3, 5, 7] and [2, 6, 10] have a median of 5, visually connecting, we get: [1, 2, 3, 5 , 6, 7, 10], i.e., this is not the arithmetic mean.), which will lie in the middle of these two arrays, moreover, sorted relative to each other. I am looking for solutions in C ++, it would be interesting to look at your ideas, because personally I, so far, have not completed
______________________
Already doper) I didn’t see adequate solutions at CPP, but I saw a couple of great ideas and simplified them as much as possible, in the near future I’ll throw the solution on the pros and throw it here. Thank you all so much!

Answer the question

In order to leave comments, you need to log in

2 answer(s)
R
Rsa97, 2016-02-14
@dart_kinselok

What's the problem? The lengths of the arrays are known, we take half the sum of the lengths and shift in parallel along both arrays by this amount, similar to merge sort.
Variant on JS, what happened right off the bat.

arr1 = [1, 3, 5, 7];
arr2 = [2, 6, 10];
p1 = arr1.length;
p2 = arr2.length;
n = p1+p2;
if (n == 0)
  n--;
med = 0;
p1--;
p2--;
while (0 < n) {
  if (p2 < 0 || (p1 >= 0 && arr1[p1] > arr2[p2]))
    med = arr1[p1--];
  else
    med = arr2[p2--];
  n -= 2;
}
if (0 == n) {
  if (p2 < 0 || (p1 >= 0 && arr1[p1] > arr2[p2]))
    med = med+arr1[p1];
  else
    med = med+arr2[p2];
  med /= 2;
}
console.log(med);

PS If you first find the area of ​​intersection of the arrays, then you can reduce the amount of data being viewed.

R
res2001, 2016-02-14
@res2001

In such a volume of data, there is probably a very large number of duplicate values.
You can try to collect some statistics:
value - the number
of each value.
Sort and look for the median of statistical values.
The volume, apparently, will still be large, but no longer terabytes.
Or, not to collect statistics, but to count the number of values ​​while moving along both arrays. On which element you reach the median position - this value will be the median.
If the values ​​in the array are of the same length (for example, 32-bit integers), then based on the total volume of the arrays, it is easy to calculate the position of the median element. Also, using the fact that the arrays are sorted, you can quickly find the amount of each specific value.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question