I
I
isapioff2011-01-10 02:02:53
Combinatorics
isapioff, 2011-01-10 02:02:53

Search for the most suitable sets from a sequence (too many combinations)

There is a set of fixed length bit sequences (L == 1024). The number of sequences is ~43000.
How to find among them a combination of N (N == 10) such that the number of included bits in the bitwise logical addition of the result is maximum?

For example (L = 4, N = 2) The answer will be: The task is complicated by a huge amount of initial data. The number of possible combinations makes me sad. Is it possible to solve this problem in an acceptable time, or rely on chance and run a random algorithm for a couple of days? upd Either I'm already stupid, or I don't understand the proposed solutions. Searching for the maximum number of bits will not return a result. I illustrate: L = 4, N = 2, for the set 1110 - 3 1100 - 2
0100
0010
0110
1000
1100


0110
1100
---- проверяем, какие биты у нас включены
1110




0110 - 2
0001 - 1

if we count the number of bits and sort (take top N), we get 1110 1100
and
we

need
1110
0001

upd2 11100 - 3 11000 - 2 01110 - 2 00010 - 1 00001 - 1 0) m = 11111 1) get the best sequence for m (11111) result = [11100] now m = 00011 2) get the best sequence for m(00011) result = [11100, 01110] now m = 00001 3) get the best sequence for m(00001) result = [11100, 01110, 00001]

now m = 00000, finished

getting the answer - [11100, 01110, 00001]
but this is not the only solution (at step 2, 01110, 00010, 00001 are suitable)
if the amount of initial data is large, then there will be many "best" results. and, of course, not the fact that there is generally a combination that will cover all the bits (I don’t need this). in nature, I will get a bunch of results (if I choose random among the “best” options). from these “best” ones, in the end, I will choose the most suitable one for me (I will sort by the number of covered bits).

Answer the question

In order to leave comments, you need to log in

6 answer(s)
V
VenomBlood, 2011-01-10
@VenomBlood

In upd2, the correct algorithm, you can choose from the “best” the first one that comes across, this absolutely does not affect the resulting result. (If I understand correctly, are you now considering all options with different "best" choices at each step?)

D
Dmitry Guketlev, 2011-01-10
@Yavanosta

What's the problem with counting the bits in a 1024 bit sequence? Where do you store them? I think in the end this task will be reduced to 1024 shifts. Load them into 32 longs and iterate through the array.

long[][] myArray = new long[43000][32];
int[] bitsCount = new int[43000];

//load our sequence into 32 longs. 32*4*8 = 1024 bits
<here is some code>

//determine the number of bits set
for(int i = 0;i<43000;i++) //loop through all sequences
   for (int j = 0;j<32;j++) //loop through all longs
      for(int k = 0;k<32;k++) //loop over all long bits
         if((myArray[i][j] >> k) & 0x1)
            bitCount[i]++;

//next, sort the bitsCount array and take the N top ones. Or we optimize the sorting so that it does not completely sort, but only looks for the top N

//???


//PROFIT!

It won't work for very long. Certainly not a couple of days. Or I did not understand the task?

A
Alexander Belugin, 2011-01-10
@unkinddragon

You only need to read a 5 megabyte file, reading in chunks of 10 kb.
You need to read from each position (from 1 bit to 1024, 2 to 1025, etc.), but if some piece gives a really bad result, you can skip K bits at once, where K is the difference between the result and the top 10 result at the moment (to don't miss anything).
In my opinion, it will turn out pretty quickly, although not at all algorithmically.

M
MikhailEdoshin, 2011-01-10
@MikhailEdoshin

I have this vague feeling that this is an NP-complete task. But I can't believe in algebra :(

L
leventov, 2011-01-10
@leventov

And why are you discarding the most obvious way:
Take any combination, then try each remaining sequence in turn, whether it can improve the combination (the main parameter is an increase in the coverage area, the side parameter is the number of single bits).
Complexity is linear

M
Mrrl, 2011-01-10
@Mrl

There is an example for L=22:
0011 1111 1111 1100 0000 00
0000 0000 1111 1111 1111 00
0101 0101 0101 0101 0101 01
1010 1010 1010 1010 1010 1010 1010 1010 10
units it won't work anymore, other pairs except (3,4) give only 17. So my heuristic doesn't work either.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question