MongoDB map-reduce - Shlemiel's painter effect?

Z

zxmd2014-02-19 18:33:01

MongoDB

zxmd, 2014-02-19 18:33:01

Problem - in a collection already 45 records. Every few hours, a map-reduce runs through the database and aggregates the data. The problem is that every day he does it slower and slower.
The structure of the original collection is something like this:
company_id:xxx, ts:.....
company_id:xxx, ts:.....
company_id:yyy, ts:.....
company_id:xxx, ts:.....
company_id:yyy, ts:.....
The aggregating query leads it to this:
ts:...., xxx:3,
ts....., yyy:2
, etc. where ts is the date (no time)
That is, the map-reduce constantly runs over a huge number of ts already worked out, doing work that has already been done. How are these problems usually handled? Ie how to make it run only on new data? You can somehow remember the last ts at the time the handler was launched and start the next one with the {ts:{$gt:__stored_ts__}} filter, but it seems to me that this is not correct. Since:
1 - it is not clear which ts to save - ts when work started or when work is completed.
2 - how to merge data in the resulting table. Example. Company_id: xxx, for a certain day ts: 02/20/2014 had 3, after passing the algorithm, the same xxx had 6 added to the same ts. That is, in fact, the resulting collection should have 9. So far, I see only the option with map-reduce in a separate collection and running a script that will update the data in the main summary collection by simply iterating over the values.

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

_

_ _, 2014-02-19
@zxmd

But there are no options when inserting somewhere to update the counter ts: .... => xxx + 1 or ts: ... => yyy + 1 ?
The operation is cheap, but it will save you a lot of time

S

Stanislav Klementiev, 2014-02-24
@Marques

Is wikipedia no longer helpful?

A

Alexey, 2014-03-19
@fuCtor

The documentation has an example of incremental execution of MapReduce.
docs.mongodb.org/manual/tutorial/perform-increment...