C
C
cryptodoxer2018-05-12 13:51:16
Database
cryptodoxer, 2018-05-12 13:51:16

In which database to store and search for url > 100M records?

There is a constantly updated base url. Grows per day by 7-10 million records.
Right now it's just MySQL. No url hash. Up to 200M records and a database size of less than 100GB still lives, then new inserts and selects are already very slow on very good hardware.
Where to dig, what to choose for storage?!
Server: 32 xeon core - 64gb ram. ssd - 512, centos

CREATE TABLE `url` (
  `id` bigint(10) UNSIGNED NOT NULL,
  `source_id` int(10) UNSIGNED NOT NULL,
  `url` varchar(255) NOT NULL,
  `checked` enum('one','two') NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

ALTER TABLE `url`
  ADD PRIMARY KEY (`id`),
  ADD UNIQUE KEY `url` (`url`),
  ADD KEY `checked` (`checked`),
  ADD KEY `source_id` (`source_id`),
ALTER TABLE `url` ADD FULLTEXT KEY `url` (`url`);

ALTER TABLE `url`
  MODIFY `id` bigint(10) UNSIGNED NOT NULL AUTO_INCREMENT;
COMMIT;

fulltext - because without it the insertion is slow after 30 lyam urls is
inserted like this:
$sql = $db->queryBuilder->batchInsert('url', array('url',  'source_id'), $urlInsert);
                    $db->createCommand($sql . ' ON DUPLICATE KEY UPDATE url=url')->execute();

slow insertion is when a request such as 20-30k urls above takes more than 30 seconds.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
A
Andrey Shatokhin, 2018-05-12
@Sovigod

We have a similar use. Also more than a billion urls. We use elasticsearch. Without morphology analyzers of course.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question