Answer the question
In order to leave comments, you need to log in
In which database to store and search for url > 100M records?
There is a constantly updated base url. Grows per day by 7-10 million records.
Right now it's just MySQL. No url hash. Up to 200M records and a database size of less than 100GB still lives, then new inserts and selects are already very slow on very good hardware.
Where to dig, what to choose for storage?!
Server: 32 xeon core - 64gb ram. ssd - 512, centos
CREATE TABLE `url` (
`id` bigint(10) UNSIGNED NOT NULL,
`source_id` int(10) UNSIGNED NOT NULL,
`url` varchar(255) NOT NULL,
`checked` enum('one','two') NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `url`
ADD PRIMARY KEY (`id`),
ADD UNIQUE KEY `url` (`url`),
ADD KEY `checked` (`checked`),
ADD KEY `source_id` (`source_id`),
ALTER TABLE `url` ADD FULLTEXT KEY `url` (`url`);
ALTER TABLE `url`
MODIFY `id` bigint(10) UNSIGNED NOT NULL AUTO_INCREMENT;
COMMIT;
$sql = $db->queryBuilder->batchInsert('url', array('url', 'source_id'), $urlInsert);
$db->createCommand($sql . ' ON DUPLICATE KEY UPDATE url=url')->execute();
Answer the question
In order to leave comments, you need to log in
We have a similar use. Also more than a billion urls. We use elasticsearch. Without morphology analyzers of course.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question