I
I
Iskandar2013-12-14 23:24:01
Python
Iskandar, 2013-12-14 23:24:01

How to process wikipedia dump with Wikipedia Extractor?

You need to process the wikipedia dump with this tool. According to the descriptions on the Wikipedia website, Extractor did not understand how this is done in Windows.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
S
sheknitrtch, 2013-12-15
@KIBIs

The Wikipedia extractor is a Python script that takes an XML dump of the Wikipedia database as input and text as output. That is, Python must be installed. To feed the database to this script, it must first be extracted from the BZ2 archive. But the unpacked file will take up a lot of space. Therefore, developers recommend doing unpacking on the fly, without saving data on the hard drive. Linux has the bzip2 utility for this. Under Windows, you can use the console 7-zip. The team will be next
Everything before the '|' is the unpacking command. And after - this is the command to launch Wikipedia Extractor with some parameters.
I haven't checked if this works, since I don't have a Wiki dump.

D
Dmitry Guketlev, 2013-12-15
@Yavanosta

Can you remove unnecessary questions?

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question