Answer the question
In order to leave comments, you need to log in
How to process 10gb text file?
Good day!
For the 2nd day I have been trying to find options on how to process a huge text file.
The essence of the task is as follows:
I receive a huge text xml file about 10GB in size. The
file has the following structure:
<organization typeof="Organization" about="http://opendata.trudvsem.ru/7710538364-organizations/organizations.xml#315910200403678">
<region rel="dc:references" resource="http://opendata.trudvsem.ru/7710538364-regions/regions.xml#9100000000000"/>
<name property="name">АЛИМЕНКО ДМИТРИЙ НИКОЛАЕВИЧ</name>
<creationDate>2022-03-05</creationDate>
<legalName>АЛИМЕНКО ДМИТРИЙ НИКОЛАЕВИЧ</legalName>
<companyStructureHidden>false</companyStructureHidden>
<ogrn>315910200403678</ogrn>
<inn>910504080415</inn>
<addressCode>9100000000000</addressCode>
<firstRateCompany>Не относится к крупнейшим компаниям</firstRateCompany>
<businessSize>SMALL</businessSize>
<source>EMPLOYMENT_SERVICE</source>
<innerInfo>
<codeExternalSystem>CZN</codeExternalSystem>
<dateModify>2022-03-13</dateModify>
<deleted>false</deleted>
<isModerated>true</isModerated>
<moderationTime>2022-03-13</moderationTime>
<registrationStatus>Получена по интеграции</registrationStatus>
<status>Одобрено</status>
<disableImportInfo>false</disableImportInfo>
<disableImportVacancy>false</disableImportVacancy>
<disableJoinCompany>false</disableJoinCompany>
<disableJoinManager>false</disableJoinManager>
</innerInfo>
</organization>
<organization>
...
</organization>
</organization>
, select this piece of data to write to the file, and continue reading from the stopped place?Answer the question
In order to leave comments, you need to log in
2 options
correct complex - google: stream xml parser c#
the first result
is the
second simple and stupid - if the organization tag is one of the elements of a huge list and the file is formatted (and this can be done by other streaming means, for example, the console editor of the sed regex, inserting translations to a new line after closing the organization tag or in your program), then you can quickly load each organization into its own line by searching for a substring or by line-by-line loading of the file and analyze it with already familiar non-stream parsers
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question