P
P
Pastun2018-10-03 03:37:11
Windows
Pastun, 2018-10-03 03:37:11

How to split a text file into parts with an equal number of lines?

Good day.
There is a text file, the number of lines in the source text is unknown, maybe fifty, maybe a thousand.
You need to break it into several parts, so that the number of lines in the output files is the same (well, ± one line). The number of parts is set in the batch file itself.
Since I'm not well versed in CMD, I piled this rubbish into five parts:

spoiler
:movech
for %%I in (list.txt) do if %%~zI==0 (goto exit)

@echo off
setlocal enabledelayedexpansion
 
set file=list.txt
set first=1
set second=1
set out=V_Obrabotku1.txt
 
set counter=0
<nul set /p x=>>"%out%"
for /f "usebackq tokens=*" %%A IN ("%file%") DO (
 set /a counter=!counter!+1
 if !counter! GEQ %first% (
  if !counter! LEQ %second% (
   echo.%%A>>"%out%"
  )
 )
)

set n=1
set File_Src=list.txt
set file_Dest=textfile_out.txt
 
more +%n% < "%File_Src%" > "%file_Dest%"
move /y textfile_out.txt list.txt

::V_Obrabotku2

set file=list.txt
set first=1
set second=1
set out=V_Obrabotku2.txt
 
set counter=0
<nul set /p x=>>"%out%"
for /f "usebackq tokens=*" %%A IN ("%file%") DO (
 set /a counter=!counter!+1
 if !counter! GEQ %first% (
  if !counter! LEQ %second% (
   echo.%%A>>"%out%"
  )
 )
)

set n=1
set File_Src=list.txt
set file_Dest=textfile_out.txt
 
more +%n% < "%File_Src%" > "%file_Dest%"
move /y textfile_out.txt list.txt

::V_Obrabotku3

set file=list.txt
set first=1
set second=1
set out=V_Obrabotku3.txt
 
set counter=0
<nul set /p x=>>"%out%"
for /f "usebackq tokens=*" %%A IN ("%file%") DO (
 set /a counter=!counter!+1
 if !counter! GEQ %first% (
  if !counter! LEQ %second% (
   echo.%%A>>"%out%"
  )
 )
)

set n=1
set File_Src=list.txt
set file_Dest=textfile_out.txt
 
more +%n% < "%File_Src%" > "%file_Dest%"
move /y textfile_out.txt list.txt

::V_Obrabotku4

set file=list.txt
set first=1
set second=1
set out=V_Obrabotku4.txt
 
set counter=0
<nul set /p x=>>"%out%"
for /f "usebackq tokens=*" %%A IN ("%file%") DO (
 set /a counter=!counter!+1
 if !counter! GEQ %first% (
  if !counter! LEQ %second% (
   echo.%%A>>"%out%"
  )
 )
)

set n=1
set File_Src=list.txt
set file_Dest=textfile_out.txt
 
more +%n% < "%File_Src%" > "%file_Dest%"
move /y textfile_out.txt list.txt

::V_Obrabotku5

set file=list.txt
set first=1
set second=1
set out=V_Obrabotku5.txt
 
set counter=0
<nul set /p x=>>"%out%"
for /f "usebackq tokens=*" %%A IN ("%file%") DO (
 set /a counter=!counter!+1
 if !counter! GEQ %first% (
  if !counter! LEQ %second% (
   echo.%%A>>"%out%"
  )
 )
)

set n=1
set File_Src=list.txt
set file_Dest=textfile_out.txt
 
more +%n% < "%File_Src%" > "%file_Dest%"
move /y textfile_out.txt list.txt

Goto movech

:exit


The script works, but it is painfully cumbersome, and it takes a long time to process a thousand lines.
Prompt, please, more graceful decision. Thanks in advance.

Answer the question

In order to leave comments, you need to log in

3 answer(s)
E
Eugene, 2018-10-03
@yellowmew

forget about cmd
Powershell is much easier for you to master.
example for your case

$file = get-content "путь к файлу"
$parts = 4 #количество частей
$lines = [math]::Round($file.Length/$parts) 
for ($i = 0; $i -le $parts; $i++) {
    $file | Select -Skip ($lines*$i/1) -First ($lines*($i+1)/1) | Set-Content -Path "путь к целевой папке\part_$i.txt"
}

It will only work fast for relatively small files. because the file is fully loaded into memory.
If your file size goes into gigabytes - you can try to adapt the script for you
https://stackoverflow.com/questions/1001776/how-ca...
The script has a comparison condition with the size of the target files - you can change it to a comparison condition with the number of lines in the file

R
res2001, 2018-10-03
@res2001

You have not measured the code heaped up, all this can be done much more compactly.
I did not begin to understand, tk. really a lot of code for such a task.
From your description it is not clear exactly how to break into lines:
1. take the first few lines and write to one file, the next portion to another, etc.
2. we take one line and put it in the first file, the second - in the second, etc., when the files end, we start again from the first file.
Point 2 is implemented generally elementary in one reading cycle and with one file counter.
Point 1: you must first count the total number of lines (you can use a loop and a counter, or you can play with find /c /v "" <file name>, it is clear that find will work much faster, but it is not entirely clear how find will behave with empty lines, you need to experiment) and figure out how many lines will fall on each of the files. Then, using for /f "skip=X" - read the file line by line, skipping the required number of lines, and counting the copied lines. It's also not very difficult. I think after your heroic efforts you will be able to simplify your code.
There will be questions - throw here.
PS: PowerShell, of course, is much more powerful, but its syntax seems to me worse than that of batch files, perhaps this is because I know the cmd language, but still no posh :-)

D
Directumov, 2021-02-26
@Directumov

It’s easier to use a ready-made solution, why waste time writing a batch file? This solution is based on HTML5, splitting multi-gigabyte files in a fraction of a second.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question