Answer the question
In order to leave comments, you need to log in
How to correctly count the number of rows in CSV with field larger than field limit (131072)?
Hello.
It is necessary to transfer the data from Oracle to Vertica.
I decided to use it through a CSV file, because. Vertica loads them fast enough.
On the test tables, everything is without problems.
They will start working with the combat scheme, everything would be fine, BUT according to statistics, 40,000 lines went from Oracle to CSV, after loading into Vertica it turned out to be 300 less.
No errors or anything.
I want to figure out why this is happening, I did not find another option except how to count the number of lines in the CSV file, in theory there should be 40 0000
of them. Everything seems to be trivial,
with open(filename,"r", encoding='UTF8') as f:
reader = csv.reader(f,delimiter = ";")
data = list(reader)
row_count = len(data)
print(row_count)
data = list(reader)
_csv.Error: field larger than field limit (131072)
Answer the question
In order to leave comments, you need to log in
Count the number of \n in a file?
import sys
import csv
csv.field_size_limit(sys.maxsize)
thank you all,
a working version that takes into account all the features of sim3x
import sys
import csv
maxInt = sys.maxsize
decrement = True
while decrement:
# decrease the maxInt value by factor 10
# as long as the OverflowError occurs.
decrement = False
try:
csv.field_size_limit(maxInt)
except OverflowError:
maxInt = int(maxInt/10)
decrement = True
with open(filename,"r", encoding='UTF8') as f:
reader = csv.reader(f,delimiter = ";")
data = list(reader)
row_count = len(data)
print(row_count/2)
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question