Answer the question
In order to leave comments, you need to log in
How to determine the dictionary limit when uploading?
The bottom line:
1) I have a large number of texts from VK
2) as well as a dictionary of words that I work with
I need to extract from this test database only those texts that relate to the selected words (let it be "apartment" and "house" )
I sort of did it...
BUT...
I need the downloaded texts to contain no other words from my dictionary!
those. if the text contains "apartment", "chair", "wardrobe" - then this text should not be unloaded
i.e. as a result, I should have a set of texts where only one word from the dictionary occurs, and there should not be others there
, in fact, the code itself:
import csv
from collections import Counter
house_list = set(["квартира", "дом"] )
in_csv = open("C:\\Hun\\texts_for_topicminer\\Vk_csv_full_lem_CORRECTED.csv", "rt", newline="")
out_csv = open("C:\\Hun\\dasha\\house_counter.csv", "wt", newline="")
full_house = open("C:\\Hun\\dasha\\house_list-2.csv", "rt", newline="")
reader = csv.reader(in_csv, delimiter=";")
writer = csv.writer(out_csv)
full_house_reader = csv.reader(full_house, delimiter=";")
full_house_list = set()
for row in full_house_reader:
full_house_list.add(row[0])
print(full_house_list)
for house in house_list:
full_house_list.remove(house)
writer.writerow(["line_number", "auth_id", "date", "text", "city", "region", "text_length", "квартира", "дом"])
for num, row in enumerate(reader):
words_list = row[0].split()
if set(full_house_list).issubset(words_list):
continue
else:
cnt = Counter(words_list)
two_house = False
for house in house_list:
if cnt[house] != 0:
two_house = True
if two_house:
house_counter = {}
for house in house_list:
house_counter[houses] = cnt[house]
writer.writerow([num + 1, row[1], row[4], row[0], row[7], row[8], len(words_list), house_counter["квартира"], house_counter["дом"]])
Answer the question
In order to leave comments, you need to log in
The code is not idiomatic. It's not clear what this does. full_house_list is a list of words or texts?
for house in house_list:
full_house_list.remove(house)
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question