Answer the question
In order to leave comments, you need to log in
How to remove unwanted characters in a string?
I wrote a parser using silenium and tesseract
When I try to display the image_link variable along with the numbers, I also get unwanted characters
. Is it possible to remove them and display only numbers?
from selenium import webdriver
from time import sleep
from PIL import Image
from pytesseract import image_to_string
class Bot_dzen:
def __init__(self):
self.driver = webdriver.Firefox(executable_path='C:\\Users\\ilya_pc\\Documents\\gecko\\geckodriver.exe')
self.navigate()
def views_recon(self):
image = Image.open('views.gif')
image_link = image_to_string(image).split('@ ')
views_dzen = int(image_link[0])
views_dzen_2 = int(image_link[1])
views_dzen_3 = int(image_link[2])
def crop(self, location, size):
image = Image.open('dzen_pars.png')
x =location['x']
y = location['y']
width = size['width']
height = size['height']
image.crop((x, y, x+width, y+height)).save('views.gif')
self.views_recon()
def take_screen(self):
self.driver.save_screenshot('dzen_pars.png')
def navigate(self):
self.driver.get('https://zen.yandex.ru/media/id/5a9d345c1aa80c262cd25c42/3-ujasnye-oshibki-v-otjimaniiah-meshaiuscie-rostu-grudi-5aa7c0739b403cd7a6cc68f4')
views = self.driver.find_element_by_xpath('/html/body/article/div/div[2]/div')
sleep(3)
self.take_screen()
location = views.location
size = views.size
self.crop(location, size)
def main():
b = Bot_dzen()
if __name__ == '__main__':
main()
Answer the question
In order to leave comments, you need to log in
Ilya, good evening.
There is a question - the purpose of the script is to get the numbers from the image view.gif ?
if not, then you can get the desired numbers directly from the site and then there will be no problem of a "broken" character.
if you still need to parse the image, then there are a couple of options:
1) Will it help if the code on the view_dzen = int(image_link[0]) call failed?
in the crop method, try to crop more horizontally,
2) regex. after image_link = image_to_string(image) try to select groups of digits (\d+) from image_link
There, views are loaded by js, if I'm not mistaken, but I don't know how to interact with selemium and js, could you tell me))
Why is it so difficult and inhumane to yourself and Yandex. Less expensive to take from here:
https://zen.yandex.ru/media-api/publication-view-s...
without selenium, use urllib2 for example
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question