Answer the question
In order to leave comments, you need to log in
Does such an xml or html parser exist?
Different languages have libraries for parsing html or xml. Is there a ready-made product for this case? Let's say I download 1000 html pages with some Teleport Pro. Then I indicate to the program a folder with files and a template for sampling. For example, take the contents of headers or lists from each file. Has anyone seen this? I'm also interested in a ready-made solution for sampling from xml.
Answer the question
In order to leave comments, you need to log in
Such a problem can be solved in any programming language, but you will not find ready-made solutions, you have to write it yourself. I myself would write such a parser in JavaScript and stupidly format it as a small local html file: open this file in the browser, pick up a folder from the disk in the input type = "file" field , and then read all the files from the selected folder with JavaScript and parse with new DOMParser() .
Why JavaScript and not php or python? Simply JavaScript is the most ideal language for parsing html. Out of the box, there is a rich set of tools for working with html code, no other language can work with html as well as JavaScript - after all, it is literally created for this.
Maybe there is, but there is no "magic" button. You need to know a little about the structure of the html document. I do this in Python. Parsing in python can be learned in a week or two. But if you know other PLs, then it's faster. If you write such a parser yourself, then your parser will have unlimited possibilities.
Here is an example:
import requests
from bs4 import BeautifulSoup
from lxml import html
import os
def parsing(filename):
with open(filename) as file:
data = file.read()
soup = BeautifulSoup(data,"html.parser")
title = soup.find('h1',class_='question__title').text.strip()
print(title)
os.chdir('html')
fileList = os.listdir('./')
for file in fileList:
parsing(f)
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question