Answer the question
In order to leave comments, you need to log in
What is the easiest way to collect a bunch of links from a site?
Tell me, please, a wildly undemanding site parser in php. The task is utterly primitive, you need to collect links to products and descriptions from the online store. I do this not to steal content, but to create an XML file for Regmarkets.
In general, I successfully did this using simple_html_dom but on a different computer. Now only a mega-weak and old machine is available, as a result of which the library tries to do something for about five minutes and to no avail. "Gag" occurs at the stage of parsing the code and searching for the necessary tags in it. Tried on Denver and OpenServer, does not depend on the server.
Perhaps it's worth writing from scratch, but I've never done parsers and it's probably faster to use a ready-made solution, but it should be some very simple one. It is necessary: to get links to products from the catalog, go to each link and take the description from the desired div there, save it all in excel.
Answer the question
In order to leave comments, you need to log in
Don't know. On the contrary, I have never used simple_html_dom. And I write everything on a regular basis. I find it very convenient and fast.
If you need to save all this in excel and on the local machine, then I would do it directly with excel
tools Using the WinHttp.WinHttpRequest.5.1 tool, we get page data
'---------------------------------------------------------------------------------------
' Purpose : Стучимся в сервер за результатами
'---------------------------------------------------------------------------------------
' sQuery - строка запроса
' sResponse - ответ, передается по ссылке
Function Runhttp(sQuery As String, ByRef sResponse As String) As Boolean
On Error GoTo ErrorHandler
Dim oHttp As Object
Dim s$, h$, FileName As String
Dim v As Variant
Set oHttp = CreateObject("WinHttp.WinHttpRequest.5.1")
With oHttp
.Open "GET", sQuery, False
.SetRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.137 YaBrowser/17.4.1.955 Yowser/2.5 Safari/537.36"
.SetRequestHeader "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"
.SetRequestHeader "Accept-Language", "uk,ru;q=0.8,en;q=0.6"
.SetRequestHeader "Connection", "keep-alive"
.Send ("")
End With
If oHttp.Status = 200 Then
sResponse = oHttp.responseText
Runhttp = True
Else
sResponse = oHttp.Status
Runhttp = False
End If
ErrorExit:
Set oHttp = Nothing
On Error GoTo 0: Exit Function
ErrorHandler:
If Err.Number = -2147012889 Then ' Ошибка нет соединения
End If
End Function
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question