Answer the question
In order to leave comments, you need to log in
How to search for a test in your comments on Habré?
I have 1200 comments. I want to find among them those in which the word "write" is included. How to do it?
-Searching the site for this word gives a bunch of links to all users, not just me
-Searching in Google \ Yandex on the Habr website by my nickname and this word gives a bunch of pages with other people's comments with this word and mine without it
-Search in google \ yandex this word on the site tangro.habrahabr.ru/comments/ does not give anything
- Scrolling through the comments on the pages and searching on each browser search is a bit boring (many pages). You can’t open all comments on one (well, or I don’t know how).
Are there any adequate ways (other than "write a spider to collect all the pages with comments and search them")?
Answer the question
In order to leave comments, you need to log in
tangro &&/+3 "write" site:habrahabr.ru - for Yandex.
+3 - means the distance is no more than three sentences in the forward direction. Gives, in my opinion, for the most part your comments.
robots.txt (http://%username%.habrahabr.ru/robots.txt) prohibits indexing of everything that is on the user's subdomain, so search engines cannot find anything
User-agent: *
Disallow: /
Host: %username% .habrahabr.ru
You yourself have described all the possible ways. All that remains is a fashionable option for geeks - find the SQL-Injection bug and search the database :)
Comments are on a page like %USERNAME%.habrahabr.ru/comments/page%NUMBER%/ We
find the number of the last page with our hands - point the mouse at the arrow.
Further options:
( 0) YQL, unfortunately, disappears due to the ban on indexing in robots.txt )
1) a shell script that will call wget with a delay. Let's get N html nicknames, you can find it.
For Windows, if there is no wget or you don’t feel like writing a batch file, you can use VBScript / JScript - this is also not long.
2) javascript-one-liner to the address bar of the browser, which will add N iframes to the page with a delay.
In the browser, turn off pictures and flash, we get a bare text page, Ctrl + F drives.
If this does not fall under the definition of "write a spider", in my opinion - quite a way out.
And here are the answers:
1) venda / wget - it doesn’t fit into one line, plus it’s tight with a delay in CMD:
set MAXPSTO=30 set HABRUSER=tangro for /L %i in (1,1,%MAXPSTO%) DO @echo http://%HABRUSER%.habrahabr.ru/comments/page%i/ >> tmp.url wget -w 5 tmp.url
$ for i in {1..30} ; do wget http://tangro.habrahabr.ru/comments/page$i/ && sleep 5 ; done
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question