B
B
Bigdgon2017-01-19 11:36:54
PHP
Bigdgon, 2017-01-19 11:36:54

Autotest crawling the site on all pages and checking the presence of content?

Hello.
Please advise on this matter. There is a site with a bunch of different pages. You need to write an autotest that will bypass all the pages of this site and see if the page is available or not. Does it have statics, pictures, java scripts, etc. And now the actual questions:
1. How can I collect links to pages from the site, except manually? I found several utilities on this issue, but one found 49999 links and stopped, the other found 783.
2. How can such autotests be written, and are there any ready-made solutions? Maybe there are links to the solution of this issue. I wrote simple autotests on Codeception, but will it be able to perform this task?
The site is written in PHP.

Answer the question

In order to leave comments, you need to log in

3 answer(s)
D
Dmitry Eremin, 2017-01-19
@EreminD

a query in Google - "web crawler" or "web crawler"
is also not difficult to write yourself:

  1. Went to the page
  2. Collected all links on the page and added them to the list of crawled links
  3. Excluded unnecessary ones (create a list of exclusions: links to external resources, for example, starting with htts://vk.com)
  4. Have done some more checks
  5. Take the first link from the list
  6. We followed it and removed it from the list to bypass
  7. Repeat the list from point 2 until the bypass links run out

N
Nikolai Konyukhov, 2017-01-19
@heahoh

It seems to me that you set too large a task for testing. Is there an economic benefit?
The page "About the company" with contact details does not work for you - you lost a couple of clients who could not get through - they presented the amount of damage. We wrote a test for this page - spent the developer's time - estimated the cost of the work done. Whether these things are commensurable and whether it is necessary to spend time testing this page is a question. Do not forget about maintaining tests in a usable form over time - when changing pages, you will again have to spend the developer's time modifying the test for new input data.
If the test is generally simple - check the server response code by URL and the presence of the necessary elements (js, css, html structures) - then I think you can modify the crawler, which, in addition to collecting site URLs, will also go through the reference pages and check the data response to the code and data availability. Codeception can fully check "weighty" pages, such as an order form or an authorization form.

M
Mikhail Shvedkov, 2017-01-25
@kosolapus

For simple poking at pages like "are you still alive, my old lady?" fit xenu. It also knows how to generate reports, a sitemap can collect. On a site of 80k+ pages, the report hung up the system a little, but the reason, I think, is already clear) By the way, it poke all the resources (html, js, css, pictures, vidos)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question