Answer the question
In order to leave comments, you need to log in
How to save a site page with all its dependent content?
People, tell me how to save the page and all the files needed in it so that the path structure is preserved, but will there be a different folder name? I read that something like phantom.js can do this, but I did not find an example of this ...
Thank you!
Answer the question
In order to leave comments, you need to log in
Wait.... Phantom.js is a bit different....
If you just need to download the site while saving all the paths and dependent files, then use wget!
Where
-r indicates that you need to recursively follow links on the site in order to download pages.
-k is used to have wget convert all links in downloaded files so that they can be followed on the local machine (offline).
-p - Indicates that you need to download all the files that are required to display pages (images, css, etc.).
-l—defines the maximum nesting depth of pages that wget should download (default value is 5, in the example we set 7). In most cases, sites have heavily nested pages, and wget can just dig in, downloading new pages. To prevent this from happening, you can use the -l option.
-E - add .html extension to uploaded files.
-nc - this option will not overwrite existing files. This is convenient when you need to continue loading a site that was interrupted the previous time.
PS you write that it will be a service, then in principle it's not difficult to write a BASH script implementation. However, you need to take into account all the subtleties of your task ...
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question