S
S
ShabaevMB2017-07-17 16:35:37
Java
ShabaevMB, 2017-07-17 16:35:37

How to get the HTML code of a page in Java in the background?

Hello.
The question is: how do I get the HTML code of a page in the background using Java? So that I enter the URL of the site page into the line, click OK, and he himself gave me the HTML code. No site scripts are of interest. You just need the pulled Ashtiemel code.
Help me please. What Java methods can contribute to this? Quite green in all this.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
A
Alexander Aksentiev, 2017-07-17
@Sanasol

Probably still javascript?
Directly through it, the browser will not give html due to CORS.
Therefore, only using the backend or crutching https://stackoverflow.com/a/18447625, I'm not sure if this will work.

D
dimkss, 2017-07-18
@dimkss

I constantly use this (curved, my hands do not reach to polish) code:

public static String readPageFromUrl(String strURL) throws IOException, InterruptedException {
  URL pURL = new URL(strURL);

  URLConnection urlCon = (HttpURLConnection) pURL.openConnection();
  urlCon.setConnectTimeout(30000000);
  urlCon.setReadTimeout(30000000);
  urlCon.setRequestProperty("User-Agent", "Mozilla");

  BufferedReader in = new BufferedReader(new InputStreamReader(urlCon.getInputStream()));
  StringBuilder result = new StringBuilder();
  String readLine;
  readLine = in.readLine();
  while (readLine != null) {
    result.append(readLine);

    readLine = in.readLine();
  }
  in.close();
  return result.toString();
}

Nuances - does not work with charSet, i.e. text on some sites will be in the wrong encoding.
Solved by replacement
And some sites are protected, of course. It is solved by substituting the correct user agent, and other parameters. I, for simplicity, in such cases, make a call to wget from java.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question