Answer the question
In order to leave comments, you need to log in
How to avoid duplicate iteration when using parallel()?
Good day!
I will try to briefly describe the problem.
There is the following snippet:
LongStream.range(1L, 10000000L)
.parallel()
.mapToObj(String::valueOf)
.forEach(index -> {
try {
String url = "http://example.com?ID=" + index;
driver.get(url);
if (driver.getStatusCode() == 200) {
System.out.println("Opening URL: " + url);
String pageSource = driver.getPageSource();
Document doc = Jsoup.parse(pageSource);
Elements centerBox = doc.select(".center-box");
if (!centerBox.isEmpty()) {
String name = centerBox.select("[data-bind=text: name]").text();
String email = centerBox.select("[data-bind=text: email]").text();
Company company = new Company(name, email);
System.out.println("Info: \n" +
url + "\n" +
name + "\n" +
email);
companies.add(company);
PrintWriter pw = new PrintWriter(new FileWriter(pathToFile, true));
pw.println(name + "\t" + email);
pw.close();
} else {
System.out.println("URL not found: " + url);
}
}
} catch (Exception e) {
e.printStackTrace();
}
});
driver.quit();
Answer the question
In order to leave comments, you need to log in
You are using streams incorrectly. They should not change the external state. Your swath of code inside forEach can be broken down into small idempotent operations and their result collected using a collector.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question