O
O
Orkhan Hasanli2019-12-23 01:00:54
Java
Orkhan Hasanli, 2019-12-23 01:00:54

How to avoid duplicate iteration when using parallel()?

Good day!
I will try to briefly describe the problem.
There is the following snippet:

LongStream.range(1L, 10000000L)
.parallel()
.mapToObj(String::valueOf)
.forEach(index -> {

    try {

        String url = "http://example.com?ID=" + index;
        driver.get(url);

        if (driver.getStatusCode() == 200) {

            System.out.println("Opening URL: " + url);

            String pageSource = driver.getPageSource();
            Document doc = Jsoup.parse(pageSource);

            Elements centerBox = doc.select(".center-box");

            if (!centerBox.isEmpty()) {

                String name = centerBox.select("[data-bind=text: name]").text();
                String email = centerBox.select("[data-bind=text: email]").text();

                Company company = new Company(name, email);
                System.out.println("Info: \n" +
                        url + "\n" +
                        name + "\n" +
                        email);
                companies.add(company);


                PrintWriter pw = new PrintWriter(new FileWriter(pathToFile, true));
                pw.println(name + "\t" + email);
                pw.close();

            } else {

                System.out.println("URL not found: " + url);

            }

        }

    } catch (Exception e) {
        e.printStackTrace();
    }

});
driver.quit();

When this script is run, the following is observed...
1) Duplicate Company objects are added to the List
2) Duplicate lines are added to the file (PrintWriter).
3) In the URL of the Company object, not the urls where the information was found are substituted, but other urls and parallel streams that do not contain information....
How to solve these questions? And is it possible to simplify this code so that, for example, it does not create a Company object and does not add it to the list, but, for example, use collect toList, etc.
Thank you in advance!

Answer the question

In order to leave comments, you need to log in

1 answer(s)
S
Sergey Gornostaev, 2019-12-23
@azerphoenix

You are using streams incorrectly. They should not change the external state. Your swath of code inside forEach can be broken down into small idempotent operations and their result collected using a collector.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question