A
A
ambulatur2020-11-22 02:04:42
.NET
ambulatur, 2020-11-22 02:04:42

C# Scraper fast as hell, tasks or threads?

Good day, I have my own web server connected to the database, as well as a client to it.
I receive results from the web server by means of Get request.
A request is sent from the client using the code below:

public void GetInfo()
        {
while(true)
{
            using (var request = new HttpRequest()) // Leaf.xNet as analog HttpClient
            {
                try
                {
                    var response = request.Get("https://site.org/show").ToString();
                    Console.WriteLine($"{response}");
                }
                catch (Exception ex)
                {
                    ex = null;
                }
            }
}
        }

And then called like:
int maxThreads = 200;
for(int i = 0; i<maxThreads,i++)
{
new Thread(()=>GetInfo()).Start();
}

And as a result, in the "console" I get data like "request number # info XXXX".
The question is how to speed up the whole thing as much as possible? Apart from the fact that you need to use ihttpclientfactory because of the complexity of creating an http client (xNet uses httpclientfactory under the cut).
Using tasks is not an option if request.Get(" https://site.org/show ").ToString(); add await Task.Run(()=>{x}); it will be more correct, but the speed of receiving data from the server will drop sharply, in practical experience, the creation of new threads that execute requests inside is faster than async, faster than using ThreadPool.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
V
Vasily Bannikov, 2020-11-22
@ambulatur

If you want to speed up something - speed up CPU-bound things and reduce the number of allocations (ideally to 0).
> try-catch
try to do without exceptions. Personally, I know so far only 1 high-level HTTP client that works without them - ClusterClient , but I'm not sure if it knows how to proxy if you use them.
> faster than using ThreadPool
If you have so many requests, then you will quickly hit the thread limit in the OS :)
Better increase the size of the thread pool at the start.
Also try to leave the .NET Framework for .NET 5 - you will immediately get an acceleration almost for free if you use System.Net.HttpClient (there is a whole bunch of low-level memory optimizations done there)
PS: I myself do not know anything about xNet, what is used there in the bowels, but judging by the fact that there have been no updates in the repository for 5 years, it is unlikely that modern things are used there.

L
Leonid Rozhnov, 2020-11-23
@Fulborg

Do you have any approximate figures "by how much it should be accelerated"? "Maximum" is too abstract a metric to work with.
Client-side speed issues? Does the server respond quickly enough to requests on its side? It is somewhat doubtful that the initialization of the http client is the most critical bottleneck of the operation.
What is the overall goal? Get some data from a web server in real time? How much of this data? If not, but it is necessary to speed up their receipt as much as possible - you can look towards alternative network protocols (for example, use queues, see RMQ).

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question