S
S
st_rapon2016-05-26 19:28:10
C++ / C#
st_rapon, 2016-05-26 19:28:10

Is it possible to download an article from wikipedia using c#?

Hello everyone, there is a music site written on asp.net, you need to make it so that the artist's page displays information about him from Wikipedia. Actually the question is how to write a function that will pull text from a Wikipedia page. Thanks in advance!

Answer the question

In order to leave comments, you need to log in

3 answer(s)
A
Alexey Ukolov, 2016-05-26
@alexey-m-ukolov

https://www.mediawiki.org/wiki/API:Main_page/ru
https://www.google.ru/search?q=c%20sharp%20http%20...

M
MrDywar Pichugin, 2016-05-26
@Dywar

Definitely possible.
-Pure JS on the client, he will do everything himself. (Googling how to send a GET request to another site, get and parse the response, taking out only the necessary one, if any)
-APS.NET is done on the server and immediately given to the client. (we think whether it is necessary to do this by the server)

S
st_rapon, 2016-05-26
@st_rapon

I could not figure out the API, but I found a solution to the problem in the form of parsing.
Used HtmlAgilityPack to extract article text from page. I throw the code in which I tested this parsing, it can be useful to someone:

public static void GetArticle() //
        {
            string html = "https://en.wikipedia.org/wiki/Gorillaz ";
            HtmlDocument HD = new HtmlDocument();
            var web = new HtmlWeb
            {
                AutoDetectEncoding = false,
                OverrideEncoding = Encoding.UTF8,
            };
            HD = web.Load(html);                    //Скачиваем всю HTML страницу

            HtmlNodeCollection NoAltElements;
            NoAltElements = HD.DocumentNode.SelectNodes("//div[@class='mw-content-ltr']/p"); //Из элемента с классом 'mw-content-ltr'
                                                                                             //Берём весь текст, 
                                                                                             //который находится в тэге <p>

            string outputText = "";
            // проверка на наличие найденных узлов
            if (NoAltElements != null)
            {
                foreach (HtmlNode HN in NoAltElements)
                {
                    //Получаем строчки
                    outputText = HN.InnerText;
                }
            }

            Console.WriteLine(outputText);
        }

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question