Why can't I get the desired page from the C# Internet?

A

Alex45G2016-01-09 22:55:17

Computer networks

Alex45G, 2016-01-09 22:55:17

Hello everybody!
I'm new to C#, trying to parse web pages I need. I write the parser in Visual Studio 2015. I ran into this problem: when I try to get the page I need through a proxy, I get the following text instead of the page:
package ru.sbogomolov.template;
public class servletBase extends HttpServleterror: java.lang.NullPointerException
And only when trying to get only a certain page class. Any other pages of the site are fine. Could this be protection against automatic page reading? And how can this be bypassed? And I checked, the page I requested is on the site.
Used code:

HttpWebRequest req = (HttpWebRequest)WebRequest.Create("http://id.npfte.ru/identDeclDrug/main?act=submit&countryE=countryE_RU&codeCD=codeCD_%D0%A4%D0%9C01&CodeSybbol=CodeSybbol_%D0%94&shortNum=54904&DateTypeSearch=beginDate&day=day_26&mount=mount_11&year=year_2014"); 
            
            WebProxy myproxy = new WebProxy(textBox1.Text, Convert.ToInt32(textBox2.Text));                        
            req.Proxy = myproxy;
            req.Timeout = 50000; //установили таймаут (ожидаем 30 секунд ответа на запрос) 
            try
            {
                System.Diagnostics.Stopwatch swatch = new System.Diagnostics.Stopwatch(); // создаем объект
                swatch.Start(); // старт замера времени
                //из ответа получаем входной поток
                HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
                swatch.Stop(); // стоп замера времени
                MessageBoxButton buttOk = MessageBoxButton.OK;
                MessageBox.Show("Время вычисления = " + swatch.ElapsedMilliseconds / 1000.0, "Уведомление", buttOk);

                StreamReader istrm = new StreamReader(resp.GetResponseStream(), Encoding.GetEncoding(1251));
                //resp.Close();
                for (int i = 1; ; i++)
                {
                    
                        ch = istrm.Read();
                        if (ch == -1) break;
                        if (ch == 10) //перенос строки
                        {
                            textBox.Text = textBox.Text + textBuffer + Environment.NewLine;
                            textBuffer = "";
                        }
                        textBuffer = textBuffer + (char)ch;
                }
                textBox.Text = textBox.Text + textBuffer;
                MessageBox.Show("Конец!", "Уведомление", buttOk);
                //закрываем поток, содержащий ответ. При этом автоматически закроется и входной поток istrm
                resp.Close();
            }
            catch (System.Net.WebException ex)
            {
                MessageBoxButton buttOk = MessageBoxButton.OK;  //Не удалось подключиться к прокси! :( 
                MessageBox.Show(ex.Message + textBox1.Text + Convert.ToInt32(textBox2.Text), "Ошибка", buttOk);               
            }
        }

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

V

VZVZ, 2016-01-10
@VZVZ

> Could this be protection against automatic reading of pages? And how can this be bypassed?
Maybe. You can get around it if you use a sniffer (for example, Fiddler) to look at requests from the browser and imitate them 100% in C # (take into account all headers, cookies, etc.)
You can also experiment with headers and cookies in Fiddler itself (Composer tab)

A

Ai Lab, 2016-01-09
@vpuhoff

Perhaps there are not enough headers and therefore he does not want to build the page, try adding headers like this:
Accept:*/*
Accept-Encoding:gzip, deflate, sdch
Accept-Language:ru-RU,ru;q=0.8,en-US; q = 0.6 , en ;q =
0.4 ; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36

W

wkololo_4ever, 2016-01-09
@wkololo_4ever

It looks like the error is on the server itself. Post a link to the page that is giving the error.