A
A
Anton Rodionov2018-08-28 21:43:00
JSON
Anton Rodionov, 2018-08-28 21:43:00

How to remove encoding bug through webbrowser C#?

Good afternoon, comrades. Faced such a glitch. I am parsing a JSON file from the server. I just connect the webbrowser element to the desired page, take the data into the stream, cut out only a 9mb JSON piece and work with it further. There was a problem with the capital Russian letter R. It just breaks. "�" instead.
5b85945c7234b213179351.png
Perhaps this happens with other capital letters as well.
At the same time, if you save the JSON file manually through the Opera browser, it is saved normally. I read somewhere that this is due to the peculiarities of the utf-8 encoding. Is it possible to fix this bug without changing the webbrowser library? It is clear that I can write a fix and modify the data after receiving it, but I really would not want to do this.
I call the thread in the code:

Thread tr = new Thread(GetDoc);
                tr.SetApartmentState(ApartmentState.STA);
                tr.Start();
                Thread.Sleep(20000);
                tr.Abort();
                //// Ожидание прерывания
                tr.Join();

And this is how I get the JSON file:
static void GetDoc()
        {
            web = new WebBrowser();
            web.DocumentCompleted += web_DocumentCompleted;
            web.Navigate("тут Сайт");
            Application.Run();
        }
        //Загрузка JSON и сохранение в файл source.json
        static void web_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
        {
            File.Delete("source.json");
            FileInfo MyFile = new FileInfo("source.json");
            FileStream fs = MyFile.Create();
            fs.Close();
            FileStream fileStream = new FileStream("source.json", FileMode.Open);
            StreamWriter streamWriter = new StreamWriter(fileStream);
            streamWriter.BaseStream.Seek(fileStream.Length, SeekOrigin.End);//запись в конец файла
            Encoding encoding = Encoding.GetEncoding("utf-8");
            //Encoding encoding = Encoding.GetEncoding(web.Document.Encoding);
            string temp = null;
            Stream stream = web.DocumentStream;
            StreamReader sr = new StreamReader(stream, encoding);
            temp = sr.ReadToEnd();
            stream.Close();
            //string temp = web.DocumentText;
            //Образка лишнего кода
            Regex regex1 = new Regex("<BODY><PRE>");
            Regex regex2 = new Regex("</PRE></BODY></HTML>");
            Match m1 = regex1.Match(temp);
            Match m2 = regex2.Match(temp);
            temp = temp.Substring(m1.Index + 11, m2.Index - 11 - m1.Index);
            streamWriter.Write(temp);
            streamWriter.Close();
            fileStream.Close();
            downJSONok = true;           
            Thread.CurrentThread.Abort();
            //MessageBox.Show("Download complete. Press OK to continue.", "Done", MessageBoxButtons.OK, MessageBoxIcon.Asterisk);
            //Environment.Exit(0);
        }

Answer the question

In order to leave comments, you need to log in

1 answer(s)
A
Artem, 2018-08-29
@Viper029

This is not code, this is some kind of nonsense, sorry.
I won't even look into it.
Use normal parsing solutions .
And never use regular expressions when parsing HTML.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question