L
L
LakushaFujin2019-04-12 15:52:28
visual studio
LakushaFujin, 2019-04-12 15:52:28

I open the file with different editors - the contents are different. Why?

Hello. There is an mkv file. If I open it in notepad or in binary mode in visual studio, I get a character set like "→EЯ??B┼?☺Bt?♦BuB'?matroskaB╪?☺". But I accidentally opened it in sublime text, so the result immediately turned out to be ideal (for further processing) - a set of hexadecimal numbers (just in case, some of them are 1a45 dfa3 0100 0000).
It would seem that it is enough to convert these characters in VS to binary code and the result will be the same. Only when converting characters to int (and uint32_t) do jambs occur. Namely, already the third and fourth characters of the sequence described above are converted into negative numbers (in the case of uint into a multi-valued positive), as a result of which the binary code comes out incorrect.
Why is that? How do I get a normal sequence of numbers in VS?
5cb087ada8ff6992270445.jpeg
Here is the result of converting these characters first to int, and then to binary.
After the word Test, there are two numbers - I converted the hexadecimal numbers obtained from sublime text into binary code.
If you do not count the first 8 zeros (where can they come from if they are not available in another editor?) in the sequence obtained in VS, as well as incorrectly translated characters, then the numbers are the same.
And here is the code with which I open the file and carry out the conversion.

setlocale(LC_ALL, "ru");

  string path = "video.mkv";
  
  ifstream mkv;

  mkv.open(path, fstream::binary);

  if (!mkv.is_open())
  {
    cout << "Not" << endl;
  }
  else
  {
    cout << "Success!" << endl;
    for (int i = 0; i < 5; i++)
    {
      char byte;
      mkv >> byte;
      cout << byte << "\t" << (int)byte << "\t" << bitset<16>((int)byte) << "\t" << i << endl;
    }
    cout << "Test: " << bitset<16>(0x1a45) << "\t" << bitset<16>(0xdfa3) << endl;

Answer the question

In order to leave comments, you need to log in

1 answer(s)
G
Griboks, 2019-04-12
@Griboks

The file contains bytes. You are trying to read them. You can only read text. Therefore, you must convert the bytes to text. Usually, letters are substituted according to the encoding table. Your different editors use different tables, so different text is mapped to the same file.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question