L
L
littleguga2016-01-08 16:46:44
Programming
littleguga, 2016-01-08 16:46:44

How is the "updating" of encodings?

Suppose a few more characters of some language are added to utf-8, how do devices around the world begin to display / perceive these characters?
What to look for / read to understand this topic?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
M
Mercury13, 2016-01-08
@Mercury13

UTF-8 is an encoding that allows (theoretically) 2 billion characters.
Unicode's current limitation is UTF-16 encoding, which only encodes 1 million characters.
In order for devices around the world to show new characters, it is necessary ...
1. The Unicode Consortium, somewhere in the middle of this million, draw a new character. Now about 120 thousand are employed, another 130 thousand - a small range in the base plane and two full planes of 2 16 pieces. - are declared user-defined and can be occupied by anyone within their OS or program.
2. The Unicode Consortium publishes the character image and updates the character property tables.
• Type: letter/number/space/punctuation mark/technical character/controlling/combining/…
• Position in bidirectional writing: left to right/right to left/accommodate/govern. If it fits, in a letter from right to left, you can swap, for example, brackets, there is also a field for this.
• Writing: Latin / Cyrillic / notes / emoticons / currency symbols ...
• How to translate into capital letters and into normal form. The capitalization setting can be overridden by the locale, but there is also a "general unicode" table.
3. The OS developer translates the table into the internal format of the OS and updates the fonts. With the next OS update, fonts and tables will come to the computer, and there will be symbols.
4. Most often, "left" characters are displayed incorrectly in the browser. To do this, browser developers, using cunning algorithms, look for a font on the user's computer where this character is. And, let's say, on the Brogue wiki (brogue.wikia.com) on my computer, all the monster symbols were previously displayed, now a couple of totems are not displayed. Apparently, with a bunch of software, a suitable font was previously installed, and now - figs. And on the working "top ten" everything is in order.

O
Oleg Tsilyurik, 2016-01-08
@Olej

Let's say a few more characters of any language are added to utf-8,

First of all, UTF-8 is not an encoding, but a way to encode Unicode tables, so nothing can be added to UTF-8 (encoding tables are Unicode ... 32 bits per character - that's enough for everyone ;-)).
And in Unicode, not characters are added, but entire pages. And to work with them, you must have the appropriate locales installed in your system for the corresponding pages.
So don't worry about UTF-8 ;-)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question