It was designed for backward compatibility with ASCII. Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes. The first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single octet with the same binary value as ASCII, so that valid ASCII text is valid UTF-8-encoded Unicode as well. Since ASCII bytes do not occur when encoding non-ASCII code points into UTF-8, UTF-8 is safe to use within most programming and document languages that interpret certain ASCII characters in a special way, such as "/" in filenames, "\" in escape sequences, and "%" in printf.
Shows the usage of the main encodings on the web from 2001 to 2012 as recorded by Google,[6] with UTF-8 overtaking all others in 2008 and nearing 50% of the web in 2012.
Note that the ASCII only figure includes web pages with any declared header if they are restricted to ASCII characters.
Note that the ASCII only figure includes web pages with any declared header if they are restricted to ASCII characters.
No comments:
Post a Comment