UTF 8 Encoding
Sponsored AdsUTF-8 (UCS Transformation Format 8-bit) is a multibyte character encoding for Unicode. UTF-8 is like UTF-16 and UTF-32, because it can represent every character
in the Unicode character set. But unlike UTF-16 and UTF-32, it possesses the advantages of being backward-compatible with ASCII. And it has the advantage of avoiding
the complications of endianness and the resulting need to use byte order marks (BOM). For these and other reasons, UTF-8 has become the dominant character encoding
for the World-Wide Web, accounting for more than half of all Web page.
The ’8′ means it uses 8-bit blocks to represent a character. The number of blocks needed to represent a character varies from 1 to 4.
For any character equal to or below 127 (hex 0x7F), the UTF-8 representation is one byte. It is just the lowest 7 bits of the full unicode value. This is also the same
as the ASCII value.
For characters equal to or below 2047 (hex 0x07FF), the UTF-8 representation is spread across two bytes. The first byte will have the two high bits set and the
third bit clear (i.e. 0xC2 to 0xDF). The second byte will have the top bit set and the second bit clear (i.e. 0×80 to 0xBF).
For all characters equal to or greater than 2048 but less that 65535 (0xFFFF), the UTF-8 representation is spread across three bytes.
Related Posts:
Tags: ASCII, Encoding, UTF, utf-8, World Wide Web
Most Searched