Skip to content
๐ŸŒ Unicode

๐ŸŒ Unicode

Unicode solves the chaos of incompatible encodings by defining a single, consistent catalog of characters for every language, symbol, and emoji. Itโ€™s not an encodingโ€”itโ€™s the semantic backbone that encoding schemes like UTF-8, UTF-16, and UTF-32 rely on.

๐Ÿ”ข Code Points

  • Each character is assigned a unique number: U+XXXX
  • Example: 'A' โ†’ U+0041, 'ฯ€' โ†’ U+03C0, '๐Ÿ˜€' โ†’ U+1F600

๐Ÿงฉ Planes & Ranges

Plane NameRangeNotable Content
Basic Multilingual PlaneU+0000โ€“U+FFFFMost common scripts, symbols
Supplementary MultilingualU+10000โ€“U+1FFFFHistoric scripts, rare symbols
Supplementary IdeographicU+20000โ€“U+2FFFFRare CJK ideographs
Supplementary SpecialU+E0000โ€“U+EFFFFTags, variation selectors
Private Use AreasU+E000โ€“U+F8FF, etcCustom characters (non-standard)

๐Ÿ” Unicode currently defines over 143,000 characters across 159 scripts.

๐Ÿงฌ Character Properties

  • General Category: Letter, Number, Symbol, Punctuation, etc.
  • Combining Class: Used for accents and diacritics
  • Bidirectional Class: Controls rendering for RTL scripts
  • Script: Latin, Cyrillic, Arabic, Han, etc.

๐Ÿ”„ Normalization Forms

FormDescriptionUse Case
NFCComposed formWeb, file systems
NFDDecomposed formCanonical comparison
NFKCCompatibility composedSearch, indexing
NFKDCompatibility decomposedAudit, equivalence checking

โš ๏ธ 'รฉ' can be U+00E9 (composed) or U+0065 U+0301 (decomposed). Normalization ensures consistency.

๐Ÿงฐ Use Cases

  • Multilingual Text: Enables consistent rendering across languages
  • Emoji Support: Unicode defines emoji sequences and modifiers
  • Search & Indexing: Normalization ensures accurate matching
  • Security Audits: Prevents spoofing via visually similar characters

๐Ÿงฎ Unicode vs Encoding

LayerRoleExample
UnicodeDefines characters'ฯ€' โ†’ U+03C0
UTF-8Encodes Unicode in bytesU+03C0 โ†’ 0xCF 0x80
UTF-16Encodes Unicode in 2โ€“4 bytesU+1F600 โ†’ 0xD83D 0xDE00
UTF-32Encodes Unicode in 4 bytesU+1F600 โ†’ 0x0001F600
Last updated on