Java Character Set Encoding

  • Java stores characters internally as UTF-16
  • Java uses translation tables to map between external encodings and UTF-16.
    • Map from external encoding to UTF-16 on input.
    • Map from UTF-16 to external encoding on output.
  • These translations can be lossy.

  • Java only deals with UTF-16 internally.
  • Inbound characters are converted to UTF-16.
  • Outbound characters are converted from UTF-16 to whatever the output encoding is.
  • Unless you’re reading and writing UTF-16, all character I/O requires conversion to and from Java’s canonical UTF-16 encoding.
  • This is a perfectly reasonable and sound approach.


Written on August 16, 2016