Character sets (charsets) are utilized by browsers to convert information from stream of bytes into readable characters. Each character is represented by a value and each value has assigned corresponding character in a table. There are literally hundreds of the character encoding sets that are in use. Here is a list of just a few common character encoding used on the web ordered by popularity:
- UTF-8 (Unicode) Covers: Worldwide
- ISO-8859-1 (Latin alphabet part 1) Covers: North America, Western Europe, Latin America, the Caribbean, Canada, Africa
- WINDOWS-1252 (Latin I)
- ISO-8859-15 (Latin alphabet part 9) Covers: Similar to ISO 8859-1 but replaces some less common symbols with the euro sign and some other missing characters
- WINDOWS-1251 (Cyrillic)
- ISO-8859-2 (Latin alphabet part 2) Covers: Eastern Europe
- GB2312 (Chinese Simplified)
- WINDOWS-1253 (Greek)
- WINDOWS-1250 (Central Europe)
- US-ASCII (basic English)
Note that popularity of particular charsets greatly depends on the geographical region. You can find all names for character encodings in the IANA registry.
As you can see there are multiple possibilities to choose from therefore character encoding information should always be specified in the HTTP Content-Type response headers send together with the document. Without specifying charset you risk that characters in your document will be incorrectly interpreted and displayed.
In Hypertext Transfer Protocol (HTTP) a header is simply a part of the message containing additional text fields that are send from or to the server. When browsers request a webpage, in addition to the HTML source code of a webpage the web server also sends fields containing various metadata describing settings and operational parameters of the response. In another words, the HTTP header is a set of fields containing supplemental information about the user request or server response.
From the example above, the “Response Headers” contain several fields with information about the server, content and encoding where the line
Content-Type: text/html; charset=utf-8
informs the browser that characters in the document are encoded using UTF-8 charset.
Read More