Windows utility.
Allows visually checking a text file character set. Displays its Byte Order Mark.
Converts a text file character set.
Byte Order Mark and Unicode
Unicode character sets
There are two Unicode character sets: 16 and 32 bits. The 16 bits character set is the most common.
Unicode characters can be encoded in various ways. These encodings have a name that begins with UTF (Unicode Transformation Format).
Character sets Encodings
16 bits UnicodeUTF-7 (intended to email), UTF-8 (web and xml), UTF-16 Big Endian et UTF-16 Little Endian. UTF-16 is sometimes just called Unicode
32 bits UnicodeUTF-32 Big Endian and UTF-32 Little Endian
Big Endian and Little Endian terms designate bytes representing a character order. When this order is not mentioned, it is Big Endian.
Documentation External Link (unicode.org: Unicode consortium website) on Byte Order Marks and UTF-8, UTF-16, and UTF-32 charsets.
Byte Order Mark
A text file can begin with a Byte Order Mark (BOM). It is a short sequence of bytes that indicate the file character set.
A byte order mark is usually used to describe a Unicode character set encoding.
Encoding Byte order mark
UTF-16 big endian FE FF
UTF-32 FF FE 00 00
UTF-32 big endian 00 00 FE FF
BOM at the beginning of a Unicode file is optional.
Using this software
Converting a file character set is done in 2 steps:
Opening a file
If the selected text file begins with a byte order mark, this software shows the corresponding encoding name and it byte order mark bytes values.
If the file begins by a byte order mark and you select a character set that begin with a byte order mark, the text is decoded after its byte order mark.
Controlling a file character set
When a text file is open, and its character set is selected, its text is displayed.
This helps controlling that the selected character set corresponds to the file one.
You can also verify the file character set searching for a portion of known text. If the searched text is found, its location is displayed. This position is a characters count, not a bytes count. If the file begins with a byte order mark, it is not counted.
Saving a file
If you save a converted file in the same location as the file you opened, you must give it a different name (this software does not accept overwriting a file).
This software inserts a byte order mark at the file beginning when it saves it using an encoding associated with a byte order mark.
Character sets list
The character sets for the open and the saved files are set by means of character sets lists.
Those two lists can display a small or an extended character set.
The lists type is selected by means of a toolbar list.
Font used to display text
You can select font used to display text by means of a toolbar list. This allows displaying some Unicode characters that exist in some fonts only.
The default system font is the first on list, followed by two generic fonts. The others fonts are alphabetically sorted.
Checking out a web page character set
You cannot save a web page to disk by means of a browser to inspect its content because a browser transforms received data before saving it to disk.
You need to use a tool like Fiddler External Link (Telerik).
Go to TextFileCharsetConverter download page.