CodeLobe.com/tools/cp437-converter


Codepage 437 Converter

Input:


 



Output:


What is this?

This tool converts text between Codepage 437 and Unicode.

During the BBS era much online art was made with text using the "high ASCII" character set technically known as the OEM Characterset or Codepage #437 (CP437). See textfiles.com for some classic ASCII & ANSI art or check out 16colo.rs for old and new textual art pack releases.

Today's dominant character set is Unicode and UTF-8 is its most popular storage format. UTF-8 is binary compatible with the first 128 characters of the old ASCII codepage (#0 to #127). Simple ASCII text files can be read as UTF-8 as long as they don't use the "extended" characters #128 to #255 (AKA "high ASCII"). Unicode includes codepoints for representing the "high ASCII" chars of CP437. However, the Unicode codepoints are not the same binary values as CP437.

I wrote a CP437 to Unicode converter to help preserve some ASCII art so it can be viewed normally on modern systems. After conversion many GNU/Linux terminals can display CP437 ASCII / ANSI art. The above is a Javascript port of the simple conversion program.

Before Unicode conversion: belinda.ans printed to a terminal using a classic PC font & 80 column width.

 wget http://artscene.textfiles.com/ansi/artwork/belinda.ans
 cat belinda.ans

After Unicode conversion:

How Do I Use It?

Open an ASCII / ANSI art file that was saved using "High ASCII" of Codepage 437 in a text editor or web browser. You could point your browser at one of the .ANS files on TextFiles.com. For example, open belinda.ans in your browser (or download: rt-click, save link as). In a browser or text editor the CP437 files will likely default to ISO-8859-1 encoding and display stuff like "°²Ü" instead of "░▓▄". You can manually set the encoding to ISO-8859-1 in most decent editors.

Either Copy and paste the ISO-8859-1 text into the Input field above, or use the → Load CP437 File ← button to load the saved file. Then press the button labeled ▼ Convert to UTF-8 ▼ .

The default converter settings preserve the char codes for carriage return, linefeed and tab (Cr/Lf/Tb), as well as ANSI escape codes ←[ or ←]. This allows conversion of most "Low ASCII" (#0 - #31) control codes to their visual representations in Unicode. Glyphs for card suites (♥♦♣♠), musical notes (♪♫), arrows (↑↓↕←→↔), happy faces (☺☻), etc. will be visible in the Output field, but the Cr/Lf/Tb and ANSI escapes will remain as control codes so that ANSI art can be viewed on an ANSI aware terminal.

Disable preservation of ANSI control codes and char #27 will be converted to unicode even when followed by a bracket. This prevents ANSI aware terminals from interpreting the escapes as control codes. I find this useful when writing documentation about ANSI escape codes.

All of the CP437 control codes have visual symbols. When Preserve Cr/Lf/Tb is disabled the newline chars and tab are converted to the visual representations (♪◙○). This can be useful when converting character raster data, such a memory dump from a text based game.

Preserve Controls does not convert any of the control code characters (#0 to #27). This may be desirable when ANSI uses vertical tab or other control characters. When this option is enabled it overrides the Cr/Lf/Tb and Esc options, i.e., disabling those will not disable preservation of the carriage return, linefeed, tab or escape control codes.

Converting from Unicode to CP437

Use the → Load UTF-8 File ← button or paste Unicode text in the Input box then hit the ▼ Convert to CP437 ▼ button. If you already had Unicode text in the Output box you can hit the ▼▲ Swap In & Out ▲▼ button instead to move it to the Input box.

When converting to CP437 the preservation settings are ignored. The codepoints for happy faces, card suites, arrows, etc. will be mapped back into the CP437 glyphs; Control codes (#0 to #31) including carriage return, linefeed, tab, etc. also become the CP437 equivalents.

Conversion into CP437 can be considered as a "lossy" operation as glyphs once again share character numbers with control codes; Whereas in Unicode they were separate codepoints. Any Unicode codepoint that is not directly translatable to CP437 will be stripped from the output upon conversion.