» Unicode detective » all things me

Monday, August 7th 2006 @ 8:13 pm | Computer & Gadgets + Work | 2 comments

I was asked to provide translations for a couple of words in “tricky” languages and in the message it said “Unicode values will do.” Well, I can ask the translations from vendors but I started wondering about the Unicode values. It isn’t too difficult to search for the character values one by one this time but what if I needed to process bigger chunks of text?

FileFormat.info is a wonderful site — I use it all the time (at work). There you can find oodles of information on a character, and you can even enter a character, e.g. Devanagari as I did, in the search field and it really finds it! Of course there is the official Unicode site but I haven’t yet learnt to use it to my full advantage. Its best feature — in my opinion — is the ≡ information (a character is identical to another character or a combination of characters).

Macromedia Dreamweaver is quite handy in determining the HTML entity (decimal) behind a character. (I’m not actually sure if you could choose to convert the characters to HTML Hex instead.) You just paste the text in the design view and the entities appear in the code view. For this particular assignment the client eventually needs the HTML entities.

But the question is, if I needed to find out the Unicode value of each character for a big chunk of text, how would I do it?

Comments

Kory (August 8, 2006 @ 9:31 am) Reply

I remember a lot of Unicode talk around the latest Notepad++ but does this – BabelPad – do what you’re looking for?

Entity Conversion :

* Convert all HTML Entities (e.g. ü) in the selected text to Unicode characters.
* Convert all non-Basic Latin characters in the selected text to HTML Entities or hexadecimal Numeric Character References (NCRs).
* Convert all Numeric Character References (e.g. ü or ü) in the selected text to Unicode characters.
* Convert all non-Basic Latin characters in the selected text to hexadecimal Numeric Character References (NCRs).
* Convert all non-Basic Latin characters in the selected text to decimal Numeric Character References (NCRs).
* Convert all Universal Character Names (e.g. \u00FC) in the selected text to Unicode characters.
* Convert all non-Basic Latin characters in the selected text to Universal Character Names (UCNs).
* Convert all characters in the selected text to their Unicode Names (e.g. LATIN SMALL LETTER U WITH DIAERESIS).
* Convert all characters in the selected text to U+XXXX notation (e.g. U+00FC).
Minna (August 8, 2006 @ 9:56 am) Reply

That editor looks very promising, thanks! I just found this Unicode Code Converter which is quite useful when you need a converter online.

Decode Unicode looks interesting as well.

all things me

This site

Categories

Search

Unicode detective

Comments

Leave a Reply Cancel reply