Unicode Conversion Issue

kfitch's picture

Hello,

Our users are experiencing a very discouraging issue in regards to how MS Word (in Windows) handles non-unicode characters. This issue is confirmed in both Word 2007 and the Word 2010 Beta using Windows XP SP3; I suspect it works the same way in 2003.

Issue:
1) A user creates a document using a non-unicode font, entering characters to represent scientific notations. For example, he enters a Mu (µ). Note: I pasted in a unicode-compliant Mu for reference.
2) The user opens his document and attempts to copy / paste this non-unicode character representing a Mu into a web browser for entry into our system. It pastes as an unrecognized character. This is expected.
3) The user opens his document, selects the non-unicode character and adjusts its font to "Arial Unicode MS," saving the document. He closes / re-opens the document for good measure. Once re-opened, he copies what should be a unicode Mu and pastes it into the web browser. It is still represented as an unrecognized character.
4) The user creates a new document, sets the font to "Arial Unciode MS" and creates a Mu. He copies this Mu into the web browser and it pastes over in Unicode, as expected.

Conclusion:
Word is not actually converting non-unicode characters into unicode characters when it should, when a unicode font is selected. Instead, it is taking a best-guess for display reasons but doing no actual conversion.

How do I overcome this problem?
* Can I change some setting in Word to force a conversion? Preferable.
* Is there a "cleaner" app or Word macro that will do this?
* Other solutions?

Additional Notes:
* Re-typing the affected documents using unicode is not an option
* This is not an issue in Mac OS X using the most recent version of Word. A sample case such as in (3) results in a unicode Mu being pasted into the browser.

Please help!

Michael_Rowley's picture

'Word is not actually converting non-unicode characters into unicode characters when it should'

I'm assuming that you mean by non-unicode font a font that has glyphs that differ from the typical representation of the character indicated by the Ansi and Unicode number. If, for instance, the Greek lower-case mu is not at 181 in Ansi, then Word will not display a glyph that looks like a mu. Word does not do conversions since at the the latest Word 2000. It simply selects the glyph corresponding to the Ansi or Unicode code point in the particular font you are using. It is often the case with pre-OpenType fonts that the glyphs selected are simply those that suit the font designers fancy and for perfectly legitimate reasons.

Syndicate content Syndicate content