Life Outside Unicode

A. Scott Britton's picture

Where can I put characters with no Unicode assignments? I've got a character from an old Chinese romanization system that appears in both upper and lower case (I'm fairly certain it has no assignment, could be wrong though).

A. Scott Britton's picture

Nevermind, got it. I'm still learning to look for answers before blindly submitting questions.

pablohoney77's picture

well share the answer with the rest of us, why doncha? ;^)

dezcom's picture

Does anyone know the proper Unicodes for Polish Kreska characters?

twardoch's picture


there aren't any. Normally, French, Spanish, Polish or Czech oblique mark placed above letters is all encoded as "acute". A font designer should decide whether he designs an acute that will be a compromise between the needs of Spanish, French, Czech and Polish typography (e.g. Palatino Linotype), or he will supply a default acute form suitable for languages such as Spanish or Czech (usually a bit flatter), and a stylistic alternate suitable for Polish, accessible through OpenType Layout features (locl, salt). Regardless of the solution within the font, the letters are always encoded using the same Unicode values.

Adam Twardoch

A. Scott Britton's picture

I'll share, by all means (I just hope I'm actually right).

And the answer is...

[drumroll please]

-Private Use Area-


A. Scott Britton's picture

Let me delve deeper for the sake of anyone who may find that they've wandered into a similar predicament...

Browsing the codepages in FontLab, you'll notice that several are labeled "Unassigned zone", you might be tempted at first (as I was) to use the codepoints in these pages--but I'm pretty sure you're not supposed to. Here's why: Those spots are reserved for future assignments; you might design a font, give it a range in an unassigned zone, and then 9 months later the consortium assigns those very codepoints to, say, Klingon (which sounds like a joke, but someone actually proposed it [and it got turned down]).

The Private Use Area is pretty big, so there shouldn't be a lack of space.

I just wish there was some other way to access the characters outside of the first 255 range; there's just something nice about getting to a character with the keyboard.

pablohoney77's picture

I've heard about Klingon getting turned down, but was there ever any final decision on Tengwar? How would one find out the status of such proposals? Oh and back on topic... you can create your own keyboard layout (i believe) with Microsoft's Keyboard Layout Creator. I haven't tried it before, but lemme know how it works!
Oh, and i'm guessing that'll only work for windows. I dunno if there's anything similar for mac, I'm sure someone else in here would know that tho.

pablohoney77's picture

the letters are always encoded using the same Unicode values

What do you mean by that Adam? Are multiple glyphs able to be mapped to the same unicode values? Or did i misunderstand what you meant?

A. Scott Britton's picture

I got excited about Keyboard Layout Creator when I first heard about it, then I realized it doesn't work in Windows ME. I'd like to try it out, but I'm not ready to do Windows XP yet.

Hey, does Windows Character Map only display the first 255 glyphs? Or is there some way I can select characters residing in the "grey" area with it too? The only program I have that lets me see and select these glyphs is Word (by selecting 'insert' > 'symbol'). Should I be able to do the same in WordPerfect? (Because I've tried and I can't.)

John Hudson's picture

I've got a character from an old Chinese romanization system that appears in both upper and lower case (I'm fairly certain it has no assignment, could be wrong though).

What are the characters? Unless very obscure and idiosyncratic, Chinese romanization characters are likely to be encoded already. If not, they are candidates for encoding and should be documented.

A. Scott Britton's picture

I'm not sure what the two characters are named, John. I was going to post a scan shortly in hopes that someone could identify them. I know what they're used for, of course--in the name "Lao Tze" the character in question replaces the "t". In the uppercase it resembles a stylized "3", in the lowercase it looks very similar to a lowercase cursive "z".

Thomas Phinney's picture

Oh, that would probably be the "yogh." U+021C, U+-021D. Used also in Scots Gaelic I think.


dezcom's picture

So then is the consensus that there is no way to use both the Polish and the romance languages in the same font without the InDesign Glyph pallette? That doesn't seem fair. In this opentype world there should be a place for a Kreska.


A. Scott Britton's picture

Aww, Thomas, I got so excited when I thought yogh was it. It is, unfortunately, not the one.

John and Thomas, let me give you a little background on the characters. As far as I can tell, the characters may have been devised specifically for a series of books (translations of some asian religious and philosophical texts), entitled the "Sacred Books of the East" series, initiated by F. Max Müller. The entire romanization system used in the series is way off by today's standards (pre Wade-Giles, Pinyin, etc.). The system uses only one non-Latin character--this one...

(All apologies for the bitmap image and its poor quality, I'm working with reverse engineering; the images came directly from one of the texts which was magnified several times then polished enough for scanning and subsequent vectorization. The outlines in my font-in-progress are significantly cleaner, but of course you'll see those images much much later.)


Thomas Phinney's picture

You're right. The cap is close enough for argument, but the lowercase in particular is really not much like the yogh, with the tail curling back up and to the right.



A. Scott Britton's picture

To tell you the truth, I kind of hope these characters HAVE NOT been assigned codepoints--I'd kind of like to try my hand at proposing and passing such a thing.

Regardless, if any of you happen to run across information on these (either in Unicode or elsewhere), let me know. In the meantime I'll begin what is sure to be the tedious process of searching through the 1400+ pages of the Unicode Standard.

andreas's picture

Do apply for new unicode assignments its best to contact Deborah Anderson.
PDF: Deborah Anderson: Script Encoding Initiative

dezcom's picture

thank you for the references.


A. Scott Britton's picture

Yeah, definitely thank you for the links Andreas, I've already begun to peruse the information with great interest.

vaissi's picture

to answer one of the question on the thread (hund's question): yes there is for the mac some good utilities to build your own keyboards (see still in beta but it worked just fine for me, for my own sogdian input method). Microsoft keyboard layout creator worked also for XP.


Syndicate content Syndicate content