Convertion to Unicode, getting started

Ken Krugh's picture

Hi Guys,

I'm combining a client's series of fonts into Unicode/Opentype and I'm wondering about the best way to get started. The fonts were originally created in Fontographer 3.5 and were used primarily as Type1 with some occasional use in TrueType.

I'd planned on opening the one of the fonts and moving the glyphs from the other fonts into that one font. The Ascent and Descent values for the various fonts do differ a bit, not sure whether that matters.

Opening the Type1 in FontLab 5 the Ascent and Descent values are different from what was in Fontographer. Opening the TrueType they seem to match. Am I better off using the TrueType as a starting point?

Some of the glyphs have stacked accents. Is that something I need to be concerned about beyond setting an acceptable Ascent value?

Does anyone know of a good source for further information regarding the PUA? Are there any even loose "standards" in existence?

Many thanks for any insight that can be offered.


oldnick's picture

I'd planned on opening the one of the fonts and moving the glyphs from the other fonts into that one font

You can save yourself a lot of time--and get some sleep: 4:45 am?--by opening both fonts and using the TOOLS > MERGE FONTS function.

I doubt that the differing metrics will cause too many problems because the baseline will remain the same. If the final product is going to be a CFF-flavored font, you're probably better off sticking with the Type 1 originals. And the accents shouldn't figure in setting your Ascent values, but will affect the bbox values, which will be calculated automatically.

Ken Krugh's picture

7:45 my time, not so bad. :o)

That was my feeling as well, but I was getting a bit nervous as someone else (NOT in this forum) mentioned having trouble doing just this. They were, however, talking about Fontographer some 12 years ago.

Yes, it will be a CFF.

Yet another issue just came up today. In the separate Type 1 fonts things were done like making the punctuation in the Greek font just a smidge heavier to better hold with the Greek. Is there anyway with Open Type to know, for instance, that Greek is being used and to go get an alternate character? I'm thinking probably not.

Thanks for your time oldnick.


charles ellertson's picture

Best I remember, when you merge fonts, if there is a name conflict, you get A.1 (or something) for the "A" merged in. Now, suppose in your Greek font that you have Alpha in the "A" position -- with the name "A" (PostScript Type 1 goes by names). If you used any of the layout programs with hard encoding (Quark, PageMaker, etc.), it had to be that way -- "Alpha" wasn't in the encoding vector.

In the new, program-merged font, it will be named "A.1", and no Unicode number will be assigned. Etc. I also had trouble with the kerning. Best I remember, when a character was in the leading position, all the kerns made it, but when it was in the trailing position, no. No idea why.

Give it a try. I think your original idea was best, but who knows? Think I was working with 4.2 at the time, FLS5 may do better with the kerning. But so far, no program can guess what a mis-named glyph *should* be. I believe it was one of your posts where I estimated 2-3 days for roman + italic, assuming the kerning and OT features needs aren't too great.

Is there anyway with Open Type to know, for instance, that Greek is being used and to go get an alternate character? I'm thinking probably not.

Sure. It doesn't even depend on language, just glyph names are needed. For example, in a calt feature, you can write something like

sub sigma' space by;
sub sigma' comma by;
sub sigma' period by;

etc. or, make up a class that has as it's members all the things where a preceding sigma would be the final sigma., and write

sub sigma' @clasname by;

Actually, I'd not recommend this for sigma, it is just an example. But does show the importance of having things properly named, either in accordance with the Adobe standard, or (PITA) by using the Unicode number, preceded by "uni" for the glyph name.

Does anyone know of a good source for further information regarding the PUA? Are there any even loose "standards" in existence?

The thing about loose standards is they keep changing. Adobe use to put all kinds of things in the PUA. Believed they have stopped. Characters in the Private Use Area HAVE MO MEANING! If you aren't going to be using a layout program/text editor that can only access characters with a Unicode number, don't give one. If, for some reason, characters have to have a Unicode number, you can use PUA. Use to be they'd be stripped out of a subsequent PDF file, though.

You need a full-blown expert to do this work, and I'm limited to what works with InDesign, where printed books are the final product. You can either spend the time to learn, and it won't be quick, or hire it out.

Sorry for the water on the dreams.

Ken Krugh's picture

Thanks Charles! That's good news regarding the calt feature. I was wondering about that one but hadn't yet had the time to investigate. I can't thank you enough for taking the time to answer these questions and saving me a WHOLE bunch of time. Unfortunately the client is moving forward too fast without knowing or even considering the breath and scope of what has to happen here. Per the usual, I guess.

Oh, and I definitely WILL be copying and pasting to the correct Unicode positions. There aren't THAT many glyphs and I'd rather know everyone is in their happy place. That's turning out to be the least of my worries.

The main use of the font will be with Word to prepare manuscripts and InDesign to do the final layout. One of the things they're looking to do with this, however, is get their content to the web. I suppose I now also have to consider what OpenType features various web authoring/display software is capable of using. For instance if I do a calt function will it even work on a web page?!?! Yet another thing to investigate.

In regard to PUA: If you aren't going to be using a layout program/text editor that can only access characters with a Unicode number, don't give one.

Do you mean don't assign a Unicode value and just give the glyph a name? Seems like assigning a Unicode value couldn't hurt and may help later on with conversions and such. Are there any negatives to assigning the a Unicode value?

Use to be they'd be stripped out of a subsequent PDF file, though.

Do you mean characters in the PUA?!?! That's a bit horrifying! Is that still something to worry about?!?!

Oy. You're right, SOME TIME. Now if I only had it. I'd very much consider hiring you on a consulting basis if you're at all interested. I'd have to check things on my end but let me know if you think that’s a possibility.

And thank you, thank you, thank you again.

Theunis de Jong's picture

Ken, about your special characters in the PUA: the glyphs in the PUA only have a semantic meaning in the exact font they are defined in -- that puts the "P" for "Private" in "PUA".

With regard to special characters, yo have two options:

(a) You assign no Unicode to these characters, but (as Charles said) you can only use them in programs that allow inserting any glyph from any font (as with InDesign's Glyph panel). You probably could use OTF features to call them up (i.e., if you have an alternate double arrow glyph, replace "=>" with that) but then you would be at the mercy of OTF support in the programs you work with.

(b) You assign PUA codes to your special characters. That means you can use them in any program (provided it can handle Unicode text; but you've got to start somewhere), and you don't need to rely on custom features.

But: in both cases you cannot use any text in this font for web content. Characters without a Unicode cannot be represented in HTML, and neither can PUA characters -- a typical web viewer, John W. Browser, who doesn't have your font but sees Times and Arial instead, will either see the PUA characters of Times/Arial in that position (which can be anything else), or the default "I dunno what this is supposed to be" Missing Glyph Squares.

That said: What sort of special characters were you thinking of?
Some limitations can be worked around. Suppose you want to include several different kinds of bullets (only graphic variations, not semantic ones). You could put the variations as 'alternates' under the regular Bullet character, and use the full range with advanced software, or copy them into HTML text, and you'd get the standard bullet.

charles ellertson's picture

Everything I wrote assumes you have kerning data in each of the fonts you're trying to preserve. If you have no kerning data, you have a much easier chore. You also probably have cruddy fonts, but that's a different story.

Pasting a glyph into the correct cell may or may not solve all your problems. As I remember, if you place the glyph in the proper cell, it will have the right Unicode number. It may not have the right name, and in FL, you write features based on glyph names. You can use FL to generate the right name from the Unicode number. Exactly how this affects kerning data depends. If you type in the name by hand (control+G+R), FL will update the character name everywhere IN THAT FILE, depending on which boxes you've checked off. But not in the other files, of course. That means the other file's AFM will no longer have the *KPX ? ? value* you need for you new file.

For that reason, when I work by hand, I make sure the old type 1 font files (well, the .vfb and .afm files) have the names as they will be in the new file. Then cut & paste, both characters via FL, and metrical data in a text editor.

It is almost impossible to list a procedure in detail that will meet your needs and involve no screwups. So, (1) try small files to completion until you get a technique that covers all your needs. (2) save often, and with different file names, so you can backtrack when something goes wrong. It will, and you don't want to start completely over.

Sorry about the "character stripping in the PDF file" -- not well put. They will appear in the PDF, but if you subsequently extract text from the PDF, they may be missing.

You need to look at what constitutes canonically correct Unicode. If, for example, you want a "c" with a dot under and an acute over, canonically correct Unicode is the character "c" followed by U+0323 followed by U+0301 (the latter are combining diacriticals). Now *rendering* this is the job of the layout program / text editor. But the *meaning* is correct in Unicode and for any Unicode-compliant program. I believe, but do not know, that MS Word will render this with no extra work on your part. InDesign will not, without a further OT feature(s), either *mark* and *mkmk*, or *ccmp*. With *mark* and *mkmk*, no extra glyph is needed. Each method has its share of compromises. With *ccmp* you do make up a glyph, and it's name must be uni006303230301, and it is subbed in by a ccmp line,

sub c uni0323 uni0301 by uni006303230301

or, if you've *named* the combing acute "acutecomb" and the dot below as "dotbelowcomb", use those instead e.g.

sub c dotbelowcomb acutecomb by uni006303230301

The replacement glyph is named uni006303230301, (Adobe requirement for PDF) but has no Unicode number.

Or, you put it in the PUA and call if *Fred*. But subsequent use of the files will ignore Fred, he has no meaning.

* * *

No, you can't hire me, and a good thing. There are people who hang out on Typophile who are real experts, and I'm not one, save for making font files that will set with InDesign, for printing books, and where the text can be extracted for latter use. AND always assuming that the files coming in are in a certain form. For example, we have never had a manuscript (file) where the author used language tags, so I don't bother with them. Etc.

All these tips can do is help with the testing you'll have to do. Again, "Save As" often, and make a list of what's in each version, because I guarantee you'll be backtracking some at first.

charles ellertson's picture

BTW, forgot to add: If your Type 1 fonts (roman and italic etc.) have the same encoding, if your programming skills are up to it, you can make a PFA from the PFB, and write a list (program) to rename all the glyphs in both the PFA(s) and the AFM(s). Then open the PFA's in FL.

There were some TeX routines that would do just this, dunno if they're still available.

Ken Krugh's picture

Me again,

I've started to move the characters from our "special" font into the correct Unicode positions. For many of the characters FontLab does a great job of creating the glyphs in one way or another, but for some it can not associate a name with a Unicode. Two instances are a cap Latin Y and lowercase y with macrons.

These characters appear in the Unicode code charts as part of the Latin Extednded-B code set.

Search for "y with macron".

That code set document shows the Unicode number for those characters but also indicates with a conguence symbol that they are made up of the Y and the macron giving the Unicode for both of those.

I don't THINK this is something I need be concerned with as I'm simply putting my glyph (NOT a composit) in the Y macron Unicode position. However, during this process every time I turn around there is something else I don't yet know/understand so I'm posing the question here. OK to simpy put my Y with a macron glyph in the Unicode position?

Thanks again, and again, and again.

DTY's picture

It's fine to put them in their proper Unicode position. The name issue is because the Adobe glyph list doesn't contain standard names for Y and y with macron. You should give them names in the form uni0232 and uni0233.

charles ellertson's picture

Isn't this fun? -- Actually, I enjoy it when not under time pressure. FL will help you with names. The *Rename glyph* pallet has two green things off to the right of the place you enter the name and Unicode number. If you have one filled out, try clicking the greenie of the other. If FL is to assign the name, a further menu setting is needed. Use the Adobe name WITHOUT Afl(whatever) -- nothing wrong with the "Afl (or something similar), but it will drive you nuts writing features, which are name-based in FL.

This trick is esp useful if you know the name but not the Unicode number. No pawing through the book trying to fnd the number. Try this. Cut and paste an alpha somewhere in the PUA region. Bring up "rename glyph" (control G R on a PC). Type in "alpha" in the name spot. Click on the green button next to *Unicode*, and voila, FL will assign the right Unicode number. Further, it checks to insure you haven't already used either that name or number. And it will "move" the character to the correct position (that's a visual move only, glyph order (index) is set differently). Now, if you've already got an alpha in your new font, this won't work, because FL is checking. If you've already done "alpha", delete it & try this trick. As always, save the file before, don't save after, unless all the metrical data will come later.

I usually work this way, (1) when doing things by hand and (2) I know the name. All this is "a faster way to use the program" not "must do it this way."

Ken Krugh's picture

Thank you archaica. Happily that's what I settled on, which makes me think I'm getting somewhere with all this!

Charles, your "Isn't this fun?" brought a smile, and I agree, if it weren't for the rediculous time crunch this would be an interesting task.

Thanks again,

Jens Kutilek's picture

Charles, for a non-expert your advice is pretty solid :)

Ken Krugh's picture

I'm guessing Charles is just modest.

Well, I'm back with more questions, nothing is every easy I guess :o) I thought the PUA was just a place to put anything and the lists the Private Use Area and supplementary areas A and B. Makes sense, just more places to put stuff.

However! In FontLab 5, when viewing in Unicode mode, there are 8 subcategories: "Windows Symbol Font Area," "Apple Thai Addition" and so on.

What the heck are those?!?! I was going to simply start at the beginning of the PUA and work my way through but should I avoid those positions?


Thanks again,

Ken Krugh's picture

Hmm, sorry, just had another thought.

Any reason not to skip some positions within the PUA so I can always be assured to have room for similar characters that might get added later?

For instance, I could start the PUA with various Greek alphas, skip a bunch of positions for possible later use and put my various Greek epsilons together, skip more and do the omicrons and so on...

I'm thinking that this would make the similar characters group together in things like InDesign's glyph menu.

Does anyone know of any potential problems with that?

Thank you again,

Theunis de Jong's picture

That *is* a good idea, and I don't think it should cause any problem at all.

As for the ID Glyph panel order: you can toggle between Unicode and Glyph id sorting. The former is useful if you know (roundabouts) what you are looking for, but the second one can also be very useful -- if the font designer anticipated its use, that is. I used Glyph order (I think FL calls it Character Index) to group 'similar' phonetic characters together, whereas in Unicode they are all over the font.

You are already free to choose any order in your PUA, but you could use this to offer an additional character sort.

Ken Krugh's picture

Thanks Theunis, I hoped as much.

Any idea what those sub categories are that FL lists under the private use area?

All Best.

DTY's picture

Various companies and organizations have internal standards for placement of glyphs in the PUA. Some of them are listed in the FontLab interface; others aren't. Nothing's going to break if your font overlaps with those assignments. The main potential issues are (1) if somebody changes the text to some other font, they might get a different character, which is an unavoidable issue anyway with PUA slots, and (2) if you ever need your font to be compatible with, say, the Medieval Unicode Font Initiative internal standard, then you may find you have conflicts.

Ken Krugh's picture

Yeah, I considered trying to see if Adobe and/or Microsoft have internal PUA standards but I'm not sure it's worth it.

Thank you archaica.

charles ellertson's picture

Before I say something stupid, why do you want to put characters in the PUA? And maybe this breaks done to "what applications programs do you need the variant characters available in?"

For example, I put in most of the Grec du Roi ligatures in a font; they can be accessed by any OT program that has a glyph pallet, or one supporting stylistic sets (Does Word support stylistic sets? Are 20 sets enough for you?). I believe that as of today, characters without a Unicode number not accessible in MS Word, but I bet that changes. Won't display on the web, but they wouldn't anyway.

What I know about the Private Use Area can be written in a match book -- as long as what you write is "Don't use it if at all possible" The only reason is that you can't access the characters in Word, or, Unicode has abandoned you, and won't give a codepoint, a situation the medievalists face. How important is that for your fonts, anyway? If they are just stylistic variants, they can be handled better in an OT-compliant program like InDesign in other ways.

BTW, If they do wind up in the PUA, if you name them something like alpha.01, alpha.02 etc, text extracted from a subsequentPDF will show them as alpha's. PDF is name based, and everything after the period drops out. XML-compliant coding is different, as it presupposes Unicode. I believe they will just be gone.

Ken Krugh's picture

Here's another! I finally have the font to a point where I can use it a bit and was using the Unicode Greek. All was great until I notice that the Greek seems quite tight.

Well . . . in the "old" Type1 Greek font the width of the space is just nearly DOUBLE that of the "regular" Latin font with which it was combined. I started with the Latin font and was moving in the glyphs from the Greek font so the width of the space from the Latin font is now being used within the Greek.

My next step is to go and investigate this but I thought I'd take the Lazy route as well and ask here whether it is possible to set up a space within the font that would be used when the Greek characters are being set?



Ken Krugh's picture

Woo-hoo!!! Wrote my first calt feature!

Of course, Word doesn't use it and neither does the web! Oh well, cool anyway.

We're playing with this idea but starting with taking a median value of the two spaces.

Big meeting tomorrow should be interesting to see what the client things of all we've discovered.

Thanks again to all that helped me out with this. Maybe some day soon I'll have enough knowledge to give back a bit.

All Best,

charles ellertson's picture

I wasn't going to answer until I get into work tomorrow (late, have a meeting), but did wonder if you got calt to work with a GPOS routine? I'd worry that if you just substituted a new space character (GSUB), you'd lose justification with InDesign.

Try it and see. If so, I'd try making a class of the Greek characters, and use a space as the context. Then add to the right sidebearing of the Greek letters, as is done with the spacing of capitals (see the cpsp code). But I'm not sure this will work.

Couple of other points: You first goal was to have everything Unicode. You can have that and still have two fonts. Each, of course, would have a separate spaceband character. Not as useful, but would work. And would work with Word and on the web.

Secondly, a number of fonts mix the Latin and Greek alphabets in a single font. What is so odd about your Greek that it needs a wider space?

charles ellertson's picture

OK, I couldn't get the calt to work as outlined above. Less elegantly, you can do it with a class kern,

pos @_greek space 100;

Which as written adds a 100 unit kern between any character that is a member of "greek" and a space.

you'd have to work something out for punctuation, such as a calt or stylistic set that swapped commas, periods, whatever, to get a different set to include in "greek."

Numerous ways to do things with OpenType. One of the pros might even be able to get the calt-sideberaring notion to work.

Jens Kutilek's picture

I could think of several ways to do it, depending on which OT features your applications support.

If your app can use the »Localized Forms« (locl) feature:

feature locl {
script grek;
language dflt;
sub space by space.grek;
} locl;

Or as part of the kern feature, with no extra glyph:

feature kern {
script grek;
language dflt;
pos space <50 0 100 0>;
} kern;

charles ellertson's picture

Ah. I thought one of the experts would have other ideas.

I'll say that none of the manuscripts we get in has ever had language tags, the downside to these approaches. Which doesn't mean authors don't use numerous languages, just that neither they or the editors tag them.

I'm assuming that the font isn't smart enough to apply a language tag simply by filling the in Unicode ranges supported, but I don't know. I figured that too many languages share a common alphabet. As typesetters, we're sure not going to go into the MS and insert language tags.

Just points up how interrelated fonts, editorial, and composition can/have become.

Ken Krugh's picture

Thanks guys!

I think the space value in the Greek was done as it is just to make the Greek set well by default, hard to say though, I wasn't on board at the time.

Good point about the justification and makes perfect sense. I did create a class in to which I would put all the Greek characters. With that I did a very simple calt, shown below. I was mainly fiddling to see if anything was even possible and only checked a single line of type in InD. I bet justification WOULD be an issue.
feature calt {
sub @class4 space ' by space.greek;
} calt;

I'm not sure what you mean by GPOS routine.

We are knocking around the idea of multiple fonts, thanks for the suggestion. Right now I think we're going to simply make some decisions of how things will go, (oh did I ever mention we may put Greek accented small caps in the font as well?) put together the Unicode font and give it to the guys handling the web and see what they say.

Thanks, Jens for chiming in, much appreciated! The Kern feature seems to be a possibly. Does using the "grek" mean I have to define certain glyphs as Greek? Can a glass be used?

With the few features I've been fiddling with I've been opening other fonts to see how they're written. I found the registered features on Microsoft's site which lists what they do but I've not found a place showing syntax for the features. Is that available anywhere?

And last but not least annoying: How the heck does one rename a class in FontLab 5? Creating and filling one is easy, but I've been unable to rename it to something that makes sense.

Thanks a gain for the time and input.

All Best,

charles ellertson's picture

As usual when I answer your posts, I'm sitting at the kitchen table, with no copy of FontLab in the house. So working from memory . . .

I never work on features within the FontLab application. No editing capabilities. Screen area too small. Type too small. Etc.

FontLab lets you write off the features (I believe it is "Save Features". Open up the resulting .fea file in your favorite text editor (I like something with power, like VEDIT or now, EditPad Pro). Rename classes as you want. Treating text as a block, having an all uppercase and all lowercase, etc., tools in a text editor is handy. The features file is just a text file, treat it as such & play around a bit -- just don't wipe out what you need to save by something silly like using the same file name.

Be sure your class definitions are in the feature file (.fea file). Or, if for some reason, you want to keep features and classes separate, write off a class file in the same way. When you do you S&R, if you have two files, obviously you need to do them in both. One reason I prefer everything in one file. Always a compromise.

To import the file, again open the FL OT panel, and click on (I believe) Open Features, read it in, and compile. Then start fixing the inevitable syntax errors you make at first, import again, compile, etc. Or maybe you won't make any errors, who knows?

Be careful about the "generate kerning" or some such line in the OT panel. If your kerning is all within the FL metrics file, that may be OK. After generating kerning the first time, I keep it separate, so after that, if I hit the wrong button, I lose my kerning.

Which brings up a point. There are many ways of working. You better have a plan.

For example, I keep all the fonts we're serious about in their own family subfolder, within a !fontwork folder. Within each individual font-family folder, I keep the .fea files. All class-based kerning is in these .fea files, with a separate ADD.txt file with a copy of any class-based kerning, just in case I hit the wrong button. Been known to do that. I use FLs metrics file only for exception kerning, which is always pairs.

These files within the folder are considered the gospel, rather than the stuff in FontLab. If further work is to be done later, it is done in these .fea files. I'm not particularly advocating this, but trying to make the point that when you go back in a year from now (and you will), you better have a system in place to know what's current, what's gospel. For that reason, I NEVER work in the FL OT features panel. If you prefer that, the only sane thing to do is treat the .fea files you may make as temporary, and delete them once FL will compile the features. Your choice, but decide on a system fairly early, changing you mind involves a painful amount of work.


Along those lines, If you want, you can also do all your OpenType work in Adobe's AFDKO 2.5, using FL only for the glyphs. If you are a serious programmer, that may be the way to go. There are things available in AFDKO not yet available in FL, such as mark and mkmk positioning routines. How you work depends on how the fonts are to be used. For example, I work only on fonts we use to set type. We don't sell fonts. I need a method where I can quickly go into a font, add a character or feature, recompile, and generate the font. Fairly often, one of the comps is waiting for me to get that done, in order to finish a typesetting job. A very different situation from someone who sells fonts as a product.


Ken Krugh's picture

That's what I ended up doing, saving renaming, then reading back in.

I'll have to give the structure some thought and download AFDKO to check it out.

Thanks again,

Syndicate content Syndicate content