Glyph names and searching

Ken Krugh's picture

I want to add a t with a cedilla under it to an existing font. Unicode shows that character encoded as 0163 but the font already has a t with a comma accent encoded with 0163 with a glyph name of tcommaaccent.

I'm adding it to the PUA and THOUGHT that I could name it t_cedilla to make search engines find it when the regular t is searched. I've also tried naming conventions of tcedilla, t.cedilla and uni007400B8 (0074 is the t and 00B8 is the cedilla) but to no avail. The tcommaaccent character that was already in the font is found when a t is searched.

I'm exporting from InD CS3 and searching in Acrobat 9.

I THOUGHT I'd done this before in another font, but of course can't find where.

Is there something that the tcommaaccent glyph has associated with it that allows it to be found or should what I've done be working?

Many thanks,

twardoch's picture

I recommend using the name "uni0163" rather than "tcommaaccent". The "*commaaccent" names are not recognized by older versions of Mac OS X (they won't even display on screen if the font is OpenType PS .otf).

For the other glyph, "t_cedilla" should actually work, although "t_uni0327" or "uni00740327" would be better, because U+0327 is the Unicode codepoint of combining cedilla, which should be used in that case. Have you tried not using the PUA? Such glyphs should be accessed through the "ccmp" feature:

feature ccmp {
sub t uni0327 by t_uni0327;
} ccmp;

Also, you might try the glyphname "uni0163.salt" and see what happens.

twardoch's picture

Ah, yes, of course. Sorry, my advice was misguided.

Actually, this should be done completely differently:

T with comma should be named uni021A and have the Unicode U+021A
T with cedilla should be named uni0163 and have the Unicode U+0163

More references:

(Note that the latter threads contain more up-to-date information about best practices.)

And a short summary of the entire discussion, with best-practice recommendations as of 2009, that I posted on the FontLab forum.

Ken Krugh's picture

I saw that wiki entry and admittedly did the forehead slap with a Homeresque "DOH!" in frustration. Unfortunately, I can't change what's already in the font otherwise I'd make it "right."

You're advice wasn't misguided at all Adam, you gave me the answer I needed, the ccmp. I knew about that comma vs. cedilla thing and really wanted to make sure the glyph names I was adding were as "safe" as could be and why they weren't being found in my search of the PDF.

I mentioned thinking I'd done this before, and driving home last night dawn broke over Marblehead and I remembered that the font I was working on at the time had a massive ligature feature to which I had added the accented characters I was building.

I've decided to encode my new glyphs in the PUA and name them using "uni" method. I also added the ccmp feature as you suggest and all is well!!

I had thought it was the glyph name only that made the search thing I mentioned work. But it appears that it's the OT feature that makes it work and that the name really doesn't matter?

Many thanks for all the help,

Ken Krugh's picture

Oy! Next question.

As described above I used a ccmp but when I insert that glyph from InDesign's glyph panel it sees it 2 seperate characters. It's using the glyhph I created but when the text is copied from InD (or a PDF generated from InD!) to something else (Word for instace) I get two characters, the t and the cedilla next to one another.

Is there anything I can do in the font to prevent InD from doing that?

We normally import InDesign tagged text files, and when I use the tagged text code for that character (<0xE2A1>) it imports as the single glyph and as well as copies and PDFs from InD as the single glyph. Which (I guess?) makes sense, right?

Oy! (again) and thanks again.

twardoch's picture

It's two characters because that's the only possible way to encode the text in Unicode. If you use PUA, then from the linguistic point of view, InDesign won't "know" what the letter is. And the letter is t with cedilla. In order to process the text (search, hyphenate, change the case when you tell it to change it from lowercase to uppercase), it needs to store the text using Unicode. And the only way to do this is t followed by a combining cedilla. (That assuming that you don't want to use 0163).

Uppercasing is a good example: when encoded as t followed by combining cedilla, InDesign can change the text to uppercase correctly (the result will be T followed by combining cedilla, which again your "ccmp" could replace with one glyph). But if you use PUA, InDesign won't be able to change case, because "logically" this character will be an "anonymous glyph".


Syndicate content Syndicate content