Encoding and Glyphs names questions

PabloImpallari's picture

Hi everyone,
I have these doubts about glyph names and unicode:

00A0: Should be named uni00A0 or nbspace?
.null or NULL?

00B7: Should be named periodcentered or middot?
Its the same as 2219/uni2219? Should I include both?

00AF: Should be named macron or overscore?
Its the same as 02C9/uni02C9? Should I include both?

0394 2206/Delta, 2206/increment or 0394/uni0394?

0130: Idotaccent or Idot?
idotaccent (lowecase) is used by any language?

015E: Scedilla or uni015E?
015F: scedilla or 015F?

0218: Scommaaccent or uni0218?
0219: scommaaccent or uni0219?

0162: Tcommaaccent or Tcedilla or uni0162?
And what about 021A/uni021A?

0163: tcommaaccent or tcedilla or uni0163?
And what about uni021B/uni021B?

0111: dcroat or dslash?

0237: dotlessj or uni0237?

ff or f_f, fi or f_i, fl_or f_l, ffi or f_f_i, ffl or f_f_l?

I´ve been looking at fonts from different foundries, and everyone seems to be doing it in different ways...

John Hudson's picture

Generally speaking, you can always use uniXXXX format name in preference to one of the Adobe Glyph List names: they will both map correctly in Acrobat name parsing. So, for instance, either /uni00A0/ or /nbspace/ is fine. Unless otherwise noted below, this applies to the glyphs you ask about, e.g. S/s with cedilla.

/NULL/ is officially correct, as only /.notdef/ should begin with a dot, but in practice it doesn't seem to make any difference.

Unicode, following earlier encodings, conflated the spacing macron and the APL overbar characters. Designwise, I tend to treat this as a macron.

U+02C9 is properly a high tone modifier sign, and may have different vertical alignment than the macron; it is a character that really only makes sense in the context of some phonetic transcription systems. I don't know why Microsoft started encoding it in their Latin 1 fonts. It doesn't do any harm, but if you are making a font that actually supports phonetic transcription you may want two glyphs, one for /macron/ and one for /uni02C9/

The Greek Delta/increment sign is a tricky one, because different software handles them differently, so it is safest to create two glyphs, name them independently, e.g. /Delta/ and /uni2206/, but ensure that they are identical.

/Idotaccent/ is the Adobe Glyph List preferred name. There is no /idotaccent/ character to which such a name would map. Turkish, which distinguishes İ/i from I/ı, requires special case mapping rules.

I now recommend including both T/t with cedilla and with commaaccent, and using the uniXXXX format names to ensure they are distinguished. The reason for this is that although the T/t with cedilla does not correctly occur in any orthography, when software uses the old Romanian encoding and does not apply locale specific glyph shaping, users prefer both the S/s and T/t to display with the same incorrect cedilla mark than for one to display with the cedilla and the other with commaaccent.

/dcroat/ is preferred.

re. the ligatures. If you want a name-parsed ligature glyph to map back to the underlying letters, then use the _ convention; this is generally the best solution if a font contains only one set of ligature glyphs. If you use e.g. /fi/ that will map to the alphabetic presentation form character, rather than to the underlying letters.

Since the only major software that cares about glyph naming conventions is Acrobat, my recommendation is to examine what recent Adobe fonts are doing, and follow that approach. Note, however, that Adobe try to provide one-to-one mappings, so include duplicate glyphs for e.g. smallcaps that map to caps -- e.g. /A.c2sc/ -- and smallcaps that map to lowercase -- e.g. a.smcp.

PabloImpallari's picture

Thanks John, Awesome!

Synthview's picture

is there any way to correctly map Tcommaaccent and Tcedilla with their respective unicode values in FontLab?
It seems that FL sets 0162 to Tcommaccent, while the correct Unicode is 021A.

Bendy's picture

Yes, there's a bug in the alias.dat file. See this thread.

Syndicate content Syndicate content