Another brick in the Unicode wall

Igor Freiberger's picture

Some questions regarding diacritics, precomposed glyphs and Unicode, whose partial answers I found in other threads:

Unicode philosophy does not admits further inclusion of letters with diacritics if they can be achieved combining glyphs already coded. So glyphs like Yoruba's E acute with dot below will not be included in any future Unicode version: they must be mounted in client program (text editor) using base glyph plus combining diacritics.

The better way to get these combined glyphs without codepoint is mark feature. But this is still not supported by FontLab and most client programs. Even if we could do it with FontLab, there are kerning problems caused by some diacritics.

Let's say you use mark feature to build something like ï. Here, diaeresis go further than the base glyph limits so the kerning bust be adjusted.

(1) How to handle kerning issues produced by diacritics in a mark scenario?. The font needs to have specific kerning exceptions for each combined glyph whose diacritics produce this effect?

In other hand, you can simply ignore mark feature by now and add the precombined glyphs you need to the desired language support. Typed sequences are defined as substitutions in ccmp feature and no marks are necessary.

Anyway, these precombined glyphs have no codepoint as they are out of Unicode specification. Without a codepoint these glyphs suffer some limitations: they cannot be inserted by keyboard codes or Windows Character map, they cannot be used in replace commands (except ID CS4+) and they are not searchable in PDF documents. Even the keyboard layout editors cannot reach them as these editors work based on Unicode.

(2) Is not a reasonable idea to set codepoints from PUA to these precombined glyphs?. I see this is not ideal and far from Unicode original idea, but this method was adopted in some fonts. Although I understand this option has drawbacks, the advantages does not pays off?

Finnally, combining diacritics are used both in mark and as components to precomposed glyphs. They must have zero width so they are keep before the zero limit. I cannot find any instruction about the position for these diacritics within this negative space. What I conclude from some fonts: usual way is to put the combining diacritic aligned as if there is an o before zero point. If you have uppercase variants, align them with O. Like this:

.
(3) Is this a correct criteria or there are other issues related with combining diacritics positioning? Actually, using anchors to handle components the position is not relevant. But for mark this seems to be essential.

Sorry for the long post, but I was not able to describe these details in a shorter way.

John Hudson's picture

1. If using mark positioning, then you need contextual kern lookups to avoid collisions. For kerning to glyphs preceding the diacritic, you need a pair adjustment that uses the following mark glyphs as context. For glyphs that follow the diacritic, you can kern off the marks, although you need to take into account the possibility of stacked marks and mark order.

As I've expressed elsewhere on Typophile, the interaction of base glyph spacing and positioning (kerning) with mark positioning, is the weakest aspect of the OpenType architecture. The OT lookup structure makes this kind of work both arduous and inefficient.

2. Avoid PUA codepoints like the plague. They should never be used for anything with semantic content. The correct way to handle precomposed diacritic glyphs representing combinations of Unicode base+mark(s) characters is to map them with ligature lookups in the 'ccmp' OT Layout feature.

3. There is no standard offset for marks on their zero widths. I tend to offset them as you have shown, over a medium width vowel such as o, since this ensures that they will have at least reasonable positioning even in software that doesn't support GPOS mark positioning. But it is just as valid to e.g. optically centre them on the zero-width. What I strongly recommend is that you offset them consistently, so that the same anchor or composite position can be used for all marks on a common base glyph.

Igor Freiberger's picture

Thanks a lot, John. As usual, you give excellent information.

To apply kerning to marks is really complicated. Even if FL6 supports marks, I'd prefer to use precomposed glyphs – at least by now.

I still did not include my precomposed glyphs in ccmp feature. After this is properly set, user could get the precomposed glyph through code uniXXXXYYYY? Of course, one needs to use a consistent naming schema to make this work.

Jens Kutilek's picture

But this is still not supported by FontLab and most client programs.

Which applications do not support the mark and mkmk features? I know a few which do:

  • TextEdit (tested on 10.6)
  • InDesign (tested: CS3)
  • Word (tested: 2010)
  • Internet Explorer (IIRC)
Igor Freiberger's picture

Yes, some support. But many not: Word (early versions), TextEdit (early versions), Photoshop, Illustrator, Quark-X-Press, Scribus, CorelDraw, Xara, ACD programs and most browsers. Also note that older versions are still largely used.

Syndicate content Syndicate content