Combining diacritical offsets

Ray Larabie's picture

When designing zero width combining diacriticals, how should I determine how far to the left they should be placed?

Seymour Caprice's picture

I use John Hudson's method and it seems to get the desired effect:

1. Design the spacing diacritic mark on the width of the lowercase o.

2. For the combining diacritic mark use the spacing mark, then drag the left sidebearing onto the right.

The result is a combining mark that will give good position over a preceding a, e or u -- and perfect position over the o.

Ray Larabie's picture

I hope I wasn't far off: I was setting the left "neighbor" in FontLab as an o and aligning to that. (checks) Yup, same result.

I don't have much experience using combining diacriticals. When a font is used, how does that resulting accent end up in the correct spot over both a dotless i, capital W etc?

Ray Larabie's picture

What happens to below diacriticals in italics? Should I offset them as if they were under an italic o, in which case the below accents end up a bit further to the left or should they align with their upper counterparts. For example: uni030A (ring comb) vs. uni0325 (ring below comb)

Theunis de Jong's picture

There is no perfect way, and (hopefully) people working with your zero-width diacritics are aware of that. Unless you make your font monospaced, of course! ;-)

Theoretically, lower diacritics in italics should be offset a bit to the left, just as the upper ones should be offset to the right. I use something like an old-style 'phi' character -- an 'o' with a slighly slanted vertical line through it -- to visually align diacritics. But that's only going to look perfectly on the o's, slightly less so on a, e, g, n, and u, and like crap on an i, f, m or w; it's the same here as with the roman accents, only angled slightly.

Theoretically (again), you should use the Opentype "mark" and "mkmk" features, which are designed exactly to deal with this; but I find programming them incredibly difficult, and the output software I use (Adobe InDesign) still doesn't support these features to the level I need it to.

Theunis de Jong's picture

As for this:

When a font is used, how does that resulting accent end up in the correct spot over both a dotless i, capital W etc?

these are the worst case scenarios, so I made them as precomposed characters with "the usual accents" and used "rlig" to look them up (I think nowadays I'd probably use "ccmp"; only on principal grounds, 'cause I don't think it really makes that much a difference).

charles ellertson's picture


I do what Theunis does, except I always make up a precomposed character (and use ccmp). Remember to name the precomposed character with base names -- the *i* rather than the *dotlessi*. So, for example, an i with a macron and acute could be named uni012B0301, or uni006903040301. There would be no Unicode index assigned.

The two possible names brings up one weakness in the system. AFAIK, there is no Unicode requirement to use one or the other. Since you never know what the user will do, safest is to have, in the ccmp

sub i uni0304 by uni012B
sub uni012B uni0301 by uni012B0301

(and remember that in FontLab, for example, uni0301 might be named "acutecomb", in which case the second item would be *sub uni012B acutecomb by uni012B0301*)

which covers the two ways a user could enter the character. If you have a lot of these though, writing the ccmp feature can get pretty complex. See, for example, the ccmp for SIL's Charis.

Khaled Hosny's picture

I usually make precomposed glyphs for ones in Unicode and use 'mark' feature for the rest, no one complained about 'mark' so far, but fortunately FontForge makes it dead simple to build precomposed glyphs based on already in place 'mark' anchors so it shouldn't be hard to support 'ccmp' scenario if requested.

Igor Freiberger's picture

Firstly I was using precomposed glyphs, but this causes very large fonts if you plan a wide language support and includes small caps (or, even worse, also petite caps and swashes). So I changed my procedure to this:

1. I added all precomposed base+diacritic coded in Unicode according to the language support.

2. Troublesome combinations, although unencoded, are also added as precomposed glyphs to produce proper design.

For example: open E with ogonek has no codepoint in Unicode. But the ogonek connection is hardly good when associating base open E with the 'generic' ogonek, so I made a precomposed glyph (sample). The same with overlines and underlines, which length may coincide with base letter width. This is not possible combining 'generic' over/underlines with base glyphs and thus I added these as precomposed.

Other tricky diacritics include slash, cedilla, horn and hook.

3. Remaining combinations will be done with mark/mkmk features and some contextual kerning (which can be set in the mark/mkmk code). As this feature is not supported by FontLab 5, it need to be made with other tool (as Khaled does) or you wait for FL6.

Typical candidates for contextual kerning are base letters narrower than diacritics, as |f|i|j|l|r|t|.

4. Further improvement can be done with variations in combining diacritical marks: different glyphs for uppercase and lowercase use, for over and below positions and also for stacked diacritics (for example: dieresis and circumflex would be combined in a precomposed glyph where their design is changed to get better result).

Theunis de Jong's picture

Charles, on your example (i + macron, zw acute):

.. . Since you never know what the user will do, safest is to have, in the ccmp

sub i uni0304 by uni012B
sub uni012B uni0301 by uni012B0301

I'd have to have my OT code in front to check, but wasn't it possible with ccmp to first decode an i_macron glyph first into a 'loose' i and macron, then recode them again as i macron acute to get your singular glyph "i with macron and acute"?

charles ellertson's picture


As I remember, you can't sub many from one in OT features. But even if you could, you would loose your imacron -- Unless you meant to break it apart, do all you combining with other accents, then rebuild it -- which would be just as much work.

At our shop, we solve the issue by running a script on an incoming manuscript, so every Unicode character with a codepoint is so encoded, even if the author used combining accents in the manuscript. That means I know imacron is already a single character. But a customer using a script isn't something a font designer can count on . . .

Ray Larabie's picture

Thanks, everyone. It's all very helpful.

Syndicate content Syndicate content