Zero-Width Combining Diacritics

amv's picture

I've been trying to get a better handle on exactly how zero-width combining diacritics fit into a font. I've been combing through the Unicode chart PDFs and have found their unicode indices and have integrated those into my glyph generating script, but I can't find them actually being used anywhere in the Pro fonts from Adobe. I figured they'd be likely candidates to support such a feature, but I can't find any actual example of them in use.

Can anyone point me in the direction of fonts that make use of these glyphs? Thanks.

charles ellertson's picture

What are you trying to do? A zero-width combining accent won't automatically position itself over a letter with any layout engines I know -- of course, I don't know that many; mainly InDesign CS2. TeX would center them left-right over a character (not always correct), but not vertically unless you wrote some extra code, & then you could also control the left-right fit.

For OT fonts & InDesign, I make them up when re-working an Adobe font only so that if we get in a text where they have been used (i.e., specified by the Unicode number in the combining diacritics range), we have a good idea which letter they go over. In some texts you might have several such marks around a single letter -- a Latin lower case a with a macron, acute, and ogonek, for example. In this case, before you kern the diacriticals, the macron will overprint the acute, but we can usually determine what's what, and of course, you can always look at the text stream.

If a particular accented letter happens more than about 5 times in the text, I go back in the font & make it up in a precomposed form & write a feature that calls the new character (unless it is already in Unicode). Otherwise, the comp has to kern all of the diacritics by hand.

You can put a "mark" with the individual letters -- numerous threads in the build forum on this -- but I've never done it -- it is a lot of work, and I've never found one "mark" point that will work for all the diacriticals. The argument is made that it will make for a smaller font, and I guess that is true. The larger font doesn't seem to be a problem for us, and I, anyway, find it easier to get thing right when I precompose the letters.

But that's our work situation -- a comp shop -- and you might have a different objective in mind.

Thomas Phinney's picture

Although no current Adobe western fonts use mark attachment, we will be using this functionality in the foreseeable future.
(just posted, should be live in a few minutes)



Michel Boyer's picture

The pronunciation of the french word "main" (meaning "hand") is normally represented (using IPA) with an "m" followed by 'latin letter open e' with a tilde (which represents a nazalized open e). In the DejaVu Sans Serif font, the zero width combining diacritic tilde is placed on the open e using marks and anchors. To make sure that no lig table is used in my back, I added a dot to the open e, as a tracer. Then I used xeTeX with the following input:

and here is the output:

The zero width tilde is properly placed using the anchors only with xeTeX.


amv's picture

Ahh... perhaps I misunderstood then. I've been under the impression that they had more general purpose use. I suppose that explains why I'm not finding them in any of my own faces.

aric's picture


The official policy of the Unicode Consortium is not to introduce any new precomposed characters when such can be represented by a combination of existing characters plus combining diacritics. This approach has some theoretical advantages, especially for encoding text, but it has a number of practical disadvantages, especially for graphically representing text. The overwhelmingly predominant approach of the font design community has been to ignore combining diacritics completely or else to put in a handful of them without adding the mark attachment information to make them usable. (Plantagenet Novus, in preparation, appears to be a refreshing exception to this rule.)

I work on a daily basis with the Aleut language, which can't be represented in Unicode without combining diacritics. The Aleut community is using a rather disappointing font that replaces the characters [ { ] } with accented characters. I mainly use Charis SIL, which has excellent support for combining diacritics. I may switch to Gentium Basic as my default font when the final release comes out. I'd love to see more fonts that supported complex phonetic transcription and the special characters of Latin-based Native American orthographies. Although at present the market is almost entirely served by free fonts, I'd pay money for good-looking, feature-rich fonts that served my needs as a linguist, and I know several others who feel the same way.

aric's picture


That is wonderful news.

Michel Boyer's picture

The overwhelmingly predominant approach of the font design community has been to ignore combining diacritics completely or else to put in a handful of them without adding the mark attachment information to make them usable.

Segoe (© Microsoft) is another exception.

John Hudson's picture

Segoe (© Microsoft) is another exception.

All the MS core fonts were updated to include combining mark support for the Vista release.

Nick Shinn's picture

approach of the font design community has been to ignore combining diacritics completely

I think the community is aware of them, but you can't do them properly with FontLab, so unless a type designer knows Volt, they're unlikely to get put into a font.

charles ellertson's picture


I’d pay money for good-looking, feature-rich fonts that served my needs as a linguist, and I know several others who feel the same way.

Adobe, at any rate, allows the end used to modify the fonts they purchase, as long as they count the modification as one of the allowed fonts -- from the number of copies that can be installed on different computers, or number of computers served from a server.

If, in Aluet, subbing [{}] (which amount to four characters) comes even close to meeting your needs, it isn't hard to make up the needed characters. They don't get a Unicode encoding, but a name such as uniXXXXXXXXXXXX, where the groups of 4 X's are the unicode index of the component characters. Then, add a ccmp feature that tells an OT savy application to sub for the string (name1) (name2) (name3) into uniXXXXXXXXXXXX.

Once you figure out how to use FontLab -- esp. reading in a font that already has class-based kerning, it is pretty simple -- if I were at work, I'd steal & post an example -- but my example would be no better than the examples that come with Fontlab.

Since you are only modifying a font, all the other goodies you want are still there when you recompile the features.

I've looked at the beta version of Gentium basic book, which will print well, I think. But it needs ligatures, kerning, and extended numbers (oldstyle & proportional). Charis too is nice -- Charter is a font I use a fair bit for work in the social sciences -- but again, kerning, ligatures and numbers are absent. I think it would be easier to modify a font along the lines I've suggested that try to add what is needed to Charis or Gentium.

aric's picture

Michel, John, thanks for the information.

Nick, until I read Thomas's blog I wasn't aware that FontLab lacked this support. That does put a damper on things. But the poor support for these characters may also stem from a fear of diminishing returns for time invested to support these features. I'm not in marketing, but I'm not surprised if the demand for these features may not be so great, and certainly don't fault font designers for leaving them out. If linguists and the communities they worked with had chosen more standard characters for their orthographies, a lot of these issues would be resolved. Of course, at the time many of the orthographies were under development, the biggest criterion for new characters wasn't whether they were used in Western or Central Europe but whether you could recreate them on a typewriter. Ugh.

Charles, thanks for the information on modifying fonts to support accented characters. For Aleut, the process you describe is probably sufficient. As far as Charis and Gentium Basic go, they already provide the characters I need for Aleut. I haven't yet tested the limits of Gentium Basic, but Charis supports all the phonetic symbols I need as well as all the characters used by the various Alaskan Native languages. It's not quite on par with Charter in terms of quality, but it's very nice and it's free. I appreciate the info on the Adobe EULA and will bear that in mind as projects come up; one of these days I'm going to learn how to modify a font. But most linguists, and probably most academic presses, indigenous communities, etc., don't have the time, the skills, or the desire to modify fonts, and for them, fonts that provide the necessary characters and features out of the box would be just the ticket.

anagnost's picture


Volt is not the only tool which allows to add combining marks. In particular FontForge supports 'mark' and 'mkmk' features, so many fonts designed with FontForge (like Peter Baker's Junicode or my own Old Standard) position combining diacritics properly.

Nick Shinn's picture

Alexey, perhaps I would have been further ahead in my (as yet unreleased) multi-language fonts if I had developed them in Font Forge from the beginning. When I started on them over three years ago I was coming from a background of commercial type design, making Type 1, Latin 1 fonts, first with Fontographer, then with FontLab, on a Mac. So I chose those tools and formats because they are the norms in my line of work, and it's important to stay connected.

Also, I didn't have quite such a clear goal as you did with Old Standard. I had a similar sentiment to you, no doubt--an appreciation for the 19th century Modern style, and its tone of scholarship. But I didn't originally intend to do Cyrillic and Greek, let alone polytonic Greek, it just kept growing! Not because I had a specific market of scholars in mind, but because it seemed an interesting thing to do, and commercial font development is moving in a multi-lingual direction.

I realize my fonts will not have quite as much language support as yours--but they should be reasonably serviceable with their precomposed characters. And they have small caps! The combining accents are in the fonts, but not connected. I think the main thing I should do is release them fairly soon, and perhaps revise them later with "mark" features.

Syndicate content Syndicate content