The Kingdom of Siam / Thai font diacritics

xensen's picture

The Asian Art Museum in San Francisco is publishing a major catalogue of classic Thai art from the former kingdom of Ayutthaya, to be called The Kingdom of Siam. The desire is for the catalogue to reflect the style of earlier Western writings from around 1700 (in other words, around the time of the Romain du roi and early Franco-Dutch typography). A sample page of one of the models is provided.

The typeface must also accommodate Thai diacritics, which have a couple of peculiarities, notably a backwards cedilla that has to combine with a macron, and a breve-like finishing stroke to some of the glyphs, which also must combine with a macron. I will upload funky approximations of these from a poor-quality source.

I'm prepared to create the special characters if I have to, but I would prefer to find an existing typeface that meets the project needs. To further complicate things, all of my work is postscript but some people need to work with the diacritics on non-postscript Windows machines. Maybe open type is a way around this?

Any suggestions much appreciated.

hrant's picture

OK, so now I have a "cultural" question again: if you were forced to decide on a way to reduce the vertical height of a sara_ii/sara_uee followed by a mai ek, what would be least offensive?

hhp

John Hudson's picture

But there you can see the bad news: the fifth most frequent pair is a top tier riser...

Surely it would be worse news, in terms of impact on linespacing, if such sequences were very rare, i.e. if Thai required tall linespacing only to accomodate rare sequences. The fact that such sequences are so common makes the linespacing justifiable, and your suggested innovations less so.

hrant's picture

No, because mostly you're worried about collisions with sara_u and sara_uu (the bottom accents - phinthu is too rare to worry about). If either the top or the bottom accents were very rare, there would be less of a problem - because you can never totally avoid collisions (unless economy is a non-issue). Think of the accented caps in Latin.

But I guess it's a matter of thresholds, and an accent would have to be very rare to be safely ignorable. Looking at the frequencies, mai_han-akat/mai_tho isn't rare enough either, so fooling only with the sara_ii/sara_uee might be pointless. On the other hand, mai_han-akat might be able to "cradle" a higher accent in a way that sara_ii/sara_uee can't. So you still gain a bit just by addressing those. If you work your way down and start fooling with the worst cases first, and eventually stop at a good threshold, you can gain some room. But it has to appear consistent too, although never really 100%.

hhp

hrant's picture

Or maybe we should just worry about sara_u and sara_uu? Chanop, do you see any decent way of making those two use less vertical room?

hhp

chanop's picture

Here are some rare and not so rare combinations

which sometimes clash with upper vowel+tonemark combination.

I have no idea yet how to make sara_u, sara_uu use less room. They looked compressed a little bit already in LucidaGrande and Tahoma.
They are ok for on-screen reading, but I wouldn't use it for print though. I haven't come acroos (noticed) a font with compressed vertical proportion and still looks good. BTW Tahoma was one on XP, neither upper vowels, tone marks, nor lower vowels want to be shaped by panther engine. It's AAT vs OTF thingy, I guess.

hrant's picture

Like I was implying, the fact that the 3rd and 4th characters in that image sometimes clash with the lower line should be seen as unavoidable; if they're rare enough it's not worth sacrificing so much economy for them.

BTW, does tho_than ever take a lower accent? That would be the worst one.

> They looked compressed a little bit already in LucidaGrande and Tahoma.

Much more damaging though is the dysfunctional vertical congruence between the two scripts.
Sickly Modernism.

hhp

John Hudson's picture

Much more damaging though is the dysfunctional vertical congruence between the two scripts.
Sickly Modernism.


In the case of Tahoma and Lucide Grande you're not looking at modernism, sickly or otherwise: you're looking at the compromises of adding Thai script support to an existing font in which you need to maintain existing vertical metrics. Note that these are both system UI fonts, which means they need to respond to specific technical requirements before typographic or cultural considerations.

hrant's picture

Alignment is a form of Regularity, and Regularity is a cornerstone of Modernism (you could even say of Western thought). I certainly understand that there's an issue with apparent size versus vertical span limits (which is why some people scale down the Latin part of a Thai/Latin system), but making the Thai bo-height match the Latin x-height sacrifices functionality for no good reason. On the other hand maybe you're saying that system (or "UI") fonts have to be set flush (no leading), so I guess it's a special problem. My questions would be: is Lucida Grande really a UI font? Do non-UI Latin/Thai system usually not suffer from this?

hhp

matteson's picture

>does tho_than ever take a lower accent

AFAIK, there's a seperate glyph (typically in the PUA) that is tho than without the lower mark. It combines with sara u, sara uu and phinthu. So it's really not as big of a problem as it may seem.

John Hudson's picture

AFAIK, there's a seperate glyph (typically in the PUA) that is tho than without the lower mark. It combines with sara u, sara uu and phinthu. So it's really not as big of a problem as it may seem.

PUA = bad.
OpenType contextual glyph variants = good.

In my OT Thai fonts, I have variant forms of yo ying and tho than without the below mark, and these are contextually substituted when followed by a low combining mark. The combining mark replaces the normal below mark on these letters.
Yo Ying and Tho Than w/ combining marks below
Note: I cannot show my Thai typeface yet, so the illustration uses Linotype's Sukothai.

matteson's picture

>PUA = bad.

Yeah, that's what I keep hearing. Obviously this isn't the thread for discussing it - perhaps I'll start a different one. But doesn't the MS shaping engine (and/or the 874 codepage) for Thai access the markless tho than and yo ying from uniF700 and uniF70F? Or is something else going on?

In fact, when I've looked at the Thai fonts that ship with XP, both those characters, and all the variants of the above and below marks, are in the PUA. From F700 to F71A. Is this just for compatability with non-OT apps?

matteson's picture

Speaking of the thread: how's the project going, Tom?

John Hudson's picture

The current MS Thai engine (Office 2003), handles this glyph substitution in two ways, either by doing a buffered character substitution to the PUA codepoint or by applying OpenType Layout features. The latter is preferable; also one can use OpenType for additional substitutions, not handled on a character-to-PUA basis by the basic engine (e.g. ligation of po pla and mai han-akat and substitution of small stacking forms for tone marks).

The kind of buffered character substitution used in the older approach is the most benign kind of PUA use -- certainly much better than actually using PUA characters in the text stream --, but it is still reliant on wholly private agreement about which PUA codepoints to use. There is nothing to stop another software vendor from specifiying a different PUA for the same Thai variant glyphs, in which case your fonts only reliably work in specific, narrow environments.

matteson's picture

John, do the OpenType Layout features work equally well in Macintosh environments? Also, the documentation on Microsoft's Thai shaping specifies the mark & mkmk features—and says that they're required. Are these GPOS lookups absolutely necessary for the shaping to work? Or is it possible to do everything with GSUB?

E.g., can you substitute mai ek (0E48) with its variant at F70A when it follows a consonant—rather than positioning it lower? Or is that inadvisable because it destroys the underlying text stream?

Does this even make sense?

Thomas Phinney's picture

Sadly, no, unless the Mac app completely rolls its own support for the mark and mkmk features. Apple does not yet support any OpenType layout features other than GPOS 'kern' and even that is rather spottily done.

One can do a lot of this via GSUB, although it may be less efficient in some cases (particularly for TrueType fonts, it will bloat the font file size a lot). Assuming the app does things in a sane fashion, GSUB won't destroy the underlying text stream because the app stores the *base* Unicode prior to text processing.

Cheers,

T

John Hudson's picture

You could use GSUB for everything, but GPOS is much more efficient and provides finer control in a greater number of situations.

There are essentially two ways to do Thai solely with GSUB:

1. Make a precomposed glyph for every possible combination of base + mark(s) and use the <ccmp> feature to access these as ligatures.

2. Create a variety of combining mark variants at various heights and different horizontal offsets, and contextually substitute these using the <calt> feature.

The second method is certainly practical, but you don't escape from GPOS completely -- I'm guessing that your question might stem from a desire to do all your work in FontLab, which doesn't support much in the way of GPOS --, because Thai requires contextual kerning for tall vowels following stacks.

John Hudson's picture

Speaking of Thai kerning, note also that for some kerning you want to ignore combining marks, i.e. kerning between base letters when the presence of combining marks above or below has no impact on the desired base-to-base spacing, but at other times you want to kern off the combining marks, e.g. when sara o follows a combination of bo-height base with one or two marks above. So you need to appropriately set the LookupFlag bits in individual lookups.

matteson's picture

I've actually done some work with VOLT, but I've found that support for GPOS lookups among different apps is a bit spotty. That drove my initial thought to use GSUB instead. I thought it might enjoy more universal support. Though I could be wrong about that. I haven't done anything with it in a while, but I was following your second option: using <calt>. It seems like that's essentially what the MS shaping engine does with the PUA points.

>when sara o follows a combination

John, I'm assuming from this that you kern sara o with preceding bo-height consonants? And that when above marks are present, sara o needs to shift right to avoid a collision with the vowel and/or tone. Personally, I've never seen sara o, sara ai maimuan, or sara ai maimalai kerned with preceding consonants. I've always assumed (perhaps wrongly) that it was because those three vowels attached to the following consonants. Although it's probably dumb on my part (or overly wishful) to think that phonology/morphology would affect typical kerning conventions.

matteson's picture

Hmm. I think I realized last night on my way home that I was mistaken in my thinking about your kerning, John. I can be pretty thick so, at the risk of derailling this thread slightly...

ko khai | sara o = no kerning, normal letterfit

ko khai | sara ii | sara o = positive kerning to correct the space between the 2 vowels, which would otherwise be too tight

ko khai | sara ii | mai ek | sara o = same kerning as above, but it has to be contextual—i.e., a higher level GPOS lookup(?)—because mai ek sits between the characters that need to be kerned

John Hudson's picture

ko khai | sara o = no kerning, normal letterfit

Correct.

ko khai | sara ii | sara o = positive kerning to correct the space between the 2 vowels, which would otherwise be too tight

Correct.

ko khai | sara ii | mai ek | sara o = same kerning as above, but it has to be contextual

Bendy's picture

Resurrecting this ancient thread...

I'm trying to get to grips with the IgnoreMarks declaration in relation to contextual kerning in Thai. I have kerned the sara ii and sara o. I am wondering whether I can avoid the need to explicitly list contextual kerning triplets by putting in an IgnoreMarks statement to ignore any additional marks above the sara ii. Or would IgnoreMarks mess up the kerning between the sara ii and the sara o?

John Hudson's picture

You can specify groups of marks to ignore in the IgnoreMarks statement, so what I do in Thai fonts is to put all the secondary above marks into a separate group, and just ignore those when kerning between the primary mark and a following ascender.

[I'd forgotten all about this thread. I was surprised to read all that stuff that I'd apparently written.]

Bendy's picture

Thanks John, I was hoping you'd reply. I'm glad the straightforward approach will work. I was also wondering about a third way of doing this by using alternates of the kerning (tall) vowel signs with different sidebearings to substitute after a double stack. I guess there's more than one way to do all this, but the ignore way seems to be the neatest. Thank you.

Bendy's picture

I'm not sure I've understood exactly. This is what I thought would work:

The marks to ignore, alternate versions of tone marks that are smaller and higher:
@tone2 = [t.maiek.alt t.maitho.alt t.maitri.alt t.maichattawa.alt ... ];

Then in the kerning feature:
lookup kern1 {
script thai;
lookupflag IgnoreMarks @tone2;
pos t.voweli t.vowelo 80;
...
} kern1

Looks like I haven't understood the syntax fully, as it's not compiling.

John Hudson's picture

I'm afraid I can't offer much help with AFDKO syntax: I do all my OTL work in VOLT. But if AFKDO follows the font table structure in this regard, the process marks flag would be inclusive, rather than exclusive, i.e. you define a group of marks that you want to be processed, not to be ignored. So, for example, I define a group of 'above_marks_main', which includes only the full-size vowel and tone marks that apply directly to the base letter, and not the small variant tone marks that would sit above the main marks.

I apply the kerning between tall vowels and preceding marks as a DX and Width adjustment to the second glyph, i.e. to the tall vowel, rather than as a width adjustment to the mark glyph. Windows likes glyphs classified as marks to be zero-width, and while it is possible to use a separate GPOS lookup to give width to individual mark glyphs, in pair positioning adjustments they should remain zero-width. This is how it looks in VOLT:

Bendy's picture

Thanks John. The VOLT way you describe then still kerns this pair when there could be secondary marks stored in between? I'll see whether I can figure this out with the FDK, otherwise VOLT may be the way to go.

Syndicate content Syndicate content