VOLT programming for Malayalam script

jonpinhorn_type's picture

I'm currently in the process of using Microsoft VOLT to programme a Malayalam typeface and I have run into a few problems that I just don't understand how to fix. I'm currently testing my file in Microsoft Word 2010.

1, Malayalam has glyphs known as chillaksaram which, as I understand it, are formed through inputting a string like this:
na (ന) + virama ( ്) + ZWJ [Control+Shift+1]= ൻ.

Within VOLT, I have included the relevant strings into a post-base substitution lookup and feature. This lookup sits below Akhand and Below-base form lookups that I have created.

2, Malayalam uses a rakar glyph that substitutes ra (ര) + virama ( ്). The substitution itself works but it doesn't repositioning to precede it's base consonant. I assume this is the job of the shaping engine and so I do not understand what action I'm either not doing or preventing from happening?

3, Finally, and more generally, I'm unsure which script I should be using; Malayalam or Malayalam v.2 , or both? I need support for both Windows and Mac OSX systems. I often find problems that vary between them. I have been trying to follow closely the specifications laid out in Microsoft's Opentype Specification manual: http://www.microsoft.com/typography/OpenTypeDev/malayalam/intro.htm

Hopefully someone can point me in the right direction.

Thanks in advance

Jonny Pinhorn

VOLT_errors.jpg113.01 KB
Theunis de Jong's picture

Just to make sure: can you test the same string with WordPad? Its Unicode engine should be the same, but it misses lots of the extra "fluff" Word carries along. (And if I remember correctly it should even work with Notepad.)

jonpinhorn_type's picture

I've just checked, WordPad contains the same rakar and chillu problems. Thanks for your speedy response, Theunis.

John Hudson's picture

Sorry not to have responded sooner; I just noticed this.

The current OT Malayalam spec is somewhat misleading with regard to what the shaping engine actually does. My guess is that eventually the shaping engine behaviour will need to be revised, because it's sort of broken; in the meantime, though, it is possible to make a Malayalam font that works reasonably well, but you have to put all conjunct ligature lookups in the {akhn} feature and not use the {cjct} and {pres} features as you would for other Indic scripts.

In response to your particular questions (using my conventional development glyph names):

1. Chillu substitutions should be the first lookup in the {akhn} feature, processed before any others. The sequence you have is correct: letter + virama + ZWJ. Note that you might also want a ligature for
mNa mVirama ZWJ mRra -> mChilluNRra
which should precede the shorter chillu strings in the lookup.

2. For v.2 shaping, the input for the prescript rakar is
mVirama mRa -> mRakar
and this lookup must be associated with the {pref} feature. The layout engine knows to reorder the output of the {pref} feature to the beginning of the cluster.

3. You should only need to implement the Malayalam v.2 shaping and can ignore the older spec. The v.2 shaping is used in all versions of Windows from Vista on, and Malayalam is also one of the scripts for which Adobe's World Ready Composer implements the v.2 shaping, albeit with some bugs. [See http://www.typophile.com/node/94543 ]

Malayalam is the most difficult Indic script for which I've had to program in VOLT, not just because of the issues with the shaping engine and spec, but because the orthographic conventions of how Malayalam is written represent a subset of capabilities of the writing system. In modern Malayalam, conventional preference dictates that some conjuncts must be written with ligatures, some with subscripts, and some with explicit virama, while other are acceptable in more than one form. This means your VOLT programming needs to handle prioritisation of particular conjunct formation using contextual rules. So, for instance, when three letters occur in a conjunct, you need to know which two-letter portion of the conjunct takes preference as a ligature or subscript form and where an explicit virama should be.

I'm afraid I don't have this information in a form that I can publicly share yet.

jonpinhorn_type's picture


On closer inspection, I have found that my Malayalam font has some strange errors that don't make sense to me. Having made the alterations you suggested above my chillu forms substitute correctly and my rakar form now correctly jumps to it's correct pre-base position but now two new issues have occurred.

All but one of my below-base marks substitute correctly on Windows but not within Indesign CS6. Why only the one consonant (mLa) substitutes and not the others is puzzling. I'd have thought it was all or nothing?

The other issue is that I cannot make rakar ligatures in Windows or CS6. For example:
mPa mVirama mRa -> mPra

I've also tinkered with the strings in these ways without success:
'mPa mRakar -> mPra' or 'mRakar mPa -> mPra

I assumed they have to be held within the AKHN lookup because you stated above that's where all conjunct ligatures should be located?

The Malayalam example within the 'CS6IndicTagTest.pdf' document you kindly shared on Typophile shows neither below-base conjuncts or rakar ligatures where there is the potential for them; this suggests it's not yet possible to implement these within CS6 as we have previously discussed?


John Hudson's picture

I've not done very much testing of Malayalam in InDesign, so it is entirely possible that there are problems beyond those I noted in my earlier documentation. All my Malayalam work to date has been targeting Microsoft's implementation, and that too with the caveat that the shaping may need to change in future.

You are handling the below-base substitutions for consonants in the {blwf} feature, yes?

I have to admit that subscript -La is an oddity: something I can make work without really understanding why it is done this way. It has its own {blwf} lookup (which isn't unusual, given that most of the subscripts have context exceptions), but then the conjunct ligatures involving subscript -La are handled subsequently in {blws}, rather than in the {akhn} feature with the other ligatures. I first noticed this in Microsoft's Kartika font, but no one at MS nor my Malayalam contact have been able to suggest why this might be so. I'm hoping I might get this cleared up when I look at the Malayalam shaping engine issues in more detail later this month.

When you refer to rakar ligatures, I presume you mean traditional lipi forms. I've not had occasion to try to implement these, as all my Malayalam fonts have been for the reformed orthography. But my assumption would be that these substitutions would need to be performed in the {akhn} feature, before the {pref} feature substitutes the prescript -Ra, because the latter would trigger reordering by the layout engine. So I would expect (using my glyph naming conventions)

mPa mVirama mRa -> mPRa

in the {akhn} feature to work. If it doesn't then

I can probably spare half an hour or so to look at your VOLT source, if that might help.


Syndicate content Syndicate content