the Hell of ligatures..

lama su's picture

Hi!

I have a problem: I am working with fontforge on a font for a very complex artificial language (with hundreds of glyphs) that needs a lot of complex ligatures (with for example sovraposition of two or more glyphs).
Something like tibetan, if you want.

I am encoding these glyph in the the unicode private area, and therefore, for istance, I have the following ligature: uniE2C2 + uniE2C2 = uniE2C3

To compose them i use the ccmp funtion.

Now, on my Mac these ligatures work perfectly, but if i try to put two uniE2C2 in text editor on windows (like Word, or even openoffice) they don't make any ligature...

My problems are:

why?

what have I to do?

Is it possible that it is due to the Unicode Private Area?
And indeed, given that these ligatures are for an artificial language and for an artificial script (that therefore, obviously, doesn't appear in the fontforge lists), what have i to set in the language aoptions? Default? or Martian?

Thank in advance

Fontgrube's picture

Opentype features are not supported by every application. Word 2010 AFAIK should support them (beta version), but no earlier version. OpenOffice.org does not give them much priority in its further development.

Andreas

PS: Maybe the Martians should simplify their writing system when going interplanetary. The last Martian invasion was stopped by a flu virus; the next one will be prevented by Microsoft's lack of OT support (SCNR).

lama su's picture

uhm.. ok for office, so i have to get an office 2010 to check..

but about openoffice, a strange thing happens: my font works well on openoffice for mac, but it doesn't work in the same version of openoffice for windows.. is it possible/normal?

there is really a double problem, with MS office and with windows in general?

And i was thinking.. given that office is able to support ligatures for some scripts (arabic, devanagare,..), there is a way to "cheat", leading the text editor to consider my glyphs in the private area as indian or arabic glyphs, with their relative ligatures?

Thanks!

P.S. "indipendece day" fan? :-)

Michael_Rowley's picture

'I try to put two uniE2C2 in text editor on windows (like Word, or even openoffice) they don't make any ligature'

Word will insert any character for which there is a glyph, even if the private use area has to be used, if you are using the right procedure for Unicode characters. You must type the four UTF8 digits and then Alt+x (if you have Windows). Of course, there has to be a glyph in the font. You can also find if there is a glyph in the font by using Search in Windows Character Map.

John Hudson's picture

Windows OTL support is driven by the Uniscribe shaping engine, which makes decisions about what script is being used and what features to apply based on Unicode encoding and text analaysis. My guess is that Uniscribe simply doesn't make any assumptions about PUA codepoints, and applies no layout features to them. Your report of Mac behaviour suggests that Apple have chosen to apply some (which?) features to PUA codepoints. I can see an argument for either approach. Obviously it is more helpful for you if some generic layout features such as ccmp are applied to PUA codepoints; on the other hand, since a PUA character can be anything, who is to say which layout features are appropriate?

twardoch's picture

John,

it's not even the question of which features to apply, but above all: in which OpenType Layout languagesystem the features should be registered. Uniscribe examines each Unicode codepoint of a string and assigns a known languagesystem to it, and then splits the Unicode string into "runs" that are of the same languagesystem. Then, for each run, it applies the OpenType Layout features registered in the font for the corresponding languagesystem.

Interestingly, the Windows 7 Uniscribe assigns the "DFLT" languagesystem to U+E000 and to U+F8FF, but no languagesystem to any of the PUA codepoints in-between. So in essence, Uniscribe doesn't "know" *which* languagesystem to apply the features for, and therefore it applies none.

I guess it might make sense to suggest to Microsoft that future versions of Uniscribe should assign the languagesystem "DFLT" to all PUA codepoints, rather than just to the first and the last. But perhaps Microsoft had a good reason to do it the way they have done.

Adam

lama su's picture

Thank you for your aswers!

So the problem is in the in the languagesystem, or in the absence of a languagesystem assigned by Microsoft to PUA.. I feared this possibility.. :-(

twardoch, can I ask you where have you found the information about the languagesystem used by windows 7 uniscribe?

There is maybe a table or something like that showing which languagesystem is assigned to which unicode codepoint?

Because if this is the problem, then i could replace my glyps in some unicode codepoints assigned to a languagesystem supporting the ccmp.. right?

And by the way.. what about the 2 bite codepoints, like 0x13000 and so on? do you not which languagesystem is assigned to them by Microsoft, if any?

twardoch's picture

> can I ask you where have you found the information about the
> languagesystem used by windows 7 uniscribe?

We wrote some simple code that called the new ScriptItemizeOpenType() Uniscribe function on one-character Unicode strings (each being one Unicode character from the Unicode Standard, version 5.2). This call returned an array of OpenType script tags, or basically , one OpenType script tag per character. For the PUA characters, Uniscribe returned SCRIPT_TAG_UNKNOWN (0x00000000).

Adam

sergeym's picture

> if i try to put two uniE2C2 in text editor on windows (like Word, or even openoffice) they don't make any ligature...

I can't tell what Word is doing exactly with PUA characters, it may be intentional and hardcoded or may be accidental leftover from older versions. But I see at least one reason for Word developers to not shape them. Windows supports so called end-user-defined characters (EUDC), used widely in East Asia. They can be associated with all fonts or particular font and will be automatically displayed in place of PUA character. This is done by GDI automatically. There are good and bad sides of this approach. Good news for application is that it should not do any special processing, EUDC just works. But there is also a problem. EUDC can only be substituted on character level, which means that once you switch to glyph mode and go to Uniscribe for shaping, EUDCs stop working. So blind shaping of any character is not what Word can afford doing.

> For the PUA characters, Uniscribe returned SCRIPT_TAG_UNKNOWN (0x00000000).

There are two tags associated with script in Uniscribe. First, there is one applied by default when you call old API (ScriptShape or ScriptPlace). This tag may or may not be mandatory. For example, Arabic text should always be shaped under 'arab' tag. ASCII digits are shaped with 'latn' by default, but can be shaped with any script tag. This is what script tag returned by ScriptItemizeOpenType means. Specific tag returned means it is enforced, and function like ScriptShapeOpenType will return error if other tag is passed. If returned script tag is SCRIPT_TAG_UNKNOWN, client can pass any tag based on additional information it has about the document, user settings, etc. For PUA characters script tag is not enforced, but it will be shaped with 'DFLT' tag by default.

> Interestingly, the Windows 7 Uniscribe assigns the "DFLT" language system to U+E000 and to U+F8FF, but no language system to any of the PUA codepoints in-between.

I am not sure what you mean, but it sound strange :). All PUA characters are classified the same in Uniscribe, so should have exactly same properties.

Thanks,
Sergey

twardoch's picture

Sergey,

> I am not sure what you mean, but it sound strange :). All PUA characters are classified
> the same in Uniscribe, so should have exactly same properties.

Thanks, it must have been my method of testing for the singular characters that may have been wrong. Basically, I tested one-character strings each time (precisely because I did not want the script-neutral characters to inherit the shaping properties of some neighboring characters in the string). Or maybe I actually skipped the PUA :)

Syndicate content Syndicate content