Diacritics are treated differently in the Arial font whether the base character is a latin, or a greek, character.

Belloc's picture

The figure below was obtained in Word using the Arial font. They show, first the LATIN SMALL LETTER A (0061) progressively combined with the diacritics COMBINING ACUTE ACCENT (0301), COMBINING TILDE (0303) and COMBINING OVERLINE (0305). When I try the same combination with the GREEK SMALL LETTER ALPHA (03B1), one can see that the COMBINING ACUTE ACCENT is placed normally on top and at the center of the character α. But afterwards the diacritics are displaced to the right and upwards. Why does the Arial font do this ? What is the purpose of these displacements ?

quadibloc's picture

I would guess that this has something to do with making characters like ἢ or ἳ come out right when they are built up from combining forms - for writing polytonic Greek, where the font does not have the prebuilt characters available.

Belloc's picture

Quadibloc

Thanks for your reply. I certainly agree with you, specially because of this paragraph, on pages 44/45 of Chapter 2 General Structure of the Unicode document :

"Such override behavior is associated with specific scripts or alphabets. For example, when used with the Greek script, the “breathing marks” U+0313 combining comma above (psili) and U+0314 combining reversed comma above (dasia) require that, when used together with a following acute or grave accent, they be rendered side-by-side rather than the accent marks being stacked above the breathing marks. The order of codes here is base character code + breathing mark code + accent mark code. This example demonstrates the script-dependent or writing-system-dependent nature of rendering combining diacritical marks.".

But I believe that this does not necessarily imply the displacements shown on the picture above, or at least, there must be some explanation justifying those displacements.

John Hudson's picture

When you enter lowercase alpha + combining acute, the Microsoft layout engine is automatically mapping this sequence to the precomposed alphatonos character (U+03AC). This is typical behaviour for any Unicode diacritics with canonical decompositions, since it is faster to display the precomposed characters than to go via GSUB or GPOS.

What your illustration indicates to me is that something in the Arial glyph lookups subsequently breaks this precomposed display contextually when followed by a second combining mark. This would be the case if the intention was to avoid having to define GPOS mark-to-base attachment lookups for every precomposed diacritic, and is something that I have done in Cambria and Brill (but which turns out to cause a problem with a bug in InDesign, acknowledged by Adobe but not yet fixed). I presume whomever did the Arial OTL work for Microsoft used a similar technique. What happens in such a situation is that mark-to-base positioning of the first combining mark to the letter is applied, and then mark-to-mark for each subsequent combining mark.

In your illustration, I reckon what is happening is that the font does not contain correct mark-to-base positioning for the combining acute relative to the alpha, and hence when the precomposed diacritic first displayed is decomposed, that mark ends up in the wrong place. The subsequent marks are positioned relative to the acute correctly, but the whole stack is in the wrong place relative to the letter.

Belloc's picture

John

I can understand what you said here

"In your illustration, I reckon what is happening is that the font does not contain correct mark-to-base positioning for the combining acute relative to the alpha, and hence when the precomposed diacritic first displayed is decomposed, that mark ends up in the wrong place. The subsequent marks are positioned relative to the acute correctly, but the whole stack is in the wrong place relative to the letter."

But how would you explain this ?

One other question John, what would be the mechanism used by the Arial OTL to combine the GREEK SMALL LETTER ALPHA WITH TONOS (03AC) with the COMBINING TILDE (0303) and the COMBINING OVERLINE (0305) to get this (which seems to be correct) ?

What I'm trying to say here is that there seems to be no breaking of the precomposed character 03AC when followed by these diacritics.

John Hudson's picture

I would explain the first new image in the same way as before: it looks like the alphas with individual marks are being rendered using precomposed alphavaria and alphabreve characters from the polytonic set, but that this is somehow being inhibited by the addition of a second mark.

Without analysing the lookup structures in detail, I can't say exactly what is going on.

John Hudson's picture

PS. I see that the same thing happens in Times New Roman, which suggests that this is something particular to how things were done in that generation of MS core fonts.

Belloc's picture

I didn't realize there was a precomposed greek character canonically equivalent to the combination of the GREEK SMALL LETTER ALPHA (03B1) with the COMBINING GRAVE ACCENT (0300), which is the character GREEK SMALL LETTER ALPHA WITH VARIA (1F70). But I couldn't find in Unicode any character called "GREEK SMALL LETTER ALPHA WITH BREVE".

John Hudson's picture

U+1FB0 GREEK SMALL LETTER ALPHA WITH VRACHY
(vrachy is the Greek word for the short vowel marker)

Belloc's picture

Perverse ! I was looking for the name GREEK SMALL LETTER ALPHA WITH BREVE !

That was great John. Many thanks.

Syndicate content Syndicate content