Hanging polytonic accents

Nick Shinn's picture


To facilitate neat left-column edges in Polytonic setting, it makes sense to build-in the "hang" by giving the accented caps the same metrics as their non-accented counterparts.

However, this takes up a lot of the space preceding a capitalized word in running text.
Therefore, a contextual adjustment should be applied. But what?

1. Substitute a wider space (eg en-space character) prior to accented caps.
2. Add positive kerning between regular space and accented cap.

Some applications, of course, ignore the space character, and substitute a layout-level value. (e.g. CCI newspaper pagination system). How much of an issue is that for setting Polytonic -- can one assume that the users of Opentype Polytonic fonts will have Quark 7 and InDesign CS?

speter's picture

I use TeX to set polytonic Greek, so I would prefer the kerning approach.

Nick Shinn's picture

I'm afraid I don't know much about TeX.
Does it support OpenType features?

speter's picture

The variant XeTeX does.

Nick Shinn's picture

Steve, would kerning vs. en-space substitution make any difference, if the software (XeTeX) supports OpenType substitutions, and the en-space character is included in the font?

If you don't use XeTex, would you consider switching to get access to OpenType features in a polytonic font, such as iota-subscript alternates and historical variants, as stylistic sets?

speter's picture

TeX (and XeTeX) doesn't use the space character for horizontal spacing, so it would make a difference. I use XeTeX on a number of my projects and am moving more and more to it, specifically to get access to the OpenType features.

Nick Shinn's picture


This shows how things line up at left margin, with hanging accents/breathings on the caps.

TeX (and XeTeX) doesn’t use the space character for horizontal spacing.

Then neither of my proposals would work.
#1 relies on replacing a normal space character with a wider one, to "fill out" the space intruded into by cap accents,
#2 relies on adding kerning between the space character and the accented cap.
But if the layout application ignores space characters...

This is frustrating -- I've examined a variety of polytonic texts on the Internet, and they all appear to contain hacks of one sort or another; in other words, they are not properly encoded. This is assuming that spacing can be handled in-font, rather than in-layout, an important principle.

Spacing is an issue for another reason: some typographers use a space, followed by a combining accent, for various purposes. On principle, that's a non-character. There seems to be some confusion over the use of the various comma-shaped "marks".

They are:

U+2019 Mark: Apostrophe
U+0313 Accent: Combining comma above
U+1FBD Breathing: Koronis
U+1FBF Beathing: Psili
U+02BC Breathing: Spacing clone of Psili


I've designed each to be slightly different, as it seems appropriate that the semantic differences should be represented visually.

In this setting:


1. A straight grab from the internet. Note that the apostrophes are represented by a combination of space and combining accent. Also, the breathing on "Hellados" is too close to the preceding nu (because the breathing is designed to hang at column edge).

2. Adjusted for proper spacing.

3. And now with correct apostrophes.

Disclaimer: My Ancient Greek was barely pass-mark, and that was 40 years ago, so I may be off-base...

John Hudson's picture

Nick, according to Yannis Haralambous there is a particular convention of hanging breathing marks on the first line of a paragraph, i.e. it is not a universal setting. I don't know much more than that, but I will direct Yannis' attention to this discussion.

Regarding the form of U+2019, although you would expect the desire to be for the semantically distinct apostrophe sign to have a distinct form from the smooth breathing mark, in fact the convention of Greek typography is that they should be identical. This implies a language-level distinction between the punctuation character employed with Latin text and that employed with Greek.

In your illustration 2 -- 'Adjusted for proper spacing' -- I think the apostrophe is too tightly spaced relative to the rho. It shouldn't give the impression of being over the letter, even a little bit, but should clearly occupy space beside the letter.

Nick Shinn's picture

Thanks John. I noted the need for hanging accents from a 2002 paper by Haralambous, Keeping Greek Typography Alive, in which he stated, "Greek diacritics (breathings and eventually accents) in front of capital letters have a very special behaviour: whenever they are encountered inside a paragraph, they are spacing diacritics ; but whenever the capital letters have to aligned, then diacritics become hanging diacritics. This is the case for the first words of: items in lists, verses in poetry, paragraphs, etc."

I suggest that the hang be built-in to precomposed characters, because there are no Unicode combining breathing characters, and the spacing breathing characters--which have width--will be used for accenting capitals within paragraphs, although strictly speaking that is not Unicode-correct. However what other option is there for the typesetter, if the spacing adjustments that I proposed above don't work in XeTeX etc?

Well, there is one option, and that is to have no built-in hang for accented caps, but for the typesetter to manually apply negative paragraph indenting, or back kerning, to move the accents into the margin.

This implies a language-level distinction between the punctuation character employed with Latin text and that employed with Greek.

And that would be made by the typesetter.
So the best practice is for them to set U+1FBF for the apostrophe, not set a space followed by U+0313.

Yannis Haralambous's picture

I have written a small text (http://www.polytoniko.gr/thessaloniki2007.pdf) presenting accents and breathings I added to Verdana and Georgia. This text has been distributed at the Thessaloniki conference. As you can see on page 3, the diacritics will hang also after an m-dash, and more generally whenever the letter is at the start of some semantic block (table cell, list item, etc.) but not when it is at line begin, in an arbitrary line of the paragraph.

I have been asked to use Stylistic sets for uppercase letters with hanging diacritics, although I find this is not necessarily the most logical choice, but maybe it is the most efficient with respect to compatibility with software. I even wrote a Python script (http://www.polytoniko.org/font-kit.php?trans=yes) which produces them out of regular letters with diacritics.

This rule as well as the one of using the glyph of the smooth breathing for the apostrophe character are rules based on Monotype priniting in the 20th century. There has never been an official specification but it is my personal belief that the most beautiful books of the 20th century have been produced that way, and therefore it is worth studying/adopting these rules.

...pour distinguer l'extérieur d'un aquarium,
mieux vaut n'être pas poisson

...the ball I threw while playing in the park
has not yet reached the ground

Es gab eine Zeit, wo ich nur ungern über Schubert sprechen,
nur Nächtens den Bäumen und Sternen von ihm vorerzählen mögen.

Nick Shinn's picture

Good to have you participating, Yannis.

Let's discuss the apostrophe first.
Here is the Georgia + XeTeX treatment, from your Thessaloniki 2007 pdf:

The apostrophe is the same glyph as the smooth breathing, but it has a different vertical position, and is actually a different character -- two characters, in fact! When I copy the text from the PDF into an InDesign document, this is what I get:


The above font is Shinn Scotch Modern, the below font Apple Times.
Here is a highlight of the metrics:

This reveals that there are two characters set to create the apostrophe--a space, and the "combining comma above" (U+0313).
In my font, this character has zero width and is backspaced "out of the box".
In Times, the character is a spacing character, with the glyph centred in its box.

Leaving aside the question of whether or not combining accents should be spacing characters, there are two issues.
Firstly, this is not the way Unicode is supposed to work, because the typographic quality of the text is document-specific, and not consistently transportable.
Is there a discrepency between XeTeX and InDesign?
Secondly, is there a problem with the typesetting practice?
It seems that way, if the best practice (based on 20th century Monotype setting) is to use the soft breathing mark as an apostrophe, but now digital typographers are using the "combining comma above" character in lieu of what should be the correct character, either U+1FBF "Psili" (soft breathing) or U+02BC "Spacing clone of Psili".

Speaking as an "independent" foundry, I am delighted to follow the typographic standards you are upholding for Greek typography. Realistically, I am constrained to follow the technical standards set by Unicode, Microsoft, Apple, Adobe, and XeTeX (and steer clear of "hacks") -- and that is why it is so important for me that the standards are consistent.

John Hudson's picture

because there are no Unicode combining breathing characters

?

There are combining mark encodings for all Greek accents and breathing marks. But I don't see how this applies to the question of hanging marks, because this is a layout issue, not an encoding issue.

After discussing this question with Yannis a few months ago, I advised including in fonts a second set of accent uppercase letters with reduced left sidebearings, such that when these are substituted the marks hang in the margin (or into a space, over an em dash, etc. as the case may be at the discretion of the typesetter). And I advised using a stylistic set feature for these, simply as a means to implement this in existing software. It isn't ideal, but it enables the typesetter to make decisions and apply them.

There isn't a supported OTL feature that meets all the circumstances that Yannis identifies for hanging Greek marks. The Optical Bounds feature would address the general case of hanging things in the margin, but it is not supported anywhere, and it might not be applicable as discreetly as Yannis would like.

When I copy the text from the PDF into an InDesign document, this is what I get...

Caveat: text retrieved from a PDF doesn't necessarily reflect the text used to create that PDF.

Yannis may have used the combination of the space and U+0313 characters for the Greek apostrophe in his example, but there may have been technical reasons for this (e.g. lack of access to language-specific variant of U+2019). This may be a hack used to ensure that the apostrophe has the same form as the smooth breathing mark.

The Unicode Standard is quite explicit that the correct encoding for the apostrophe is U+2019. Getting this character to display as a glyph that is identical in form to the smooth breathing mark is, as Nick Nicholas writes, 'a Serbian Italics issue yet again', i.e. a situation that calls for language-specific glyph substitution.

John Hudson's picture

It seems that way, if the best practice (based on 20th century Monotype setting) is to use the soft breathing mark as an apostrophe, but now digital typographers are using the “combining comma above” character in lieu of what should be the correct character, either U+1FBF “Psili” (soft breathing) or U+02BC “Spacing clone of Psili”.

Note that U+1FBF has a compatibility decomposition to space + U+0313, i.e. U+0313 is the combining smooth breathing mark.

Nick, your comment confuses encoding and display. The Greek apostrophe should look like the Greek smooth breathing mark; this does not mean that the correct encoding for the Greek apostrophe is to use U+1FBF or U+02BC. The correct encoding for the Greek apostrophe is the apostrophe character U+2019. Anything else is an encoding hack to achieve a particular display, which is contrary to the basic character/glyph model of both Unicode and OpenType.

Do this in your font:

script: grek
language system: dflt
feature: locl
lookup: quoteright -> quoteright.grek

Nick Shinn's picture

There are combining mark encodings for all Greek accents and breathing marks.

?

The way I understand it, there are two sets of Unicode characters for Latin accents.
Most are included first in the basic encoding "range", and then again in the separate range "combining diacritical marks".

For Greek, there is a basic set of accents and breathings in the "Extended Greek" range, but these are not duplicated elsewhere, in the "combining diacritical marks" range, which is what one would have expected.

It isn’t ideal,

It's a font hack, to make up for layout software deficiencies -- the kind of thing you have in the past berated me for suggesting!
However, given that certain layout applications don't use the space character, it may be a better hack than my suggestion of making the hang the default, and adjusting by space-character dependent substitutions elsewhere.

However, in terms of the portability of text, the most robust strategy must surely be to incorporate hang in precomposed characters for starting paragraphs, and within paragraphs to set the accent and the capital as separate characters.

Caveat: text retrieved from a PDF doesn’t necessarily reflect the text used to create that PDF.

I've come across the same anomaly in online Greek text, so it doesn't appear to be a PDF phenomenon.

The Unicode Standard is quite explicit that the correct encoding for the apostrophe is U+2019.

Hmmm. Surely it's a question of semantics and practice, rather than standards; after all, there are separate non-language-specific Unicode characters for quote marks -- both "curly" and "guillemets" -- so why not proceed on the understanding that the Latin apostrophe is U+2019, and the Greeek apostrophe is U+02BC, or even U+0313 -- which seems to be the practice?

But yes, it's quite straightforward to include a line of code in the Greek language tag that substitutes U+2019 by U+02BC.

Nick Shinn's picture

Note that U+1FBF has a compatibility decomposition to space + U+0313, i.e. U+0313 is the combining smooth breathing mark.

What is "compatability decompositon"?

Nick, your comment confuses encoding and display.

Not at all. I'm just observing what has been set.
In the texts I've copied from various sources, the "apostrophe" is encoded as space and U+0313, not U+2019.
But perhaps this has something to do with compatability decomposition.

lookup: quoteright -> quoteright.grek

Does this mean I have to name U+1FBF "quoteright.grek"?

John Hudson's picture

For Greek, there is a basic set of accents and breathings in the “Extended Greek” range, but these are not duplicated elsewhere, in the “combining diacritical marks” range, which is what one would have expected.

Yes, they are included in the Combining Diacritical Marks block, where they are unified with combining marks also used for Latin, and hence need to be distinguished at the glyph level:

U+0300 = varia
U+0301 = oxia / tonos
U+0342 = perispomeni
U+0313 = psili
U+0314 = dasia

If you look at the canonical decompositions of the diacritic letters in the Greek Extended block, these will confirm these encodings for the combining marks.
_____

It’s a font hack, to make up for layout software deficiencies...

As I understand Yannis' explanation of the conventions for hanging accents in Greek, it seems that this is a stylistic preference applied in certain situations, and not globally after the manner of hanging punctuation. So it looks to me like something that requires a degree of discretionary, user control. I don't think using a discretionary stylistic feature to access a discretionary stylistic behaviour is a hack. The reason I don't think the stylistic set option is ideal is because it has limited interchange capability, and it would be better to have a dedicated feature for this aspect of Greek typography.
_____

However, in terms of the portability of text, the most robust strategy must surely be to incorporate hang in precomposed characters for starting paragraphs, and within paragraphs to set the accent and the capital as separate characters.

Separate characters or separate glyphs? You shouldn't need to re-encode your text in order to achieve a particular layout result. So I would be looking for a robust solution within that context of maintaining the text encoding. Something like a GPOS contextual single adjustment in the kern feature might do it. You could set the default spacing for in-text use, i.e. provide a wide enough left sidebearing for the pre-cap accents/breathings to follow a wordpspace with appropriate spacing. Then you could have a contextual lookup that collapses that left-sidebearing to hang the accents/breathings when not preceded by any glyph (using the ignore function in FontLab/AFDK code or the EXCEPT function in VOLT). A second context or lookup could provide for the spacing following an em dash as recently mentioned by Yannis.

I don't know whether FontLab/AFDK supports this kind of lookup yet. You might need to use VOLT. Also, this approach will only work with applications that use a glyph to define a wordspace, otherwise there is no way to exclude the spacing adjustment when preceded by a wordspace.
_____

Surely it’s a question of semantics and practice, rather than standards

No, it's a question of standards, and encouraging people to adopt practices that conform to standards so that they can exchange consistently encoded text. The fact that four different possible encodings are currently being used by various people to encode the Greek apostrophe is a problem. They are doing it because they are trying to get something that looks the way they want, in particular fonts, because correct display of the correct encoding (U+2019) relies on applications and fonts doing something more than most are currently doing.
_____

What is “compatability decompositon”?

Think of it as a one-way road, contrasted with a canonical decomposition. A canonical decomposition is a reversible one, i.e. if a diacritic character is decomposed into a base + combining mark(s), that process is cleanly reversible and the resulting character sequence can be re-composed into the diacritic character. So, for example, the Greek lowercase alpha with psili character (U+1F00) decomposes into U+03B1 U+0313, and that sequence composes to U+1F00.

A compatibility decomposition is one in which a particular character maps to another character (or characters) in a way that is not reversible. This happens when there are multiple possible relationships between different characters as, of course, there can only be on canonical decomposition. It also happens when Unicode deliberately excludes a canonical composition for a combining mark sequence, which is the case with most combining marks.* So, for example, the Greek spacing psili mark U+1FBF has a compatibility decomposition to U+0020 U+0313, which means that it may, under certain circumstances, be decomposed to that character sequence. But the process cannot be reversed.

*Unicode only encodes spacing variants of accents if a) they existed in some earlier encoding with which Unicode provides roundtrip compatibility, in which case they are provided with either canonical or compatibility decompositions, or b) they are used as distinct characters in e.g. IPA notation, in which case they are not provided with decompositions.
_____

Does this mean I have to name U+1FBF “quoteright.grek”?

No. I would include a separate, unencoded glyph as a Greek variant of the /quoteright/ (U+2019). There are circumstances in which U+1FBF may occur in text as a psili, not as an apostrophe (especially in text converted from some older Greek encodings), and you wouldn't want the glyph name to incorrectly associate the glyph to U+2019 instead of U+1FBF.

John Hudson's picture

In the texts I’ve copied from various sources, the “apostrophe” is encoded as space and U+0313, not U+2019.
But perhaps this has something to do with compatability decomposition.

No, it has to do with poor users trying to get text to look the way they want it to look without benefit of applications and fonts that will allow them to do so using a standard encoding for the apostrophe. Given the situation, I wouldn't be at all surprised to find that individual users might use different encodings for the apostrophe depending on the font they are using. It is very unfortunate.

Nick Shinn's picture

Accented caps:

Separate characters or separate glyphs? ...this approach will only work with applications that use a glyph to define a wordspace,

It's because XeTeX doesn't use the space character that I'm suggesting setting accented caps in paragraphs as two separate characters. (The precomposed accented caps in the font would include built-in "hang" that would protrude too much into application's "non character" spacing.)

The theory is that it's easier to set in-paragraph accented caps as two characters than it is to manually create hang for paragraph-starting "no hang" accented caps.

This would be a work-around for XeTeX users until such time as XeTeX works properly to recognize spaces.

However, for InDesign, I would include positive kerning between space and precomposed accented cap, so that InDesign users would get correct spacing both for in-paragraph situations, and at the start, with no work involved.

Your system, with alternate glyphs for begining situations, activated by stylistic sets, is more "Unicode correct", but doesn't work when OT is not supported, eg online. But I wonder whether hanging accents will be clipped by some layout programs...

Must get Steve back in this thread, and get a typographer's perspective.

speter's picture

I'm here, but Typophile disappeared :-)

I've asked the developer of XeTeX to chime in with regard to the possibility of XeTeX ever recognizing the space character.

I'm fine using two characters for in-paragraph accented caps. Until recently, I set everything via ASCII, which required multi-character sequences anyway (e.g. \textgreek{>'agei}). Just so long as some sort of documentation is provided.

Nick Shinn's picture

The intractable issue seems to be that cap accents are required to hang (a) at the start of a paragraph, but not (b) at the beginning of a line in the middle of a sentence.

Building-in "hang" works for a pre-composed accented cap at (a), but can't be contextually over-ridden in InDesign, to insert an extra spacing character in situation (b).

It appears that the only solution is that adopted by Harambolous for Georgia and Verdana -- the addition of a stylistic set of 120+ duplicates of all accented caps with built-in "hang", for use at (a), while the normal accented caps with no hang are used within sentences, including situation (b).

Even if XeTeX were to recongize space characters, it wouldn't solve the (b) problem.

There is of course this:

I’m fine using two characters for in-paragraph accented caps.

That may be a working solution, but now that I'm aware of how Verdana/Georgia tackles the problem, I face a dilemma.

On the one hand, there is the Adobe de facto standard -- Minion Pro has built-in hang for precomposed accented caps (requiring typographers like Steve to find some way of dealing with in-paragraph spacing, either by adding a space character, adding kerning, or setting the accented "letter" as two separate characters).

On the other hand, there will be the Microsoft de facto standard, with default accented caps in Georgia and Verdana having NO built-in hang, and the typographer required to use a stylistic set at the beginning of paragraphs, as described by Yannis Haralambous in the Thessaloniki2007.pdf document mentioned above.

Should I follow the Microsoft or Adobe standard?
Whose fonts do most (polytonic) Greek typographers use -- or will be using?

The Adobe standard would certainly be easier for me to follow.

John Hudson's picture

On the one hand, there is the Adobe de facto standard — Minion Pro has built-in hang for precomposed accented caps

This is recognised by Adobe as an error. Minion is not a good model, because it was Adobe's first attempt at polytonic Greek. It would be better to look at Arno Pro, which represents their latest approach, arrived at after a lot of consultation.

In addition to the other problems raised above, you should also be aware that there is a bug in MS Word (up to and including the current Office 2007 release), which fails to apply any kerning to Greek characters that are not part of the MS Greek (monotonic) 8-bit codepage. This has been confirmed as a bug by one of the Office managers with whom I corresponded, and it is something they apparently intend to fix (despite potential document reflow issues), but for the time being it means that relying on a positive space + diacritic letter kerning pair for correct in-line spacing of accented caps is a really bad idea.

I think it is a bad idea in any case. Hanging marks is a specialised, paragraph-level layout behaviour, so that is the level at which it should be addressed. It is a bad idea to make the default spacing of a font conform to a specialised layout behaviour instead of to the more general case, which is spacing within text. The default behaviour of a font should always try to conform as closely as possible to the more general case, not to the specific or specialised case. Sensible sidebearings are your first line of defence against major problems in text; everything else, including something as simple as kerning, is a secondary mechanism for refinement. And you can't necessarily rely on any secondary mechanism, not even kerning.

...until such time as XeTeX works properly to recognize spaces.

Heh. I think there are a lot of TeX users who would disagree with you over what constitutes 'proper' space handling. The idea that a space is a glyph is a fairly recent one, and the idea that this glyph might be used in layout lookups is an even more recent one. Even Adobe resisted using the space glyph in OTL lookups until they hit Bickham Script Pro and found it was impossible to do what they wanted without it. But it remains a contentious issue, because through much of the history of typesetting and also in some approaches to digital typography the space is not a glyph, but is simply an advance, specified by the typesetter. The idea of spaces in TeX, and hence XeTeX, is quite particular, and is one of the distinctive features of Knuth's typesetting system: spaces are glue, which holds letters and words together but which can also be stretched or compressed.

Adobe actually implement spaces in two different ways: the typical word space is handled as a glyph, but if you insert a thin space, hair space, em space, en space, etc. from the white space menu in InDesign, you are getting an advance, not a glyph. They do this because most fonts do not support those variant white spaces as glyphs. One of the implications of this is that font lookups that utilise the space glyph might work in context of a word space, but not in context of other spaces.

Syndicate content Syndicate content