Contextual Historic Ligatures?

anagnost's picture

Hi all,

I am new to this board and this is my first post. I am currently working on an "Old Style" Greek font, which will include several historic ligatures. I is quite obvious fact, that many of the Greek ligatures are context-dependent: for example, there are usual abbreviations for some common conjunctions or word endings which should not be used in other cases (i. e. when the same sequence of letters occurs at the middle of a word). However, the OpenType specification seems to provide only one feature specially intended for contextual ligatures (namely 'clig'), and most OpenType renderers enable this feature by default, which, of course, is not exactly that I want. On the other hand, the recommended implementation for the 'hlig' feature is GSUB lookup type 1, i. e. plain ligature substitution.

So I am wondering, which feature tag should I use for historical context-dependent ligatures, if I would like them to be disabled in normal contexts? Can anybody suggest me a proper solution?

paul d hunt's picture

you could put this in a stylistic set, i would think.

anagnost's picture

Yes, I have already seen this recommendation somewhere in this forum. However, the recommended implementation for stylistic sets is, again, GSUB lookup type 1, which is not appropriate for my case. So, why using stylistic sets for contextual substitutions is considered better than assigning a GSUB lookup type 8 to the 'hlig' feature, if both solutions seem to contradict to the specification?

paul d hunt's picture

well, if you want to be strict about it, you could make a stylistic set of your greek letters, which are duplicates of your regular letters and then use these alternates to create your ligatures from in the clig feature.

dezcom's picture

What if you put different forms in {calt} and the {liga} would find those parings and convert them to your special ligs?

ChrisL

anagnost's picture

Well, I considered solutions like these, and they don't look very elegant, especially because I have no special "historic" forms for most letters, so that, indeed, I have to create a set of exact duplicates of my regular letters. In fact, I would prefer to assign a lookup type 8 to 'hlig', if there are no strict "contraindications" against such practice (at least with ICU/XeTeX my lookup works just fine).

I am wondering, however, why the OT specification forces developers to resort to such ugly tricks if only they want to strictly follow it. I think it would be quite possible either to define special features for historic contextual ligatures (and historic contextual alternates), or to explicitly allow using different lookup types for 'hist' and 'hlig'.

John Hudson's picture

In my heavily ligatured Greek font, I used a two-level approach. Simple, tying ligatures that I was happy to have active by default went in the 'liga' feature; more elaborate, archaic ligatures went in the 'dlig' feature. Given the definition of GSUB layout features and their current implemantations, I would be confident putting contextual lookups in any feature. So in the case you describe I would probably put appropriate contextual lookups for these ligatures in the 'dlig' feature.

I am wondering, however, why the OT specification forces developers to resort to such ugly tricks if only they want to strictly follow it.

You need to bear in mind that many of the OTL feature definitions were written at a time when no one had actually implemented OpenType Layout. They represent Microsoft and Adobe's best guess at what would be desirable and how it might work. Also bear in mind that the feature descriptions are not part of the OT specification proper: they are an appendix. I consider them, after many years of discussion with Microsoft and Adobe engineers and managers, as informational and suggestive, rather than binding. The only restriction I recommend and adhere to is that GSUB and GPOS lookups should not be mixed in the same feature and that a clean distinction should be made between GSUB and GPOS features.

I have suggested in the past that a more sensible feature model would have allowed for three states for any layout feature -- required (on by default, cannot be turned off), desired (on by default, can be turned off), and discretionary (off by default, can be turned on) -- and any feature could explicitly include either non-contextual lookups, contextual lookups, or both. But this is a hypothetical suggestion (and something to bear in mind when spec'ing the font format of tomorrow).

anagnost's picture

Thank you, John, this is exactly what I wanted to know. Just a small question: are there any particular reasons to prefer 'dlig' over 'hlig' for historical ligatures? I always supposed that 'dlig' is intended for those cases where it is impossible to determine the applicability of a particular ligature in the given context with a simple algorithm, so that the user has to turn the feature on and off whenever appropriate (as with standard f-ligatures in German and ae/oe in Latin). Since there are similar cases in Greek, I planned to put them into 'dlig', keeping 'hlig' for more "safe" ligatures, which normally don't require any manual control.

I also would be very interested to know, which exactly ligatures you have defined in your Greek font, and which of them have been placed to 'liga'. I am asking this, because designing Greek ligatures is a captivating process... but, I think, it is necessary to stop somewhere, especially because historic ligatures are not going to be actively used anyway. So it would be interesting to determine a minimal set of ligatures sufficient for achieving an effect of an "old style" charm. That's why I would like to know how other developers resolve the same task.

John Hudson's picture

I ignore the 'hlig' feature (and 'clig'), and use only 'liga', 'dlig' and 'rlig'. These cover all the situations that I think matter: on by default, off by default, always on. So for the 'safe' ligatures for which you were thinking of using 'hlig' I would simply use 'liga'.

I'm just getting ready to hit the road for a couple of days, but will try to remember to get back to this thread with some comments about Greek ligature sets.

Nick Shinn's picture

I am putting historic forms in the Greek fonts I'm working on.
There are the Unicode "historic variants" of beta, kappa, theta, phi, pi, sigma, cap upsilon, etc., abbreviations, and the "number" forms of koppa etc.

Because they break down into categories which don't fully correspond to a neat split between contemporary and historic, I am avoiding any "hist" designations, and arranging them in Stylistic Sets
The categories are:
- contemporary (default)
- "scripty" historical variants
- obscurely historic (round sigma and omega-shaped pi)
- historic abbreviations

The "kai" character is included, but no actual ligatures -- however, I am including a fully marked set of omicron-upsilon abbreviations, and the sigma-tau abbreviation.
Because these are abbreviations, not ligatures, they are going in a Stylistic Set.
As far as I am aware, these abbreviations were well-used in classical Greek typography of the 18th and 19th century, so are appropriate to 19th-century revivals, which is what my faces are.

The scripty "historical variants" of theta etc. seem quite at home today, especially in light of the orthodoxy (as represented by Adobe, Microsoft, and GFS) that Greek is inherently scripty, and their inclusion in Unicode as separate characters with their own code points, which mandates their inclusion in any fonts with pretensions of completeness.

IMO, making them separate characters is absurd, like making the single and double-bowl "a" and "g" in Latin separate characters, or, for that matter, Roman and Italic.

anagnost's picture

Nick,

I don't think omicron-upsilon and sigma-tau can be called abbreviations. The only point at which they are different from standard Latin ligatures is that they are formed from vertically (not horizontally) stacked elements, but, I think, this doesn't prevent them from being real ligatures at any other aspects. But, of course, it is really difficult to distinguish ligatures from abbreviations as the early Greek printing is concerned.

I also don't believe all Greek letterforms defined in Unicode can be easily separated into categories you have listed. Let's take "script" theta for example: it would be completely wrong idea to assume that this form is intended mainly for script or italic typefaces. In fact, it is as "contemporary" as the closed form, and it is a matter of typographic conventions to decide which one should be preferred. For example, in traditional German (and Russian) typography the "closed" theta was almost unknown, and the loopy form is still considered normal and preferred in most contexts.

Nick Shinn's picture

it is really difficult to distinguish ligatures from abbreviations

I don't think so.
In ligatures, two characters are joined by a ligating element -- as is most obvious in the Latin "quaints" c_t and s_t.
However, in the omicron-upsilon and sigma-tau combinations, the individual characters are lost in a composite form that quite literally abbreviates (removes) parts of the components. There is no sense of two characters being joined by a ligature, or extra material. In fact, rather than the addition of a joining component or ligature, body parts are removed.

Some interpretations of omicron-upsilon look like somewhat like an upsilon sitting on top of omicron, but where is the ligature that joins them? Other versions, such as my italic here, are even further removed. In sigma-tau, where is the stem of the tau?

This compression is quite different from ligation.

Nick Shinn's picture

I also don’t believe all Greek letterforms defined in Unicode can be easily separated into categories you have listed...it would be completely wrong idea to assume that this form is intended mainly for script or italic typefaces.

I agree with you, which is why I criticized the separate Unicode points for what are in essence stylistic variants.
I use the term "scripty" only to indicate that these versions of the character are more, shall we say, calligraphic.
But I do think they can be usefully grouped in stylistic sets.

Of course, there are still problems with these groupings.
For instance, perhaps the "scripty" beta should be contextually encoded to replace the descendered form in the middle of words.
And it does seem strange that the "historic variant" of phi is *less* scripty than the default!
However, I'm following Unicode (although I have added a variant rho to the scheme).

anagnost's picture

You should keep in mind, that the whole concept of connecting lines is peculiar to the Latin script and almost unknown for the Greek writing (and early Greek printing). Most Greek ligatures are formed by such a way that the involved characters share some of their basic features and, naturally, may lose their components which would be considered important otherwise. This is more like Latin 'ae' than 'ct'. In case of 'omicron-upsilon' and 'sigma-tau' the connection is very close indeed, and yet both parts are still recognizable (sigma-tau is essentially a lunate sigma over a tau, although the right arm of the tau is not present in most designs), none of them being abandoned or completely subdued by another. That's why I still think they are ligatures, and they were always considered ligatures in the literature devoted to the history of the Greek printing.

Another reason for which I think the term "abbreviations" is inappropriate here is that it normally denotes an entity which is used to substitute a specific word, or a specific morpheme, rather than just a common sequence of letters. Cf. the case of the Greek "kai" symbol: it corresponds to the Greek conjunction "kai", but, of course, not just to the 'kappa-alpha-iota' letter sequence. But of course this might be just a terminology issue.

anagnost's picture

The problem is that you cannot put Greek glyphs into the same set basing just on their shape and Unicode mappings, but ignoring the history of each particular letterform. For example, in your particular case I would probably make the x-shaped kappa and the "scripty" rho the default forms, as they would be more appropriate for a typeface implemented in a German style. Curly beta represents a completely special case (it should never be used for typesetting classical Greek, unless the French contextual rule is applied), so please don't put it into the same stylistic set with "scripty" theta, or you will make that set completely unusable.

On the other hand, you have correctly noticed that Unicode itself breaks the idea of grouping "scripty" forms together in an alternate set by making loopy phi the default glyph. I would also mention the case of epsilon, which is very similar, as obviously the default 8-shaped form is more "calligraphic" than Porsonic (lunate) one. Practically this means it is impossible to be fully conform with the Unicode code chart if you want to make the default set of your Greek letters look really good together.

Fortunately the shapes shown in the codechart are by no means prescriptive. The difference between the "default" and "alternate" forms may be really important only in mathematical contexts, and the 'mgrk' feature can be used to resolve this problem. So there is actually no need to "follow Unicode" at this point.

Nick Shinn's picture

the whole concept of connecting lines is peculiar to the Latin script and almost unknown for the Greek writing

I'm afraid I don't see it that way. IMO, ligatures are inherently connecting strokes used in cursive writing, and "Byzantine" Greek typography is full of them. A very basic instance: many fonts had an epsilon-iota ligature in which the normal "flipped 3" epsilon was replaced by a lunate form, with the central stroke "ligaturing" over to the iota.

they were always considered ligatures in the literature devoted to the history of the Greek printing.

I'm no expert on that.
I have been influenced by C. S. Van Winkle's The Printer's Guide (Printer to the University of New York), 1818, see below.

The problem is that you cannot put Greek glyphs into the same set basing just on their shape and Unicode mappings, but ignoring the history of each particular letterform.

That's a good point, and I was aware of the beta usage, and I will implement it contextually.

Nonetheless, I do think that the idea of grouping the scripty verisons into a stylistic set is practical, as it offers the typographer a choice between a plain, contemporary default, and a look which is older and more ornate. That is why I would make the Latin kappa the default. I will post some comparisons specimens shortly.

I don't really consider my serif face to be in the German style, as it based on applying the structural forms of the mid-19th century American Scotch Modern to the Greek alphabet. When I did the same for Cyrillic, with no knowledge of 19th century Cyrillic faces, I subsequently discovered that the process produced results that were remarkably close to actual Cyrillic faces of the 19th century, so I have confidence in this approach.

Nick Shinn's picture

Here's the comparison of default with the "scripty" stylistic set (although the beta usage requires fixing).
The idea is for the default to have a cool and modern repose, in relation to the more traditional liveliness of classical Greek typography, represented by the stylistic set.

I would have taken this further and done a more mannered alternate of eta, were it not for the massive number of glyphs involved!

Nick Shinn's picture

And here are the other stylistic sets, including the abbreviations.
I've put the "kai" character in the archaic set, but perhaps it belongs with the abbreviations.

John Hudson's picture

Nick: However, in the omicron-upsilon and sigma-tau combinations, the individual characters are lost in a composite form that quite literally abbreviates (removes) parts of the components.

But this is true of very many of the Byzantine cursive 'ligatures', so unless one is going to invent some new term for them one has to accept two different kinds of ligatures. Terminologically, I sometimes use the distinction tying ligatures and morphographic ligatures.

I don't think calling things like omicron_upsilon and sigma_tau 'abbreviations' is a good idea, because this is misleading. Abbreviation implies to me that letters are missing from the text, i.e. that the character string is abbreviated, regardless of how the resulting text is displayed. In the omicron_upsilon and sigma_tau ligatures, the two letters are very much present, even if they are not in their regular form. There are other sequences of both ligated and unligated letters in Greek and also in Latin that are actual abbreviations -- e.g. nomina sacra -- and these need to be independently analysed at an orthographic level, and hence at a text encoding level, not at the visual/glyph level.

Nick Shinn's picture

John, "abbreviation", meaning shortened, makes perfect typographic sense for these glyphs where two occupy the space usually taken by one. However, orthography takes precedence, so I'll follow Aleksey's and your advice and term them ligatures.

guifa's picture

I seem to recall that Unicode's justification for the alternate forms of β, ρ, κ, π, φ, and θ is that the alternate glyphs are used by Greeks when they need to represent pi the number, as opposed to pi the letter. I could be wrong, so a Greek here could correct me, but I'm pretty sure that that was the justification, just like the Fraktur letters are encoded, even though technically they are just a particular style of the Roman alphabet, they are used for specialty purposes in mathematics where they contrast with the Antiqua forms.

«El futuro es una línea tan fina que apenas nos damos cuenta de pintarla nosotros mismos». (La Luz Oscura, por Javier Guerrero)

John Hudson's picture

The variant Greek letters are in Unicode because they occur in mathematics with distinct semantics. This is why the Unicode characters names refer to them 'symbols' even though they have letter properties. This isn't about Greeks using the letters as numbers, but about mathematicians using them as symbols.

Greeks traditionally did use letters as numbers, but so far as I can tell they did not favour particular variant forms for this purpose.

Nick Shinn's picture

That doesn't make sense John.
Latin symbols with "distinct semantics" don't have a separate Unicode point.
Why two pi's (3.141...) and only one e (2.718...)?
Perhaps that's a future project for Unicode workers :-)

John Hudson's picture

Latin symbols with “distinct semantics” don’t have a separate Unicode point.

They do for mathematics:
http://www.unicode.org/charts/PDF/U1D400.pdf

Syndicate content Syndicate content