The eszett ligature in German

Belloc's picture

I typed the word groß in a Word document and tried to see it replaced automatically by gross, just by changing the font applied to the word. To my surprise, just one font "Showcard Gothic" did the substitution. I don't know why, but it seemed to me that this substitution should occur more often. Is there any explanation for this behavior, other than the obvious one, that this is the only way to write this word ?

Frode Bo Helland's picture

I wouldn't like it if Word auto-replaced å by aa, or ø by oe. The writer should be able to write what he/she wants to without the software interfering.

HVB's picture

Showcard Gothic did NOT do any substitution at all! In the Germandbls position Showcard Gothic has a single glyph that consists of two esses.

If you want MS Word to automatically substitute SS or ss for every appearance of ß (I don't know why you'd want to), go to Tools/ AutoCorrect options / and enter the appropriate eszett character and what you want to replace it with in the "replace" and "with" table.

I don't believe that the germandbls (eszett) is a ligature in the opentype sense, but I could be wrong.

- Herb

Belloc's picture

@Herb

>> Showcard Gothic did NOT do any substitution at all! In the Germandbls position Showcard Gothic has a single glyph that consists of two esses. <<

I used the German keyboard to type the word groß, and as I far as I can understand the referred font replaced it by gross with the font change, using some OpenType lookup. At least that's how I'm reasoning, as how this substitution occurred.

>> If you want MS Word to automatically substitute SS or ss for every appearance of ß (I don't know why you'd want to) <<

I don't want anything. I'm just trying to understand what the program (MS Word) is doing with the font's OpenType tables, if my thinking is correct.

Belloc's picture

@Frank

My expectations were based on the fact that the German speaking population in Switzerland doesn't use the character ß for some time already. Maybe they would appreciate this substitution for some specific font.

HVB's picture

MSWord prior to Word 2010 does not support OpenType ligatures at all; however, the replacement method I described above will work with any versin.

What you're trying is completely dependent on the font. For the decomposition to work automatically, the eszett would have to be defined as a ligature. Just looking at some of the OpenType fonts provided with Microsoft Windows, such as Segoe UI, they do NOT define it as a ligature, just as a selectable character.

John Hudson's picture

Belloc: My expectations were based on the fact that the German speaking population in Switzerland doesn't use the character ß for some time already. Maybe they would appreciate this substitution for some specific font.

That's a spelling issue, not a glyph display issue. The Swiss do not use the eszett character, ergo they never need to substitute anything for it.

gargoyle's picture

It seems worth mentioning that Showcard Gothic is an all-caps font. It's not uncommon in such fonts to find an "SS" glyph in the slot for the eszett, since that's the conventional all-caps translation (with the capital eszett still competing for wide acceptance). Showcard Gothic isn't doing any substitution aside from substituting it own design of the "ß" glyph, just as it does with every other selected glyph.

Belloc's picture

@gargoyle and @Herb

You're right. I checked the file SHOWG.TTF, which is the MS file for the Showcard Gothic font. It doesn't even have a GSUB table.

But then, why would any font substitute the ligature ffi by its componentes f + f + i ? I'm pretty sure I've read somewhere this decomposition exists, or am I wrong again ? If it exists, what would be the purpose of it ?

Excuse me if my questions seem a little dumb. I'm just trying to learn a little bit about this matter.

Thanks

Belloc's picture

@John Hudson

I understand what you said. But if we had a font that did the reverse, i.e., replaced the 'ss' by 'ß'. Wouldn't that be helpful ? Specially for the ones who don't have access, or don't know how to access a German keyboard.

ahyangyi's picture

Do you mean "Unicode compatibility decomposition"?
http://en.wikipedia.org/wiki/Unicode_equivalence

JamesT's picture

Belloc,

With the huge difference in the way fonts are handled in different applications, it would be best, I think, when designing a typeface, to not do things like code the ß as a ligature of ss. Even if it was language dependent, you can't count on it working correctly in all software, in all operating systems or assume that all users have the correct language chosen when they use the font.

Not to mention that you would be changing people's expectations when they switch fonts (there's enough confusion among users already).

Belloc's picture

@JamesT

I think I'm confused with what I read here

"A language system may modify the functions or appearance of glyphs in a script to represent a particular language. For example, the eszet ligature is used in the German language system, but not in French or English (see Figure 2b). And the Arabic script contains different glyphs for writing the Farsi and Urdu languages. In OpenType Layout, language systems are defined within scripts.

A charming mess
Le cahier Français
Das Wasser war heiß

Figure 2b. Differences in the English, French, and German language systems"

Somehow I got the impression that the eszet ligature would be used in some fonts to replace 'ss' with 'ß' or the reverse.

This also helped to increase my confusion : "ß is still used as a ligature and is replaced by 'SS' or 'SZ' in capitalized spelling".

Michel Boyer's picture

The word Wasser is not written Waßer . So far as I know, you can't tell when ss is replaced with ß without looking at a dictionary.

Michel Boyer's picture

I think the term "ligature" is confusing. For instance, in French there is a letter "œ" and it is described as a ligature. However, the wiki entry concerning it, http://en.wikipedia.org/wiki/French_alphabet#Ligatures, clearly states "The two ligatures œ and æ have orthographic value." You cannot write oe when it should be œ, nor conversely. You need to know, or refer to a dictionary, or use a spell checker.

Edit: in fact, if you write oe when it should be œ, only the purists will mind.

ralf h.'s picture

“ß” is a single character in the alphabet of Germany and Austria—but not Switzerland. In lowercase/mixed-cased setting it can’t be replaced by anything else. It has a distinct phonetic purpose. (Unlike the f-i ligature for example)
So no text or layout application should do a ß → ss or ss → ß conversion. This would be as useful as doing an automatic f → ph substitution. It doesn’t make any sense and it is not included in any OpenType feature code as well.

There are however reasons to make this conversion:
a) When text should be converted to ASCII, like for international travel papers, bank transactions and so on. Then European diacritical marks are then replaced (ö → oe and ß → ss).

b) Uppercase writing. Traditionally there was no uppercase ß (ẞ). The unicode for it has just been assigned 4 years ago and the official orthography in Germany and Austria still uses SS as official replacement, so “groß” would become “GROSS”. Applications might have this replacement hardcoded in their text engine, so text that is transformed automatically to uppercase writing will have this transformation applied automatically. When you use an uppercase-only font, the slot for the ß character will therefore likely have a SS in it.
You can find some more articles on the confusion due to the missing capital ß here in my blog: http://opentype.info/blog/tag/capital-sharp-s/

Frode Bo Helland's picture

FYI, text-transform: uppercase; does this in HTML. Some browsers also keeps the SS pair at default distance (from each other) even if the rest of the word is tracked.

Belloc's picture

@Ralf H.

Excellent explanation ! That's what I like about this site. I always get some valuable information from my discussions here. Thanks very much.

Belloc's picture

@Michel Boyer

>> Edit: in fact, if you write oe when it should be œ, only the purists will mind. <<

Didn't follow you on this. Could you elaborate ? Thanks.

Never mind. I got it already. Thanks.

Belloc's picture

@frank

>> Some browsers also keeps the SS pair at default distance (from each other) even if the rest of the word is tracked.<<

I have no clue about what you said here. Thanks.

Belloc's picture

Using the same arguments some of you used above, why would the first 'ccmp' feature, shown on the figure below exist in a font ? What would be the purpose of this decomposition ?

The figure caption says : "The rationale for the decomposition illustrated above is to take advantage of the color diacritic feature found in Microsoft applications like Word and Publisher".

Could someone explain to me what this is all about ?

hrant's picture

ö → oe

Just curious, is "œ" converted to "oe" as well?

It would be nice to have a list of such "downgrades", by country.

hhp

ahyangyi's picture

Belloc:

I think the feature is to help you making this effect:



Both from an example of colorful diacritics

Belloc's picture

@ahyangyi

I'm sorry but I can't see the relationship between the aforementioned decomposition and a different color for diacritics. Take for example this article. It doesn't say anything about decomposition to attain the effect of having the diacritics printed in a different color.

Perhaps that's the reason why I was not able to show the character U+0623 with its diacritics in red, even though I set up the option to print it with this color !

ahyangyi's picture

It's easier to color a glyph, than to color a particular part of a pre-composed glyph.

If I ask a computer to color the dieresis of an "ӓ" to red, question occurs as there is no enough information about which two of the 4 paths in a "ӓ" correspond to the dieresis.

However, with the decomposition feature, "ӓ" is decomposed into "a" followed with a Combining Diaeresis (U+0308). This Unicode sequence should produce the same visual output, but now the computer should be able to color the dieresis easily.

Belloc's picture

ahyangui

I've tried again to obtain diacritics in red in a Word document. The figure below shows what I've got.

1. Surprisingly the diacritics over the character 'a' didn't print as expected. Maybe this is just a bug in Word.

2. But also the Arab character U+0623 didn't print correctly. Should that mean that the aforementioned decomposition is not in the font Arial that I used to print these characters ?

But I'm also keen to know, what are the mechanics used by the computer to paint the diacritics with a different color. Would the following be a reasonable explanation ?

1. First the character U+0623 is decomposed in two glyphs, the base character and the diacritics.
2. The diacritics is painted with the specified color.
3. The glyph corresponding to the character U+0623 is again composed, using the two glyphs obtained in steps (1) and (2).

Thanks for your feedback

ahyangyi's picture

Just checked. Arial doesn't contain decomposition table for Latin characters. So "à" cannot be decomposed. "b◌̀" is a different story as it is actually not 1 character, but two ("b" followed by a combining grave).

Belloc's picture

I'm sorry but that doesn't make sense to me. I used the same keystrokes to type the 'a' and the other Latin characters, i.e., first the character then the combining mark. I believe this is a bug, or was my fault while typing the characters, for I have just repeated the operation, and now I was able to obtain the diacritics in red, over all the Latin characters. I've also obtained colored diacritics over several Arabic characters, using the same procedure : first the character followed by the combining mark.

The only thing that didn't work so far, was exactly the character U+0623 !!

John Hudson's picture

Belloc,

Re. the eszett:

But if we had a font that did the reverse, i.e., replaced the 'ss' by 'ß'. Wouldn't that be helpful?

No, because German, especially since the spelling reform of the 1990s, makes a spelling distinction between 'ss' and 'ß' in lowercase, i.e. conversion between them is not reliable without dictionary support.

Re. Arabic and other decomposition:

Microsoft Word provides an option to differentially colour diacritic signs. This is based on identification of the signs as 'marks' in the font GDEF table. In order that this feature might apply consistently to all diacritic signs, including those that are part of precomposed combinations such as alif+hamza, these combinations are decomposed in the ccmp feature of some fonts.

This functionality depends on the font a) decomposing precomposed combinations and b) correct identification of the decomposed diacritics as marks in the GDEF table. Most likely one also wants correct anchor positioning attachment for these marks in the GPOS table.

Note that although, as you've discovered, it is possible to apply this Word coloured-diacritic function to Latin text, it exists primarily to support Arabic, in which differentially colouring marks has a very long history, dating back to the earliest Koran manuscripts and the early development of the script.

[You may also notice that in some fonts ccmp is used to contextually decompose precomposed diacritic glyphs when they are followed by additional combining mark characters. This is done so that GPOS mark-to-base positioning does not need to be defined in the font for every precomposed glyph. So, for instance, a font might decompose ë when followed by ́ (combining acute) so that, instead of applying a 'mark' feature lookup of the combining mark to the precomposed diacritic, the sequence can be displayed as a 'mark' positioning of a combining diaeresis on the e, followed by a 'mkmk' positioning of the acute to the diaeresis: ë́ (which may or may not render nicely in your browser).

By the way, there is a nasty bug in current versions of InDesign that causes such ccmp contextual decompositions to fail. Unfortunately, I didn't get this bug diagnosed and reported to Adobe in time for it to be fixed in the new CS6.]

ahyangyi's picture

John, thanks for your informative post!

Belloc's picture

John Hudson,

It took me awhile to read and understand (more or less) what you said about the decomposition of a precomposed character or glyph.

For instance :

>> [You may also notice that in some fonts ccmp is used to contextually decompose precomposed diacritic glyphs when they are followed by additional combining mark characters. This is done so that GPOS mark-to-base positioning does not need to be defined in the font for every precomposed glyph. So, for instance, a font might decompose ë when followed by ́ (combining acute) so that, instead of applying a 'mark' feature lookup of the combining mark to the precomposed diacritic, the sequence can be displayed as a 'mark' positioning of a combining diaeresis on the e, followed by a 'mkmk' positioning of the acute to the diaeresis: ë́<<

I gather from this that the Arial font does not promote this decomposition. Look at the screenshot I got from Word. It seems like any combining mark placed on a precomposed glyph, such as alif+hamza, is not decomposed, as the mark seems to be always placed at the center of the precomposed glyph. The same could be said about the precomposed character 'e' with diaeresis.

Also, I presume that if this decomposition existed in the Arial font, I should obtain something like the figure below, every time I inserted the alif+hamza precomposed character in a document :

and I'm not getting this, as I've already alluded in a prior post.

John Hudson's picture

I gather from this that the Arial font does not promote this decomposition.

Correct.

Re. the alif + hamza, note that the dotted ring is inserted dynamically in whatever program you are using to peruse the ccmp lookups, as a means of illustrating that this is a sequence of letter plus combining mark. In actual text using a font that decomposes the alif + hamza character, e.g. the MS Arabic Typesetting font, the resulting display of letter plus combining mark should be indistinguishable from the precomposed form until you turn on the coloured marks function.

Belloc's picture

Now I beginning to see the light. Below you'll find the difference between the Arial (first) and the Arabic Typesetting (second) fonts.

From now on, I'll consider you my guru in terms of Arabic fonts ! Many thanks again. Surely, I'll have other questions, but I'll try not to be intrusive.

Syndicate content Syndicate content