Modifying Apple's Myriad Pro

acm
25.Aug.2007 2.03am
acm's picture

Hello,

I’m planning to work on my website design and I’m going to use a dynamic text replacement technique that will automatically change the headlines with an image. That way I’m able to use other fonts than the regular Arial, Verdana, Georgia, Courier and Times New Roman and, so far, I’m going for the Myriad Pro available in Mac OS X.

However, there’s a slight problem: Romanian, the language I use on my website, has some characters not available in the default Myriad Pro like Ş and Ţ and I thought I could try and add those by hand.

Am I allowed to do that? And am I allowed to upload a font from the operating system to the server so I can use it to generate images with it?

Thank you.



James Arboghast
25.Aug.2007 4.48am
James Arboghast's picture

The End User License Agreement for most commercial fonts prohibits any modification or reverse engineering of the font data, so opening the font up in an editing program and adding characters is a no-go.

j a m e s


dan_reynolds
25.Aug.2007 5.02am
dan_reynolds's picture

Doesn’t Adobe’s EULA allow for modification for personal use, though? Wouldn’t this fall under that? I am not a lawyer…


acm
25.Aug.2007 5.15am
acm's picture

Here’s the thing: it’s Apple’s Myriad Pro that has only some slight differences from the Adobe Myriad Pro and I’m guessing there are different EULAs. I can’t find Apple’s font EULA on the web so I don’t really know what I can and can’t do.

Thanks for your input.


Michel Boyer
25.Aug.2007 6.18am
Michel Boyer's picture

I found these characters in Adobe Myriad Pro version 002.000. According to this link of Wikipedia, those are the characters you are looking for.


Michel Boyer
25.Aug.2007 6.39am
Michel Boyer's picture

Note: The glyphs in unicode posisions U+021A and U+021B in Myriad Pro 002.000 are the same as those of Tcommaaccent and tcommaaccent above.


Miguel Sousa
25.Aug.2007 1.27pm
Miguel Sousa's picture

> it’s Apple’s Myriad Pro

Huh!? How did you get this font? Are you an Apple employee?
Last time I checked it was not being bundled with the OS...
http://docs.info.apple.com/article.html?artnum=25710
http://docs.info.apple.com/article.html?artnum=301332
http://en.wikipedia.org/wiki/List_of_fonts_in_Mac_OS_X


acm
25.Aug.2007 2.05pm
acm's picture

Um, I might have mixed things up. I have 10 otf files with MyriadPro in their name in /Library/Fonts. I thought it was Apple’s Myriad Pro, but now I notice that there’s a copy of Photoshop, Illustrator and InDesign installed. It’s a second-hand MacBook so fonts might have been installed with those applications.

As you might have noticed, I’m really, really new in typography. I’ve just learned the difference between serif and sans-serif :)


Miguel Sousa
25.Aug.2007 2.50pm
Miguel Sousa's picture

> I have 10 otf files with MyriadPro in their name in /Library/Fonts

Open FontBook and tell us the version of these fonts. (Go to Preview->Show Font Info, or press Apple+I)

> I notice that there’s a copy of Photoshop, Illustrator and InDesign installed

Which versions?


sii
25.Aug.2007 5.03pm
sii's picture

Apple Myriad is the branding font they use internally, right? The only version of Myriad that Apple ever claimed to have shipped would have been the 3rd generation (first color) iPod UI font, but they’ve since removed references to that from their site.


Michel Boyer
25.Aug.2007 7.17pm
Michel Boyer's picture

I googled and could fairly rapidly find this


and the links are not dangling.


sii
25.Aug.2007 9.24pm
sii's picture

My guess is that there can’t be much difference between these and the Adobe versions, otherwise the Apple lawyers would have been on this, like a...


acm
26.Aug.2007 12.53am
acm's picture

Here’s what I got from FontBook for MyriadPro-Bold:

PostScript name MyriadPro-Bold
Full name MyriadPro-Bold
Family Myriad Pro
Style Bold
Kind OpenType PostScript
Language English, French, German, Spanish, Italian, Dutch, Swedish, Danish, Finnish, Portuguese
Version Version 2.007;PS 002.000;Core 1.0.38;makeotf.lib1.7.9032
Location /Library/Fonts/MyriadPro-Bold.otf
Unique name 2.007;ADBE;MyriadPro-Bold
Designer Robert Slimbach and Carol Twombly
Copyright © 2000, 2004 Adobe Systems Incorporated. All Rights Reserved. U.S. Patent D454,582.
Trademark Myriad is either a registered trademark or a trademark of Adobe Systems Incorporated in the United States and/or other countries.
Enabled Yes
Duplicate No

I guess it’s the Adobe Myriad Pro, not Apple Myriad.

From what I’ve read, Apple’s Myriad has “incorporates minor spacing and weight differences from the standard varieties, and includes Apple-specific characters such as the company logo” (from the Wikipedia article) and may not be as cross-platform as the Adobe version.

I’m sorry for the mix-up :)


Michel Boyer
26.Aug.2007 6.01am
Michel Boyer's picture

> My guess is that there can’t be much difference between these and the Adobe versions

Well, here is a grab taken from the output of ADFKO’s “tx -pdf” command applied on a otf reencoded version of the said ttf font.


sii
26.Aug.2007 8.17am
sii's picture

U+2044 appears to be the stick the lawyers will beat you with if you post the font. ;-)


Miguel Sousa
26.Aug.2007 5.37pm
Miguel Sousa's picture

> Romanian, the language I use on my website, has some characters not available in the default Myriad Pro like Ş and Ţ
> Full name MyriadPro-Bold
Version Version 2.007;PS 002.000;Core 1.0.38;makeotf.lib1.7.9032

Andrei, what makes you think that the version you say you have does not contain the characters needed for Romanian?
BTW, v2.007 is the latest shipping version of Myriad Pro.


acm
27.Aug.2007 3.42am
acm's picture

Miguel, forgive my ignorance, but I used a Romanian keylayout for the Mac and when I pressed those letters in, let’s say Pages, I got a different font. I’m guessing the software automatically changes the font so it will still print that letter.


Michel Boyer
27.Aug.2007 5.21am
Michel Boyer's picture

On my Mac, here is what the Romanian keyboard looks like:


I thus get the Ș just under the escape and Ț under  at the left of the CR. How did you type Ș and Ț in your message above? Where is “Pages” on your keyboard? If they are the up and down arrow, that’s not the right choice! [edit] for typing your characters.


acm
27.Aug.2007 7.01am
acm's picture

Pages is an Apple application that is part of the iWork suite :)

Romanian layouts have been a major problem in the last few years. Pre-Windows Vista keyboards have S and T with a cedilla, not a comma underneath so most applications and documents use the non-stardard Microsoft way. And most Windows fonts, like Arial, Verdana and Times New Roman, don’t have the T and S with a cedilla (Microsoft issued an update a couple of months ago, but you have to search for it, it’s not in the mainstream Windows Update) and if you write a web page with those letters or the equivalent HTML entities Internet Explorer will display empty squares instead and Firefox will try to “build” the letters.

It seems that the letters T and S with cedilla are not part of Myriad Pro (or at least they won’t show up as regular Myriad Pro letters).

Thanks for helping me clear things out.


Michel Boyer
27.Aug.2007 7.18am
Michel Boyer's picture

Now everything is clear. Here is s cedilla, is it not (we have c cedilla in French).


and I don’t see either a correct Tcedilla or tcedilla; all I see is a T and a t with a comma underneath. No font in my system seems to have a Tcedilla or a tcedilla. [edit] By the way, the above Scedilla and scedilla are from Myriad.


Michel Boyer
27.Aug.2007 7.27am
Michel Boyer's picture

Correction: I found many other fonts with a correct Tcedilla ant tcedilla, but I don’t get them in my Myriad Pro


Michel Boyer
27.Aug.2007 7.51am
Michel Boyer's picture

What is even more mysterious to me is that the S and s with a comma underneath do not even show in this Character map on FontShop.


Michel Boyer
27.Aug.2007 9.51am
Michel Boyer's picture

With a SIL Ukelele modified romanian keyboard (not the one coming with the mac), I could type directly in Excel the following:


That does not solve your problem with Myriad but Ukelele is a useful tool. The keyboard I used is here; you put it in your library/Keyboard Layouts, log out and log in again.


sii
27.Aug.2007 10.07am
sii's picture

The ’true’ Romanian forms for these characters were a recent addition to Unicode...

U+0218 Ș LATIN CAPITAL LETTER S WITH COMMA BELOW
U+0219 ș LATIN SMALL LETTER S WITH COMMA BELOW
U+021a Ț LATIN CAPITAL LETTER T WITH COMMA BELOW
U+021b ț LATIN SMALL LETTER T WITH COMMA BELOW

So older fonts, older keyboard layouts and older apps may only support or expect the legacy code-points, that were ’shared’ (uneasily) with Turkish - +015e, U+015f, U+0162, U+0163

Cheers, Si


Thomas Phinney
27.Aug.2007 10.28am
Thomas Phinney's picture

The necessary letters are in the fonts.

I strongly suspect that “acm” is running into a Mac OS issue wherein they are using glyph names to determine encoding, instead of using the encoding built into the font, and then on top of that they may not have always recognized the glyph names as defined in Adobe’s glyph naming standards, in favor of their own schemes. The latter problem is likely better in newer versions of Mac OS, even if the former is not.

What version of OS X are you running, acm?

Regards,

T


sii
27.Aug.2007 10.43am
sii's picture

Off topic a bit, but does the Acrobat Reader end-license also grant font modification rights like the stand-alone Adobe font EULA does (or at least like the FAQ does) to cover modification of the fonts it supplies?

If so I wonder if Myriad might become an alternative to Bitstream Vera / dejavu, in that modified versions may be freely shared between anyone who has a validly licensed Acrobat Reader on their desktop or device?


Michel Boyer
27.Aug.2007 10.56am
Michel Boyer's picture

Thomas,

I must confess that I would like to have a look at the pdf file produced by the command
tx -pdf -1 MyriadPro-Regular.otf > Myriad.pdf
on the last version of the font.

Michel

[edit] The first three pages would be enough.


acm
27.Aug.2007 11.54am
acm's picture

Thomas, I’m not sure if I really understood the problem, but the About This Mac windows is telling me I have Mac OS X version 10.4.10. As far as I know I have the entire system up to date. Might be a problem with the keyboard and as soon I’ll have the time I’ll look into Ukelele and try to fix things.

It’s great to know that I don’t have to modify a font because, honestly, I was pretty sure I would do a lousy job.

I had no idea that the world of typography was so complex :) Fonts, glyphs, UTF numbers... I really have to buy those three books recommended on the Typowiki page.


Miguel Sousa
27.Aug.2007 1.38pm
Miguel Sousa's picture

Michael:
I appreciate your efforts but hold on a second as you’re creating more confusion than helping to solve Andrei’s problem. Thanks.

Andrei:
One thing is for sure, the characters you need are in the fonts you have — as Thomas said —, so there’s no need to modify them.

Now, the problem you’re having is most likely related with Mac OS X not reading the fonts’ ’cmap’ table and relying on the glyphs’ names instead. I’m also running OS X v10.4.10. I don’t use Pages but I did a test with TextEdit and I had no problems with the said characters (Ș ș Ț ț). Can you please confirm this in your end?

On the other hand, I tested — still in TextEdit v1.4 (220) — a few other “related” characters (Ş ş Ţ ţ) and some of them — highlighted in red — were replaced by Lucida Grande, despite Myriad Pro v2.007 containing them. I reckon this is a bug in Mac OS X or TextEdit, not in the font.


Miguel Sousa
27.Aug.2007 5.09pm
Miguel Sousa's picture

BTW, Arno Pro displays correctly, just because the glyphnames happen to be in uniXXXX form.


Michel Boyer
28.Aug.2007 8.44pm
Michel Boyer's picture

In their link About the Unicode Standard, the Unicode consortium states that the entire content of the Unicode Standard including the Character Code Charts is available online. Here are their online Unicode Character Code Charts and here is the Latin Extended-A chart. And here is a grab of the glyphs for U+0162 and U+0163 from the Latin-Extended-A chart provided by the Consortium:


Michel Boyer
29.Aug.2007 7.19am
Michel Boyer's picture

Of course, and as pointed out by Thomas, the real issue here is a naming issue. In the Myriad Pro Regular font that I bought directly from Adobe on Aug 27, the unicode characters U+0162 and U+0163 (look above) are named Tcommaaccent and tcommaaccent; if they are renamed Tcedilla and tcedilla respectively, then textedit and Word gain access to those Myriad Pro characters and insert them correctly in your file. It no longer seems those characters are undefined in Myriad Pro. I checked.

Michel


Miguel Sousa
29.Aug.2007 12.15pm
Miguel Sousa's picture

The same Latin-Extended-A chart also points out that “a glyph variant with comma below is preferred for Romanian”:

And that is also the opinion of two esteemed expert residents:
http://www.typophile.com/node/2764#comment-22015
http://www.typophile.com/node/3970#comment-29637


Miguel Sousa
29.Aug.2007 12.16pm
Miguel Sousa's picture

Thomas also points out that “they may not have always recognized the glyph names as defined in Adobe’s glyph naming standards”, which seems to be true given the fact that the Adobe Glyph List For New Fonts v1.6 associates the name Tcommaaccent to codepoint 0162, and tcommaaccent to codepoint 0163:
http://partners.adobe.com/public/developer/en/opentype/aglfn13.txt

It’s also worth mentioning that the history of these two characters has been eventful, judging by the notes on that same file:

# 1.1 [17 April 2003] Renamed [Tt]cedilla back to [Tt]commaaccent:
#
# 1.0 [31 Jan 2003] Original version. Derived from the AGLv1.2 by:
# - removing the PUA area codes
# - removing duplicate Unicode mappings, and
# - renaming tcommaaccent to tcedilla and Tcommaaccent to Tcedilla


Michel Boyer
29.Aug.2007 1.48pm
Michel Boyer's picture

Miguel

Here is how I read the notes from the Standard that you displayed above concerning U+0162 and U+0163. A first point says that U+0163 is used in Romanian, for Semitic transliterations and there is a ... that usually suggests other uses. The next point insists that the character is used in Romanian data, and will thus stay in archives even if a new glyph is now preferred. The third point states explicitly that for Romanian, the new preferred glyph, a t with comma below, is to be found at U+021B.

Great! Romanian was granted four new characters, namely U+0218, U+0219, U+021A and U+021B as mentioned above by sii. I don’t see how this should imply that people in Semitic studies (and others) should now be deprived from their glyphs in U+0162 and U+0163.

If there is any doubt left, there is a last line; it is in Backus Naur form (BNF), commonly used in Computer Sciences (in particular in the syntax of programming languages) and it leaves no ambiguity; it defines the glyph U+0163 as a composed glyph; a glyph composed of those in U+0074 and U+0327; U+0074 is just a “t”; now what is U+0327? It is so small on the screen that we may have doubts. So we just go to the page The Unicode Character Code Charts By Script and enter “0327” in the slot for “Look up by character code” and then click “go”. The most current chart is said to be here. We look and we find this nice glyph:


and in the notes under, we learn how the corresponding character is to be named: “combining cedilla”

So U+0163 is defined in BNF as a “t” with a “combining cedilla” under. The people in Semitic studies are safe (and maybe others as well), they may keep their “tcedilla” glyph in U+0163!

Michel


Michel Boyer
29.Aug.2007 2.16pm
Michel Boyer's picture

By the way, U+0162 was also defined unambiguously as the glyph in U+0054 composed with the glyph in U+0327, and the last chart containing U+0327 leaves no room for interpretation.

[edit] My interpretation may be biased by the fact that I have been Faculty in a Computer Science department for now over 20 years and for me a BNF definition prevails over any comment.

Michel


twardoch
29.Aug.2007 2.56pm
twardoch's picture

[edit] My interpretation may be biased by the fact that I have been Faculty in a Computer Science department for now over 20 years and for me a BNF definition prevails over any comment.

Well, I guess that is where we differ. I trust the human understanding about the human language more than the computer understanding about the human language.

Besides, any Backus-Naur form i.e. a context-free grammar that describes a formal language has been written by someone. Therefore, there’s no difference between it and “any comment”.

A.


twardoch
29.Aug.2007 3.10pm
twardoch's picture

There are several issues that come together here:

1. Mac OS X ignores the glyph-to-Unicode mapping provided in the “cmap” table of OpenType PS (CFF/.otf) fonts, while it uses it for OpenType TT (.ttf) fonts. For OpenType PS fonts, Mac OS X uses the glyph-to-glyphname mapping provided in the font and then maps the glyphnames to Unicodes itself.

2. Unfortunately, Mac OS X does not recognize the “*commaaccent” glyphnames that are defined by Adobe for Romanian and Baltic languages (such as Tcommaaccent, Rcommaaccent, Kcommaaccent, Ncommaaccent) but instead only recognizes the “*cedilla” names (Tcedilla, Rcedilla, Kcedilla, Ncedilla) or the “uni****” names (uni0162, uni0156, uni0136, uni0145). This means that Mac OS X will fail to recognize the glyphs Tcommaaccent, Rcommaaccent, Kcommaaccent, Ncommaaccent and map them to their respective Unicodes.

3. On top of that, there is another confusion. Originally, the Unicode consortium defines the codepoints U+015E, U+015F, U+0162, U+0163 as suitable for both Turkish and Romanian, and defined them as containing the cedilla accent. Turkish indeed uses cedilla in U+015E, U+015F but does not make any use of U+0162, U+0163. But the Romanian normalization delegation raised an objection towards those mappings because in the Romanian typographic tradition, glyphs with a commaaccent are preferred. So the Unicode consortium added the mappings U+0218, U+0219, U+021A, U+021B, and defined them as containing a commaaccent.

4. Unfortunately, many of the Romanian locale definitions used in operating systems still use the “old” mappings rather than the “new” mappings. So it is common practice that Romanian texts contain U+015E, U+015F, U+0162, U+0163 rather than U+0218, U+0219, U+021A, U+021B.

5. Nonetheless, the preference that the Romanian characters should be rendered using a commaaccent still applies. Since a “T/t with cedilla” does not seem to be used in any living language, many type designers have decided to draw U+0162, U+0163 as a “T/t with commaaccent”. So the glyphs for U+0162, U+0163 and for U+021A, U+021B are identical in many fonts, and that’s how it should be.

6. Since the U+015E, U+015F are used as “S/s with cedilla” in Turkish but are also used in “old” locales for Romanian, many OpenType fonts now contain a “locl” glyph substitution that replaces the glyphs Scedilla (U+015E), scedilla (U+015F) with the glyphs uni0218, uni0219 in Romanian context. This is not a very elegant solution but seems a pragmatic one.

To resolve the problem, Apple should do two things:

a) change the Romanian locale/keyboard layout definition so that it no longer uses the U+015E, U+015F, U+0162, U+0163 codepoints but instead uses U+0218, U+0219, U+021A, U+021B.

b) change the manner how it treats OpenType PS (.otf) fonts in that it no longer relies on glyphnames to build a glyph-to-Unicode mapping but instead, directly uses the “cmap” table included in the font.

To avoid problems with current Mac OS X versions, font developers should do three things:

a) do not use the “*commaaccent” glyphnames but use the “uni****” glyphnames instead

b) draw the “Scedilla” and “scedilla” glyphs with a cedilla, and draw the “uni0162”, “uni0163”, “uni0218”, “uni0219”, “uni021A”, “uni021B” glyphs with a commaaccent (“commaaccent” does not mean that the accent has to look 100% like a small comma, but it should be disconnected from the base letter, should be thicker at the top and thinner at the bottom)

c) provide the following OpenType feature definition code in their fonts:

feature locl { # Localized Forms
language MOL; # Moldavian
sub [Scedilla scedilla] by [uni0218 uni0219];
language ROM; # Romanian
sub [Scedilla scedilla] by [uni0218 uni0219];
} locl;

Optionally, the type designer could provide stylistic alternates “uni0162.cedilla” and “uni1063.cedilla” that indeed would be “T/t with cedilla”, suitable for Semitic studies and similar applications. These could be available through the “ss**” and “salt” OpenType layout features.

Should the type designers feel that he desperately needs to to draw the “uni0162”, “uni0163” glyphs using a cedilla, they should provide the following OpenType feature definition code in their fonts:

feature locl { # Localized Forms
language MOL; # Moldavian
sub [Scedilla scedilla] by [uni0218 uni0219];
sub [uni0162 uni0163] by [uni021A uni021B];
language ROM; # Romanian
sub [Scedilla scedilla] by [uni0218 uni0219];
sub [uni0162 uni0163] by [uni021A uni021B];
} locl;

Regards,
Adam Twardoch
Fontlab Ltd.


Miguel Sousa
29.Aug.2007 3.18pm
Miguel Sousa's picture

> Here is how I read the notes from the Standard that you displayed above [...]

That is not exactly the way the notes are to be read. The entries with a bullet (•) are Informative Notes, the ones with an arrow (→) are Cross References, and the ones with an equivalent sign (≡) are Canonical Decompositions. According to the Unicode 5.0 reference book, these are defined as:

Information About Languages
An informative note may include a list of one or more of the languages using that character where this information is considered useful. For case pairs, the annotation is given only for the lowercase form to avoid needless repetition. An ellipsis “...” indicates that the listed languages cited are merely the principal ones among many.

Cross References
Cross references are used to indicate a related character of interest, but without indicating the nature of the relation. Possibilities are a different character of similar appearance or name, the other member of a case pair, or some other linguistic relationship.

Decompositions
The decomposition sequences (one or more letters) given for a character is either its canonical mapping or its compatibility mapping. The canonical mapping is marked with an identical to symbol ≡.

In addition, in page 564 one can read the following:
Images in the Code Charts and Character Lists
Each character in these code charts is shown with a representative glyph. A representative glyph is not a prescriptive form of the character, but rather one that enables recognition of the intended character to a knowledgeable user and facilitates lookup of the character in the code charts. In many cases, there are more or less well-established alternative glyphic representations for the same character.

Designers of high-quality fonts will do their own research into the preferred glyphic appearance of Unicode characters.[...]


Michel Boyer
29.Aug.2007 3.31pm
Michel Boyer's picture

> Well, I guess that is where we differ. [added with edit]

Ok, that’s fair.

So more research is needed (or maybe just call people from the Consortium, they are people too and there is a phone number on their site).

Here is a citation from Chapter 7, page 228, of the Unicode Standard version 5.0 to be found here.


In Turkish and Romanian, a cedilla and a comma below sometimes replace one another depending on the font style, as shown in example 4 in Figure 7-1. The form with the cedilla is preferred in Turkish, and the form with the comma below is preferred in Romanian. The characters with explicit commas below are provided to permit the distinction from characters with a cedilla. Legacy encodings for these characters contain only a single form of each of these characters. ISO/IEC 8859-2 maps these to the form with the cedilla, while ISO/IEC 8859-16 maps them to the form with the comma below. Migrating Romanian 8-bit data to Unicode should be done with care.

And here is Fig 7-1.


The cedilla form being preferred in Turkish, why then remove it completely from the font?

Michel


Michel Boyer
29.Aug.2007 3.51pm
Michel Boyer's picture

@Miguel

Well, I now see the syntax is not quite the one I was expecting; I also feel bad for people that depend on characters that are not accessible in their font.

Michel

[edit] In fact, I am not sure the way you describe the interpretation is fundamentally inconsistent with mine.


John Hudson
29.Aug.2007 3.58pm
John Hudson's picture

The cedilla form being preferred in Turkish, why then remove it completely from the font?

Turkish doesn’t use a T/t with cedilla, only S/s and C/c.

Let me provide some background here to the practice to date with regard to the T/t with cedilla, and then outline what I am doing now and what I recommend to other font developers. This has changed recently.

When font developers started extending their fonts to support central and eastern European languages, they began documenting the orthographies of these languages, noting glyph preferences and correlating these to Unicode characters. Romanian was quickly identified as a problem in terms of both encoding and glyph preference. The encoding problem was due to the fact that Unicode provided a single codepoint encoding two text entities that, it turns out, needed to be distinguished: S/s with cedilla below and S/s with comma accent below. It should be noted that Unicode also encoded a number of other characters nominally ’with cedilla’, but for which a comma accent form is preferred in all the European orthographies that use these diacritic letters: K/k, R/r and, importantly it turned out, T/t. So although Unicode calls these characters letters ’with cedilla’ the expectation in most fonts is that these will actually be displayed with a comma accent below, as per the editorial notes in the Unicode Standard and user expectations.

But the S/s with cedilla created a different situation, because that diacritic, with an actual cedilla not a comma accent, is a common feature of virtually all Turkic language orthographies using the Latin script. This meant that font developers and text processing engineers had a problem because a single codepoint encoded two possible forms that needed to be distinguished for different languages. So Unicode and WG2, with input from the Romanian national standards body, decided to add separate codepoints for the two Romanian diacritics S/s and T/t with comma accent.

That should have solved the problem, but it hasn’t because pre-existing 8-bit Romanian character sets — which continued to be used by some computer systems and that bizarrely continue to influence encoding and display behaviour even in some nominally Unicode environments (notably Mac OS X) — reference the old, unified S/s and T/t ’with cedilla’ codepoints. And of course there are a lot of existing Romanian documents that use those codepoints. And change-over to the new, comma accent codepoints has been slow and inconsistent.

Now, bearing in mind again that what the majority of font developers were trying to do was to support the modern orthographies of a subset of European languages and not all the languages of the world and not semitic transliteration and other specialised uses, it appeared that the most efficient way to give the most desirable display of the most number of characters for Romanian users was to present the T/t ’with cedilla’ character with a comma accent form just as is done for K/k and R/r ’with cedilla’. Further, in acknowledgement that the old cedilla codepoints would continue to be an issue for Romanian users, fonts were future-oriented at projected support for OpenType language system tagging, by mapping the S/s with cedilla to the comma accent glyph forms via language-specific substitution lookups for Romanian. What this means is that in applications like InDesign CS3 that support such tagging, both S/s and T/t will display with the comma accent regardless of which character pairs are used to encode the text.

Given the information available — the clear preference of Romanian users for the comma accent form, the lack of use of T/t with cedilla in any of the target orthographies, and the glyph processing options available —, I think this was all pretty reasonable decision making.

What we hadn’t taken into account, because no one had raised the issue until very recently, is that the Romanians might have a different preference in the condition in which systems and applications do not take advantage of the OpenType language system tagging. In such situations, the S/s diacritic, when encoded using the old codepoints, displays with a cedilla not a comma accent, and the only way to correct this is to change the encoding to the new codepoints (which, as far as I’m concerned, is exactly what should be done). As explained above, the approach in the past ten years has been for font developers to minimise the number of incorrectly displayed Romanian diacritics to this single S/s diacritic, and to map the T/t ’with cedilla’ character to a comma accent glyph, so that this diacritic displays correctly.

Last year, I began to hear, through Microsoft’s Romanian marketing people, that in this situation, where display of the S/s diacritic is with the cedilla, their preference is, in fact, for T/t to also display with cedilla for the sake of consistency. This is a notable instance of two wrongs making a right: it is preferable for both diacritics to display incorrectly with the cedilla than for one diacritic to display correctly while the other displays incorrectly.

So I have recently revised my approach, and advise other font developers to do the same: start including those T/t with cedilla glyphs mapped to U+0162 and U+0163, because that, it turns out, is what the Romanians want. Of course, under the Romanian language system tag, these should then be mapped to comma accent forms as a glyph-level rather than a character-level solution to the Romanian display issue.

And it should go without saying that anyone making fonts that are targeted at broader language use and things like semitic transliteration should already be including T/t with cedilla as appropriate.


Michel Boyer
29.Aug.2007 7.11pm
Michel Boyer's picture

> do not use the “*commaaccent” glyphnames but use the “uni****” glyphnames instead

Since this thread started with a practical problem caused by an existing font using “*commaaccent” in glyphnames, I will ask it would be a lot of work to provide end-users with a downloadable “patch” that, when applied to that font, globally replaces the “*commaaccent” glyphnames by the appropriate “uni****”.

Michel


k.l.
30.Aug.2007 2.47am
k.l.'s picture

Doesn’t this thread belong to the Build forum now?

Michel Boyer — The cedilla form being preferred in Turkish, why then remove it completely from the font?

There is no Tcedilla/tcedilla in Turkish.
However, John Hudson’s paragraph ’So I have recently revised my approach’ adds another aspect to consider which suggests to keep it. (Thank you for this.)

Michel Boyer — And here is Fig 7-1.

What looks interesting to me is number (2) in this illustration. How acceptable is the dcaron/tcaron/lcaron versions with a ’real’ caron above in countries that use them? With spacing/kerning in mind — these would be much easier to deal with. I am curious for comments.

Karsten


dezcom
30.Aug.2007 7.53am
dezcom's picture

Great post John! Very clearly stated (and appreciated).

ChrisL


Michel Boyer
30.Aug.2007 8.25am
Michel Boyer's picture

> Great post John!

Indeed! And thanks.


John Hudson
30.Aug.2007 2.44pm
John Hudson's picture

Karsten: What looks interesting to me is number (2) in this illustration. How acceptable is the dcaron/tcaron/lcaron versions with a ’real’ caron above in countries that use them? With spacing/kerning in mind — these would be much easier to deal with. I am curious for comments.

Note that the figure is illustrating possible glyph variants at the script level, not at the individual language level. The ’real caron’ forms are legitimate ways of representing these diacritic characters within the Latin script system and, indeed, there may be situations or languages for which these forms are preferred. But for Czech and Slovak the apostrophe-like form is very much the norm, and I’ve not seen the other form used with L/l d or t in any Czech typography. So again the issue is one of what languages you are targeting. The majority of Latin font development is targeted at languages of European use or origin, so the expectation is that the Czech and Slovak forms for these diacritic letters will be the default forms in most fonts.

As the target language coverage for fonts expands, the support of language-specific glyph variation will become more and more important. Thankfully, we’re beginning to see it implemented, albeit not in an ideal way, in major apps like InDesign CS3 (and even in CS2 ME).


k.l.
30.Aug.2007 4.41pm
k.l.'s picture

Hello John, thank you. “But for Czech and Slovak the apostrophe-like form is very much the norm, and I’ve not seen the other form used with L/l d or t in any Czech typography.” This indeed answers my question. I wondered if — maybe not at once but in future — national habits might change in favor of a common denominator, and with some luck this would be forms that are less likely to collide.

Karsten


Michel Boyer
30.Aug.2007 6.13pm
Michel Boyer's picture

> Unfortunately, Mac OS X does not recognize the “*commaaccent” glyphnames

I took the time to test and in fact, in Myriad Pro, the names [G/g, K/k, L/l, N/n, R/r, S/s]commaaccent are all used and are all processed correctly by my Mac.

It is only [T/t]commaaccent in 0x0162 and 0x0163 that cause problem so far as I can see. Could you be more precise in your statement?

Michel


Michel Boyer
30.Aug.2007 6.59pm
Michel Boyer's picture

More precisely, here is the display of a text written with Textedit in Myriad Pro with a customized keyboard.


To see exactly what is typed, we need only look at the text of the saved rtf file; here it is:


{\rtf1\mac\ansicpg10000\cocoartf824\cocoasubrtf410
{\fonttbl\f0\fnil\fcharset77 MyriadPro-Regular;}
{\colortbl;\red255\green255\blue255;}
\margl1440\margr1440\vieww10380\viewh8460\viewkind0
\pard\tx566\tx1133\tx1700\tx2267\tx2834\tx3401\tx3968\tx4535\tx5102\tx5669\tx6236\tx6803\ql\qnatural\pardirnatural

\f0\fs96 \cf0 \
\uc0\u291 \u290 \u311 \u310 \u316 \u315 \u326 \u325 \u343 \u342 \u537 \u536 \
}

We see that the characters are all in MyriadPro-Regular (the font \f0) and we also have the characters with their decimal value;
here they are, with their transformation in hexa, and then their name in the otf file of Myriad Pro Regular:


     \u291 ; 0x0123 ; gcommaaccent
     \u290 ; 0x0122 ; Gcommaaccent
     \u311 ; 0x0137 ; kcommaaccent
     \u136 ; 0x0136 ; Kcommaaccent
     \u316 ; 0x013C ; lcommaaccent
     \u315 ; 0x013B ; Lcommaaccent
     \u326 ; 0x0146 ; ncommaaccent
     \u325 ; 0x0145 ; Ncommaaccent
     \u343 ; 0x0157 ; rcommaaccent
     \u342 ; 0x0156 ; Rcommaaccent
     \u537 ; 0x0219 ; scommaaccent
     \u536 ; 0x0218 ; Scommaaccent

Michel


Michel Boyer
31.Aug.2007 1.17pm
Michel Boyer's picture

Correction: of course, the line
     \u136 ; 0x0136 ; Kcommaaccent
above should read
     \u310 ; 0x0136 ; Kcommaaccent

Michel


Michel Boyer
31.Aug.2007 4.59pm
Michel Boyer's picture

I tried other things and here is what I found. The characters whose glyphs are


are in 0x0162 and 0x21A for the majuscule, 0x0163 and 0x021B for the minuscule and here is what they are named by Adobe in Myriad Pro:

     uni0162 : Tcommaaccent
     uni0163 : tcommaaccent
     uni021A : uni021A
     uni021B : uni021B

It is the characters in 0x0162 and and 0x0163 that are not recognized by the Mac. If we rename the above four characters as follows:

     uni0162 : Tcedilla
     uni0163 : tcedilla
     uni021A : Tcommaaccent
     uni021B : tcommaaccent

then all the characters in the resulting font are recognized by the Mac, be they called *commaaccent or *cedilla. So it is not the names that are causing a problem but what they are naming. Is Tcommaaccent uni0162 or is it uni021A? As pointed out by Adam, in a quite different style, one way to lift the disagreement is not to use the names Tcommaaccent and tcommaaccent and use uni0162 and uni0163, on which everyone agrees.

Michel

PS. Notice that if we execute the command

curl -s http://www.unicode.org/Public/UNIDATA/NamesList.txt | egrep '^0162|^0163|^021A|^021B'

to get the names in Unicode’s NamesList we get this:

    0162 LATIN CAPITAL LETTER T WITH CEDILLA *
    0163 LATIN SMALL LETTER T WITH CEDILLA *
    021A LATIN CAPITAL LETTER T WITH COMMA BELOW *
    021B LATIN SMALL LETTER T WITH COMMA BELOW *

and those are the names displayed by the Macintosh character palette.


Michel Boyer
31.Aug.2007 5.46pm
Michel Boyer's picture

I must add that I fail to understand by what mechanism such disagreement would cause characters not to be acessible. If you have any idea, please tell me.

Michel


Michel Boyer
3.Sep.2007 5.13am
Michel Boyer's picture

And I am still more puzzled when, after a curl -s and a join I get the following comparative table of Adobe names and names in Unicode’s file NamesList.txt

  0122  Gcommaaccent  LATIN CAPITAL LETTER G WITH CEDILLA
  0123  gcommaaccent  LATIN SMALL LETTER G WITH CEDILLA
  0136  Kcommaaccent  LATIN CAPITAL LETTER K WITH CEDILLA
  0137  kcommaaccent  LATIN SMALL LETTER K WITH CEDILLA
  013B  Lcommaaccent  LATIN CAPITAL LETTER L WITH CEDILLA
  013C  lcommaaccent  LATIN SMALL LETTER L WITH CEDILLA
  0145  Ncommaaccent  LATIN CAPITAL LETTER N WITH CEDILLA
  0146  ncommaaccent  LATIN SMALL LETTER N WITH CEDILLA
  0156  Rcommaaccent  LATIN CAPITAL LETTER R WITH CEDILLA
  0157  rcommaaccent  LATIN SMALL LETTER R WITH CEDILLA
  015E  Scedilla      LATIN CAPITAL LETTER S WITH CEDILLA *
  015F  scedilla      LATIN SMALL LETTER S WITH CEDILLA *
  0162  Tcommaaccent  LATIN CAPITAL LETTER T WITH CEDILLA *
  0163  tcommaaccent  LATIN SMALL LETTER T WITH CEDILLA *
  0218  Scommaaccent  LATIN CAPITAL LETTER S WITH COMMA BELOW *
  0219  scommaaccent  LATIN SMALL LETTER S WITH COMMA BELOW *
  021A  uni021A       LATIN CAPITAL LETTER T WITH COMMA BELOW *
  021B  uni021B       LATIN SMALL LETTER T WITH COMMA BELOW *

If the other “commaaccent” are recognized, why not also [T/t]commaaccent ?

Michel


k.l.
3.Sep.2007 5.59am
k.l.'s picture

Don’t spend too much thought on this. Since it is a Mac OS bug, the question is not ’why?’ but ’when will it be fixed?’  :)  Follow Adam’s and John’s advices as regards glyph naming and the locl feature, and the font will work fine. At least in ≥ 10.4.


Michel Boyer
3.Sep.2007 6.40am
Michel Boyer's picture

Here is something else that does not quite fit the world I am used to. It comes from this chart of Unicode’s standard:


The name mentions a cedilla, and the definition in BNF style says that it is built with a G and a character 0327 which is indeed a cedilla, yet the Gcedilla they display is with a comma. I must confess that I don’t like this. This obviously confirms Miguel’s comment. [edit] about my too strict interpretion of what looked to me like a BNF definition. The same holds for g, K, k, L, l, N, n, R, and r. Only S, s, T and t are shown with a cedilla in that chart.

Michel


Michel Boyer
3.Sep.2007 6.51am
Michel Boyer's picture

> Don’t spend too much thought on this.

I am learning and I have no need to follow advices because I am not a developer. My problem is with Unicode’s definition itself and is probably somewhat “Academic” for the time being.

Michel


Michel Boyer
3.Sep.2007 7.15am
Michel Boyer's picture

> Don’t spend too much thought on this. (again)

I am spending too much time indeed, but it is quite fascinating. For instance, I have a Teach Yourself Romanian that dates back to 1970. People did not have computers at home by then. I don’t know when fonts started to be digitized. Well, in that book all the t “cecilla” have a comma below. As for the s, on the very same page, very close one from the other, I can see one with a comma, one with a hook and one with the cedilla of Times New Roman.


John Hudson
3.Sep.2007 10.57am
John Hudson's picture

The name mentions a cedilla, and the definition in BNF style says that it is built with a G and a character 0327 which is indeed a cedilla, yet the Gcedilla they display is with a comma. I must confess that I don’t like this.

Go back and read my long post again. All the ’with cedilla’ characters in Unicode except C/c cedilla and S/s cedilla are properly displayed with the comma accent form in the European orthographies that use these characters. The unification of cedilla and comma accent under the name cedilla was an early error in Unicode, and one which for stability reasons they chose not to correct except in the case of the S/s and T/t comma accent for Romanian (and given the massive confusion and conflicting text encodings that that correction has produced, one can see why they would avoid throwing the Baltic languages that use the other ’cedilla’ characters into the same mess). It was a mistake to conflate these two diacritic marks and a mistake to call them ’with cedilla’ in the formal names, but it is an old mistake and one that we have to live with.


Michel Boyer
3.Sep.2007 11.51am
Michel Boyer's picture

> it is an old mistake and one that we have to live with.

Thanks for clarifying. Is there other instances in the Unicode “specification” that require such an exegesis?

Michel


Michel Boyer
3.Sep.2007 12.48pm
Michel Boyer's picture

This does not answer all my questions. If I look at the “cedillas” in New Times Roman, I see this


All the “cedillas” whether attached or detached match, except those under the Romanian “t”; this inconsistency must have a justification. Are they all “commas below” but looking different (except of course for S and s cedilla).

Michel

[added] You mention “the other ’cedilla’ characters”. Maybe you are expecting too much from me, taking for granted a background I don’t have. For me a cedilla may be detached, and when I write a c cedilla in French, it will most probably not be connected with the c, even if it is printed connected. So, for me something that looks like a cedilla even if it does not look like an attached cedilla is a cedilla; mixing detached cedillas with attached cedillas is no problem for me. But mixing a comma with a detached cedilla feels really weird.


Michel Boyer
3.Sep.2007 1.17pm
Michel Boyer's picture

Here is a (rare) example of a detached cedilla in my old Teach Yourself Romanian.


Could this be a good example of a ’scommaaccent’ ? All the T and t have real commas under, like those of Times New Roman.


Michel Boyer
3.Sep.2007 2.29pm
Michel Boyer's picture

> It should be noted that Unicode also encoded a number of other characters nominally ’with cedilla’, but for which a comma accent form is preferred in all the European orthographies that use these diacritic letters: K/k, R/r and, importantly it turned out, T/t.

Should I conclude that the above R/r characters in Times New Roman are wrong?

Michel


John Hudson
3.Sep.2007 5.29pm
John Hudson's picture

Is there other instances in the Unicode “specification” that require such an exegesis?

Yes, quite a few. Perversely, the relative messiness of Unicode as a standard is a testament to its success: it was willing to accept dubious encodings and politically motivated proposals (e.g. the Arabic presentation forms and the composed Hangual syllables), at least during the early years, in order to get the standard off the ground.

So, for me something that looks like a cedilla even if it does not look like an attached cedilla is a cedilla; mixing detached cedillas with attached cedillas is no problem for me. But mixing a comma with a detached cedilla feels really weird.

An unattached cedilla is probably pretty acceptable to a Romanian or Baltic reader. Indeed, there have been attempts to design what Chuck Bigelow call a ’commadilla’, a deliberatly ambiguous, disconnected form that could be read as either a cedilla or a comma accent according to the preference of the reader.


John Hudson
3.Sep.2007 5.35pm
John Hudson's picture

Should I conclude that the above R/r characters in Times New Roman are wrong?

Not in themselves: as I just wrote, this kind of unattached curved shape is probably an acceptable ’commaaccent’, but it would be better if all the commaaccent glyphs were consistent. It isn’t crucial for European languages, because the T/t commaaccent is only used alongside S/s commaaccent, not alongside the Baltic diacritics.

By the way, at which version of Times New Roman are you looking? The Windows Vista version distinguishes T/t with cedilla from T/t with commaaccent.


Michel Boyer
3.Sep.2007 6.50pm
Michel Boyer's picture

> which version of Times New Roman are you looking

It is Monotype Version 3.05 that probably came with Microsoft Office 2004. My PC is not working at the moment but in any case, it does not run Vista. Vista is not supported by our staff. [added] I am working almost all the time on my mac.


Michel Boyer
4.Sep.2007 6.36pm
Michel Boyer's picture

My disk is dead; I couldn’t even try installing Office 2007. I presume that when the Unicode glyph name contained CEDILLA they chose one of the CEDILLA subglyphs under and when it contained the word COMMA they chose the COMMA subglyph under:


[added] ... except, of course, for g.

Michel


twardoch
18.Aug.2008 10.13am
twardoch's picture

Fontlab Ltd.’s current recommendation is to design four glyphs using a cedilla accent, and giving the S with cedilla glyphs the *cedilla names and the T with cedilla glyphs uniXXXX names or *cedilla names. The notes that follow the glyph names are not the Unicode character names but actual descriptive names:

U+015E "Scedilla" Latin capital S with cedilla
U+015F "scedilla" Latin small s with cedilla
U+0162 "uni0162" or "Tcedilla" Latin capital T with cedilla
U+0163 "uni0163" or "tcedilla" Latin small t with cedilla

The remaining glyphs in question should include glyphs with the commaaccent diacritic and should use uniXXXX names, not *commaaccent names.

U+0122 "uni0122" Latin capital G with commaaccent below
U+0123 "uni0123" Latin small g with turned commaaccent above
U+0136 "uni0136" Latin capital K with commaaccent below
U+0137 "uni0137" Latin small k with commaaccent below
U+013B "uni013B" Latin capital L with commaaccent below
U+013C "uni013C" Latin small l with commaaccent below
U+0145 "uni0145" Latin capital N with commaaccent below
U+0146 "uni0146" Latin small n with commaaccent below
U+0156 "uni0156" Latin capital R with comma below
U+0157 "uni0157" Latin small r with commaaccent below
U+0218 "uni0218" Latin capital S with commaaccent below
U+0219 "uni0219" Latin small s with commaaccent below
U+021A "uni021A" Latin capital T with commaaccent below
U+021B "uni021B" Latin small t with commaaccent below