Adobe Latin Character Sets
Over on Typblography, Thomas has posted notes about the future extended Latin character sets. Comments are welcome.
http://blogs.adobe.com/typblography/2008/08/extended_latin.html
Over on Typblography, Thomas has posted notes about the future extended Latin character sets. Comments are welcome.
http://blogs.adobe.com/typblography/2008/08/extended_latin.html
28.Aug.2008 6.49pm
Oops, Miguel beat me to posting!
T
28.Aug.2008 7.06pm
Tracking
ChrisL
28.Aug.2008 8.01pm
Are we to respond & ask questions here, or on the blog? In either case, a lot to think about.
28.Aug.2008 11.27pm
Don't know if Thomas has a preference, but I'd say that here allows for more discussion and you can post images if needed. In any case, both places are good for collecting comments. Thomas has started threads on two mailing lists as well, so it's unlikely that all comments will end up in the same place. So, the bottom line is, post where you prefer. We'll be keeping an eye in all the places. Thanks!
29.Aug.2008 8.47am
Here or on the blog are fine. (Email on the ATypI or OpenType mailing list is harder to track, at least for me.)
Cheers,
T
30.Aug.2008 12.29am
Can I use these character sets (AL-1 through -5) in the fonts I publish, and mention in my documentation/publicity that my fonts support these "code pages"?
If so, isn't this all getting rather non-standard, with foundries such as H&FJ and their "Latin-X" encoding, and Microsoft with WGL-4?
What is the difference between a code page, a glyph list, and a Unicode chart?
AL-3 doesn't support ISO 8859-14 Latin 8 (Celtic), which seems to be part of the established ISO set of 1 through 10 encodings.
Where is the capital Eszett?
"almost equal to" is not really "approximately equal to", it sounds more like "slightly less than".
I note that you worked with Paratype on the Adobe Extended Cyrillic character set, which is very similar in coverage to the Paratype "Asian Cyrillic" code page. I followed the Paratype Asian Cyrillic code page in my Modern Suite fonts. I like the utilitarian rationale.
31.Aug.2008 2.07pm
> Can I use these character sets (AL-1 through -5) in the fonts I publish,
Yes.
> and mention in my documentation/publicity that my fonts support these “code pages”?
You are welcome to mention support for these character sets (or more properly glyph complements) in your documentation. (They aren't code pages... see below for some handy definitions.)
> If so, isn’t this all getting rather non-standard, with foundries such as H&FJ and their “Latin-X” encoding, and Microsoft with WGL-4?
"Latin-X" is either a character set or a glyph complement, but it's not an encoding. But to your point, sure this is all "non-standard" because there were no standards in this area which we found useful. If there were standards, it would be an open question as to whether they would be keeping up with changes and developments in Unicode....
Why did we not just adopt these other companies' standards? Basically they didn't meet the needs we were trying to meet, in terms of global language support. I'm not saying they aren't useful and of course they support lots of languages and that's great. They just didn't do some things we thought were important.
WGL-4 is completely euro-centric: it includes support for languages spoken by only tens of thousands of people (e.g. Sami) but doesn't include support for langauges spoken by tens of millions of people (e.g. Pilipino/Tagalog in the Pacific and Yoruba in Africa).
I haven't studied H&FJ's Latin-X character set closely (Latin-X is a registered trademark of H&FJ, so not sure we could use the name even if we used the character set), but I see that it doesn't support Pinyin transliteration of Chinese, something that I put in AL-4 (and hence in AL-5 as well). Neither does WGL-4, of course.
> AL-3 doesn’t support ISO 8859-14 Latin 8 (Celtic), which seems to be part of the established ISO set of 1 through 10 encodings.
Okay. Is this a problem? Why?
> Where is the capital Eszett?
U+1E9E. It's in AL-4 and Al-5. It was subject to some debate internally. One of us favored to relegating it to AL-5 only, under the theory that it isn't going to see much use at all. But we are putting it in AL-4, and we may very well add it to most of our AL-3 fonts as well in some retrofit some day.
> “almost equal to” is not really “approximately equal to”, it sounds more like “slightly less than”.
I have no idea what you're referring to here. Can you clarify?
> I note that you worked with Paratype on the Adobe Extended Cyrillic character set, which is very similar in coverage to the Paratype “Asian Cyrillic” code page. I followed the Paratype Asian Cyrillic code page in my Modern Suite fonts. I like the utilitarian rationale.
Yes, I asked Emil Yakupov from Paratype to review the work. Similarly, I got feedback from several folks on aspects of the new extended Latin charset, most notably Robert Bringhurst on pan-Athapaskan coverage.
I wrote up a set of definitions for you as well, but they got eaten by the Web. I'll rewrite those and post them on my blog....
Cheers,
T
31.Aug.2008 2.42pm
Trackin' too.
31.Aug.2008 8.40pm
Here are some definitions for Nick:
http://blogs.adobe.com/typblography/2008/08/character_set_terms.html
Cheers,
T
1.Sep.2008 3.15pm
I've spent some time with the spreadsheet(s) & have a question on the IAST Indic transliteration -- perhaps this is more for the Unicode consortium than Adobe, but here goes:
Shouldn't the "line below" characters really be "macron below" characters? That is how I've always constructed them. Compare, for example, 1E5D with 1E5F. As understand it, the combining low line (0332) connects on the left & right; the combining macron below (0331) would have the same width as a macron, 0304 (macron above).
This question comes up fairly often for us. Kiowa too uses either a "line below" or "macron below." We usually try to steer people to the macron below, which will have the same width regardless of the setwidth of the character. And while is isn't suppose to come up in Kiowa (I'm told only one macron below per word), if you have two adjacent characters with a combining line below, you get, in effect, an single line under two characters.
* * *
In general, I think I'm in a bad position to comment on the new character sets. All my experience is with reacting to texts sent to us for typesetting. In that position, I have to draw up only the glyphs needed for a particular job. Moreover, we can, and sometimes have to, accommodate authors/publishers with differing orthographies for the same language. On the other hand, someone who is primarily a font designer and who isn't going to revisit the fonts very often has to plan ahead. From my perspective, Thomas' & Miguel's job is far harder than mine.
The only thing I'd question is if the combining diacritical marks shouldn't be included at a rather basic level. These, not the legacy diacriticals, are where things should be headed.
1.Sep.2008 7.54pm
I haven't really talked about implementation yet, but in short: we'll be making use of combining diacritics using OpenType mark positioning, and in fact many, many combinations should "just work" which are not listed (and won't be explicitly tested, either). More on this in a future blog post.
I agree with Charles Ellertson that "macron below" is probably the right description for those IAST Indic transliteration characters (although the source I was using used the "line below" description).
Cheers,
T
2.Sep.2008 2.08am
> Shouldn’t the “line below” characters really be “macron below” characters?
The Unicode names say "WITH LINE BELOW", but yes, what should be used is a macron. If you look at the second page of the Latin Extended Additional chart, it actually says that,
In this block the names "WITH LINE BELOW" refer to a macron below the letter.