How many of those "extended" characters do you really need?

Type Minds's picture

I'm in the process of planning out a new type family and I'd like it to cover as much of the Latin, Greek, and Cyrillic scripts as is practical (i.e. in modern use - I'm not too worried about ancient or archaic languages). Looking through the Unicode Standard, I realize just how little I really know about the Latin script alone! So I'm trying to sort through the various code charts to figure out which characters I need.

I already know to include all the characters in Basic Latin, Latin-1 Supplement, and Latin Extended-A. I know the basic diacritics, punctuation, "letterlike symbols," etc. I can identify the basic necessities for the Greek and Cyrillic scripts. But beyond that I'm lost. The following is a list of Unicode pages I'm interested in. Any advice on which to include and which to ignore would be great.

Latin Extended-B, -C, -D, & Additional
IPA Extensions
Spacing Modifier Letters (besides the basic diacritics)
Combining Diacritical Marks (besides the basics)
Greek and Coptic (beyond the basic alphabet)
Cyrillic (same)
Cyrillic Supplement
Phonetic Extensions & Supplement
Comb. Diacritics Supp.
Greek Ext.
General Punctuation, Supp. Punctuation
Superscripts and Subscripts
Currency Symbols
Letterlike Symbols
Cyrillic Ext.-A & -B

Igor Freiberger's picture


some of your doubts were mine some time ago.

Do the contrary: define which languages you want to support and then include the proper characters. Unicode blocks are heterogeneous and the Latin ones brings together glyphs needed to less common European languages besides Asian languages, Medieval characters and even Roman signs.

For example: Latin Extended Additional includes 1E9E, the German uppercase double S. It seems ok to include that. In the same block you find the Vietnamese accented and double-accented vowels. Of course, you would ignore them if Vietnamese support is out of your target. This Unicode block also adds several precomposed combinations to support Indic, Hebrew and Cyrillic transliterations. Again, you must chose if transliterations would be supported.

A similar definition must be made about Phonetic alphabets, currency and letterlike symbols. This will take some research, but you will not include unnecessary glyphs.

Some basic information about the blocks you did mention:

01. Latin B has mostly glyphs to African languages, but also mix rarely used European glyphs and Pinyin transliteration.

02. Latin C brings glyphs to Cyrillic transliterations and old African orthographies. 2C6D and 2C70 are needed to complete African support, the remaining probably are out of your scope.

03. Latin D has many Medieval additions and support for old orthographies. Probably the whole block is unnecessary to you.

04. Latin Additional is a mess. Define the languages and look in detail this block.

05. IPA and Phonetic block. Just needed to phonetic support.

06. Super and subscripts. Add the whole block.

07. Currencies. The very basic set is: dollar, cent, pound, yen, currency (generic) and euro. An improved set also includes thai, cólon, naira, won, new sheqel, kruvinia (hryvnia), tenge, new rupee (20B9) and tugrik. Most of others are historical.

08. Letterlike. nº, liter, estimated, TM, and Ohm are basic. Others may be included or not according to your scope, but are not essential.

09. Punctuation. Basic: 2002, 2003, 2013, 2014, 2018 to 2022, 2026, 2027, 2032, 2033, 2039, 203A, and 2044,

The thread where I presented my own font project has some good information about this, kindly gave by fellow typophilers. The link above points to where this discussion begun.

Other threads I also begun may be useful:
Unicode and diacritics
Eng and hooked N
Currencies and others
Greek and Cyrillic
Slashed letters

Ray Larabie's picture

I already know to include all the characters in Basic Latin, Latin-1 Supplement, and Latin Extended-A.

Here are few never-gonna-get-used-outside-of-a-historical-context Latin ext glyphs you can skip.

Aringacute 01FA
aringacute 01FB
kgreenlandic 0138
napostrophe 0149
longs 017F

Type Minds's picture

Freiberger: Thanks for your thorough information and useful links. Having read (or at least skimmed through) a dozen or so threads, I am thinking I'll probably cut back on my original plans and stick to a more standard set of characters. Unfortunately, I do not have your skill nor your dedication! (Given how long you've been working on Palimpsest, I'd probably need at least ten years to do a project that size!)

Fortunately, there seem to be plenty of good examples for "semi-extended" character sets in Latin, Greek, and Cyrillic. And if I ever become ambitious enough to attempt full Latin support, I will certainly look to your Palimpsest and Mr. Stötzner's Andron to get a better idea of the glyphs I would need.

typodermic: Thanks for the heads-up on those characters.

Igor Freiberger's picture

For a "semi-extended" character set, the one from Minion Pro seems to be excellent –and already a huge project. Just note the position used in Minion for Vietnamese circumflex+grave (e.g., 1EA6) is not the preferred one.

Thanks for your kind reference to PProject. Actually, it did born much smaller, but the curiosity did not let me stop at a proper size. It would be more logical to start with a narrow target and slowly make it wider –as you plan to do.

You already identified a wonderful source of research: Andron is a masterpiece and its support (including a large medieval set still not present in Unicode) is the result of a deep knowledge. Other good source is Gentium, from SIL.

Hope to hear good news about your font in a near future.

Type Minds's picture

I'll definitely look into Minion and Gentium (I think I have both of those around here somewhere). I'll post some more once the actual design has gotten underway.

Pomeranz's picture

define which languages you want to support and then include the proper characters

Therefor check this:

Ray Larabie's picture

and . . .

01FC AEacute
01FD aeacute
01FE Oslashacute
01FF oslashacute

Frode Bo Helland's picture

The Saami people still use oslashacute!

Type Minds's picture

Obviously I'm going to have to do a bit more research than I had expected when I began this journey - so thanks, everyone, for your advice.

By the way, I started a new thread here sharing several Fontlab-compatible encodings and codepages I compiled for handling large character sets. There are Adobe Latin-3, -4, and -5 encoding (.enc) files, and some double-byte codepages (.cpg) that support whole Unicode planes.

Freiberger, I'd be interested to know how much of (or more than) Adobe Latin-5 your current project covers.

kentlew's picture

> The Saami people still use oslashacute!

Yes, didn’t we just have someone on these forums asking how he could achieve an /oslashacute/ because it appeared in a name in a magazine and his font didn’t include this glyph?

Ah, here it is:

Frode Bo Helland's picture

How she ...

kentlew's picture

[ Frode — Really? I thought Riku was a masculine name. But you would know better. My mistake. ]

Frode Bo Helland's picture

I know a girl named Rikke (his/hers nickname), so I just made that connection.

Igor Freiberger's picture

Freiberger, I'd be interested to know how much of (or more than) Adobe Latin-5 your current project covers.

The font includes the whole Adobe Latin-5 plus additional glyphs to support less known languages (African, native American). A number of tricky diacritical combinations were also added as precomposed glyphs (for example: letters with line below, 0332, where the line is variable to fit the letter width).

If you plan to take Latin-5 as a reference, note that one third of that table refers to IPA/APA alphabets. If you cut phonetic support in your first version, it becomes a lot easier.

Type Minds's picture

I'm not planning on Latin-5 support, but as you have said, curiosity may well get the best of me and I'll wind up finishing it off anyway. Thanks again!

Ray Larabie's picture

Frode: I couldn't find a Sami language where oslashacute was still in regular use. I'd be interested in learning more about this.

blank's picture

Maybe oslashacute is another theoretical Danish accented letter like Aringacute.

Frode Bo Helland's picture

Ray: Have a look at that linked thread! Also, when my type buddy and lingomaniac Sindre is back from his holidays, I’m sure he can elaborate. (I also suspect he’s got a thing or two to say about those language support lists most foundries have.)

Michel Boyer's picture

While you wait for Sindre, you can read comments he and others made on that subject in an earlier thread

Frode Bo Helland's picture

Yeah :) That's about how our tiny typostammtisches in Oslo play out.

Syndicate content Syndicate content