Fonts language support

goloub's picture

Hello everybody, I have a question and I can't find an answer in the internet.

I'm in a process of designing a typeface - TT Nord. It's not the final design by that link. Anyway, what bothers me is that I've drawn many glyphs, including Cyrillic extended and Latin extended a and b and others. Is there any way to know what languages does my font support? Maybe a script or some kind of a web service?

Thanks in advance?

ChristTrekker's picture

I don't know of any automatic way to do it. Your face supplies the glyphs for scripts (Latin, Cyrillic, etc), of which various subsets are used to write various languages. It would be nice to know that language Foo uses all of SomeBlock except Bar, with the addition of Baz and Qux from AnotherBlock. That would make it easy to say, "My font provides full support for language Foo", which it going to be more relevant to most people than saying, "My font provides full support for the Latin script". This information may be partially compiled somewhere (e.g. "Latin alphabets" on the English Wikipedia), but it's a manual verification process.

Igor Freiberger's picture

There is no such tool. I did a wide research about languages and alphabets between 2010 and 2013 for my own font project. Unhappily, no source is complete. If you have interest, here are some good information and sources:

The Wikipedia article about Latin-derived alphabets is the nearest single source I know. For deeper information, you can take a look on each letter and language article published in Wikipedia. Each letter article brings a table with diacritical combinations.

A good place regarding Latin script is this post from Adobe's Typblography. You may also find useful this site about diacritics.

Regarding transliterations, there is an excellent archive made by Thomas Pedersen. It's almost hidden in the Estonian Language Institute site.

You can also consider to do the contrary: define which languages you want to support and then include the proper characters. Unicode blocks are heterogeneous and the Latin ones brings together glyphs needed to less common European languages besides Asian languages, Medieval characters and even Roman signs.

For example: Latin Extended Additional includes 1E9E, the German uppercase double S. It seems ok to include that. In the same block you find the Vietnamese accented and double-accented vowels. Of course, you would ignore them if Vietnamese support is out of your target. This Unicode block also adds several precomposed combinations to support Indic, Hebrew and Cyrillic transliterations. Again, you must choose if transliterations would be supported.

A similar definition must be made about Phonetic alphabets, currency and letterlike symbols. This will take some research, but you will not include unnecessary glyphs.

Some additional data about Unicode blocks:

  1. Latin B has mostly glyphs to African languages, but also mix rarely used European glyphs and Pinyin transliteration.
  2. Latin C brings glyphs to Cyrillic transliterations and old African orthographies. 2C6D and 2C70 are needed to complete African support, the remaining probably are out of your scope.
  3. Latin D has many Medieval additions and support for old orthographies. Probably the whole block is unnecessary to you.
  4. Latin Additional is a mess. Define the languages and look in detail this block.
  5. IPA and Phonetic block. Just needed to phonetic support.
  6. Super and subscripts. Add the whole block if you want to keep up to date with contemporary type trend.
  7. Currencies. The very basic set is: dollar, cent, pound, yen, currency (generic) and euro. An improved set also includes thai, cólon, naira, won, new sheqel, kruvinia (hryvnia), tenge, new rupee (20B9) and tugrik. Most of others are historical.
  8. Letterlike. nº, liter, estimated, TM, and Ohm are basic. Others may be included or not according to your scope, but are not essential.
  9. Punctuation. Basic: 2002, 2003, 2013, 2014, 2018 to 2022, 2026, 2027, 2032, 2033, 2039, 203A, and 2044.

Here in Typophile you will find a number of quite informative threads. Some:
http://typophile.com/node/67458
http://typophile.com/node/84563
http://typophile.com/node/77983

The thread where I presented my own font project has some good information about this, kindly gave by fellow typophilers. The link above points to where this discussion begun.

Other threads I also begun may be useful:
Unicode and diacritics
Eng and hooked N
Currencies and others
Greek and Cyrillic
Slashed letters

charles ellertson's picture

Wonderful resource, thanks!

Igor Freiberger's picture

I am glad to know you find this useful, Charles. Here are other sites which helped me during this research:

Atlas of Languages
Ethnologue
Etnolinguistica
Evertype
LanguageGeek
Linguasphere
Omniglot
Script Source
Endangered Languages
MultiTree
Language Exploration

goloub's picture

Thank you Chris for your advices and Igor for such wonderful resources, they will help me a lot, definitely!

I want to fill an almost empty niche of supporting minor languages, at least of ethnic minorities, particularly those in Russia.

Forgive me my ignorance, but on MyFonts.com each font has supported languages specified. Since I've never submitted to MyFonts yet, I'd like to know is it done automatically, or each author describes language support manually?

wollmersdorfer's picture

There is no such tool, and such a tool would never be complete.

In addition to the links posted by Igor Freiberger:

- the eki-letter database (oldish, but still good): http://www.eki.ee/letter/

- unicode cldr has a list of exemplar characters per language (base, extended, punctuation etc). You can read it from the database or browse:

http://cldr.unicode.org/cldr-features
http://cldr.unicode.org/index/charts
http://www.unicode.org/cldr/charts/dev/by_type/index.html
http://www.unicode.org/cldr/charts/dev/by_type/core_data.alphabetic_information.html

Unicode cldr only contains information about the most used languages, and their "official" characters in current orthography or typography. E.g. there is no information about more exotic languages like Yiddish. Old orthography like long-s (most European languages in 18th century) or LETTER U WITH SMALL E ABOVE (used in German instead Umlaut U+DIARESIS) is not supported.

Igor Freiberger's picture

Forgive me my ignorance, but on MyFonts.com each font has supported languages specified. Since I've never submitted to MyFonts yet, I'd like to know is it done automatically, or each author describes language support manually?

Maybe they use an internal tool or rely on information sent by the designer. But note that MyFonts does not identify all languages supported by a given font. The advanced search just offers 13 group of languages, most based on scripts. In the technical info they list common character sets, again related to a handful of languages.


I want to fill an almost empty niche of supporting minor languages, at least of ethnic minorities, particularly those in Russia.

Begin here: Wikipedia: Languages of Russia

I am doing exactly this with my font projects, besides supporting Cyrillic variations to Bulgarian and Serbian. AFAIK, there are just four or five typefaces with this kind of support, and none of them is really goood.

Unhappily, when I begun to do this, I do not registered my findings in an organised way –just made a large table with needed glyphs and some notes, but didn't listed glyphs by language.

Some less known minority languages are poorly documented and it is really difficult to find data about the proper way to design their glyphs. But as a native Russian, you are entitled to find much more than I did.

This would give an idea (sorry for the large image):

I know, quite a strange goal to a Brazilian, but I am addict to Cyrillic. If you want further contact, can reach me at contato ( ) if . pro . br.

Thomas Phinney's picture

“There is no such tool.” There are actually several such tools!

Fontaine was first released in March 2009. http://fontaine.sourceforge.net/

Speakeasy in October 2010. https://github.com/typekit/speakeasy

Neither had all that impressive a data collection, so I drew on both those sources and added a LOT more data at Extensis. We released that data in April of last year, download it here:
http://blog.webink.com/custom-font-subsetting-for-faster-websites/

It is not perfect, but it covers over 150 languages and character sets. We use the data internally with a proprietary scanning tool.

I do agree with Igor that “no source is complete,” and neither is my data. But it is considerably better than nothing. Most type designers just want to figure out either what characters to add to cover a given language, or what languages are supported by a given character set, and have a one-stop answer to the question. For the languages covered by this data—and there are a lot of them—it should be reasonably solid. Where I am suspicious of the data source or have not done sufficient cross-checks, that is noted in the file.

ChristTrekker's picture

Thomas - thank you for this information! I'd never heard of these. I wish the fontaine and speakeasy sites were a bit more verbose, and screenshots would be helpful.

—CT

goloub's picture

Thank you Thomas, those are precious links!
However, I'd agree with ChristTrekker, I wish I knew how to use those tools.

ChristTrekker's picture

I am now including fontaine-generated language reports on my individual font pages. I wonder if Mr. Trager would import your data set to improve its output. BTW, www.unifont.org/fontaine is better for learning about it than the SF page.

Haven't looked at speakeasy since I don't do ruby.

—CT

Thomas Phinney's picture

I have heard from a couple of folks who said they were interested in doing such a thing. I have started pinging, it would be good to make that happen.

Thomas Phinney's picture

Actually, it seems Dave Crossland integrated our data file into PyFontaine a while back. He has already integrated the newest one as well. :)
https://github.com/davelab6/pyfontaine

This is a Python re-implementation of Fontaine. However, it does not currently use the Extensis data as broadly as it could/should. I pointed that out and I gather it will be updated pretty quickly. I gather there is a lot more happening on PyFontaine in the last few years than on Fontaine proper.

Richard Fink's picture

fontaine
pyfontaine

This is a little far afield, I know, but does anybody here remember Frank Fontaine?

Betcha Dezcom does.

Thomas Phinney's picture

PyFontaine has been updated to use the data in all the appropriate ways. Very slick!

Michel Boyer's picture

Thomas

I had a look at your xml file. Nice initiative! The number of languages you cover is impressive.

However your code for French (my mother tongue) does not appear to match what I read on your blog. If the characters listed in subsetting-codepoints are considered as accessory, then you have put there characters that are so frequently used in French that they are on the French azerty keyboard as well as the Canadian French keyboard, namely çèàùé (for the others you need to use a dead key); they are certainly not accessory.

The French alphabet is very clearly listed on the German Wiki Französisches Alphabet as well as the French Wiki Alphabet français, where characters specific to French (to be added to what you call "English") are listed separately.

So, the small characters needed to be in the font to contain the French alphabet (and my understanding is that they should appear in scanning-codepoints similarly to what was done for Portuguese) are (using intervals)

0x00E0,0x00E2,0x00E6-0x00EB,0x00EE,0x00EF,
0x00F4,0x00F9,0x00FB,0x00FC,0x00FF,0x0153

The corresponding capitals are

0x00C0,0x00C2,0x00C6-0x00CB,0x00CE,0x00CF,
0x00D4,0x00D9,0x00DB,0x00DC,0x0178,0x0152

That list assumes, of course, that if parent is English then the scanning code points of English are inherited.

Michel

Thomas Phinney's picture

Thanks, I think you are the first non-Extensis person to point out a bug in our file!

French is one of the ones we inherited from one of the other data sources. Clearly it had some significant issues. :( I have updated it for the next release.

Michel Boyer's picture

Thomas,

I noticed a few other things

In Cyrillic you list 108 characters including 65 from Russian.
Why not just list the 43 new characters that are added to Russian since Russian is parent?

For Ukrainian, you use Cyrillic as parent; 8 of the 9 characters you list in the scanning-codepoints for Ukrainian are already in Cyrillic. Why not choose Russian as parent for Ukrainian?

In the languages Tajik, TalyshCyrillic, TurkmenCyrillic, Tuvan, Udmurt you have Cyrillic as parent-name; the attribute "parent-name" is used nowhere else. Was that intended? Maybe parent should have been Russian (or Cyrillic)?

The scanning codepoints you list for Urdu are those I see in the file src/orthographies/Urdu.h of the fontaine sources. However the fontaine file fontface.cpp contains the code

   // Arabic:
   //
   if(_checkOrthography(Arabic::pData)){
           _checkOrthography(Farsi::pData);
           _checkOrthography(Urdu::pData);
           _checkOrthography(Kazakh::pData);
           _checkOrthography(Pashto::pData);
           _checkOrthography(Sindhi::pData);
           _checkOrthography(Uighur::pData);
   }

I would personally have put Arabic as parent for each of those languages before checking any further.

Michel

Thomas Phinney's picture

Thanks for all the feedback! It's great to have somebody else actually looking at this.

These sound like good improvements (and in some cases outright bugs). I'll do a review of the file next week and issue a revision. :)

dezcom's picture

Crazy Guggenheim

Syndicate content Syndicate content