Celeste's picture

Maybe this discussion has already taken place somewhere on Typophile, but I was wondering : which language using the Latin alphabet as its normal writing system (i.e. not romanized Eastern languages such as Vietnamese, Thai, etc.) uses the greatest number of diacritics ?

JanekZ's picture

Slovak 19
Niderlandish(?) 17
Portuguese 16
Czech and French 15
(A. Tomaszewski, Leksykon pism drukarskich, Warszawa 1996)

Florian Hardwig's picture


Is that supposed to be Dutch [Nederlands, in German: Niederländisch]?
If so, this number is probably due to the fact that every vowel can have an acute to show emphasis: á, é, í, ó, ú, but also íj (Erik van Blokland has designed a special j acute for Eames, see page 9 of this manual pdf); plus capitals. I don’t know how widely used accentuated caps are in Dutch – if at all. And Dutch spelling uses the trema/dieresis, in words like ‘Indië’.

clauses's picture

They are used to signify sounds that are not covered by the Latin (language) alphabet. One example is Czech:

Igor Freiberger's picture

Tomaszewski's work is wrong about Portuguese.
There are 13 diacritics in Portuguese (áàâãçéêíóôõúü).
In Brazil, there are just 12 now as ü was cut off since 2009, when an ortographical reform took place. This reform is still not implemented in other Portuguese-speaking countries.
I believe it's also wrong about French (14) and Slovak (17).

JanekZ's picture

Thanks Igor, good to know. One way or another Slovak is the leader:
(Slovak) á ä č ď é í ĺ ľ ň ó ô ŕ š ť ú ý ž (17)
(not native, used in names) + české hlásky (Czech) ě, ř, ů + nemecké hlásky (German) ö, ü (+5)

Celeste's picture

OK. Thank you very much everybody.

Jongseong's picture

Be careful; Vietnamese uses the Latin alphabet as its normal writing system.

Nick Shinn's picture

Number of accented characters in the alphabet is not the same thing as number of accented characters that occur in text.


From a type design perspective, Slovak is far and away the most demanding on spacing/kerning, due to d-caron, l-caron, and t-caron and i-acute.

Interestingly, Hungarian is not that demanding, because the hungarumlaut is "acute", so may be kerned the same as the plain letter when following T, V, W and Y.

Thomas Phinney's picture

"(i.e. not romanized Eastern languages such as Vietnamese, Thai, etc.)"

I wonder if perhaps this comment reflects a misunderstanding? The standard writing system for Vietamese is Latin with diacritics. Thai (like Korean, Japanese, and Chinese) has a standard writing system that is non-Latin, so their romanized forms are less widely used.

Though, that being said, the demand for the diacritic forms needed for transliteration of Chinese will only be rising, and not all of them are available in common character sets.



Igor Freiberger's picture

Thomas, do you mean that Pinyin actually needs more characters that these defined in Unicode or there are other transliteration systems under growing adoption?

Thomas Phinney's picture

I just meant that many common extended-Latin character sets, such as WGL-4, do not have all the characters needed for Pinyin transliteration. There are a couple of semi-obscure things in there not needed by much of anything else, unfortunately.



jcrippen's picture

The Canadian orthography of Tlingit uses 17 letters with diacritics, all on vowels: ąą́ą̀ą̂áàâéèêíìîúùûÿ. The ones with ogoneks are not terribly common, though they do occur in the word «mą̀» or «mą̂» ‘how’.

There are probably quite a few other minority languages which have large inventories of diacritics, although I don’t know if anyone has done a survey of any kind.

Igor Freiberger's picture

Thomas: thanks for the info.

James: after you message in the topic about the font I'm developing, I did some research about languages which use Latin alphabet. Although Celeste's original question seems to be about European languages, this info may be useful.

These are the ones with more diacritics, besides Vietnamese, Tlingit and Slovak:

Apache uses 16 characters with diacritics, plus the glottal stop mark (').

Sami uses 17 if you sum up all the language variations.

Navajo uses 18 and also glottal stop mark.

Twi seems to use 18 too, but the info I found is not complete.

Kolkata is not a language but a transliteration scheme for Indic languages (actually the most widely adopted scheme). It uses 20.

dezcom's picture

"Slovak is far and away the most demanding on spacing/kerning, due to d-caron, l-caron, and t-caron and i-acute."

Yes, Nick, kerning would be much easier if we did not have these protrusions into the x-height :-) Add to that all of the i diacritics which don't have enough horizontal space to prevent overlap of neighboring diacritics and you have a "crash course" in diacritical abutment :-)


Kristians Sics's picture

Well, Latvian uses ā, ē, ī, ū, ģ, ķ, ļ, ņ, č, š, ž and also used to be ŗ. So not more than 12. But in blackletter era (before 1930) there was a longs with a slash!

Nick Shinn's picture

Igor, Sami is a European language.

Igor Freiberger's picture

Thanks Nick. I believe Sami is Santa Claus' mother tongue, no? :-)

Jchthys's picture

Note that while Esperanto uses only six (ĉ ĝ ĥ ĵ ŝ ŭ, plus their uppercase equivalents), five of them occur in no other language.

Nick Shinn's picture

Thanks Nick. I believe Sami is Santa Claus' mother tongue, no? :-)

Joni Mitchell has Sami heritage, but AFAIK hasn't written any songs in the language.

Jongseong's picture

Funnily enough I was just watching a video of 'Čáhcceloo', ABBA's 'Waterloo' sung in Northern Sami by Sofia Jannok, quite beautifully I might add.

John Hudson's picture

Gotta add that to my list of surreal cover versions that are strangely better, or at least more interesting, than the original. Thanks, Brian.

WType's picture

"Though, that being said, the demand for the diacritic forms needed for transliteration of Chinese will only be rising, and not all of them are available in common character sets."

There are only 4 tones to all chinese characters. ā, á, ǎ and a`. It's quite simple to design but you are right that most common characters set don't have them.

These 3 - "ā, á, ǎ" I cut and paste from wikipedia (http://en.wikipedia.org/wiki/Romanization_of_Chinese. i wonder what font they use.) but I can't find "a`" on my keyboard, so I basically type a combination of "a" and " ` " with poor kerning.

FYI, no one read the chinese language in its romanized version. All publications are written in original Chinese alphabets. It's not just "less widely used". It is wrong to be used. To typeset an article in romanized version is almost like a sin to the Chinese language and you don't write Chinese in romanized version at all. The romanized version is only for the purpose of pronunciation, designed as a simplified version to replace the original "Han Yi Pinyi" system, which uses it's own unique set of alphabets and is very complicated. It's almost like a language by itself and I don't think it's qualified to be called diacritics at all.

WType's picture

Contrary to the Chinese language, the Malay Language, (Bahasa Malaysia) which is the official language of Malaysia, Singapore and Indonesia (In Indonesia is called Bahasa Indonesia) is entirely written in romanized version.

The entire set of the English alphabet A-Z are sufficient to form all Malay words including it's phonatary function, so no diacritic is needed. The original Malay language was in "Jawi", a combination of arabic and Persian scripts, with additional letters added as an adaptation to local phonetics. Under the influence of Dutch and British, the language was romanized started from 17th century and today all official documents, publications and school text books uses the romanized version of the Malay language. Very few people can read the original Jawi, except in the context of Islamic literature. The original Jawi is thus facing the danger of extinction today. Most students in Malaysia today don't even know that the original language of Malay is in Jawi. It's not even being taught in the school at all.

Syndicate content Syndicate content