Accented characters -- actual possible positions in words

Mark Simonson's picture

Does anyone know of a resource that has information about where in words in various languages accented letters are possible (or most likely) to appear? Alternatively, lists of words in different languages would help me work it out myself.

For example, as far as I know, Ñ and ñ never fall at the start or end of a word in any language. So, if I'm making alternate glyphs that are meant only to be used at the start of end of a word, there is no reason to make such alternates for those two characters.

The "easy" thing would be to proceed as if every accented character could appear in any position in any word in any language. My problem is that the font I'm working on has a large number of alternate characters, and adding accents to all of them will increase the glyph count dramatically, a big chunk of which may not ever be used or may not even be possible to use in a normal word. I don't want to include glyphs like that if I can help it.

blank's picture

I recently came across the OneLook online dictionary which combines numerous online dictionaries and word lists to provide a wild-card searchable dictionary European languages and Asian transliteration. Just be careful as it is case-sensitive.

This link goes to list of diacriticals at the beginning of, inside of and at the end of real words. It includes every position I could find using online dictionaries and lists of place names. It does not show a breakdown by language (although you could probably cook up a screenscraper to compile that) but none of them are from obscure languages (although some place names are obscure). The character set I use is a subset of one of the pan-european FontLab on Steroids encodings without Esperanto characters or the dot accents for the old Irish words.

The list does include some transliterated words from Asian languages, but if those don’t concern you they aren’t hard to spot.

Mark Simonson's picture

Thanks, James! OneLook looks perfect for what I'm trying to figure out, and your compiled list looks good, too.

DTY's picture

Beware of oddities like loanwords from non-European languages. For instance, there's a journal of Andean archaeology called Ñawpa Pacha (ñ can occur at the start of a word in Quechua).

Cristobal Henestrosa's picture

For example, as far as I know, Ñ and ñ never fall at the start or end of a word in any language

Ñ and ñ can start words in Spanish: ñoño, ñandú, ñáñara.

Mark Simonson's picture

Thanks, guys. Good to know.

Theunis de Jong's picture

James, in case anyone doesn't have your InDesign CS3, here is the list as plain text. (Only the lowercase strings, for clarity.) It sure does look like avant-garde poetry -- are you sure all of these exist? ;-)

(Oops -- removed all caps and deleted all Cap I dot entries. Sorry.)

À à àrbatax unità
È èdre accès esistè
Ò òglach eòrna comò
Ì ìquāc izglìtìbai colorì
Ù ùikhí dlùth nandù
Ẁ mẁg

Á ábyrgd vzdělání abucheará
Ǽ ǽþryttan þrǽd
Ć ćwiczyć działalność życzyć
É élémentaire Örvényes émigré
Í írskur ólafsvík vzdělání
Ĺ stĺp gĺ
Ń ńdízídígíí armiński przyjaźń
Ó ónýtur cabhróidh aggódó
Ǿ gǿr
Ŕ vŕba
Ś śana kaścit kubuś
Ú ústrojí íbúgvi gairmiúil
Ẃ gẃraidd
Ý ýla þýða každý
Ź źródło gwiździny łabędź

 âgée jâsekmè hâlâ
Ê être albanês xanxerê
Î învăţătură
Ô ôsakajô côté cocô
Û ûntstean brûlée kyûshû
Ŵ ŵy malaŵi halibalŵ
Ŷ llŷn tŷ

Č češčina počátečních kabeláč
Ďď ďačovĕwiri horažďovice odpověď
Ě ěn vzdělání hřiště
Ľ ľubiša cigeľ hhľhhľhh
Ň ňadro drahňov píseň
Ř řehoř bedřich akcionář
Š širok vyšší botoš
Ťť ťuhýk liešťany byť
Ž živa hadžić paříž

à ãnveastã educação nsulã
Ĩ anyĩtsi tu’ĩ
Ñ ñukung piñata toruñ
Õ õhtu camões fernándõ
Ũ ngũgĩ kĩkũyũ

Ä äpäräpä täielikule öistä
Ë është drieërlei danaë
Ï ïdar aïoli sinaï
Ö öğrenim göre enkö
Ü Ürdün özün türünü
Ẅ cyẅres
Ÿ l’haÿ-les-roses ysaÿe croÿ

Å åbenrå reliåiskajâm missförstå
Ů hnůj dolů

Ā āustā jābūt dhāñgā
Ē ēower spējām valdē
Ī īwake izglītību kurzī
Ō ōcēlōtl ahōge ōshima
Ū ūsai jābūt zhīzhū

Ă ăpūs învăţătură Şopârlă
Ĕ ĕwiri ďačovĕwiri muľerĕ
Ğ ğínwar öğrenim samandağ
Ĭ ŋăĭn dundgovĭ
Ŏ ŏgang kaŏshì okchŏ

Ċ ċek þeċċan iċ
Ė ėlovar sugebėjimus kernavė
Ġ ġimgħa dryġan żebbuġ
İ İGUS AİLE GİBİ
Ŀ ŀ coŀlecció
Ż żiemel bełżyce nieśwież

Ő őrült előtte alapvető
Ű szűcsvár fésű

Ç çakalle educação gülüç
Ş şansă priveştes beliş
Ţ ţiteră învăţătură ţinţ

Ą chorąży mokslą
Ę będzin proszę
Į bįįh jį
Ų sųłiné lygmenų sųłiné lygmenų

Ģģ ģibuļi reliģiskajām
Ķ ķīpsala miķeļi
Ļ ļussery itāļmō nädīļ
Ņ pulksteņiem
Ŗ bōŗ läpš
Ș șerbu dușman mureș
Ț țicău crețu ganț

Đđ đurđevac anđelak Đuveđ
Ħ ħaż ngħalla miftuħ
Ŧ ruoŧŧa

Æ æghwylc onwæcnan nebulæ
ı ırımşık aydın çalğı
Ŋ ŋgìlìŋ aŋan gúraluŋ
Ø ørret højere kaysø
Œ œthel bœuf ashlœ
Ł łebcz człowiek kościół
Ðð þrúðheim aðgerð
Þ þilfar íþróttir mæġeþ

Alex Kaczun's picture

Wow... this list of accents words (usage) is great.

Just what I was looking for, now that I'm incorporating all the extended latin accented glyphs into my new font releases.

Many thanks.

JanekZ's picture

I suppose that "toruñ" (ntilde on the end) is miswritten Toruń (nacute).
Unluckily ñ and ń shared the same encoding in different old fonts.
Another correction: Ć ćwiczyć zaćma życzyć
James: excellent work!

JanekZ's picture

Xx (AlwaysCapitalised/lc) (middle) (end) [language]
- never in this position
? to be filled up

Ćć Ćmielów/ćwiczyć zaćma życzyć [polish]
Ńń -/- ormiański przyjaźń [polish]
Ńń ?/ńdízídígíí ? ? [?]
Óó ?/ónýtur cabhróidh aggódó [?]
Óó -/ów żółw - [polish]
Śś ?/śana kaścit ? [sanscrit?]
Śś Śliwiński/ściana wiśnia kubuś [polish]
Źź -/źródło gwóźdź łabędź [polish]
Żż ?/żiemel ? ? [lithuanian]
Żż Żurawski/żuraw łyżka pawęż [polish]
Ąą -/- chorąży ręką [polish]
Ąą ?/? ? mokslą [lithuanian]
Ęę -/- będziemy proszę [polish]
Łł Łeba/ławka człowiek kościół [polish]

It would be great to attach revised list in the first topic...

clauses's picture

Here are some Danish combis:

æÆ æble Æble, birketræ
øØ østers Østers, fæstemø
åÅ åbenhed Åbenhed, afslå

cuttlefish's picture

I'm digging through the Typophile archives to find lists such as this and others to test kerning pairs. I know there have been others over the years, but the Google searches are not proving as effective as I hoped. If anyone remembers where the other ones are, please point me in the right direction.

cuttlefish's picture

Sorry, doubleposted due to server errors

filip blazek's picture

Few corrections:

Ď ďábel horažďovice odpověď (the first word did not exist)
Ě xx vzdělání hřiště (at least in the Czech language the words cannot start with ě)
Ľ ľubiša cigeľ poľovník (the last word did not exist)
Ů xx hnůj dolů (in the Czech language the words cannot start with ů)

Birdseeding's picture

And one Hungarian correction:

Ű űrhajó szűcsvár fésű

(The confusion that ű does not appear at the beginning of words is probably due to its relative rarity, mostly in words with űr (space) or űz (pursue) in them. It's normally concatenated with ü in dictionaries as well, making it harder to check.)

Syndicate content Syndicate content