Typeface testing: typical bits?

Andreas Stötzner's picture

I wonder if anyone has thought about compiling typical syllables or letter combinations of different languages, for the purpose of testing fonts. E.g.: keit sch ung for German; or ough for English or ghi for Italian.

Here is a most humble beginning:

  • Deutsch
  • auch tz lich keit sch ich ver ung ehr

  • English
  • whe sh ough sty our thr ly the yth ould ing

  • French
  • eaux onne arde esse ente ndre

  • Italiano
  • ace ghi ghe chi che sc iamo enno ano azza ezza nza

    – contributions?

    nina's picture

    Very good.
    I have to wonder if instead of trying to think of these off the top of our heads, this sort of data (frequent/«typical» letter combinations) should/could not better be retrieved from linguistic databases. I seem to remember linguistic data for frequent two-letter combinations (Hrant?) and of course frequent words; I can't see a reason why this shouldn't also exist (or be retrievable) for combinations of 3 or more characters.

    Andreas Stötzner's picture

    eason. :-)

    Personally, I’m not in the position of retrieving things from such databases.
    But maybe someone else is?

    I’d like to have it not only for the major languages. Also for Czech, Icelandic, Greek, ... any.

  • Latin
  • ad sub super com que tion bus

    eliason's picture

    My impression is this is what the Just Another Foundry Text Generator does. Options for Arabic, Persian, Hebrew, Greek, Russian, Czech, Welsh, Danish, German, English, Spanish, Finnish, French, Icelandic, Italian, Latin, Dutch, Polish, Portuguese, Romanian, Swedish, and Turkish.

    riccard0's picture

    I’m not sure exactly what do you intend with “typical syllables or letter combinations”.
    If it’s typical as in “most frequent”, as Nina suggest, the database approach would be a sensible one (maybe starting here http://www.onelook.com/).
    If it’s typical as in “less found in other languages”, it could become a difficult task.

    As for Italian, one example of the latter type is "gli" (and also the common use of "l’" and "L’" followed by any uppercase or lowercase vowel, but that’s a different problem, I suppose).

    Edit: I looked at the generator Craig linked to and think it would be less effective than an actual (good) text, at least for Italian.

    hrant's picture

    Andreas, linguistic frequency data tends to be quite plentiful.
    But if you want to compile a list of "tricky" sequences you'd
    have to be more inventive - like assign a trickiness level to
    each character and write code to go through all the frequent
    sequences and order them according to the total trickiness value.


    Andreas Stötzner's picture

    The “Just Another Foundry Text Generator” is indeed a funny and noteworthy tool! Thanks for sharing. To my great distress it does not allow to mix Latin with Greek or Cyrillic.
    Anyway, it reveals not quite what I am looking for.
    I try to be more clear. By “typical” I do not mean “most frequent”, but rather “most characteristic”, like heit lich ung in German, for instance.
    Maybe, it’s a similar question like that for actual useful ligatures …

    blank's picture

    Just compile the same paragraph(s) in a wide selection of relevant languages from the UN Universal Declaration of Human rights. I test with thirty+ languages that way.

    hrant's picture

    But of course that document has not been authored
    to highlight characteristic strings... It would be nice
    to have what Andreas describes.

    BTW, perhaps a more practical document than the UN
    declaration is the small warning fold-out that comes
    with pretty much any Lego set - it has around the same
    number of languages and features the type of text that
    one is more likely to run into! One would however have
    to re-type all of it...


    froo's picture

    ch, cz, rz, dz, dź, dż, sz
    ść, ąc, ąć, dzi, szcz, cy, ły, wy, wz, ył, yt, zy
    cki, icz, nic, nie, ski, wan

    (Polish syntax is too rich to to provide both short and characteristic list).

    I doubt in finding characteristic sets in all kinds of manuals (here I include the OUN Declaration) because of (1) repetitions, (2) formal language, (3) foreign words.
    Here you have an example of such Polish text, where every fourth word comes from Latin (formal use) and English (technical loan):
    technologie, konfigurowanie, telefonu, laptopa, telewizora, internetowej, oprogramowanie, funkcję, minimalizmem, skomplikowane, aplikację, menu, projektowanie, humorem, praktycznymi, cytatami, organizowania, interfejsy.
    I suppose you understand the topic more or less, don't you?

    Andreas Stötzner's picture


    – ?

    docunagi's picture

    Few years ago Typotheque released a software called "LetterFrequencyMeter" which did the job pretty well. You could instantly view which were the most used phonemes used in a text. If I remenber well you could choose the length of the phonems and analyze any text (I did some tests with some typical texts from different languages - Shakespeare/english - Balzac/french - Brecht/german…). But it is not Lion compatible :(

    Syndicate content Syndicate content