Fontmakers: How do you decide upon character coverage?

Andreas Stötzner's picture

Starting from questions in this thread I’d like to continue this issue seperately.

1“In these days, I think one can reasonably expect Latin fonts of a certain kind supporting all Latin-related languages. Why shouldn’t it?”

2“Can you clarify what you mean by 'of a certain kind'?”

I mean fonts which obviously come and appear as being text fonts, typefaces for setting body text.

… where exactly (…) to get information about the necessary glyph repetoire?

*That* is the point. I think: nowhere. For myself, I know nothing of relying on my own research. I started whith default codepages. Then I took the impressive UDHR-in-Unicode text samples for testing the coverage of my fonts. Not an absolutely safe tester, but quite a useful one.

… especially those with emerging orthographies would be IMHO very difficult.

Yes of course they are. You can’t set up a fixed schedule for an orthography still evolving. But such cases hardly matter here, i.m.h.o.

… but with ~4000 languages spoken on earth, no designer is going to be able

Slowly, slowly. We talked about the field of Latin-written languages. After all, that field is quite comfortably mapped, with few risk of omitting essentials.

… a perfectly reasonable position to take, but it still excludes a great many latin-based languages. My point is simply that designers must make some sort of decision of this type, and that decision will inevitably exclude some language which someone else might think should be part of that minimum target.

Well, it’s up to you to exclude or to include. It is not so difficult to set up a scheme that comprises, let’s say, all Germanic, Romanic, Celtic, Slavonic, Baltic, Finno-Ugric, Turk languages alongside with some others to complete the (European) map of Latin writing. And it’s not so tricky to decide upon wether to include or exclude the Vietnamese and/or African/Latin-B blocks … Maybe information about (native) American orthographies are more rare, but it should be possible to assemble this as well.

However, what I think you are thinking about, is actually a question for a kind of standard. A kind which neither UC tables nor default codepages represent. So it’s up to ourselves – a great chance (i.m.h.o.).

blank's picture

For the most part I just trust Tiro/Fontlab and use character sets based on the Fontlab on Steroids character sets minus aringacute.

Jongseong's picture

In my view, for practical reasons, you do need to decide on a rather specific group of target languages and/or specialist needs, not just to determine the characters to cover but the appropriate forms.

I might, for example, decide to make a Latin font specifically for an Asian Studies department at a university. This will mean Vietnamese as well as letters needed for major transliteration systems of Chinese, Japanese, Korean, Sanskrit, Pali, Tibetan, etc. That is already a huge collection of special characters, ill supported by most fonts out there on the market.

In particular, the some characters will require different glyphs depending on the use. The preferred form of the capital Eng will be different depending on whether it is used for Sámi or for African orthographies. The uk should probably be designed differently for Native American orthographies (where it will resemble an 8) than for transitional Romanian (where it will be confined to the x-height and resemble a v and an o stacked vertically). Characters used for the International Phonetic Alphabet will require some unusual treatment (italic double-storey a, a latinized beta). Theoretically, you can provide all possible alternates and hope language tags can be relied upon to choose the appropriate forms, but it's best to have a primary use in mind that you want to support by default.

By the way, to my mind, supporting the current Latin-based orthographies of all living languages of Europe already is quite a specific goal that nevertheless requires extensive character coverage. Thankfully, this is the portion of Latin-based scripts that is comfortably mapped; it also represents all the needs of 99% of likely users of Latin fonts. But you can't just go ahead and claim that this represents "all Latin-based languages".

One important thing to point out: Unicode doesn't try to encode all letter-and-diacritics combinations in used, but encodes them as base letter plus combining diacritics. The precomposed letters with diacritics with Unicode points are those inherited from earlier standards (and are thus heavily biased toward those used in European languages). So it would be incorrect to conclude that the precomposed letters in the Unicode charts represent all documented combinations.

eliason's picture

Anyone have a working link for Fontlab on Steroids?

agisaak's picture

@Andreas,

I think our disagreement stems from the fact that in my original remark ("no font can reasonably be expected to support all [Latin-based] languages") I was using the word *all* in a much more literal sense than I think you are.

To me, supporting *all* latin-based languages would necessarily include not only established, national languages, but also:

• minority languages, including those which have several competing orthographic conventions in use.
• languages which are dead or moribund.
• transcriptional systems used for languages written primarily in non-Latin alphabets, including both those in common use and the more obscure.

Note also that by claiming that we cannot reasonably expect a font to support all of the above I am not disputing that this isn't a worthwhile goal. As a linguist, I am certainly in favour of providing the broadest language support which one can, but I don't think that supporting all latin-based languages is really an attainable goal; the point which I was making was simply that I don't think it is fair to be critical of a design which fails to provide support for some particular language (though omitting major European languages would likely be a bad marketing decision).

Very few fonts on the market currently include obsolete characters such a yogh and wynn. If I were to criticize designers for this omission and they were to acquiesce to my demands and include such characters, I could then criticize them for omitting the various special characters used in Anglo-Saxon abbreviations, or the characters needed for pre-reform Irish, or the characters needed to transcribe Aramaic, or the diacritics needed to represent some language with only seven speakers remaining, none of whom are even literate in the language in question, &c. At some point a designer must declare their character set to be done, despite the knowledge that there are many omissions.

However, what I think you are thinking about, is actually a question for a kind of standard.

There are many standards out there already. I don't see a need for establishing new ones. I think that the question of which minimal character one wants to adopt is a decision for individual designers to make.

André

Frode Bo Helland's picture

*Tracking*

DTY's picture

Thomas Phinney wrote up a very useful summary of Adobe's plans on this a couple of years ago:
http://blogs.adobe.com/typblography/2008/08/extended_latin.html

From my own experience, I can echo what Brian and André have already said: What is needed depends on the intended use. I've had occasion to add precomposed dot-under letters for Semitic transliteration to Latin fonts before, and on one occasion also the denarius and sestertius signs (needed for Latin support, in the literal sense), but none of these are used in any present European language. On the other hand, I really don't expect these characters to be supported by most font designers, since their use is rather specialized.

Nick Shinn's picture

Since I have been making OpenType fonts, I have included the Unicode blocks that cover all the present-day European Latin script languages, plus the cap German double S. That's quite enough characters, if one is including typographic features such as small caps and alternate figure styles, and several weights with italics.

I did produce some much larger fonts, with support for Greek, Cyrillic, and some older languages. For instance, I included the characters from the Unicode block for Old English, but I don't think having merely yogh and wynn (even if in bold italic small caps) in an expensive commercial font makes any sense. If a foundry is going to the trouble of producing characters for a minority user group, then it also has to market to that group, or it's a waste of time, and it may be a waste of time anyway.

What with Adobe and Microsoft creating free fonts for minority users, and educational publishers doing likewise, and members of the academic user groups as well...

Is there a corporate branding market that will support commercial fonts for minority users? (Other than, say, addressing the needs of a corporation operating in the European Union.)

dezcom's picture

My character sets usually run over 800 glyphs without non-latin scripts. Part of this is figure sets and small caps. Drawing the line is hard for me as well. I try to think about the amount of work it might take and problems it might entail vs the number of potential buyers it may bring. I don't look at historic languages that are not used today because none of the fonts I have started seem suitable for those languages. The Latin scripted language that tips the scale for me is Vietnamese. The large number of stacking diacritics causes me nightmares. I would be at a loss for proper testing of the language and don't feel confident enough in the attempt. I have done Greek and Cyrillic and at my age, don't feel I have the years left to go beyond those 2 scripts. I greatly enjoyed my Cyrillic Weekend with Maxim a couple of years ago and spoke Greek as a child so I have at least some confidence in pursuing those 2 scripts.

ChrisL

Andreas Stötzner's picture

@ jongseong:

I might, for example, decide to make a Latin font specifically for an Asian Studies department […] a huge collection of special characters, ill supported by most fonts out there on the market.

– Why?!
Sad to say so, but experience with intersted parties often reveals: they tend to mourn about anything missing “YOUR FONT IS NOT GOOD ENOUGH FOR US BÆÄÄÄH :-( ” – yet they are stubborn enough to complain if some IPA-beta bears the serif on the wrong side (and *they* know it better, of course); however, they stick to lousy laymen’s fonts of poor quality (“free”!), producing awful typography, rather than getting (and paying) a professional for doing it right.
That’s reality, but few lucky exceptions at least give proof of another kind of practice. – As it happens, I met both types of fellows… ;-)

… some characters will require different glyphs depending on the use. The preferred form of the capital Eng will be different depending on whether it is used for Sámi or for African orthographies …

Well put, but your overal approach seems rather depressive to me. What will you do with the Eng when a user steps in and requires both Sámi and African usability of your font –? In my humble opinion, that kind of old-fashioned fractioned thinking ought to get overcome. I understand that we cannot reach a breakthrough solution for such cases on our own at once, but doing fonts *today* has something of a universal aspect, for which we should take care of.

… it's best to have a primary use in mind …

Yes ok ok but here again you’re confining fontmaker’s target preliminarily to very tight boundaries. That is NOT the future.

… you can't just go ahead and claim that this represents "all Latin-based languages"

I can. And all Cyrillic ;-)
If a user of my font(s) happens to spot any essential gap, I’ll fix it within a few month’s time, or faster.

… The precomposed letters with diacritics with Unicode points are those inherited from earlier standards …

yes yes we are already aware of this. It is not the point here. o o o o o o o o

_ _ _ _
@ agisaak:
I am certainly in favour of providing the broadest language support which one can …

I agree that it is an issue to differentiate
– current languages
– older variant orthographies
– extinct languages
– extinct or specialist writing systems
– special transcriptional systems
– local glyph variant preferences
a.s.o.

And yet, where is the problem? – It is just about sorting it out and compile it to the best possible degree, performing improvements on occasion of the next due version upgrade. A bit of thinking, collecting and organisation.
– Should we not be able to manage it?

Very few fonts on the market currently include obsolete characters such a yogh and wynn.

Not that few I suppose, but very few provide such ch.s in a proper design.
– There it is again: the complaint about “something that fails to give me what I want”. But it is a *market* (as you put it), so you’re likely to get what you want, perhaps within 48 hours. There are more people around here in stand-by-mood to await your commissions. It is entirely up to YOU getting a yogh or whatsoever in Semibold-Italics …

I could then criticize them for omitting the various special characters used in Anglo-Saxon abbreviations, or the characters needed for pre-reform Irish, or the characters needed to transcribe Aramaic, or the diacritics needed to represent some language with only seven speakers remaining …

OK, than let’s volunteer to sort out what is needed in which field of study and set up a glyph coverage schedule which serves as a usefull guideline. I’d go for it. But don’t forget: it’s work involved.

There are many standards out there already.

Could you be a bit more precisely upon this? What kind of standards are you referring to?

– – – – – – – –
@ archaica:
What is needed depends on the intended use.

The use I intend to provide my fonts for is “text”. That’s my target.

_ _ _ _ _ _ _
@ Nick Shinn:
If a foundry is going to the trouble of producing characters for a minority user group, then it also has to market to that group, or it's a waste of time …

You’re certainly right with this. However: *everyone* belongs to a minority user group. I refuse to give up the dream of fonting to serve as a universal means to bring those minoritans together … anyway.

Jongseong's picture

Me, earlier: I might, for example, decide to make a Latin font specifically for an Asian Studies department […] a huge collection of special characters, ill supported by most fonts out there on the market.

Andreas: – Why?!

Andreas, are you asking why I would want to make a font for this specific purpose, or why these characters are ill supported by most fonts out there on the market?

As for having a specific purpose, I think it makes sense to try to serve a well-defined set of needs, which makes it easier to manage. One has time to be thorough and address the user community's needs. If one just chooses a set of special characters to design, but leaves out some characters that will also be required by anyone using these or otherwise fails to address the needs of the user community, then the resulting font will be unusable for the intended purpose and all the effort will have been pointless.

The characters I was talking about are those used for transliterating Asian languages. These are letters with macrons, dots, and breves, and you wouldn't deny that it is rare to find fonts that support them. How many fonts support Vietnamese, even? And that is a living language with millions of potential users. Characters needed for scholarly transliteration are quite

Sad to say so, but experience with intersted parties often reveals: they tend to mourn about anything missing “YOUR FONT IS NOT GOOD ENOUGH FOR US BÆÄÄÄH :-( ” – yet they are stubborn enough to complain if some IPA-beta bears the serif on the wrong side (and *they* know it better, of course); however, they stick to lousy laymen’s fonts of poor quality (“free”!), producing awful typography, rather than getting (and paying) a professional for doing it right.

There are things that the user sees and there are things the type designers sees, and they can be rather different. The type designer will see the inconsistent stem widths, ugly curves, uneven colour, and faulty spacing that the user can't. However, the user will have an 'eye' for the correct shape of the letter—certain expectations about normative letter forms—that one who has never used the character doesn't have. In many cases, there is some existing typographic or fine writing tradition that has trained 'native' users to expect certain normative shapes, ducti, and modulation, and type designers need to listen to user input to see these.

A glyph, however well designed from the type designer's point of view, will be a failed design if it looks wrong from the user's point of view. This is more demonstrably illustrated when designing for foreign writing systems. Or let me try a different analogy. You might be a great singer, but when you sing in a foreign language the speakers of the language complain about the pronunciation and incorrect phrasing. They prefer the same song performed by a native speaker, who sings out of tune and with horrible technique. You and the audience both have demanding ears, but are listening for different things; you have a better musical ear, but the audience has a better ear for the language. You should be careful not to dismiss such criticism so easily.

What will you do with the Eng when a user steps in and requires both Sámi and African usability of your font –?

One can provide alternate glyphs, as the SIL fonts do. With OpenType and language tags, appropriate forms can be automatically chosen. But you can't always count on software support or correct language tags being present, and you have to decide which variant to put as the default. Right now, the best method is to provide alternate versions of fonts and let the user choose the one that suits them. SIL's TypeTuner is a great solution which lets users choose default glyphs and behaviours in selected fonts of theirs.

Me, earlier: … you can't just go ahead and claim that this represents "all Latin-based languages"

Andreas: I can. And all Cyrillic ;-)
If a user of my font(s) happens to spot any essential gap, I’ll fix it within a few month’s time, or faster.

You misunderstood me. What I said, in full, was that the characters required to support the current Latin-based orthographies of all living languages of Europe, as numerous as they are, do not represent "all Latin-based languages".

When you talk about "essential" gaps, you are talking about making judgement calls about what characters can be considered inessential. There are always characters for highly specialized uses that will seem inessential until a publisher suddenly has to set a book on that esoteric subject.

I admire your commitment to cover as many Greek-Latin-Cyrillic characters as possible with your Andron Mega. I must admit I have had similar aspirations. But one also has to be wary of the limitations of a one-size-fits-all approach to type design. Typeface design may be a series of compromises one after another at the best of times, but trying to harmonize characters that have wildly different historic provenances and that have never been used together into a single font may not always produce the results everyone prefers. For a particular purpose (say, Vietnamese) it may be preferable to fine-tune the diacritics placement, vertical metrics, and other details with only that use in mind without having to worry about what the decisions will mean for all the other characters one could conceivably support with the font.

agisaak's picture

I think you're making a mountain out of a molehill here -- all I'm really taking issue with here is your use of the term 'all'.

From your comments (correct me if I am wrong), it seems that what you really want is to include support for all modern latin-based languages which have stable, established orthographies. That's not quite the same as 'all'. In fact, languages with stable, established orthographies are very much in the minority (in terms of numbers of languages, not number of speakers) and the conventions in use by minority languages are not always documented (at least not in places where they would be obtainable without doing actual fieldwork -- and if we don't restrict ourselves to extant languages many have no surviving documentation at all).

My original statement was that one couldn't reasonably expect any font to support *all* Latin languages. Let me revise this statement in light of your idiosynractic use of the word 'all': No font can be reasonably expected to support all Latin languages *unless* that is the goal which the designer explicitly set out to do.

Put differently, if someone designs a font which targets only a particular set of languages, it would be reasonable to request that the designer incorporate additional languages into future versions (a request which the designer is of course free to ignore), but it isn't really reasonably to chastise the designer for an omission if they never claimed that comprehensive coverage was their intention.

Me: There are many standards out there already.

Andreas: Could you be a bit more precisely upon this? What kind of standards are you referring to?

You were the one who brought up the issue of standards. What I was thinking of, however, were the ISO/IEC 8859-n character sets, unicode blocks, the various Adobe Glyph sets, WGL4, etc.

Each of these groups languages together in ways which are at least partially arbitrary and which may or may not be appropriate for a particular typeface.

This whole thread started with my somewhat facetious claim that the hcircumflex character sometimes makes me want to drop Esperanto support.

Were I to design a face which specifically targeted Esperanto, I'd probably want to design it such that the ascender height is very nearly equal to the cap height, a property which not all faces have. If the ascender height differs radically from the caps height, its rather difficult to place the circumflex in a way which doesn't muck with one's vertical metrics.

So the question I'd like to raise is: if a face is *really* inappropriate for (e.g.) esperanto, does it make sense to include characters necessary for esperanto simply because the face includes other characters from the Latin Extended-A block? I'd say that letting the way that unicode (or any other set of standards) dictate your character set based on how it groups languages together will inevitably lead fonts to include some characters which just don't fit with the design.

If one's goal is to design a face which is intended for near-universal support, that's going to constrain certain design choices, so such a goal really isn't appropriate for the majority of faces.

André

John Hudson's picture

So the question I'd like to raise is: if a face is *really* inappropriate for (e.g.) esperanto, does it make sense to include characters necessary for esperanto simply because the face includes other characters from the Latin Extended-A block? I'd say that letting the way that unicode (or any other set of standards) dictate your character set based on how it groups languages together will inevitably lead fonts to include some characters which just don't fit with the design.

Especially since Unicode does not group languages in any systematic or consistent way. If characters needed for a particular language were proposed for inclusion in Unicode, or inherited from some earlier standard, as a group, then it is likely that they will appear together in the same Unicode block, but it is just as common, or more so, that characters for a particular language will be located across multiple blocks.

I do have a client who prefer to support complete Unicode blocks, but they're software developers and being able to claim such support is convenient for them. Even so, just because they support one or two characters from a block doesn't mean that they want to support that particular block: they target specific blocks, but will admit strays from other blocks (e.g. the currency symbol block) if it makes sense to include these characters based on the language coverage they have arrived at through supporting the targeted blocks.

For the most part, I try to be more selective and target specific languages when planning glyph sets in consultation with clients, since I don't like wasting my time designing glyphs that will never be used (especially since the rarer, weirder characters are often the most difficult and time-consuming to design). Of course, this means a lot of research into language character sets and usage, but I'd rather spend my time doing research once than designing unnecessary glyphs multiple times.

Sometimes, I'm hired to consult on character sets, and I usually take a ‘Chinese box’ or ‘Russian doll’ approach, designing a series of expandable sets with different levels of language support. The lowest level generally conforms to individual 8-bit character sets plus variant glyphs and any stray characters that improve support for the covered languages; the next level(s) increase language support on a regional basis; and the upper level involves complete Unicode block support, albeit sometimes still omitting certain characters (e.g. the Old Church Slavonic characters in the Cyrillic block or the Coptic characters in the Greek block).

If one's goal is to design a face which is intended for near-universal support, that's going to constrain certain design choices, so such a goal really isn't appropriate for the majority of faces.

I entirely agree.

Andreas Stötzner's picture

@ agisaak:
… ISO/IEC 8859-n character sets, unicode blocks, the various Adobe Glyph sets, …

… a bunch of unmatching standards each of which to cover the very same subject differently. Really funny.
To be serious about it: all these do help in making decisions upon coverage. But *none* of them is suitable as a definite plot for my final font layout. And as for the ISO-8859-n series: it belongs to the museum. It may make sense to check if a font matches some 8859-n layout, but, honestly, who is going to offer a font today which “meets ISO 8859-x standard”. Would be rediculous.

WGL4

What is that?

agisaak's picture

To be serious about it: all these do help in making decisions upon coverage. But *none* of them is suitable as a definite plot for my final font layout.

That's exactly my point. And whatever character set you eventually arrive at isn't necessarily going to be useful to anyone else unless their goals are identical to your own. That's why I suggest we don't *need* additional standards -- the existence of such character sets tend to constrain designers in artificial ways.

If, on the other hand, you are proposing developing some sort of resource which identifies which glyphs are used in particular languages, then I would wholeheartedly support such an endeavour.

Note, though, that even for specific languages it generally won't be possible to easily specify some list of required glyphs without referencing the basic purpose of the font. Is gtilde, for example, required for Tagalog? It's no longer used, but appears in printed material much more recently than one finds (e.g.) long s in English. How much time must pass before a particular character becomes entirely obsolete? Will characters which appear only in loan words be considered part of the language? Does it matter how common those loans are? What about materials found only in lexicographic materials?

Even if you clearly specify the set of languages which you intend to support, there's still going to be a lot of fuzziness in terms of which glyphs will be necessary, and this will depend largely on the fonts intended use.

WGL4 = Windows Glyph List 4.

André

Andreas Stötzner's picture

… whatever character set you eventually arrive at isn't necessarily going to be useful to anyone else unless their goals are identical to your own.

I agree.

… some sort of resource which identifies which glyphs are used in particular languages

This is what I meant. For instance, a scheme collecting “all Latin ch.s neccessary for rendering European (and neighbouring) languages in current orthographies” would make sense in my opinion. Additional sets, e.g. long-s usage for older German or English, may be dealt with as add-ons. Same for other scripts.

What about materials found only in lexicographic materials?

It’s getting the more delicate, of course, when it comes to more specific usage. A wide field, but one might be able to master this as well, to a certain extend at least.
A good example for this is the MUFI, giving specialists a comprehensive and concise character repertoire applicable in one particular subject. Works quite well.

Thomas Phinney's picture

My approach for Adobe was a lot like John's: nested sets of increasing language support, each a superset of the previous.

http://blogs.adobe.com/typblography/2008/08/extended_latin.html has the Adobe extended Latin character set definitions.

Cheers,

T

Syndicate content Syndicate content