Naming glyphs with no coded components

Igor Freiberger's picture

Glyph naming convention for base letter+diacritic is well known: uni+[base code]+[diacritic code]. But how to name letters when the combined diacritic has no codepoint?

This issue arises regarding L with double slash and Lambda with bar, both used in native North American languages. The slanted bar may be considered the 0337, but I'm not sure if this is correct. And the double Polish bar has no Unicode at all.

phrostbyte64's picture

You should probably put it into the private use unicode section and work the substitution into the OpenType code for stylistic alternates or stylistic sets for language substitution. Or is there a code for languages? LANG? I don't remember now. I even go so far as to create unicodes for the elements of diacritics such as the macron over the "i" (macron.salt). I hate overhanging macrons. Even if you write the code for the substitution, you must have a unicode because some software doesn't see the character if it doesn't have a unicode. Also, the naming conventions should be something similar to lambda.salt or L.salt because some software and postscript printers won't recognise names outside that convention. Or, so I've read. That should give a starting point for further research until someone far more knowledgable than I (which wouldn't be hard) posts a better solution.

JLS

John Hudson's picture

Igor, might the L/l with double slash be treated as glyph variants of U+2C60/U+2C61?

If not, then they should be proposed for encoding, because Unicode's preference now is to encode characters in which combining bars or slashes intersect with the base form, realising that the techniques for dynamic mark positioning that work so well for floating marks are problematic in this cases, and hence these text elements will tend to be displayed with precomposed glyphs.

Yes, the single combining slash is U+0337.

The lowercase Latin lambda with bar is encoded as U+019B, but it currently has no uppercase form encoded (it originated in Americanist phonetic notation, which uses only lowercase). If this uppercase form is in use, then it definitely needs to be documented and proposed for encoding.

Igor Freiberger's picture

James and John, thank you very much.

About PUA codepoints, I'm following technical recomendations I received here in Typophile and thus I'm avoiding them by now. OT substitution would be an option if these characters were actual variations of others. But I do not know if the double slashed L is just a language variation from the single slashed version.

Uppercase barred lambda is included in Huronia Pro, as well as the L with double slash. A non-barred version of Lambda is available in LaserSalish fonts. Unhappily, I cannot find further information about them (maybe I should ask Ross Mills about it).

Curiously, the lowercase Latin labda with slanted bar is used in Wikipedia as an icon to native North American languages. It seems to be a typical letter for some of these tongues.

Khaled Hosny's picture

The general rule is that glyph name should reflect the character(s) used as input to get this glyph, as the point of standardised glyph naming is to allow recovering the original input from an array of glyphs (e.g. in PDF files where the input text in not usually preserved but only the resulting glyphs).

Igor Freiberger's picture

Khaled, this is a non standardised situation as the double slash is not available as separeted diacritic. Thus, there is no input to be emulated by some uni+0000 combination.

Khaled Hosny's picture

So how the user is supposed to get this glyph? there must some sort of input even if PUA (whether a special OpenType features is needed or not is largely irrelevant).

Igor Freiberger's picture

So how the user is supposed to get this glyph?

This is part of my question: if I know how to name it, I could adopt a proper way to access it. Note there are a number of special keyboard layouts designed to make easier the input of less known languages –and even specific text editors were developed to achieve the same.

BTW, it always may be reach with a glyph panel.

Michel Boyer's picture

Why is there any need to see uni019B as a composed character? To input uni019B or any unicode character (whether it looks composed or not), all you need is an appropriate keyboard. I see that there are more than 170 different keyboards for Ametican Native Languages on languagegeek. There are also tools to make your own keyboard. Why is that not enough?

Igor Freiberger's picture

Michel, my question is not about the U+019B character, but its companion uppercase, which does not have codepoint. Anyway, I consider the answer was given by John: the slanted bar is the U+0337 diacritic and a simple uni+0000+0000 naming schema is enough.

The remaining doubt is how to name L/l with double bar.

Michel Boyer's picture

> The remaining doubt is how to name L/l with double bar.

John asked if they cannot be seen as variants of U+2C60/U+2C61. Is there a reason? What use is reserved for those two last glyphs?

Michel Boyer's picture

I have found the proposal for those two characters: http://scripts.sil.org/cms/scripts/render_download.php?&format=file&medi... and the proposed use is quite different, I must confess. I have nothing to add to what John said.

Added: If I felt I had no choice but to find something (but never to make public), I might be tempted to use U+20EB as the combining diacritical for L and l even if the bars are "too long"; my guess is that it is unlikely that such a choice could cause any conflict for any personal use. For a font to make public, that is unclear.

Igor Freiberger's picture

I don't know if they are variants of these, but the specimen for Huronia shows them as different characters:

John Hudson's picture

Igor, checking Huronia, it looks like Ross handled the l with double slash as a 'ccmp' composition of l + short combining slash + short combining slash. That's a clean encoding at least, although I still think this letter should be proposed for separate encoding.

Ross Mills's picture

The double-barred 'l', or double slash 'l' are used in Ktunaxa (Kootenay) orthographies as a voiceless l. You could see the barred l (ł) used (eg. in Boas), the double-slashed l or the double-barred l. Given the last is encoded, the double-slashed l could be mapped as a variant of either ł or ⱡ, in lieu of it having a separate encoding. It could also be mapped via [ccmp]. I would tend to say you're more likely to see the double-bar form.

The uppercase barred lambda is included for the sake of complete typographic case options for the orthographies which use the barred lambda. It is a logical inclusion (at least from a typographic standpoint), and I try to at least give users the options of having uppercase, lowercase and smallcaps for most everything. People typesetting eg. Nuuchahnulth have said they would like an uppercase option (and by extension a smallcap option). If any precedent is set for such use, then it of course should get encoded as the uppercase form of ƛ—as it stands, access mapping via OT lookups could probably be done in a couple of ways; none particularly elegant that spring to mind.

Incidentally, unless you are relying solely on mark positioning, there should probably be a precomposed version of the glottal-barred-lambda as well which needs fairly careful positioning of the glottal mark. The lambda and barred lambda* are used in Salish languages as voiced and voiceless lateral affricatives (both can take a glottal). Traditionally there has been no uppercase version because, well, no one bothered to make them.

*The subject of 'latinate' Greek for the purpose of phonetics has been covered elsewhere.

Syndicate content Syndicate content