Symbols and Unicode

agisaak's picture

The unicode codecharts specifically state that the sample glyphs shown are not intended to be normative, but I'm wondering how much leeway is actually intended. I'm specifically interested in symbol/picture characters rather than characters from actual scripts.

As a concrete example, both Linotype Astrology Pi No. 1 and Linotype Astrology Pi No. 2 include glyphs in the range U+2648 through U+2653 (i.e. zodiacal symbols). The glyphs in LAP#1 closely match the symbolic representations shown in the standard whereas those in LAP#2 contain pictorial representations of these signs which bear no resemblance to the glyphs shown in the standard. This seems legitimate, though, since each set of glyphs match the descriptions given in the standard (aries, taurus, etc.)

However, I'm wondering about cases where the description given in the standard describes an actual _shape_.

For example, in the standard U+26E8 is described as BLACK CROSS ON SHIELD (=hospital). Would it be inappropriate to encode a glyph in this slot such as the hospital glyph from Adobe Carta (encoded in the private use area at U+E02D) despite the fact that it is actually a white cross on a black circle?

Note that the above is just an example. I'm just trying to get a feel for whether the graphic description takes priority or whether a graphic or semantic match is sufficient (or near-match for that matter - with pictorial symbols in some cases the semantics are a bit fuzzy).

n.b. I'm pretty sure Adobe Carta was released before this codepoint had been defined, so their decision to use a PUA codepoint is not revealing of their take on the matter.

André

John Hudson's picture

Generally speaking, this has to be determined by criteria external to the Unicode Standard. Such criteria have to take into account user expectations, and must reflect the degree to which a certain shape is normative in a particular usage. The zodiac symbols are a good example, because users employ both symbolic and graphical representations; hence the Linotype Astrology font approach makes good sense in supporting both usages.

Where a Unicode character name does specify something like particular colour arrangement -- e.g. black on white -- then that can usually be expected to form user expectation, such that someone may legitimately complain if this character is represented in white on black. If U+26E8 were simply named CROSS ON SHIELD, then I think reversing the colours would be legitimate and while some users might not like it they would have no grounds to complain.

Note that some mathematical symbols in Unicode have recognised variant forms that are meant to be displayable using Variation Selector code. This is supported in OpenType using a format 14 cmap table, which maps combinations of character code and VS codes to glyph variants.

agisaak's picture

Thanks for the reply. Your position on U+26E8 makes sense.

More generally, what I'm trying to decide is, when encoding a pi font, whether it's better to assign non-PUA unicode values to symbols only when an exact match can be found, or whether a 'close' match is sufficient.

I'd prefer to avoid PUA codepoints where a more appropriate assignment exists. The big problem here, though, is figuring out exactly what constitutes a close match. The standard doesn't always specify why certain symbols were encoded in the first place, which makes it difficult to figure out whether a precise shape might be expected in some technical contexts.

André

John Hudson's picture

As I hoped I tried to convey, there is no general answer to your question independent of the usage of the font.

ilyaz's picture

> The standard doesn't always specify why certain symbols were encoded in the first place

I go to
http://search.cpan.org/~ilyaz/UI-KeyboardLayout-0.13/lib/UI/KeyboardLayo...
and I see
http://scriptsource.org/cms/scripts/page.php?item_id=entry_detail&uid=ng...
http://scriptsource.org/cms/scripts/page.php?item_id=entry_detail&uid=kt...
These are 2 examples of how to find documents specifying the “whys” of encoding (the nDDDD ones). The first page was found via going to Scripts/Cyrillic/search-for-Unicode Status; the second one can be reached via Scripts/Latin. I cannot find “Unicode Status” for “Miscellaneous symbols” though… Maybe googling for site:scriptsource.org "unicode status" "miscellaneous symbols" (or some such) may help you…

Bob H's picture

I see
http://scriptsource.org/...
http://scriptsource.org/...
These are 2 examples of how to find documents specifying the “whys” of encoding (the nDDDD ones). The first page was found via going to Scripts/Cyrillic/search-for-Unicode Status; the second one can be reached via Scripts/Latin. I cannot find “Unicode Status” for “Miscellaneous symbols” though…

The ScriptSource community have been adding Unicode Status pages for various scripts just because the background information for characters is, as you've found, difficult to find. I'm glad you have found the info helpful, but please understand that this is a work-in-progress -- and we invite anyone with relevant knowledge to contribute to the project.

I would point out that for characters that have been in Unicode for a long time documentation is both harder to find and, in a practical sense, generally less important to developers. For that reason ScriptSource contributors have focused on characters added more recently.

Case in pont: the zodiac characters you mention (U+2648, U+2653) are very old -- dating back to Unicode 1.1.

Syndicate content Syndicate content