unicase ???

Rob O. Font's picture

Does anyone know, is the OT feature unicase, referring to u.c. forms made to join the l.c., l.c. forms made to join the U.C. both. i.e. is it ambiguous? or what?

andyclymer's picture

According to the specification it sounds like it's "a mixed set of lowercase and small capital forms", in the sense of Filosofia. U.C. made to join the l.c., from both U.C and l.c. text.

Rob O. Font's picture

"U.C. made to join the l.c., from both U.C and l.c. text."
?

andyclymer's picture

I'll clarify what I meant, I take the specification to mean that the user should expect to be able to type in a case-insensitive manner and have everything remap to a single case...
sub [F f] by f.unic;
...where the .unic set of glyphs could be a mixed set of capital and lowercase drawings at the same height like Filosofia. Do you agree?

John Hudson's picture

The feature is deliberately ambiguous in this regard, David. Emigre requested the feature because they wanted it for the OT version of Filosofia. I think it can be legitimately interpreted in different ways, including making giant lowercase letters to mix with uppercase letters or, as in Filosofia, making small uppercase letters to mix with lowercase. There is also no reason why the result couldn't be something in between upper- and lowercase. The point of the feature is to remove the normal visual distinction of upper and lowercase and display text in a single case. I considered calling it the Uncial feature, but thought that made too strong a reference to particular historical styles.

Rob O. Font's picture

"The feature is deliberately ambiguous in this regard, David."
Thanks John. That was my first thought but I wanted to make sure. I'm going to put what I was calling "biforms" in there, actually "uppercase biforms", and be glad I don't have a second or third set, "lowercase biforms" or "Smallcap biforms" in this design. Otherwise...I don't know what I'd do...Does anyone?

John Hudson's picture

If you had multiple unicase implementations in the same font, you would need to decide on one of them to be the default associated with the unicase feature, and then use the stylistic set features to access the others as variants of the first. In a situation like that, you would probably want redundant glyphs and perform a complete substitution of the entire alphabet to .unic variants, rather than only subsituting specific letters that change form in this feature, since you wouldn't want the stylistic set feature to be affecting random letters if applied to non-unicase text.

Rob O. Font's picture

Yep! choices must be made. It seems OT is settled on a default state of case that is lower. Is that you understanding? So, is the "case" feature also ambiguous?

John Hudson's picture

I think there probably is a presumption somewhere that lowercase text as the most frequent, is in some sense the default. But this presumption precedes OpenType. It is seen in all those old smallcap fonts with smallcap glyphs mapped to lowercase positions and not to uppercase positions. OpenType has actually corrected the presumption with regard to smallcaps, enabling them to be mapped to lowercase and to uppercase.

I don't think the unicase feature contains such a presumption: the results of the feature should be the same regardless of what case of text it is applied to. The decision of whether the results of the unicase feature look more like lowercase, more like uppercase, or not predominantly like either, is an individual design decision.

The case feature on the other hand is not ambiguous at all, it is just poorly named. The description makes it clear that it presumes affected glyphs to be lowercase harmonising by default, and the feature substitutes uppercase harmonising forms and positioning.

Rob O. Font's picture

John, I'm sure u can can imagine this is confused. Most designs lack Small caps, some lack lowercase, no faces in my experience, with Latin, lack UC, and many display faces only have that one case.... You'd think, just from lookin' at the keyboard, not touching it, would give a clue, but I quess not ;) The font format, on the other paw, should be equally adept in any case. There outta be one feature (each) for lower case, upper case and small caps. (period)...

Nick Shinn's picture

you are so right david
and such a system
with the lower case feature
would have the added benefit
of being useful for hardcore modernists

John Hudson's picture

I'm not sure what your point is, David.

Nick seems to be wanting a glyph substitution feature to display all characters as e.g. lowercase, regardless of how they are encoded. In the first place, this is unnecessary as InDesign's display-all-text-as-uppercase function demonstrates: this is something that can be done outside of font layout features using buffered character casing and referencing the cmap. In the second case, though, it is a really bad idea anyway, as the InDesign function also demonstrates. Like it or not, we have a bicameral alphabet in which case is semantically significant. If you display lowercase characters are uppercase glyphs or vice versa, you are just creating unnecessary text processing problems for yourself. I never use that InDesign function: if I want uppercase glyphs, I change the text case, which is just as easy to do and has the benefit of matching the text encoding to the text display. If you display lowercase characters as uppercase glyphs you have text strings in your document that look like all caps but which will not be found in a case sensitive search for all caps. In order to find them, you have to employ extra formatting sensitivity in your search, but then you miss actual all caps character sequences that do not share the same formatting as the fake all caps.

Rob O. Font's picture

In "InDesign’s display-all-text-as-uppercase function", "all text" means what? 26 characters + whatever gsubs happen, don't it?

The point is not that ""the "change Case" sign"" only works on A-Z changing to and from a-z...the fact is that many fonts now aspire to, and some contain, a much broader crowd of case sensitive forms.

I like the bicameral alphabet very, very much. I agree completely that case is semantically significant, (else I'd not argue for an upper default in OT fonts ;) violet goes pale, you see).

Can I rest my...point?

John Hudson's picture

I'm afraid I still don't get what your point is :)

InDesign's display-all-text-as-uppercase function employs character case mapping without changing the backing string. So it should work for any character with straight casing behaviour. I say straight casing, because Adobe have not implemented Unicode Special Casing support, in this function or in the actual case changing function, so there are some lowercase characters that slip through the net -- as Nick found when working with polytonic Greek -- or which case correctly only for certain languages.

I'm not sure what you mean by 'an upper default in OT fonts'. For the most part, there is no assumption of a default text state, and less of an assumption than there used to be in Type 1 smallcap and expert set fonts. I think the 'case' function is unique in presuming that the default forms and positioning for e.g. hyphen height is going to be for use with lowercase or mixed case text. But this seems to me a very fair presumption about the nature of most text. The fact that there are some fonts that only contain uppercase forms doesn't seem very relevant: these fonts would have appropriate forms and positioning for their design, and wouldn't need the 'case' feature at all. The majority of text is mixed case and within that the majority of consecutive character strings are lowercase. That is a reasonable basis for the default form and positioning for punctuation, symbols and, yes, numerals, in mixed case fonts being designed for use in such contexts.

Rob O. Font's picture

:)"display-all-text-as-uppercase" in indesign, is really "display all alphabetics as uppercase alphabetics, (i.e. the same as "shift"). " Display all text as uppercase", I believe, is a different thing to users, and thus to type designers.

There are, 5 cases and a point to be made for a sixth to account for better composition with a particular foreign script (and esp). But the 5 cases we work with most of the time are upper, lower, small cap, infs. and sups. These are not ambiguously used, which is why they should not be ambiguously defined. It's sometimes confusing, because people work in narrow design spaces and don't think all the time about all the space.

Here is a picture: Garamond is the best of the bad examples. If you choose "lowercase" in InDesign only the "lowercase", are , figs are still "too big" Can You See?. Next worst is Trajan, where the figures are not managed for the user as in Garamond, and "lowercase" it says, but small caps they are, SEEEE?. Is this a good way to teach users? Zapfino shows the true metalheadedness of the situation. As the cases diverge in whatever direction the designer chooses, ht. & wt. being the most effective wedges , the characters we usually think of as appropriate for both, (multicase in FB terms), are stressed, and the need for case specific characters increases.

Is the OT ideal that the user needs to manage all of the case outside of A-Z and a-z with other feature tags, or are we trying to hElP? er. no. I think the expectation of users after all this time, is that they will have more power to more easily compose superb typography in a WIDE RANGE OF STYLES, without having to face menus like these... But don't worry, I'll fix it.

Nick Shinn's picture

Right.
"Small Caps" should be accurately termed "Caps with Small Caps" to differentiate it from "All Small Caps".

"Strikethrough" -- now isn't that useful!?

k.l.'s picture

Don't complain you lucky ones ... In the German version it was just "Kapitälchen" for both "Small Caps" and "All Small Caps".

Rob O. Font's picture

I also realize I did not answer "I’m not sure what you mean by ‘an upper default in OT fonts’". John, really it's in a good Document, in a good environment, OT font or not, that the default case composition is set to "mixed U&l.c.", and so, the empty blinking cursor is thinkin' uppercase because that first letter composed should be Uppercase. It is not until another character is input that you know for sure it should be uppercase, but, as I hope is a point of no contention, it'll most likely remain uppercase regardless of what is typed next, or for the rest of the document.

This "Uppercase Default" is true from the beginning of composition (pre-type), and remains true even in digital squirt gun fights like these, I did not make it up, readers did, and All's I'm saying, is writers. By now. should have the tools to make this easy at even the zer0th common denominator level we strive so mightily to serve, (and I don't mean in the deadhead way some have mangled to work it out, where e.g. E.g. Happens because En(2)rage e.g. doesn't have an abreviational period and a solid set of tables on its head to play with. :) )

To do it right, I think we need a well organized and unambiguous series of case features, applications based on how people read and write, (not based on old fonts), and some other important parts too few to mention, as opposed to splitting up the user's perception of case into an idiot's paradise of menus. This is not to say, once again, the FB Inc. is not "doing things right, even if we have to copy other people's mistakes", because we're well practiced in this area of the type business, but we are taking precautions in case an intelligent forward-looking composition system can be made to work, and in the mean time we're using those precautions to make the "right" tables now. :)

John Hudson's picture

Okay, now I understand what you're getting at. Thanks for the examples.

On one hand, everything you've described could be handled at the UI level, largely independent of the font level implementation. This is why, for instance, Microsoft's XAML typographic features are not identical to the OpenType Layout features that they trigger, and a single UI feature might implement a collection of OTL features. [Which is not to say that I think the XAML feature structure is the best thing going, only that it is an example of the distinction between UI experience and underlying architecture.] Similarly, there is no reason why the 'convert to lowercase' function in an application couldn't apply the 'onum' OTL feature at the same time as converting character case.

Of what you are typographically identifying as 'case' -- upper, lower, small cap, inferiors and superiors -- two are considered case for alphabetic characters from a text encoding perspective, and two different ones are considered case for numeric characters from a text encoding perspective. And that isn't going to change any time soon, if ever. So the question is what kind of architecture and what kind of UI should exist to handle a combination of character level processing and glyph level processing in such a way that the user only has to think about what he sees on the page/screen, not how it got there.

Regarding architecture, the question is how, for example, to have oldstyle numerals, which are glyph variants, identified with lowercase text, which is a character class, and how to enable this in a design-specific -- i.e. font-specific -- way. And, at the same time, how not to lose the significant processing speed advantage of character level actions available for alphabetic case mapping. In other words, one wants to be able to say, within a font, 'The following glyphs are associated with lowercase text' without needing to rely on such a statement for glyphs that are already associated with lowercase text by virtue of their character mapping to lowercase character codes. Well, one way to do that is to create new font layout features or tables, in which such statements are made. But another way to do it is so take existing layout features, such as 'onum', and to say that this feature should be applied as part of text transformation functions triggered by the application UI.

k.l.'s picture

David Berlow wrote:
This “Uppercase Default” is true from the beginning of composition (pre-type) [...]

I wonder if this is still the problem except for a few relicts -- default numerals, currency & mathematical symbols which are still "uppercase".

John Hudson wrote:
In other words, one wants to be able to say, within a font, ‘The following glyphs are associated with lowercase text’ without needing to rely on such a statement for glyphs [...] one way to do that is to create new font layout features or tables, in which such statements are made. But another way to do it is so take existing layout features, such as ‘onum’, and to say that this feature should be applied as part of text transformation functions triggered by the application UI.

Maybe it's rather a matter of conceiving things. Just consider oldstyle numerals as lowercase, and lining numerals as uppercase. And make oldstyle numerals default in your fonts. This would make numeral-related UC-to-LC text transformation obsolete.

It is banal: "Normal" text is made up of lowercase, with some uppercase here and there. "Normal" typing (no 'shift' &c) gets you lowercase. Uppercase ask for 'shift' or the 'All Caps' option ('case' which includes switching to lining numerals), and thus can be regarded as "exceptions" already. All caps, small caps, italics -- there's not much difference from a typographic point of view, they are to make text stand out from "default" lowercase setting. And if someone wants lining numerals in lowercase context too, this again has to be an active decision. Moreover, those who really want lining numerals everywhere could make it the default on application or OS level with future software.

But then, one could just turn the last suggestion around: It should be possible to set *any* numeral style, say 'onum', as default or preferred on application or OS level. That way it turns out irrelevant which numeral style is set as default in fonts.

The real problem is that a normal keyboard doesn't apply 'shift' the same way to numerals as it does to letters.

Nick Shinn's picture

Tthere's some confusion of form with function.

While it's OK to make a set of figures to go with lower case alphabetic characters, that doesn't mean they have to be oldstyle. For instance, I could make "Scotch" lining figures (three-quarter cap height) to go with lower case (and upper and lower case) settings -- but make a full-cap height set of lining figures to go with an all-cap setting.

However, while it may make more logical sense to have the OT figures categorized according to the case they're intended to support, users are used to the conventions of oldstyle and lining.

John Hudson's picture

The real problem is that a normal keyboard doesn’t apply ‘shift’ the same way to numerals as it does to letters.

That is just a symptom of the underlying issue that I described: the alphabetic case distinction is a character-level distinction, while numeric 'case' distinction -- because usually thought of as a style distinction -- is a glyph level distinction. A keyboard is primarily a character input tool.

k.l.'s picture

John Hudson wrote:
That is just a symptom of the underlying issue that I described:

Now I understand your earlier text better by reading your comment on my one ...

I just wonder if some of these things, say, automatic selection of the contextually appropriate numeral style on application level, are possible. If I have lowercase text, then for emphasis some uppercase words, followed punctuation marks, then figures, and again lowercase. Would a application be able to "decide" whether to put oldstyle (lowercase context) or lining (uppercase context) figures? And respect different typographic conventions, even ideosyncraties ...
But then, maybe it would help a bit if applications' 'All Caps' were used instead of 'Caps Lock'. Or if 'Caps Lock' would evoke applications' 'All Caps'. Then, automatism could be omitted.

Maybe one has to apply a more radical view: uppercase-lowercase as a glyph-level distinction (like smallcaps-lowercase) which accidentally/historically was regarded as a character-level distinction. 'Shift'+[any lowercase character]* then does nothing but apply 'All Caps' to a single lowercase character. And special unicode values for uppercase are a necessary evil, legacy, one has to live with. (Typographic point of view.)

* I am writing on a Mac: 'Caps Lock' applied to numerals does *not* evoke punctuation marks. Which is a little step toward a change of the paradigm.

Nick Shinn wrote:
While it’s OK to make a set of figures to go with lower case alphabetic characters, that doesn’t mean they have to be oldstyle.

Of course. When applying this broader view (numerals for lowercase & uppercase context), a distinction of oldstyle & lining numerals is too narrow, too design centric -- as opposed to function centric. (Which becomes obvious when adding small caps figures ...)

hrant's picture

> users are used to the conventions of oldstyle and lining.

I think this is true only of a small minority of users.

hhp

John Hudson's picture

I think Nick's point is that if users are aware of a distinction in numeral style at all, then they are aware of it in terms of the convention of oldstyle and lining. The idea of 'lowercase and uppercase numerals' is so rare as not to be a convention at all.

Rob O. Font's picture

"Of what you are typographically identifying as ‘case’ — upper, lower, small cap, inferiors and superiors — two are considered case for alphabetic characters from a text encoding perspective, and two different ones are considered case for numeric characters from a text encoding perspective."

:) which, I guess in the Hudsonian universe leaves small caps as "not a case", which in turn begs the question of why you'd call lowercase characters drawn to Small Cap height, as Unicase...

now for a name for l.c. drawn to the ht. of the caps, I have "fornicase". Sound okay?

Nick Shinn's picture

In typography there are four formal categoric systems.
The first is abstract, and the other three are geometric, i.e. they concern the graphic qualities of shape, size, and position.

1. Character Form. This creates the category of character.

2. Variant Character Forms. This creates categories of majuscule/minuscule, and roman/italic. However, some characters (eg "ess": S,s,S,s) do not have variant forms, while others do (eg "ay": A,a,a).

(Note that Unicode recognizes both majuscule, which it terms "capital" and minuscule, which it terms "small", but not italic variants. That's somewhat illogical.)

3. Relative Size. This creates categories of Big (upper case) Medium (lower case, small caps), and Small (inferiors, superiors, numerators, denominators, ordinals)

4. Relative y-axis Position. This creates categories of Baseline, Superior, and Inferior.

In practice, there are conventional groupings of character parameters from different categories, which exclude many combinations. So on the basic keyboard you get a default roman set of some Medium characters (eg s) with some variant Medium characters (eg a), and some Big characters (eg S) with some variant Big characters (eg A).

In theory, Unicase is not a question of Relative Size (3), because size is "case", and unicase means that there is one "common" case, and therefore no distinction based on size. So unicase is a question of what default set is chosen from the Variant Character Forms (2).

John Hudson's picture

David: I guess in the Hudsonian universe leaves small caps as “not a case”

Smallcaps have (variable) significance in the display of text, but does this make them a case in the same sense that majuscules and miniscules are separate cases? I don't think so, because there are no spelling rules regarding them, no rules at all in fact, other than (variable) typographical style rules, about how and where they should be used. The Latin script is an evolved bicameral alphabet, and that is the basis for its encoding. We augment this bicameral alphabet with additional categories of letterforms for both aesthetic and functional purposes, but these categories do not constitute cases in the same way that uppercase and lowercase do. If you write a German noun with an initial lowercase letter, you have spelled it incorrectly. There is no parallel, orthographically required use of smallcaps, or italics, or sans serif. Any of these categories of letterform might be used in a articulatory way, to make clearer the meaning of a text, but they are not cases in the orthographic sense. We might decide to refer to them as cases, but they are 'typographic cases', not orthographic cases.

Nick: (Note that Unicode recognizes both majuscule, which it terms “capital” and minuscule, which it terms “small”, but not italic variants. That’s somewhat illogical.)

No it isn't. If it looks illogical in terms of your categorisation, then that should alert you that there is something wrong with your categorisation. Orthographically there is no distinction between this and this; ergo, there is no encoding distinction.

hrant's picture

I'm with John on this one.
Smallcaps, italics, etc. are not cases.
And they certainly don't "deserve" encodings!

hhp

Nick Shinn's picture

Orthographically there is no distinction between this and this; ergo, there is no encoding distinction.

I think I've disproved your statement, by copying and pasting the text!

But seriously, orthographic logic is not the same as typographic logic, which is why we are having this discussion.

hrant's picture

> orthographic logic is not the same as typographic logic

Certainly.
But the latter, unlike the former, is not "codified", it's not standardized (for example people disagree about how best to use italics) and there's no way to encode anything that mushy. So you could say that logic and formalism are as different as typography and orthography!

hhp

John Hudson's picture

But seriously, orthographic logic is not the same as typographic logic, which is why we are having this discussion.

Right, but digital typography sits on top of a framework of digital text processing. Unlike bits of lead, digital characters have semantic properties. Text processing and display is multilevel processing: there are the individual encoded characters with their standard properties (e.g. directionality, case folding, canonical composition/decomposition); there are strings of these characters with string level properties (e.g. bidirectional layout); at this level we're still dealing with what is defined as 'plain text'; then there is the first level of display, the direct mapping of characters to default glyphs; then we have what Unicode calls 'higher level protocols' that look after basic rendering requirements beyond straight character-to-glyph mapping (e.g. language shaping for complex scripts, a mix of font and shaping engine responsibilities depending on the technology employed); and finally one gets to the relatively thin layer of typographic refinement that sits on top of the rest of the structure. So however one determines things should be considered in typographic logic, practically one has to consider how these are to be implemented on top of a text processing architecture that begins with a structural model of how individual scripts work at the plain text level.

John Hudson's picture

I think I’ve disproved your statement, by copying and pasting the text!

Actually, you've proved my statement: plain text copying and pasting removed the typographic distinction between roman and italic but, of course, retained the case distinctions between capitals and lowercase letters.

Nick Shinn's picture

...you’ve proved my statement...

Actually, copying and pasting removed the semantic significance which you had encoded by italicization, proving that in some instances italics can have orthographic value.

***

the relatively thin layer of typographic refinement that sits on top of the rest of the structure.

Admittedly, this way of arranging things works far better than what we had before, when it was the other way up. However, I think writing/typography is more fundamental than a veneer.


Cased unicase!?
The issue of unicase highlights the problem the orthographic system has with casing. Casing is convention, enshrined as standard, but from a typographic perspective it's inconsistent to define upper and lower case as separate characters. The German noun convention, the proper name convention, the first letter of the sentence convention, these are all examples of Titlecase, and departing from the convention is not mis-spelling, but mis-casing. Of course, such casing should be kept in the basic document (ie orthographic), but it begs the question of whether Case is a quality of text or layout. Unicode says that case is a text thing; typographically, it's not.

Keyboards recognize that A and a are not different characters, by only having one set of keys, modified by the shift key. And all but the most basic of word processors recognize the inadequacy of having the same "letter" as different characters for upper and lower case, by making the connection between 41 and 61. A and a, etc.

According to the 4-level typographic system I outlined above, these could be coded
1111: 1 (first character) 1 (majuscule) 1 (big size) 1 (baseline), and
1221: 1 (first character) 2 (minuscule) 2 (medium size) 1 (baseline).

In practice, we already have the .unic OpenType feature, which, as has been observed, is non size-specific. A .forc feature is unlikely.

John Hudson's picture

Actually, copying and pasting removed the semantic significance which you had encoded by italicization, proving that in some instances italics can have orthographic value.

No, that is not orthographic. The semantic significance of italics is what I call articulatory. Orthographic implies something on the level of spelling. My point was that there is no difference, orthographically and hence in terms of text encoding, between roman and italic setting of the same word. The spelling is the same, the character string is the same, and so far as plain text operations are concerned they are identical. If Typophile enabled rich text copy/paste, then the italic styling would have been preserved when you copied and pasted my text. But italic styling is mark-up above the text encoding level.

Keyboards recognize that A and a are not different characters, by only having one set of keys, modified by the shift key.

A shift key isn't modifying the input from a key, it is accessing a totally different input. In a bicameral alphabet, the shift key typically switches between cases, with the upper and lowercase characters associated with the same key in different shift states. But as soon as you start keyboarding non-bicameral scripts you learn to think of the shift key in a completely different way. On my Hebrew keyboard, for instance, the shift state of a key accesses characters that have no notional relationship to the characters accessed by the non-shift state. This keyboard contains four states, with up to four characters associated with a single key, none of them notionally related to one another.

And all but the most basic of word processors recognize the inadequacy of having the same “letter” as different characters for upper and lower case, by making the connection between 41 and 61. A and a, etc.

Inadequacy? It is hardly inadequate if the particular text processing function you are trying to perform relies on case matching. By treating upper and lowercase letters as separate characters, but defining their relationship as a property of the characters, you get the best of both worlds, and you get it in a way that is very quick and easy to process and which corresponds to conventional use of the script as a bicameral writing system. If casing were handled as a higher level protocol like, e.g. italics styling, there would effectively be no such thing as plain text processing for the Latin script. Almost every text processing operation would have to be handled as rich text, because almost every operation is applied to bicameral text: that has been the normal way to write the script for many centuries, and almost all text one encounters includes both upper and lowercase letters. Far from being inadequate, encoding upper and lowercase letters as separate characters is highly sensible.

Unicase does not 'highlight the problem the orthographic system has with casing'. Unicase presents anomalistic problems in text encoding and display. It is precisely because they are anomalies that they are problematic and are not obviously addressed by casing technologies designed to handle normal text.

Anomalies are sometimes important, but they shouldn't be taken as benchmarks of encoding technologies, precisely because they are anomalies, they're not normal. I'm very aware of the importance of textual anomalies because I work with scholars who are studying and transcribing pre-standard texts, documenting textual ambiguities, recording scribal errors. At the Unicode conference next month, I'll be discussing the problems of addressing textual anomalies in the context of encoding standards designed around normal -- i.e. standardised, majority convention -- language processing. Even addressing the textual anomalies of a single well-documented text, the Leningrad Codex of the Hebrew Bible, has taken almost three years to pass through the international standards process. Yesterday, I received a copy of Emmanual Tov's Scribal practices and approaches reflected in the texts found in the Judean desert, which contains tables of symbols found in the Dead Sea Scroll fragments, most of which are not encoded in Unicode yet. The very identity of some of these symbols is unknown or ambiguous, so how does one encode them. Similar complex situations pertain to other ancient texts. The encoding of Vedic marks for Sanskrit has been stalled for several years now because the same visual mark may be used to signify different characters in different texts, or the same character may be signified by a different marks in different texts, and no one has yet been able to thoroughly document the usage in a way that would enable to useful encoding. Early Arabic texts, including the earliest Qur'an fragments, are written in an archigraphemic script without the dots of later Arabic that distinguish different letters. To encode these texts using standard Arabic characters is to make assumptions about the identity of the letters that are not explicit in the texts themselves. So, yes, anomalies are sometimes important and we need to be able to handle them within a text encoding and display architecture that is designed around presumptions of normal text, but the importance of anomalies is primarily found in textual analysis operations -- e.g. comparison of multiple texts in an electronic corpus --, not to things like cased unicase display typography.

Nick Shinn's picture

The semantic significance of italics is what I call articulatory.

And so is Titlecase. Your argument that titlecase is like spelling is a justification for assigning casing to plain text. But common sense says that there are some 26 letters in the alphabet, and words are spelled the same way whether in upper, lower, or mixed case.

...as soon as you start keyboarding non-bicameral scripts you learn to think of the shift key in a completely different way.

Well of course! Bicameral means that the same character has two components, not that two (or more) separate characters have been arbitrarily associated.

Almost every text processing operation would have to be handled as rich text,

Now that sounds like progress!
Assigning casing to plain text is a bit of a cheat, to facilitate basic encoding.

the importance of anomalies is primarily found in textual analysis operations ... not to things like cased unicase display typography.

Eunoia is perfectly logical, and only an anomaly in terms of standard practice. (Which may explain why only a handful of people have licensed the font!) I don't really consider it display typography -- it's more of a theoretical experiment.

hrant's picture

> Eunoia is perfectly logical

Just like Van Gogh's ear.

hhp

Nick Shinn's picture

If unicase and small caps are accepted practices, and people understand the logic behind them, then a "cased unicase" font is equally logical -- and Eunoia Unicase adheres strictly to both the unicase and the small cap conventions; there is nothing mutually exclusive that screws up the combination. The fact that the term sounds contradictory is the fault of the terms, not the typographic logic.

hrant's picture

> If unicase and small caps are accepted practices,
> and people understand the logic behind them

But they're not, and they don't. Not even plain old
italics enjoys that. Unicase in particular is looked
upon by many people (including readers) as a bad joke.

And again: even if they are, their usage is still
not codified (or probably even codifiable) so no
encoding is possible.

hhp

Nick Shinn's picture

There is a unicase OpenType feature tag.
All that's lacking is for it to be supported by a layout application.

John Hudson's picture

Your argument that titlecase is like spelling is a justification for assigning casing to plain text. But common sense says that there are some 26 letters in the alphabet, and words are spelled the same way whether in upper, lower, or mixed case.

My specific example was German nouns, in which capitalisation is not optional and an initial lowercase letter constitutes something like a spelling error. The fact is that uppercase letters have grammatical function in a way that italic styling, the presence or absence of ligatures, the use of smallcaps, etc. do not. We expect a grammar checker to pick up on whether the first letter of the first word of a new sentence is a capital, but we don't expect it to pick up on whether that word is set in italics. This is the common sense way in which we use our bicameral alphabet.

Bicameral means that the same character has two components, not that two (or more) separate characters have been arbitrarily associated.

It's begging the question to describe them as 'the same character'. They are the same letter, but that is not the whole basis on which a decision is made regarding how they should be encoded. We must consider how they are used, what kind of functions we expect to be able to perform with encoded text, how best to facilitate those functions. Describing something as a 'cheat' because it 'facilitates basic encoding' seems to me perverse when the goal is basic encoding. It's like saying that kicking the ball in the direction of the goal is cheating at soccer.

Nick Shinn's picture

Describing something as a ‘cheat’ because it ‘facilitates basic encoding’ seems to me perverse when the goal is basic encoding.

The problem is that Titlecase, while it is basic to encoding, and arguably to grammar, is a layout feature. Unicode gets around this by pretending that majuscule and minuscule versions of the same letter are different characters, making basic encoding possible without resorting to the higher level of layout features. Sure, it may get the ball in the goal, but so can handball, and that's a foul.

In terms of grammar, Titlecase is a form of Inflection. Whereas most inflection signifies a grammatical feature by altering the spelling of a word, Titlecase does so by capitalizing the initial letter.

The result of the "cheating", which applies case selectively only to letters, and through "characterization", as David Berlow has so astutely pointed out, is that the typographic casing of non A-Z characters is left up in the air: "Is the OT ideal that the user needs to manage all of the case outside of A-Z and a-z with other feature tags? ...an idiot's paradise of menus".

As Karsten has noted, ideally the shift key should activate the Case feature. The way it is now, letters keyed as all-caps by using the shift key (or shift-lock), will not automatically display the appropriate "case" of accompanying glyphs, such as figures and punctuation -- unless the "all caps" button is pushed for text which is already in all caps; from the user's perspective, that's absurd.

So the Unicode system is not perfect. But funnily enough, I don't think that's necessarily bad for good typography. Perfection is hard to come by. OpenType is a powerful tool, and if some expertise is required from typographers to realize all its benefits, then it becomes a focus of interest. If typographers have to think about which figures go with which case, and make a decision, that's good for typography, rather than just having the standard kick in automatically. Nonetheless, there's a lot of room for improvement in the OT menus.

hrant's picture

> Unicode gets around this by pretending majuscule and minuscule
> versions of the same letter are different characters

Pretending? That's exactly what they are: two different characters representing the same letter, each one used depending on certain grammatical rules. The fact that non-alphabetic characters don't have other-case counterparts is just a complexity we have to live it; it doesn't make Unicode's handling of UC/lc incorrect. And typographic distinctions are completely moot to this point; because they are not formal, codified rules - they're more like personal preferences.

Think for example of so-called "upright italics", which aren't
really italics at all (because they're functionally not) but too
many type people claim they are.

> the typographic casing of non A-Z characters is left up in the air

That's a different problem. As is the observation that
there's room for improvement (although there always is,
even in something like the basic alphabetic structures).

My own main point (a minor one, I admit, in this broad thread) is that you cannot -and should not- encode things like smallcaps and italics the way you encode formal grammatical distinctions, at least not at such a base level as Unicode. Also, unicase is not a formal grammatical distinction (thank god).

> the Unicode system is not perfect

True.
Verily, only Eunoia's unicase logic is so.

hhp

Nick Shinn's picture

two different characters representing the same letter, each one used depending on certain grammatical rules.

That's self-fulfilling. Just because grammar determines their usage doesn't make them different characters.This is apparent with those characters which have an identical form in both cases, CcOoSsUuVvWwXxZz in Latin. That's the fundamental way casing works, it's the same character in two different sizes. The complexity we have to live with is that in many instances the character has a different form as well as size.
As I said, Titlecase is a grammatical indicator that works by changing case (a typographic quality), not by changing character, i.e. spelling (a linguistic quality).

The fact that non-alphabetic characters don’t have other-case counterparts is just a complexity we have to live it; it doesn’t make Unicode’s handling of UC/lc incorrect.

The correct casing of characters, both alphabetic and non-alphabetic, is an important part of typography. Beyond typography-lite, Unicode's handling of case poses difficulties; it's incorrect because it's simplistic. Rather like what happened with faux-italic, -small caps, and -bold.

Unicode would accomodate expert typography better if it encoded casing for all characters -- not just those which are grammatically meaningful. That would be something like feature tags for the five cases: Caps, Lower case, Small caps, Superior, and Inferior.

The idea that "oh that's just fancy typography, the icing on the cake, so it doesn't matter" is at the root of the way Unicode handles case.

hrant's picture

> it’s the same character in two different sizes

No, I don't think size is it at all. The essential useful difference
between UC and lc is that they're used grammatically differently.
The size, forms, etc. are all on a different level of usefulness. Maybe
a more useful level to an art director, but clearly not to most people.

Also, by seeing uppercase letters that are structurally different than their lowercase counterparts as somehow flawed, you're making the same mistake as Thompson: vying for destructive simplification.

> The correct casing of characters, both alphabetic and
> non-alphabetic, is an important part of typography.

Not really. It's more of an important part of orthography, and [the casing aspect of] typography just follows that. I'm getting the feeling that perhaps you'd like people to see unicase as a typographic scheme as valid as traditional mixed-case, and you realize that can only happen by simply ignoring [that aspect of] grammar, so you're advocating classing typographic refinement above basic encoding? If so, I'm sure that won't work.

> Unicode would accomodate expert typography
> better if it encoded casing for all characters

No, it would become a mess. Repeat: those are not properly encodable.
The decisions made would be "incorrect" to various segments of users.
Like try to define what an "italic" is, and you won't hear the end of
it from me. :-)

What you're talking about isn't useless, quite the contrary; but that's not the point of unicode, and it shouldn't be. Furthermore: what unicode tries to do is in fact more important, simply because it's more fundamental; billions of people have trouble typing their scripts into computers - they're quite far from worrying about smallcaps and such - not that the most technologically advanced cultures really worry about smallcaps either anyway... :-/

Seeing this distinction is critcally important - it has nothing to do with "fancy" or dumb - but with the useful layers of how humans process textual information.

hhp

Nick Shinn's picture

by seeing uppercase letters that are structurally different than their lowercase counterparts as somehow flawed

I didn't say that, I don't see that. The minimum requirement for upper vs. lower casing to be grammatically significant is size. The fact that there are many characters where the glyph shapes are different for the different cases is therefore grammatically redundant. Which isn't to say it's a flaw, it's a complexity (to use your term), and probably serves some other purpose.

I’m getting the feeling that perhaps you’d like people to see unicase as a typographic scheme as valid as traditional mixed-case,

Hrant, it's 2006. Go change some diapers. Stop projecting your feelings onto me and address what I say. Note that in my last post I mentioned five typographic cases, and unicase wasn't one of them.

Repeat: those [expert typographic cases] are not properly encodable. The decisions made would be “incorrect” to various segments of users.

Standards are negotiated. OpenType feature tags are agreed-upon encodings. Typographers need sophisticated layout applications to work efficiently -- at the moment their menus are all over the place -- and it would be a lot easier for application developers to do their part if the five kinds of casing were encoded. The premise of Unicode is to encode only that which is grammatically significant.

simply because it’s more fundamental;

Not so. Coded text can't exist without typography.

There's no reason to blow off typography as the icing on the cake. The system should work for everbody -- art directors using small caps are no less important than ancient linguists studying dead languages.

Small caps are an integral part of the world system -- Microsoft's commitment in the ClearType fonts will see to that: small caps for Latin, Greek and Cyrillic; Regular, Bold, and Italic!

hrant's picture

(Nick, I hope you weren't counting
on soiled diapers to save your ideas
from being trounced. :-)

> The minimum requirement for upper vs. lower
> casing to be grammatically significant is size.

If that were true (beyond the banal absolutist/formalist
sphere) guess what wouldn't work in practice: smallcaps! :-)

> grammatically redundant

It's ironic that you suddenly seem to be missing
what makes it not redundant in the end: typography!

> Stop projecting your feelings onto me

Feelings? "I'm getting no reading, captain."
What I'm doing is trying to figure out where somebody is going based
partly on where he's coming from. In my book that's a smart thing.

And nevermind that I used "perhaps", and a question mark.
After all, it's much harder to feel/act offended while noting tentativeness.

> The premise of Unicode is to encode only
> that which is grammatically significant.

I would friggin hope so, is my whole point.

> Coded text can’t exist without typography.

Sure. But it CAN exist without smallcaps, italics, etc.
What it can't exist without (without changing the language)
is alphabetic casing. Why is this distinction so hard to accept?

> There’s no reason to blow off typography as the icing on the cake.

I agree - and I never do that. But seeing the difference
between orthography and typography, and accepting that the
former can't simply be ignored, is still crucial.

> The system should work for everbody — art directors using small
> caps are no less important than ancient linguists studying dead
> languages.

1) No system can do that. Fortunately.
2) Some occupations are still more important to the world at large than others. Your example of those who study dead languages is a convenient extreme.

> Small caps are an integral part of the world system

Not even close.
Not even italics is.

hhp

John Hudson's picture

The premise of Unicode is to encode only that which is grammatically significant.

I don't think it is a premise, per se, but it is often the way it works out for a particular script. Unicode is guided by a number of conflicting principles, and in given situations one or more of these principles will trump others. For example, the principle of one-to-one compatibility with pre-existing standards has always trumped the character/glyph distinction principle. But Unicode precedes from a notion of 'plain text', which is defined as 'Computer-encoded text that consists only of a sequence of code points from a given standard, with no other formatting or structural information. Plain text interchange is commonly used between computer systems that do not share higher-level protocols.' Unicode encodes as characters that which is necessary to encode plain text, and one way of looking at this is to determine what is an acceptable loss of information when richly formatted text is interchanged as plain text between systems that do not share higher-level protocols. So the question is, is it acceptable to lose the distinction between upper and lowercase letters in Latin plain text? I would say no.

Positing a 'grammatical' premise for Unicode is innacurate. Indeed, a criticism of the Unicode encoding of some scripts is that it does not correspond to how users think of their script in grammatical terms, or may reflect only one of a number of competing grammatological analyses of the writing system. The Tamil script is an interesting and controversial example. The Unicode encoding is based on the ISCII encoding (an example of the pre-existing standard principle), which treats Tamil in a way directly analogous to the North Indian Brahmic model. Some Tamils complain bitterly about this, pointing to the existence of a very ancient Tamil grammar that explicitly describes the structure of the script in ways that do not neatly correspond to the Unicode/ISCII encoding. This debate takes place against a political and cultural struggle. But the question that needs to be asked is whether an encoding needs to correspond to a grammatological analysis of a writing system? Isn't it enough that the encoding works and, indeed, shouldn't the practical considerations of computer text processing be the guiding principle of a computer text encoding standard? So the UTC and WG2 committees look at writing systems and made decisions about how to apportion handling between plain text operations and higher level protocols.

The idea that “oh that’s just fancy typography, the icing on the cake, so it doesn’t matter” is at the root of the way Unicode handles case.

I disagree. I don't know anyone at Unicode who dismisses typographic display in this way: they just insist that it sits above the plain text level at which they encode characters. And they're right. It is not their job to enable rich typography at the text encoding level. It is the job of systems and applications to enable rich typography at the higher level. As discussed much earlier in this thread, and as alluded to in your recent mention of menus being 'all over the place', the problems you are describing are essentially user-interface issues. Sure, you can say that it would be easier for applications to solve these user interface issues if all five of the typographic cases you identify were encoded at the character level, but what other text processing functions would become more complicated if one did this? One shouldn't build a text encoding standard around user-interface ease of design. That's putting the cart well before the horse.

In any case, this is a very moot point, because there is no way that Unicode's handling of casing for the Latin script is going to change. This is the architecture we have to work with, and the level at which we should be addressing it is in UI design and UI function interaction with font layout features.

John Hudson's picture

The system should work for everbody — art directors using small caps are no less important than ancient linguists studying dead languages.

Trust me, the system currently works a heck of a lot better for art directors using smallcaps than for scholars studying ancient texts. Again, what we're talking about for the former is essentially a user interface issue: how to present typographic controls in ways that make intuitive sense to a typographer. Further, the needs of the art director are typically quite simple: he needs an individual piece of text to look a particular way, often without needing to worry about the underlying text encoding at all because the result is to be output in a static medium.

Nick Shinn's picture

seeing the difference between orthography and typography

That would be nice, but they overlap.
Unicode ignores the overlap by encoding capitals as separate characters, not by case.
So higher level protocols are left to hack the connections between A and a, etc., in order to perform functions like "all caps".
"All caps" is rich-text typography, but it's as basic to the standard WP software, MS Word, as holding down the shift key on a typewriter -- not exactly art-director stuff.

I don’t know anyone at Unicode who dismisses typographic display in this way: they just insist that it sits above the plain text level at which they encode characters.

Sorry for the colourful figure of speech, but I suspect that the subtleties of expert typography may be perceived as a bit precious, hence the reluctance to incorporate a display function like casing.

Yes, this is all very moot.

hrant's picture

> higher level protocols are left to hack the connections between A and a

Yes, but certainly not "hack".
In fact the main benefit of unicode is greatly reducing hacks.

> the subtleties of expert typography may be perceived as a bit precious

Which, of course, they are.
We're simply not very useful.

hhp

Syndicate content Syndicate content