Linquistic/phonetic question

charles ellertson's picture

A fair number of the books we set signal a glottal stop with the "apostrophe," or "raised comma." Times past, you simply set an apostrophe (what is now U+2019 in Unicode).

But Unicode allows for options. The U+2019 apostrophe/single close quote is paralleled in the Spacing Modifier range with U+02BC. While occurring less often, this character/glyph difference is also paralleled by U+2018/U+02BB (turned comma) and U+201B/U+02BD (reversed apostrophe).

Is there any consensus amongst linguists which are the correct or preferred characters?

In the same vein, in some orthographies we occasionally encounter an acute used as a spacing character, not an accent. (Cherokee, for example). Do linguists prefer U+02CA and view U+00B4 as a legacy character, or does it matter?

[It matters to us because we use character composition routines in our text file conversion program that parallel the ccmp feature in OpenType fonts. It is always an issue whether or not to treat 0060, 00A8, 00AF, and 00B4 as combining accents or spacing characters. I think of them as "legacy" accents, others may not.]

Curiously, most authors and editors, even at scholarly presses, leave these decsions to the typesetter. They draw a picture of what they want, and in the text files, use a code sequence rather than make any attempt to use the correct Unicode character. From an appearance point of view this doesn't matter -- the old "if it looks right, it is right." However, further uses of final typesetting files are becoming common, so it would be nice to move toward some consensus.

John Hudson's picture

Regarding the apostrophe, I've lost count of how many different ways there are to encode/display glottal stops. On the basis that the spacing modifier range was specifically encoded for the purposes of phonetics, my recommendation is to use those characters where possible.

Regarding the spacing acute, I would recommend using U+02CA, the combining modifier character, rather than U+00B4. I agree with your interpretation of the spacing accents in the 00 range as legacy characters. It should also be noted that these spacing accents have compatibility decompositions to space+mark, so there is the risk, albeit small, that in some normalisation situations these might be decomposed, which would affect spacing and kerning.

Syndicate content Syndicate content