Rock ’n’ Roll: correct apostrophe usage

Richard L's picture

All, what is correct apostrophe usage when abreviating "and" to ’n’? (for US grammar)
I've seen all kinds of versions out there.

eliason's picture

You have it right: the apostrophe takes the place of both missing letters a and d, just as it does in words like "I'll" and "can't."

Not-that-smart "smart" quotation features assume that a typed apostrophe beginning a word is intended to be an open single quote, but they are wrong in cases like '68 Olympics, 'Tis the season, and rock 'n' roll.

Nick Shinn's picture

Isn’t it about time the poor apostrophe had its own Unicode?

hrant's picture

But is such use of two apostrophes grammatically correct to begin with? If it's not, "anything goes" works out fine anyway.

http://www.in-n-out.com/

BTW auto-quotes also mess up the Hawai‘ian ‘okina diacritic.

Nick, the apostrophe does have its own Unicode number; but I'm not sure how much it's actually referenced.

hhp

agisaak's picture

Hrant,

If you're thinking of U02BC, that's only appropriate where the apostophe serves either as a phonetic modifier or as a letter in its own right, so this would be appropriate for words like b'ak'tun or Qur'an, but not for words like 'n'.

André

altsan's picture

Strictly speaking, I believe U+0027 is the apostrophe. It's only typists' convention (and the limitations of ASCII) which causes most word processors to use it as a synonym for the right single quote.

I see the Unicode standard describes U+2019/right single quote as also being the 'preferred' apostrophe character (goodness knows why), but it doesn't change that U+0027 is the character officially assigned as the apostrophe...

hrant's picture

André, I didn't know that! Thank you.

Alex, I think 0027 is best seen as a non-directional single quote.

hhp

John Hudson's picture

U+0027 is a deliberately ambiguous character intended to represent a left quote, a right quote and an apostrophe. It reflects the fact that computer keyboards typically follow typewriter keyboards -- as opposed to, say, typesetter keyboards --, and these had only one ' key. I don't consider U+0027 to be a character at all in the sense of something to be displayed as a text entity; I consider it a (not very good) input mechanism for other characters.

Nick: Isn’t it about time the poor apostrophe had its own Unicode?

It already has two. U+2019 is the right quote character that, following writing practice, is normally indistinguishable from the apostrophe, hence Unicode's annotation that this is the preferred apostrophe character on the grounds that, unlike U+0027, it actually looks like an apostrophe. The other apostrophe character is U+02BC, but as André notes this is properly used only in some forms of transcription where it typically indicates a glottal stop.

Is there a need for an apostrophe character distinct from the right quote character? I don't think so: written and typographic practice is for these two characters to be identical, and I can't think of any behaviour distinctions. The problem is not lack of a separate codepoint for apostrophe, but the general shittiness of keyboard layouts derived from typewriters and the inability of 'smart' quote substitution to handle all the instances in which the input mechanism U+0027 needs to be displayed as U+2019.

hrant's picture

Actually it would be nice to have a proper apostrophe code point, simply because an apostrophe is not a quote! They have a different meaning. Just because writers and typographers have been too lazy to show a difference doesn't mean they never will. Remember, some people like quotes to point in a different direction than an apostrophe (not to mention the size potentially being different). I personally believe the ideal closing quotes point upward (and inward) and have in fact made a font like that (Cristaal). Nobody has to do that, but not having an apostrophe code point obstructs that choice.

hhp

timd's picture

We don’t need no apostrophes.

Tim

John Hudson's picture

Hrant: Actually it would be nice to have a proper apostrophe code point, simply because an apostrophe is not a quote! They have a different meaning.

Unicode does not encode semantics, it encodes text entities. This $ has a different meaning in various countries, despite often having the same name, but is a single character.

Your Cristaal example would be evidence for possible disunification, but idiosyncracy tends not to be very convincing in what is, after all, systems built on conventions and standardisation. You'd need to present a variety of examples in real-world use and make the case this these merit requiring updates of all software to handle the new apostrophe character.

John Hudson's picture

Bob D'ylan might not need apostrophes but greengrocer's do.

hrant's picture

John, is producing examples of precedents the only way to achieve disunification? What about if I could robustly explain the future benefits?

hhp

quadibloc's picture

Unicode does not encode glyphs, and so each Unicode point actually does represent a "meaning". So f, f, and i can become the ffi ligature, Arabic letters appear only once, and so on.

In ASCII, U+0027 is supposed to be the apostrophe and closing quote, and U+0060 is supposed to be the opening quote. In practice, though, U+0027 was available before U+0060, since that character was only added when lower-case was added to ASCII, so the practice of using U+0027 like the symmetrical symbol on a typewriter was well established.

So there ought to be a different Unicode codepoint for the cases where one actually wants the typewriter-style symbol!

On the other hand, U+0022 is officially a typewriter-like ambiguous double quote. So single and double quotes are handled very differently in the basic 256-character set. On the other hand, guillemets are handled properly in Unicode.

This is a mess, and the response has been for each operating system and application to handle U+0027 in whatever way fits its own needs. Since the "right" Unicode character is usually one that our keyboards won't let us type, I don't know if this can be "fixed" in a way that won't just make things worse.

John Hudson's picture

...each Unicode point actually does represent a "meaning"

No. Meaning is semantic content of language. Writing systems capture various aspects of language as text, but outside of the Chinese ideographic system semantics are generally the least commonly captured aspect of language (English orthography only captures semantics via some punctuation, notably in the distinction between its and it's, i.e. a distinction of meaning that is phonetically absent).

Unicode encodes text entities for plain text processing, which means, as you note, that it does not usually encode glyph variants (ironically, the two examples you give of ffi ligature and Arabic positional forms are exceptions, since these have encoded presentation forms for backwards compatibility). But not encoding glyph variants does not mean that what Unicode encodes is meaning. Even in the case of East Asian ideographs, what Unicode encodes are the text entities needed for plain text processing, not the meaning of the characters (obviously, since some characters have multiple meanings or have diverged in their meaning in different cultures).

In ASCII, U+0027 is supposed to be the apostrophe and closing quote, and U+0060 is supposed to be the opening quote.

U+0060 is a spacing grave accent, and is clearly identified as such in the Unicode Standard. It is not an opening quote and is not intended to be used as such. [BTW, there is no such thing as an 'opening quote' character outside of the conventions of particular punctuation systems. The 'left quote' character is a closing quote in the German system.]

John Hudson's picture

Hrant: John, is producing examples of precedents the only way to achieve disunification? What about if I could robustly explain the future benefits?

Providing examples of existing use is by far the easiest means to get any character encoded in Unicode. In the case of disunifications, the bar is higher than for new characters, because stability is one of the principle goals of any technical standard. In the case of Unicode, stability can also be a strict requirement due to signed agreements with other standards organisations that rely on Unicode not changing certain things. So, for example, Unicode has agreed with the IETF not to introduce any more characters with canonical decompositions. Some disunifications are likely to be subject to such agreements.

Really, I think your idea is likely a non-starter, because you're talking about disunifying characters that have been part of ASCII or ANSI for a heck of a long time. This is the sort of stuff that software developers consider core library stuff that no one has had to think about for decades. With regard to 'future benefits', pushback is much more likely to focus on future disruptions.

hrant's picture

Sadly I have to agree that it's pretty hopeless.

What about co-opting an existing seldom-used apostrophe-like code point? Like U02BC.

Or even adding a new one: "Apostrophe We Never Knew We Really Needed". :-)
(And just for the memories: http://typophile.com/node/69010)

hhp

John Hudson's picture

Well, yes, you could take the uppercase eszett approach, and encode a new character without changing any official behaviour for existing characters. The argument in that case is along the lines of 'Some people want to be able to distinguish an apostrophe from a right quote, so without changing the existing dual identity of U+2019, we'd like to encode a distinct apostrophe character'.

And I'm 99.9% certain that the response in that case would be 'Such people should use U+02BC'.

oldnick's picture

Jeez: talk about a tempest in a teapot…

The hell with the rules: we’re talking about Rock ‘n’ Roll here, so what looks cool rules, and especially more so if it rhymes. Sheesh…

John Hudson's picture

Röck ‚n‘ Röll

quadibloc's picture

@John Hudson:
U+0060 is a spacing grave accent

Back in the old days of ASCII, U+0060 was a grave accent only to the same extent as U+0022 was an umlaut and U+0027 was an acute accent. That is, one possible unconventional coding was to overstrike those characters, and have their shape altered on sophisticated systems, or their meaning recognized by humans for output fr0m unsophisticated ones, to attain accents.

That this exotic coding is now claimed as the primary meaning of the character in the Unicode standard... is, I suppose, possible, but if so it does not give me great confidence in the committee responsible.

On the other hand, I will admit that U+005E was changed from an exponentiation symbol or up-arrow to a spacing caret back when ASCII-64 gave way to ASCII-68 (and lowercase was added). So this trend could indeed have continued when Unicode came along.

However, it does appear you are basically right, the story being made plain here:

http://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html

but as you can see, this situation is a changed compared to long-established usage... and not just in the X Window System either, but in countless ASCII terminals.

hrant's picture

Just so I'm clear: Is an apostrophe in text always encoded as a RIGHT SINGLE QUOTE MARK?

What are the chances somebody would write a Word and/or InDesign plug-in that goes through a text and changes the "intended apostrophes" to U+02BC? And/or some custom code point.

hhp

joeclark's picture

I simply cannot believe a British typographer would claim the following:

Is there a need for an apostrophe character distinct from the right-quote character? I don't think so: written and typographic practice is for these two characters to be identical, and I can't think of any behaviour distinctions.

Yet here we are.

Do I have to look up any of several of my photographs of sentences (and, worse, lines) using half-assed British quotation-mark rules that end in ’ and you cannot figure out what it means until the next line starts?

Have you never had to convert British quotes to U.S./Canadian quotes and found it next to impossible without manually inspecting the closing half of every quotation?

I thought Bringhurst was the only British typographer with a tendency to decree that real-world scenarios are impossible by definition so follow my advice and please stop bothering me.

hrant's picture

Joe, I for one would love to see your collection of examples of -what might be called- "apostrophe envy".

hhp

John Hudson's picture

Joe: Do I have to look up any of several of my photographs of sentences (and, worse, lines) using half-assed British quotation-mark rules that end in ’ and you cannot figure out what it means until the next line starts?

But it is precisely because written and typographic practice is for the right quote and apostrophe to be identical that such ambiguities occur. I'm not saying that there might not be reasons to want to distinguish them, I'm saying that's not how our writing system works. Sure, make the case that they should be distinguished, using the many examples you can surely find: that way lies spelling reform, alphabet reform, and tinfoil hats.

Anyway, who you callin' a typographer?

John Hudson's picture

Hrant: Just so I'm clear: Is an apostrophe in text always encoded as a RIGHT SINGLE QUOTE MARK?

No. Very often it is encoded as the generic U+0027. I'm guessing that's probably the most common encoding simply because that's what the keyboard makes convenient. In order to encode U+2019, whether as apostrophe or quote mark, one either needs to input it directly (ALT+0146 on Windows, custom keyboard, etc.) or rely on 'smart quote' algorithms, which work a lot of the time and provide work for proofreaders the rest of the time.

What are the chances somebody would write a Word and/or InDesign plug-in that goes through a text and changes the "intended apostrophes" to U+02BC?

I think you could expect such a plug-in to be about as accurate as smart quote algorithms. In other words, it would get it right almost all the time, but would get it wrong in some ambiguous circumstances. As Joe rightly points out, it is likely to get it wrong more often in British usage.

John Hudson's picture

Just noticed this earlier comment:

Hrant: BTW auto-quotes also mess up the Hawai‘ian ‘okina diacritic.

The proper character for this really should be U+02BB MODIFIED LETTER TURNED COMMA, but I'm sure it occurs encoded in many instances as either U+0027 or U+2018, and doubtless also U+2019 whether as the result of smart quote algorithms or ignorance.

I wouldn't classify it as a diacritic: it represents a glottal stop, which means it is a full consonant and considered a letter in the alphabet. And to be fair to the ignorant, in other orthographies the ’ is much more commonly found as a glottal stop than the ‘ shape.

hrant's picture

that's not how our writing system works.

In terms of an informal -if pervasive- tradition, I would have to agree. But where does it say "they must look the same"? It's just a lazy fallback (one that can cause confusion) and I don't think making the apostrophe and single right quote look different is any kind of "reform" - it's just a result of believing that's Good Design.

When I made Cristaal's right quote(s) point up, that was not Wrong, but neither was I following some formal system. And others can see it (or hear about it) and possibly follow suit, creating a new tradition. For a while -thanks largely to ATF- mirrored quotes (where the left quotes point down) were quite common (and interestingly the old MS Core Fonts did that too) but that was not some act of sedition.

Thanks for the correction/elaboration on the ‘okina. One thing I value BTW is that it's supposed to point up (because that makes it less confusable with the apostrophe).

hhp

John Hudson's picture

For a while -thanks largely to ATF- mirrored quotes (where the left quotes point down) were quite common (and interestingly the old MS Core Fonts did that too) but that was not some act of sedition.

Unless you're a German whose punctuation system is messed up by such designs.

I don't think it's a 'lazy fallback' that the apostrophe and right quote look the same. Its the outcome of an historical decision that this little mark ’ has more than one usage in text. Yes, it might sometimes result in confusion, but it's not at all uncommon for writing systems to contain such confuseables. Heck, we're talking about capturing natural language here: ambiguity, confusion, multiple meanings -- these are the very hallmarks of human communication.

hrant's picture

I remember that German situation being mentioned recently. I'm curious, would my upward (and inward) pointing quotes also not work out? Also, would German-localized versions of the quote glyphs solve the problem, or are language tags not well-supported?

Ambiguity is natural, but so are warts. Let's treat them.

hhp

jcrippen's picture

Letters:
ʹ — U+02B9 Modifier Letter Prime
ʻ — U+02BB Modifier Letter Turned Comma
ʼ — U+02BC Modifier Letter Apostrophe
ˊ — U+02CA Modifier Letter Acute Accent
ˋ — U+02CB Modifier Letter Grave Accent

Puncutation:
' — U+0027 Apostrophe
‘ — U+2018 Left Single Quotation Mark
’ — U+2019 Right Single Quotation Mark
′ — U+2032 Prime
‵ — U+2035 Reversed Prime

Symbols:
` — U+0060 Grave Accent
´ — U+00B4 Acute Accent

The Unicode standard defines these three categories according to their expected behaviours in various writing systems and other forms of written communication (math, etc.). Letters are alphabetic elements, like the use of ʻ U+02BB Modifier Letter Turned Comma in Hawaiian to represent the glottal stop /ʔ/. Punctuation is an element that is paralinguistic, used for indicating textual phenomena that are not necessarily part of the spoken language. (So e.g. commas have an associated intonation contour in English, but commas don’t always occur where this intonation is found in speech and vice versa. The previous sentence is an example.) Symbols are something else entirely, and are kind of hard to define I guess.

There’s a pretty good argument to be made for the use of ʼ U+02BC Modifier Letter Apostrophe in English where we use apostrophes for contraction and possession: ‹ donʼt › and ‹ dogʼs ›. It is in essence an orthographic element that distinguishes different lexical items, and that’s what we usually think of as a “letter” even though the apostrophe doesn’t have an independent sound of its own. But it’s hard enough to get people to use ’ U+2019 Right Single Quotation Mark instead of ' U+0027 Apostrophe, so that asking people to differentiate the quotation ’ and letter-apostrophe ʼ is just tilting at windmills.

As for rock ’n’ roll, I think it’s best with the two apostrophes pointing the same way. Logically they are both apostrophes and not quotation marks, and English doesn’t have an apostrophe that points in the other direction. (Actually I don’t think any LGC orthography does, but I could be wrong). Writing ‹ rock ‘n’ roll › makes me think at first that the ‹ n › is being scare-quoted.

hrant's picture

Nice details and analysis. I think U+02BC is sounding pretty solid indeed.

Just one thing:

But it’s hard enough to get people to use ’ U+2019 Right Single Quotation Mark instead of ' U+0027 Apostrophe, so that asking people to differentiate the quotation ’ and letter-apostrophe ʼ is just tilting at windmills.

It's not "people" that need to worry - they can't type anything more a "dumb" quote/apostrophe anyway; we need the software to automatically map to U+02BC (as best it can) as needed.

hhp

John Hudson's picture

Hrant: Also, would German-localized versions of the quote glyphs solve the problem, or are language tags not well-supported?

They are unevenly supported. Also, punctuation substitutions are unreliable in OpenType because there is a tendency in some software not to roll punctuation into glyph runs with adjacent text. Remember, OpenType Layout proceeds from script to language system to glyph, but the decision about what constitutes a character in a given script is made by the software, not by the font. Since a lot of punctuation is script-neutral, it can only pick up a script identity by algorithmic analysis of adjacent or surrounding text content. But that doesn't happen everywhere, while some software might simply presume common punctuation characters = Latin, which might help your German quote situation, but is a pain in the neck when trying to e.g. kern tall Thai vowels to preceding quote marks or parentheses!

With regard to the German quote issue, the desirable form of the 'left quote' U+2018, i.e. the German closing quote, is a 180 degree rotated and raised form of the opening baseline quote with which it corresponds.

hrant's picture

Great insights - thanks.

BTW, every passing day, I like guillemets more. :-)

hhp

John Hudson's picture

John (Q):

Back in the old days of ASCII, U+0060 was a grave accent only to the same extent as U+0022 was an umlaut and U+0027 was an acute accent. That is, one possible unconventional coding was to overstrike those characters, and have their shape altered on sophisticated systems, or their meaning recognized by humans for output fr0m unsophisticated ones, to attain accents.

That this exotic coding is now claimed as the primary meaning of the character in the Unicode standard... is, I suppose, possible, but if so it does not give me great confidence in the committee responsible.

You are confusing things by referring to U+0060, U+0022, etc. and then talking about ASCII. The prefix U+ indicates a Unicode codepoint, i.e. a character in the Unicode Standard, not some other standard. So it makes no sense to talk about e.g. U+0060 'back in the old days of ASCII'. The Unicode 'C0 Controls and Basic Latin' block provides a one-to-one mapping of Unicode characters to ASCII characters, which is not the same thing as being the ASCII standard. As you say, the ASCII standard deliberately enabled the interpretation of some codes as representing multiple characters. A principal -- and principle -- goal of Unicode's larger character set was to avoid such confusions, so I think the UTC was eminently sensible and responsible in assigning only one meaning to the Unicode character U+0060, allowing the other interpretations of the corresponding ASCII code to have their own unique Unicode assignments. I also think they made the right choice in selecting the spacing grave as the identity of this character given that the same block includes a corresponding spacing acute character, and the single quote character is handled as deliberately direction agnostic in almost all software -- note as a 'right single quote' and as a vertical glyph in almost all fonts, and there would be in any case no corresponding 'left double quote' if U+0060 were interpreted as a 'left single quote'.

I still occassionally get emails from people that are punctuated like `this'. It is so obviously a mistake, I have to wonder what combination of software and font they might be using that doesn't display it as such, or if they are blind.

Michel Boyer's picture

I have to wonder what combination of software and font they might be using that doesn't display it as such, or if they are blind.

That is the way to get the right thing in LaTeX. Also ``word'' gives the right double quotes.

If you want to get that behaviour in XeLaTeX, you need to specify Mapping=tex-text when setting the font, for instance:

\setromanfont[Mapping=tex-text]{Chaparral Pro}

hrant's picture

John, that chaps my hide too.

hhp

Nick Shinn's picture

Smart quote software could be made smarter, to include a dictionary of "exceptions" such as the first apostrophe in rock 'n' roll. After all, look at the way the new Blackberry reads people's minds and finishes their sentences for them.

jcrippen's picture

I think the reason that’s not caught on – outside of Microsoft Word, perhaps – is because you have to have a different set of exceptions for each language. If all you care about is English and French then it’s not too hard, but once you start including even other big languages like Spanish, German, Dutch, and Italian you’ve got a huge pile of databases to build and maintain.

quadibloc's picture

@John Hudson:
On considering the matter more, I can see that the Unicode Consortium decision probably was quite reasonable.

I didn't want to start referring to Unicode ' as U+0027 and ASCII ' as X'27' as that would confuse people.

Since ' was used as the only quote much more than ` and ' were used as paired quotes - ` was, and is, so little used that I kind of wish that codepoint were used for, say, the degree symbol - I can somewhat see the logic of using ` for a grave accent.

But that just felt wrong, simply because accents were far to exotic to be part of the primary 7-bit ASCII set. I also felt that ^ should never have been changed from the up-arrow, so useful as an exponentiation operator.

Thus, when ISO 8859-1 came along, with all those accented letters, but without desperately needed symbols such as ≤, ≥, and ≠, I could only wonder what they were thinking. (On the other hand, placing × and ÷ on codepoints obviously more suitable for Œ and œ simply further compounded the insanity in the opposite direction. After all, * and / were perfectly good for multiplication and division, unless one was writing grade school arithmetic textbooks.)

Of course, the whole world doesn't speak English. So a character set like ISO 8859-1 was indeed a good idea. But it should have been the -2 set, as it were, in my opinion. There already were alternate versions of 7-bit ASCII for the major European languages; what would make sense from my perspective would have been to allow any of those to have a supplementary set of high-bit characters bolted on. (And, thus, characters likely to be replaced - @, [, \, ], ^, _, `, {, |, }, ~ - would end up being copied/moved to the high-bit side) since, in general, after all, people only use one language at a time.

And people who speak different languages clearly aren't able to communicate with each other, and so having the same character coding for areas where different languages are spoken... could wait until we went to 16 bits with Unicode (although, again, a set like 8859-1 for the specialized and exotic purpose of international communication certainly would be of some use).

Of course, strange to relate, in Continental Europe, people don't share the view of people living in, say, North America or Australia that anyone who speaks a different language either lives thousands of miles away or is a poor immigrant who is going to be learning your language instead of the other way around - because of the disparity in the economic value of the effort required.

Syndicate content Syndicate content