apostrophe vs. prime sign

emspace's picture

Me and my fun questions...

Can anyone tell me why the unicode named apostrophe almost always contains a prime symbol?

Thank you!

JamesT's picture

I'm sure someone will correct me if I'm wrong but I believe the "apostrophe" is supposed to be completely vertical in Roman faces while the prime is slanted.

My guess as to why the prime symbol is usually placed there is that most people prefer using "smart quotes" but want an easily accessible prime symbol.

emspace's picture

Thank you! The Wikipedia entry is very technical but am I right in assuming that it is a technological heritage and that it's not changed because it would impact too many files and too many people?

Emilie

Nick Shinn's picture

That is correct.

Unicode is a fairly recent idea, and the distinction it makes between characters, glyphs and code points was quite the innovation.
Basically, it means that each linguistic character is designated a code point (an alphanumeric number), and may be represented by a glyph at that code point.

Previous encoding systems, most notably ISO 8859, encoded glyphs rather than characters, so were not so linguistically precise and in many instances there were conflicting code systems—hence the need for one code to rule them all.

And way before that, the idea that one piece of type (i.e. glyph) may encode several different characters was representative of the economic resourcefulness traditionally exercised by printers. In fact, prior to the introduction of hot metal equipment (Linotype) in the 1880s, the left quote marks were set from the same piece of type as the comma, merely rotated 180°.

Many typewriters had no "one" or "zero" characters, the typist being expected to make "l" and "O" do double duty.

The situation with the "dumb" quote marks stems from continuing support for older fonts that use the ISO encoding.
So really, it's a double-edged sword deriving from the longevity of fonts as software.
It may have been possible, when Unicode started c. 1990, for the Unicode Consortium to have moved to obsolesce the dumb quote encoding of ISO 8859, but it was a young organization and such a move was, I assume, not expedient.

Another legacy that Unicode incorporates, although inconsistent with its premise, is the encoding of the glyphs "fi" and "fl" as if they are single characters and not merely typographic ligatures.

Even today, there are no separate Unicode points for the right quote and the apostrophe in the basic encodings, hence the debacle with "smart" layout applications abbreviating numbers with left quote marks. Rather pathetic, if you ask me.

emspace's picture

Context is everything, thank you :-)

I wish they had done the move, then I wouldn't look dumb when students ask me "why does the prime symbol says apostrophe when you hover over it in the glyphs panel?". Oh well, now I can explain it to them and look even smarter! ;-)

"Another legacy that Unicode incorporates, although inconsistent with its premise, is the encoding of the glyphs "fi" and "fl" as if they are single characters and not merely typographic ligatures."

I'm not sure I understand this fully. Isn't this a good thing? Is this the reason I can find, let's say, "flow", if I'm doing a Find/Replace in InDesign? It's nice to be able to find the word even if it uses a ligature. Or are you just saying they're not consistent because they didn't do the right move for the apostrophe?

Cristobal Henestrosa's picture

Unicode deals with the meaning of characters, not with typographic niceties. You can design a font with all kind of crazy ligatures (fi, fl, ff, ffi, ffl, fff, fb, ffb, fh, ffh, fk, ffk, ct, st, sp…) but you shouldn’t assign a codepoint for all of them, only to the characters in which the ligature can be decomposed. Therefore, you can find the sequence “fl” because it is composed by an “f” and an “l”, no matter if it is ligated or not.

This is a relatively new idea. Before Unicode, it was thought that treating ligatures as characters was harmless (and probably the only way to go). Now this is considered a bad practice: someone can type fl (uniFB02, as I did here) for the sake of having a nice ligature, so you won’t be able to find your word because you are looking for f (uni0066) and l (uni006C).

Cristobal Henestrosa's picture

However, as Nick said, Unicode incorporated fi (uniFB01), fl (uniFB02) and some other ligatures (ff, uniFB00; ffi, uniFB03; ffl, uniFB04; ſt, uniFB05; st, uniFB06) for backward compatibility reasons: some texts were written with those characters in the past – and, since it is allowed by Unicode, even now.

Té Rowan's picture

Don't forget MUFI (and its German subset, UNZ1), which uses the Private Use Area to encode various extra ligatures among all the manuscript oddities. IIRC, the idea is to enable making accurate manuscript renditions for print and display, even in old apps.

guifa's picture

Yes, it's important to remember that by design, Unicode is supposed to be a semantic encoder, not a glyph encoder. For 99.99% (possibly more) text documents, that is the preferred method. However, for certain specialized topics, mainly scholarly, it is as important (or more important) to note the actual glyph changes more so than the underlying semantic meaning for any number of reasons.

In comes the private use area. Because Unicode explicitly doesn't encode anything there, by using those code points, you can ensure that specialized encodings don't tread on the standard encodings. In practice, the PUA is used for three things really:

1. encoding logos/etc. for use in text (e.g. the Apple logo)
2. encoding new Unicode blocks which are still being proposed / in the process of being approved
3. encoding functionally, graphically, or featurally, but not semantically, important differences in glyphs

The second one tends to result in a wholesale move of the block (so, you might use four rows in your proposal, they'll just become four rows in a different block which makes it easier to convert).

For the last one, in order to make the non-standard somewhat standard, several different initiatives have started up, namely MUFI, UNZ1, and CSUR. Those projects agree not to step on each other's areas. Sometimes some of their glyphs will end up in the Unicode standard, but most of them will probably stay in the PUA.

quadibloc's picture

@guifa:
Yes, it's important to remember that by design, Unicode is supposed to be a semantic encoder, not a glyph encoder.

@Cristobal Henestrosa:
Before Unicode, it was thought that treating ligatures as characters was harmless (and probably the only way to go).

Well, before Unicode, people conceptualized printers as something you hooked up to a computer to print glyphs in response to codes sent from the computer. One code, one glyph. If fancy steps to insert ligatures in text were required, they would be done inside the computer - since, of course, computers are big and expensive, and thus printers are dumb.

And codes like ASCII and EBCDIC are principally communications codes, governing things like the connection to a printer.

We've come a long way from then to now. Unicode indeed is eminently suitable to allowing documents to be interchanged between different computer systems, containing text in different languages, and so on. It facilitates applications, though, that were just about unimaginable in the world of mainframes and punched cards.

@emspace:
To be less technical than Wikipedia:

The first 128 charcters in Unicode match those of an older code, ASCII.

When ASCII was originally invented, the code positions used for lower-case letters were left unassigned. Thus, ASCII 39 (') didn't look like either an apostrophe or a prime, but like the single-quote symbol on a typewriter - it was vertical.

When lower-case was added to ASCII, some terminals left that character as it was, but others sloped the character so it could be either a directed quote or a prime, making a pair with the new reverse-quote, ASCII 96, (`), which had only just been added, along with the lower-case letters. (The characters { | } and ~ were also added to ASCII at that time.)

Before lower-case was added to ASCII, ^ was an up-arrow instead of a caret and _ was a back-arrow instead of an underscore, also.

And | was a broken vertical bar because the EBCDIC vertical bar was mapped to ! instead of it, and the EBDIC logical not was mapped to ^ instead of ~, because PL/I programmers insisted that all their characters needed to be in the uppercase-only subset, thus preventing ASCII ! from translating to EBCDIC !... a nightmare.

Basically, though, the rule is:

When you think of Unicode, think of ASCII.

When you think of ASCII, think of this.

Té Rowan's picture

The Sinclair ZX Spectrum, which did have 7-bit ASCII, had the up-arrow instead of the caret. It also had a pound sign in position 96 instead of the backtick.

Khaled Hosny's picture

TeX had automatic ligatures in the late 70s, you type f and i and get an fi ligature in the output, and I'm sure it was not alone in that, so I don't understand this whole "before Unicode" thing unless in the context of certain applications.

Jallanite's picture

On typewriters, in order to keep the number of keys down, it was customary to simplify some of the glyphs so that they could be used for more than one character.

The ' symbol is one such glyph, to be used for an apostrophe and closing single quotation mark ( ’ ), for an opening quotation mark ( ‘ ), and for a prime mark ( ′ ), and for the top half an exclamation mark which would be built from ( ' . ).

Computer keyboards and early computer character sets mostly followed typewriter practice in using the ' symbol. See http://www.fileformat.info/info/unicode/char/0027/index.htm . The name APOSTROPHE was applied to it in many standard character sets and Unicode followed suit. It was also used for the prime mark which also stands for “feet” and “minutes”, but in good typography the prime mark slants.

Unicode distinguishes the following symbols which are somewhat alike:
APOSTROPHE: http://www.fileformat.info/info/unicode/char/0027/index.htm [ ' ] (typewriter apostrophe–quotation-mark)
LEFT SINGLE QUOTATION MARK: http://www.fileformat.info/info/unicode/char/2018/index.htm [ ‘ ]
RIGHT SINGLE QUOTATION MARK: http://www.fileformat.info/info/unicode/char/2019/index.htm [ ’ ] (typographical closing quotation mark and apostrophe)
SINGLE HIGH-REVERSED-9 QUOTATION MARK: http://www.fileformat.info/info/unicode/char/201b/index.htm [ ‛ ] (variant opening quotation mark)
PRIME: http://www.fileformat.info/info/unicode/char/2032/index.htm [ ′ ] (genuine prime mark)
MODIFIER LETTER VERTICAL LINE: http://www.fileformat.info/info/unicode/char/02c8/index.htm [ ˈ ] (marks stressed syllable in some phonetic notations)
MODIFIER LETTER TURNED COMMA: http://www.fileformat.info/info/unicode/char/02bb/index.htm [ ʻ ] (when the character represents a sound in some phonetic notations and transliterations)
MODIFIER LETTER APOSTROPHE: http://www.fileformat.info/info/unicode/char/02bc/index.htm [ ʼ ] (when the character represents a sound in some phonetic notations and transliterations)
MODIFIER LETTER REVERSED COMMA: http://www.fileformat.info/info/unicode/char/02bd/index.htm [ ʽ ] (when the character represents a sound in some phonetic notations and transliterations)
LATIN CAPITAL LETTER SALTILLO: http://www.fileformat.info/info/unicode/char/a78b/index.htm [ Ꞌ ] (sometimes used for a sound in Nahuatl and related tongues)
LATIN SMALL LETTER SALTILLO: http://www.fileformat.info/info/unicode/char/a78c/index.htm [ ꞌ ] (sometimes used for a sound in Nahuatl and related tongues)

Té Rowan's picture

@Khaled... Everybody Knows that there was nothing before Macintosh'n'Windows.

Syndicate content Syndicate content