Where to get info on unicode characters?

agisaak's picture

I'm wondering if anyone can point me to a decent reference on the use of specific unicode characters. I'm specifically interested in the Spacing Modifier Letters block. The descriptions given in the unicode code charts don't actually provide examples of use and this is causing some problems for me.

The actual code chart lists the subrange 0x02D8–0x02DD as 'Spacing Clones of Diacritics', but there are a variety of other characters in this block which are also simple spacing clones (e.g. 0x02C6 'modifier letter circumflex accent') and some which might appear to correspond to a combining mark but which are also used as spacing marks. For example, 0x02CC 'modifier letter low vertical line' can't really be viewed as a simply a spacing clone of 0x0329 'combining vertical line below' since the spacing form represents a stress mark whereas the combining form is used to indicate syllabicity in the IPA or as a vowel modifier in, e.g., Yoruba.

Unfortunately, there are many characters in this range which I am unsure of. For example, I don't know if the half-rings (0x02BE-0x02BF) are used as independent forms with usages similar to apostrophes, or if these are simply spacing versions of the corresponding combining forms. In the former cases, I'd assume the spacing versions should be lower and more prominent, but I can't find info on these and many other characters in this block.

Any suggestions would be appreciated.

André

Andreas Stötzner's picture

The short answer to your worthwhile question is: there is no such a thing as ‘a decent reference on the use of …’. Least of all Unicode itself. However, there is decodeunicode.org where you may find some information you search for. But this is a project relying entirely on voluntary contributions, therefore it is by no means authoritative or in any sense affiliated with Unicode.

To be entirely honest, I myself have not really a clue to what the spacing accents in the 02Bx range are really good for, with a few exceptions. (Maybe, they are just a relict of the ASCII aera?) At least, the half rings 02BE and 02BF are commonly used for Latin transcriptions of Arabic, as a stand-in for the ‘Hamza’. What else? I don’t know.

A comprehensive professional source for informations of the kind you ask for would be a blessing. But, who cares? Who would fund that sort of work?!!
Never forget: Unicode is a second-hand shop, not a specialist supplier. Don’t expect too much.

R.'s picture

Here’s a list I made. You might find it useful.

02B0 (ʰ) – IPA: aspiration
02B1 (ʱ) – IPA: ‘voiced’ aspiration
02B2 (ʲ) – IPA: palatalisation
02B3 (ʳ) – IPA: general ‘r-colouring’
02B4 (ʴ) – IPA: alveolar approximant ‘r-colouring’
02B5 (ʵ) – IPA: retroflex approximant ‘r-colouring’
02B6 (ʶ) – IPA: uvular fricative ‘r-colouring’
02B7 (ʷ) – IPA: labialisation
02B8 (ʸ) – APA: palatalisation
02B9 (ʹ) – AHD: primary stress/UPA: palatalisation
02BA (ʺ) – don’t know, maybe strong primary stress in some system
02BB (ʻ) – ALA-LC: substitute for 02BF
02BC (ʼ) – ALA-LC: substitute for 02BE
02BD (ʽ) – UPA backward compatibility (modifier version of 0314)
02BE (ʾ) – latinisation of Arabic Hamza or Aleph
02BF (ʿ) – latinisation of Arabic Ayin
02C0 (ˀ) – IPA/APA: glottal stop (or glottalisation)
02C1 (ˁ) – don’t know, larger version of 02E4
02C2 (˂) – UPA backward compatibility (modifier version of 0354)
02C3 (˃) – UPA backward compatibility (modifier version of 0355)
02C4 (˄) – don’t know (maybe UPA)
02C5 (˅) – don’t know (maybe UPA)
02C6 (ˆ) – UPA backward compatibility (modifier version of 0302)
02C7 (ˇ) – UPA backward compatibility (modifier version of 030C)
02C8 (ˈ) – IPA: primary stress
02C9 (ˉ) – UPA backward compatibility (modifier version of 0304)
02CA (ˊ) – UPA backward compatibility (modifier version of 0301)
02CB (ˋ) – UPA backward compatibility (modifier version of 0300)
02CC (ˌ) – IPA: secondary stress
02CD (ˍ) – low version of 02C9
02CE (ˎ) – low version of 02CA
02CF (ˏ) – low version of 02CB
02D0 (ː) – IPA: long sound
02D1 (ˑ) – IPA: half-long sound
02D2 (˒) – IPA backward compatibility (modifier version of 0339)
02D3 (˓) – IPA backward compatibility (modifier version of 031C)
02D4 (˔) – IPA backward compatibility (modifier version of 031D)
02D5 (˕) – IPA backward compatibility (modifier version of 031E)
02D6 (˖) – IPA backward compatibility (modifier version of 031F)
02D7 (˗) – IPA backward compatibility (modifier version of 031G)
02D8 (˘) – UPA backward compatibility (modifier version of 0306)
02D9 (˙) – UPA backward compatibility (modifier version of 0307)
02DA (˚) – latinisation of Abkhaz ә
02DB (˛) – backward compatibility (modifier version of 0328)
02DC (˜) – backward compatibility (modifier version of 0334)
02DD (˝) – backward compatibility (modifier version of 030B)
02DE (˞ ) – IPA: rhoticity (‘r-colouring’)
02DF (˟) – IPA backward compatibility (modifier version of 033D)
02E0 (ˠ) – IPA: velarisation
02E1 (ˡ) – IPA: lateral release
02E2 (ˢ) – IPA: alveolar release
02E3 (ˣ) – IPA: velar release
02E4 (ˤ) – IPA: pharyngealisation
02E5 (˥) – IPA: extra high tone
02E6 (˦) – IPA: high tone
02E7 (˧) – IPA: mid tone
02E8 (˨) – IPA: low tone
02E9 (˩) – IPA: extra low tone
02EA (˪) – Departing tone mark for some languages
02EB (˫) – Departing tone mark for some languages
02EC (ˬ) – IPA backward compatibility (modifier version of 032C)
02ED (˭) – IPA: no aspiration
02EE (ˮ) – used in Uralic and African orthographies
02EF (˯) – low version of 02C2
02F0 (˰) – low version of 02C3
02F1 (˱) – low version of 02C4
02F2 (˲) – low version of 02C5
02F3 (˳) – UPA backward compatibility (low version of 02B0)
02F4 (˴) – UPA backward compatibility (middle modifier version of 0300)
02F5 (˵) – UPA backward compatibility (middle modifier version of 030F)
02F6 (˶) – UPA backward compatibility (middle modifier version of 030B)
02F7 (˷) – UPA backward compatibility (low version of 02DC)
02F8 (˸) – maybe UPA backward compatibility
02F9 (˹) – UPA: begin of high tone
02FA (˺) – UPA: end of high tone
02FB (˻) – UPA: begin of low tone
02FC (˼) – UPA: end of low tone
02FD (˽) – don’t know (maybe UPA)
02FE (˾) – don’t know (maybe UPA)
02FF (˿) – don’t know (maybe UPA)

_savage's picture

I found this website helpful. It gives me the Unicode code points, the characters (if my browser font provides them), as well as their UTF-8 encoding. In your case, some information on the "Spacing modifier letters" you mention above can be found here.

Having said that, I'm sometimes careful with more exotic characters like these because applications don't necessarily support them correctly, nor do font files provide their respective glyphs. As you say, it's a bit of a fuzzy mess :-)

Michel Boyer's picture

If you find references, please share with us. For actual uses, I can see no better tool than Google. If I put the single character ʾ (0x02BE) in a Google window and do a search, I get documents (books, a Word text on Syriac) where it is used. As for the “intended use”, I have no clue.

Michel Boyer's picture

Here are two relevant links for U+02BE and U+02BF

http://en.wikipedia.org/wiki/Modifier_letter_left_half_ring
http://en.wikipedia.org/wiki/Modifier_letter_right_half_ring

Just two statements without references.

Syndicate content Syndicate content