Unicode

raphaelb's picture

Unicode, tell me more

What is the best way to find informations about use and functions of a specific unicode character?
For instance I would like to know, why Ƀ (U+0243) exists, when it was created...
Thank you for your help.

Hello dear typophiles,

I'm working on typesetting a document with mixed latin and greek text. My problem is not being able to find out what font was used for the greek, or even a suitable replacement.

The "manuscript" that I received is an XML file, so I have no information on what font was used for the typesetting of the previous edition, and I'm having a hard time finding out what font was used.

javascript's picture

Closed Minded

Copy from http://twitter.com/openvclosed showing weasel word racists
(mostly from British Linux operating system user groups (LUGs)) trying
to suppress the alteration of the 11th letter shape from k/K to
unicode 0915 shape meant for spiritual reasons.

Nix 31 Oct 2010 Gllug says "We're _too short of decent free font
designers. Please stop trying to drive them off"

Tig 1 Nov 2010 Staffslug says "Don't forget the chic?en ?orma"

Andrew Edwards 1 Nov Staffslug says "_sacred meat"

martin rome 1 Nov Staffslug says "_people_"think"_i seriously hope
this is a joke_
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK :P"

Peter Cannon 1 Nov 2010 Staffslug says "_2 X Onion bhaji 1 X
Chicken Phal_ please_"

Hello everyone.

I'm making a PHP application to generate text labels. The PHP libraries seem to don't support OpenType features.
Now I have no problem working with unicode index to display the standard characters.
But, as far as I know, the OpenType features are name based.
For example: if I want to display a ligature «my» I have to replace «m» and «y» unicodes indexes by «my» unicode index.
The problem now is I don't have unicode indexes on this characters and I can't reffer to them by their names.

The questions are:

Is there any existing unicode for these glyphs and I am missing them?
Do I need to reencode all the glyphs?
Which is the best way to reencode a font and/or generate unicode names for this purposes?
Is there any standard?

I'm looking for a font that will be able to translate some text my sensei sent me. I'm trying to design a card for him but for the life of me can not find a decent Japanese font for it (with Kanji, found a few kana ones). I think I might need unicode ?

http://www.wazu.jp/gallery/samples/AoyagiKouzanFont2OTF__Japanese.gif
looks awesome. but doesn't seem to work on my computer. I've installed for 20 and none seem to translate well, which makes me think I need unicode version?

found it here http://www.wazu.jp/gallery/Fonts_Japanese.html

I need to use this text:
山下武道こくさいきょかい

and help/direction would be amazing.

Ignore this post if you’re reading it on a Mac.

Either I was installing it wrong previously, or Microsoft fixed it in 1.4, and I was using 1.3, but finally I got Microsoft Keyboard Layout Creator to install and run in Windows 7, and I used it to build a custom German/English “typographer’s keyboard” layout. (My physical keyboard has a standard German layout.) This enabled me to add a bunch of extra punctuation and some archaic stuff such as ſ.

I also managed to add two dingbats from the Unicode Zapf Dingbats subset range, U+2766 and U+2767.

❦ ❧

If you can see those, you are seeing them in pure Unicode. Windows is switching fonts somewhere to display them.

kfitch's picture

Unicode Conversion Issue

Hello,

Our users are experiencing a very discouraging issue in regards to how MS Word (in Windows) handles non-unicode characters. This issue is confirmed in both Word 2007 and the Word 2010 Beta using Windows XP SP3; I suspect it works the same way in 2003.

Issue:
1) A user creates a document using a non-unicode font, entering characters to represent scientific notations. For example, he enters a Mu (µ). Note: I pasted in a unicode-compliant Mu for reference.
2) The user opens his document and attempts to copy / paste this non-unicode character representing a Mu into a web browser for entry into our system. It pastes as an unrecognized character. This is expected.

Michel Boyer's picture

Python and codepoints above FFFF

Here is a Python script that dumps to the output a utf-8 input file. The script works fine on Linux but if the input contains characters above U+FFFF it does not behave as expected on the Mac with whatever version of Python I use (I tried with Python 2.5, and 2.6 on OS X 10.5, and with Python 2.5, 2.6 and 3.1 on OS X 10.6).

Hi all,

I tried asking this question over at the FontLab forum, but there doesn’t seem to be very much activity there, so I’m trying here as well. Apologies in advance if this question has been asked before – at any rate I have not been able to find an answer in the archives.

I am developing a font which includes a large number of glyphs in the Private Use Area. For these I would like to use my own names, primarily because many of them have alternate forms accessible through aalt, stylistic sets etc. Coding would get much easier if I could use semantic names rather than “uniExxx”, especially in case I want to change the Unicode index of a glyph (each time I do that, I have to track down every reference to that glyph in the code and change the name).

Igor Freiberger's picture

Unicode Questions

I was analyzing Unicode tables and some pro fonts to understand how it works. Even after navigate through the huge Unicode documentation, some doubts remains:

1. Unicode tables does not includes variations for small caps, petite caps, swashes, beginnings, endings and alternates. So, all these glyphs will have no Unicode set while the font development. Correct?

2. When the font is generated, these glyphs without Unicode are recorded on Private User Area and receive a Unicode assigned by the font generator program. Correct?

3. Glyphs without a Unicode definition works correctly but are identified as NULL in InDesign glyph palette. Is there a way to replace this NULL name by a descriptive one?

Google now has the reading abilities of a teenager and can read f-ligatures: “[T]he characters fi can... be represented as two characters (f and i) or a special display form . A Google search for [financials] or [office] used to not see these as equivalent – to the software they would just look like *nancials and of*ce. There are thousands of characters like this, and they occur in surprisingly many pages on the Web, especially generated PDF documents.

I am about to assign a bunch (approx. 500) of Private Use UNICODE to the following which do not have official UNICODES and are not yet [to my knowledge] covered by Adobe Glyph List:

Nut fractions
Annotation Superiors
Superior Punctuation
Small Cap Numbers
Small Caps Punctuation
Tabulated Numbers
Titling Capitals (x140ish)
Titling Lowercase (x140ish)
Titling Numbers
Reversed Encapsulated Sansserif Capitals
and perhaps a few other ornaments (not dingbats) etc.

Q. Are there any established Code-blocks for the listed glyphs above *agreed* between foundries?

I am posting here before approaching anyone individually. My next step is to check the future of the Adobe Glyph List but first I want to gauge the reaction to having so many apparently missing UNICODES.

Time and time again, the use of any character in a post subject line or heading beyond the US-ASCII repertoire causes such characters to be incorrectly escaped. The entire site has to be UTF-8-compliant.

Hello everyone,

I am in need of your suggestions.

A little background is needed: the University where I work and study is using the Angel LMS. The biblical language professors are trying to find a font that could be standardized for both Hebrew and Greek.*

What are our options, considering the following requirements:

  • Unicode
  • Must have support for Greek & Hebrew (must point correctly in Hebrew)
  • License: something that would allow us to upload to our own server and distribute to registered students only** / or system font (cross-platform compliant)

*They're open to using separate fonts for Hebrew / Greek if needed.
[EDIT] ** Through a secure LMS.

I'd appreciate your help.

Dan

Haansoft Standard
Symbols 1
Symbols 2
Latin/Numbers
Korean Jamo
Greek
Box Shapes
Unit Symbols
Enclosed Letters
Parenthesized Numbers
Hiragana
Katakana
Russian
Special Languages
Fractions/Superscripts
Phonetic Alphabet
Extra European
Pinyin, Kanji
Arrows
Parentheses
Mathematical Signs
Parenthesized Kanji
Punctuation
Currencies
Letterlike Signs
Superscripts, Others
Numbers
Extra Signs and Dingbats
Science Symbols
Kanji 1
Kanji 2

Korean Wansung Characters
Extra Symbols
Fullwidth ASCII Characters
Korean
Double Korean
Roman Numerals
Greek
Box Drawing
Unit Symbols
Extra Roman Signs and Others
Kana Letters
Cyrillic Letters
Hangul Syllables
Kanji

Unicode 3.2 Characters
Basic Latin

Indices : Technical Info : Haansoft Unicode Blocks and UniPad Features

This article is designed to list which Unicode blocks are supported by corresponding font and all of the features supported by Sharmahd Computing's UniPad.

Supported Platforms
Microsoft Windows 9x, Windows Me
Microsoft Windows NT 4.0, Windows 200X, Windows XP
WINE HQ, for x86-based Unixes, including Linux, FreeBSD, and Solaris
Windows CE ≥
Microsoft Windows 3.x
Linux / X-Windows
Mac OS X
Palm OS ≥

General Features
Full BiDi support (Hebrew, Arabic, Thaana, Syriac)
Rendering of Arabic contextual forms
Separated rendering of non-spacing marks
Combined rendering of non-spacing marks (!)
Normalization (maximal decomposition)

Indices : Technical Info : Language Coverage

This article is designed to list which languages are supported by corresponding Unicode blocks. The rule used here to list languages is to list under scripts that have either official status or are still in popular usage, ie. Azerbaijani officially uses the Latin script, but many Azerbaijanis still use Cyrillic extensively.

Latin-1:

Albanian
Danish
Dutch
English
Faroese
Finnish
Flemish
German
Icelandic
Indonesian
Irish
Italian
Malay
Norwegian
Portuguese
Scottish Gaelic
Spanish
Swahili
Swedish
Tagalog

Latin Extended-A:*

Afrikaans
Basque
Bosnian
Breton
Catalan
Chichewa
Cornish
Croatian
Czech
Esperanto
Estonian
Fijian
French

Syndicate content Syndicate content