Convert text to glyph names

Bahman Eslami's picture

Hello
I use Fontlab for create digital typefaces and when I want to test my font in metrics window I need to type every glyph name and in some scripts (like arabic) it's not easy.
I need a utility to convert my text (it could be in a PDF) to glyph names of the font used in the text file. I did a comprehensive search and I couldn't find anything useful. is there anything out there that could do such a thing? or maybe convert text to Unicode values so I can convert them to glyph names?

any suggestions are appreciated,
Thanks.

cerulean's picture

You should be able to paste the text directly into the Metrics window in text mode. The text field at the bottom of the window will then show it with all the characters above Basic Latin converted into glyph names or Unicode numbers depending on what mode your Font window was in when you opened the Metrics window.

Bahman Eslami's picture

Hi Kevin,
thanks for the reply,
It does work for Latin texts, but when I paste an Arabic text, it replaces the text with question marks, no matter which mode my font window was in. I use Mac osx, in windows it pastes a mangled text and no glyph shows up. I use fontlab 5.0.4 on mac. In Arabic situation is more complicated compared to latin, because when I copy the text, Os just copies the text without considering the initial or medial forms (they are rendered using OpentType features). so I think only way for me is converting text to unicode values or glyph names. or maybe there is something that I'm missing? did any glyph show up when you pasted an arabic text?

John Hudson's picture

I needed to convert Cyrillic text to glyph names once, and ended up creating a Word macro to run through text and convert Unicode characters to glyph name strings.

Bahman Eslami's picture

Hi john,
How did you got Unicode values in the first place? Your macro reads unicode value of characters? could I have the code?
Thanks

John Hudson's picture

I didn't need the Unicode values, since I was mapping to human-friendly glyph names, not uniXXXX format. So I just had a list of characters and my own corresponding glyph names.

You could use Word's ToggleCharacterCode function, which converts a character to its Unicode hex value or vice versa (manually, you can do this with alt+X after a character or code). So what you want a macro to do is to step through the text one character at a time and convert each character to its Unicode value preceded by 'uni'. I'm not a good enough programmer to figure out how to do this, but someone here probably is. Note also that you'll want to try to control the length of the Unicode hex, if that is possible, since Word by default does not include preceding 0s; so, for instance, converting 'A' to Unicode hex produces '41' not '0041'.

Michel Boyer's picture

If you save the following lines

import sys, codecs
infile=codecs.open(sys.argv[1],"r","utf-8")
text=infile.read(); infile.close()
for char in text:
   print 'uni%04x ' % (ord(char)),

to a file named dmptxt.py, then

python dmptxt.py file.txt

will take take a utf8 encoded file file.txt and output the list of corresponding uniXXXX.

John Hudson's picture

Brilliant. Thanks, Michel.

Bahman Eslami's picture

Awesome, Thank you Michel & John.

vanblokland's picture

With a font open, in python, make a dict with unicode: glyphname pairs. A cmap. Then iterate through the string, lookup the name for each character, add to a new list?

Michel Boyer's picture

Python dictionaries are indeed fantastic for that type of processing. I have a small Python script, using the FontForge python module, that takes a font as input and produces a dictionary unicode:glyphname that can then be imported to output glyphnames instead of uniXXXX (print charname[ord(char)],).

import fontforge,sys
fnt=fontforge.open(sys.argv[1],1)

print 'charname = {',
for g in fnt.glyphs():
   if (g.unicode >= 0x21):
      print "  0x%04X:'%s'," % (
              g.unicode,g.glyphname)
print '}'
Syndicate content Syndicate content