Vietnamese Glyphs

ocsenttdd's picture

Hi all,
I'm newbie so maybe this question is quite silly to you.

I'm designing a new font for my personal purpose. I've just finish basic latin characters and now moving to Vietnamese characters (or someting like: Western Europe, Central Europe...). How could I link to these all ones because the software just shows basic letters on keyboard. (I'm using Fontlab)
I had an idea to search their unicode code (on Wikipedia) for each but it took a lot of times to do so.

Thanks for your answer.

Duong.

Si_Daniels's picture

You could probably start by encoding these characters... http://en.wikipedia.org/wiki/Windows-1258

Cheers, Si

ocsenttdd's picture

@Si_Daniels:
thanks for your replying.
May I show all characters of this list in Fontlab Studio, it is faster than finding each character and edit them one by one.

Michel Boyer's picture

You can get the pane for that codepage by clicking "page mode" at the bottom and then choose as follows:

If you want to add the Vietnamese characters in the Latin extensions, you choose another pane. You select "Ranges mode" and then "1E00 Latin Extended Additional". The characters that concern you are from 1EA0 to 1EF1.

The pictures are scaled. If you find them too small, open them in a new window or new tab.

.00's picture

I would suggest you create the combined accents as individual combining glyphs. You'll have to scale and redraw the different elements of say a circumflexacute that it will be easier to place and hint. I recommend building separate accent combos for the lowercase and uppercase (and small caps).

Once all of the glyphs are built as components, just copy and paste from one font to the next.

Also, this site should be on your short list:

http://www.unicode.org/charts/

Michel Boyer's picture

Those charts are indeed great to get an idea of how the characters are expected to look like. For the other information the charts contain, I find the file NamesList.txt much more useful. For instance, to get the relevant characters for Vietnamese, just search for "Vietnamese" in NamesList.txt.

Better still, with just basic knowledge of Python, unicodedata (see ref on http://docs.python.org/2/library/unicodedata.html) gives you a fast access to what I find relevant information in http://www.unicode.org/Public/UNIDATA/

I don't know how much that can be useful for font design but here is for instance a script that finds the NFD canonical decomposition of characters in a range specified by two hex numbers and prints all the component characters; no need to search the files on the unicode site:

---- file decomp ---- cut here
#!/usr/bin/env python

import unicodedata, sys
ud=unicodedata

if len(sys.argv) < 2:
  print """Usage: %s starthex endhex
        Example: %s 1EA0 1EF1 """ %(sys.argv[0],sys.argv[0])
  exit()

start=int(sys.argv[1],16)
end=int(sys.argv[2],16)

def uhexandname(h):
  try:
    nam=ud.name(unichr(h))
  except:
    nam=''
  return "u%04X  %s" % (h, nam)

schars=set([])
for h in range(start,end+1):
  schars |= {ord(c) for c in ud.normalize('NFD',unichr(h))}

lchars=list(schars); lchars.sort()
for c in lchars:
  print uhexandname(c)
---- cut here ---

Here is a trace of execution, showing all components of character in the range 1EA0 -- 1EF1 (included):

611 % decomp 1EA0 1EF1
u0041  LATIN CAPITAL LETTER A
u0045  LATIN CAPITAL LETTER E
u0049  LATIN CAPITAL LETTER I
u004F  LATIN CAPITAL LETTER O
u0055  LATIN CAPITAL LETTER U
u0061  LATIN SMALL LETTER A
u0065  LATIN SMALL LETTER E
u0069  LATIN SMALL LETTER I
u006F  LATIN SMALL LETTER O
u0075  LATIN SMALL LETTER U
u0300  COMBINING GRAVE ACCENT
u0301  COMBINING ACUTE ACCENT
u0302  COMBINING CIRCUMFLEX ACCENT
u0303  COMBINING TILDE
u0306  COMBINING BREVE
u0309  COMBINING HOOK ABOVE
u031B  COMBINING HORN
u0323  COMBINING DOT BELOW

That was tested with python2.7.2.

lunde's picture

I am pretty sure that for Vietnamese you also need glyphs for 1EF2 through 1EF9.

John Hudson's picture

Yes, that's correct.

Michel Boyer's picture

They are indeed in the table of the wiki http://en.wikipedia.org/wiki/Vietnamese_alphabet#Tone_marks, which shows that searching Vietnamese in Nameslist.txt is not enough. I presume that with such a complete list, you make a .enc file for Fontlab so as to see all the glyphs you need in one shot (plus at least the variants you need for the composed diacritics). Just curious, I don't use Fontlab.

If I do that with FontForge for Source Sans Pro, here is a possible view of the capitals (the script generating the encoding grouped the "base glyphs" together, "base glyph" being here the first character of the canonical decomposition.


(I never had so much trouble inserting an image...)

When I use the same encoding with FontLab (with the glyph names taken from Source Sans Pro), some characters look missing and are not placed where expected (Abreve is placed somewhere else and the corresponding uni character looks missing for instance).

Michel Boyer's picture

The following letters appear in the Wikipedia table. They do not figure in the Fontlab win_1258.enc file (at least those hex values do not appear in the comments).

00C3     LATIN CAPITAL LETTER A WITH TILDE
00CC     LATIN CAPITAL LETTER I WITH GRAVE
00D2     LATIN CAPITAL LETTER O WITH GRAVE
00D5     LATIN CAPITAL LETTER O WITH TILDE
00DD     LATIN CAPITAL LETTER Y WITH ACUTE
00E3     LATIN SMALL LETTER A WITH TILDE
00EC     LATIN SMALL LETTER I WITH GRAVE
00F2     LATIN SMALL LETTER O WITH GRAVE
00F5     LATIN SMALL LETTER O WITH TILDE
00FD     LATIN SMALL LETTER Y WITH ACUTE
0128     LATIN CAPITAL LETTER I WITH TILDE
0129     LATIN SMALL LETTER I WITH TILDE
0168     LATIN CAPITAL LETTER U WITH TILDE
0169     LATIN SMALL LETTER U WITH TILDE

Are they also required? Is there no clear and reliable list?

ocsenttdd's picture

Thank you for your great response, Michel. This is exactly what I'm finding. At first, I also chose MS Windows 1528 Vietnamese for those characters. But I was a little bit confused because there were some missing words that I couldn't see. (I'm Vietnamese :-)).
By the way, maybe I will expand my font list to characters in Western Europe (1252) or Central Europe (1250). So, could you also give me the way to access to these full character sets? Or just choose MS Windows 1250/1252 is enough because it contains full words. I thought it's not enough because in the link below it looks like more characters than words list in Fontlab.
http://en.wikipedia.org/wiki/Western_Latin_character_sets_(computing)

Albert Jan Pool's picture

Or just choose MS Windows 1250/1252 is enough because it contains full words.

could it be that you are confusing ‘words’ with ‘names’?

Michel Boyer's picture

I downloaded the small Hunspell Vietnamese spellchecker and looked at the characters used. Aside from the 1EA0 to 1EF9 range and the standard unaccented latin letters, it uses the following small letters

00E0 00E1 00E2 00E3 00E8 00E9 00EA 00EC 00ED 
00F2 00F3 00F4 00F5 00F9 00FA 00FD 0103 0111 
0129 0169 01A1 01B0 

The corresponding capitals should also be needed

00C0 00C1 00C2 00C3 00C8 00C9 00CA 00CC 00CD 
00D2 00D3 00D4 00D5 00D9 00DA 00DD 0102 0110 
0128 0168 01A0 01AF 

That implies that the small letters that are in my list of the post http://typophile.com/node/105171#comment-562024 (thus neither in Windows 1258 nor in the 1EA0-1EF9 range) figure all in that small dictionary of only 6631 entries.

lunde's picture

Besides ASCII and friends (aka, ISO 8859-1 or U+00[A-F][0-9A-F]), your listing above covers all of the characters that requires glyphs for full Vietnamese support.

Michel Boyer's picture

Or just choose MS Windows 1250/1252 is enough

Put together, they are still missing the characters

  0128  LATIN CAPITAL LETTER I WITH TILDE
  0129  LATIN SMALL LETTER I WITH TILDE
  0168  LATIN CAPITAL LETTER U WITH TILDE
  0169  LATIN SMALL LETTER U WITH TILDE
  01A0  LATIN CAPITAL LETTER O WITH HORN
  01A1  LATIN SMALL LETTER O WITH HORN
  01AF  LATIN CAPITAL LETTER U WITH HORN
  01B0  LATIN SMALL LETTER U WITH HORN

on top of all those in the 1EA0-1EF9 range.

Michel Boyer's picture

I just ran the following experiment: I typed ằẳẵắặ with the Vietnamese Keyboard http://gate2home.com/Vietnamese-Keyboard, copied the characters in the little box (I was using Chrome on OS X 10.8) and pasted them in vi (and TextEdit); the sequence of characters pasted was

   0103 LATIN SMALL LETTER A WITH BREVE
   0300 COMBINING GRAVE ACCENT
   0103 LATIN SMALL LETTER A WITH BREVE
   0309 COMBINING HOOK ABOVE
   0103 LATIN SMALL LETTER A WITH BREVE
   0303 COMBINING TILDE
   0103 LATIN SMALL LETTER A WITH BREVE
   0301 COMBINING ACUTE ACCENT
   0103 LATIN SMALL LETTER A WITH BREVE
   0323 COMBINING DOT BELOW

(from a dump of the utf8 text file). Now, if I copy those characters with option C in vi and paste them with option V (either in vi or textedit), the letters that are pasted are

   1EB1 LATIN SMALL LETTER A WITH BREVE AND GRAVE
   1EB3 LATIN SMALL LETTER A WITH BREVE AND HOOK ABOVE
   1EB5 LATIN SMALL LETTER A WITH BREVE AND TILDE
   1EAF LATIN SMALL LETTER A WITH BREVE AND ACUTE
   1EB7 LATIN SMALL LETTER A WITH BREVE AND DOT BELOW

During the copy-paste the string is recoded. That is a behaviour I did not expect. Is that something that is documented and, if so, where?

(In fact, this text was written in textedit, and pasted with Chrome in the typophile edit window and the recoding also appears to have occurred on the first line but this may come this time from some Unicode normalization rule for data interchange. Nevertheless, with the link /files/clavierviet.html, no recoding seems to occur)

(Added: If I view /files/clavierviet.html with Safari, copy the string and paste it, the combining diacritics are kept as with Chrome. If I do the same with Firefox, the recoding occurs, independently of the font used for viewing, even with a font with no ccmp table. Note that I am now on OS X 10.6.8 with Firefox 21.0)

Michel Boyer's picture

Maybe I should add, to clarify, that the Vietnamese keyboard (at least on the Mac) does not behave like the keyboard on the site I referred to above, http://gate2home.com/Vietnamese-Keyboard; indeed the orange keys behave like "dead keys" and after the accent is typed, a unique precomposed character is input in the text.


(open image in new window or new tab to see actual size).

Syndicate content Syndicate content