Vietnamese text problem with InDesign

Andreas Stötzner's picture

A colleague of mine faces a problem with importing vietnamese text into InDesign: a couple of characters fail to get displayed correctly. A quick test of applying different Pro-fonts on the text reveals that with each of them a different selection and number (!) of ch.s submarine.
On the other hand, the same text performes perfectly well in e.g. Apples MAIL or TEXTEDIT.
See this view of Indesign (top) and Textedit (bottom), text file and font *are the same*:

– Has anyone a clue?

Theunis de Jong's picture

The support for mark-to-base (to position a single accent character to its base letter) and mark-to-mark (to position multiple accents relative to eachother) differs from one application to another. Pre-CS4, InDesign did not support it (at all, or correctly); CS4 should finally support it in full. That's from hearsay, by the way.

It's also possible Textedit doesn't need the Opentype definitions and can simulate the compositions just by having the right glyphs in the font.

John Hudson's picture

I don't think this is anything to do with mark positioning GPOS. All the Vietnamese diacritics are precomposed Unicode characters, so GPOS isn't necessary. In any case, the InDesign problem displays as .notdef glyphs, indicating that the program is confused about the underlying characters.

Andreas, do you know what method your colleague is using to import the text?

Andreas Stötzner's picture

He says: »… Text kopiert aus openoffice (das freie office-Programm, word vergleichbar, aktuelle Version). Die Datei selbst kommt vom Verlag bearbeitet in word.«

That is: “text pasted from OPENOFFICE (the free office application, similar to WORD, actual version). The file comes originally from the publisher and was edited in WORD.”

Bert Vanderveen's picture

What happens if you first import the textfile in TextEdit and then copy/paste into InDesign?

. . .
Bert Vanderveen BNO

Miguel Sousa's picture

Andreas, can you confirm that the font contains the characters that InDesign is showing as notdef? (e.g. ắ ễ)

InDesign doesn't do font fallback whereas TextEdit does. So I'm wondering if the problem doesn't show up in TextEdit because it's using a different font to display the missing characters.

Michel Boyer's picture

Andreas, here is an example where the fallback is obvious, but you may have more fonts than I do and get a fallback that is not so clear. To check if the character that is displayed is in the font you think, you select the character in TextEdit, and then select "Format > Font > Show fonts". The name of the font used to display the character is then displayed at the top of the font window, as "Lucida grande" is displayed in my grab. If no font name is displayed, the selection is a mixture of different fonts.

Michel

Michel Boyer's picture

And here is a convincing way to prove that Charis SIL is used to display all the selected text in TextEdit (select the text and display the font name at the top of the font window):

Theunis de Jong's picture

Auto-glyph replacement is high on my wish-list for InDesign :-) (although I'd appreciate a minimum of control -- see Michel's first picture, mixing two rather unlikely fonts).

Miguel Sousa's picture

I think it would be a bad thing if InDesign started to do font fallback. Font fallback is highly desirable in environments where the content (i.e. the data/information) is more important than its form (i.e. the design/typography). So I'd argue that, being InDesign a professional typesetting tool, it's very important to get typographic fidelity and therefore having things like font fallback and faux styles is undesirable*.

* (Even the faux small caps that InDesign already allows is sort of an heresy for Type geeks)

charles ellertson's picture

Even the faux small caps that InDesign already allows is sort of an heresy for Type geeks

Miguel, you have made my day. I've never been called a geek before!

Theunis de Jong's picture

Ah -- but I wouldn't call it a "fallback"! (in the sense that Word, for example, automatically falls back onto faux bold & italic).

Why not call it a Feature, and have the user specify which other font to use if there is a glyph missing? It's what I do now manually: check a font for missing glyphs, find a replacement font, create character style for it and apply that to the missing characters.

charles ellertson's picture

Why not -- as Adobe does -- allow the comp to modify the font, so needed characters can be made up?

Composition is going to get so automated that typesetters will become buttonpushers. The results won't be quite as good generally, but I suppose it will serve. Of course, if you really need a good comp, there won't be one around anymore.

Theunis de Jong's picture

You are worried the ancient typesetters' art of memorizing the fonts in which to find any of several thousands glyphs will be lost for future generations?

I'd rather once enter it somewhere and then have my computer look it up where necessary. It's really good at that sort of things.

charles ellertson's picture

No, I'm worried that the new, modern "compositor/punchcutter," who makes up the proper character in the correct font, (or does the equivalent in the typesetting file with combining diacriticals/spacing modifier characters) will be replaced by the pushbutton what-the-heller.

Theunis de Jong's picture

Make it a hidden option, activated by a password which you receive together with your Graphic Designer certificate.

I still think it'd be useful. Altering existing fonts is an option, but

- it takes a lot of time and effort
- you might break existing structures in the font, such as kerning and opentype features
- you end up with a unique font you cannot legally distribute -- thus, you can't send any of the original files you used it in to someone else.

How does the virtual font system I imagine compare to that?

- For any character not in font f, you should be able to specify an alternate glyph g from another font. Somewhere in the InDesign Glyph panel -- browse & click.
- The program (InDesign, preferable) handles the replacement of glyphs; the fonts themselves are not touched.
- It's just a data table somewhere in the document; yes, anyone else using your original documents should have the same fonts, but this is already a (reasonable) requirement.

Andreas Stötzner's picture

can you confirm that the font contains the characters that InDesign is showing as notdef? (e.g. ắ ễ)
Yes I can. Regarding the font in question the Vietnamese range is definitely complete. So fallback is surely not the issue.
Michel, in the snapshot by which you show Textedit’s fallback operation you obviously used one of the Andron Freefonts, none of them support Vietnamese, however. But Andron Mega does (which my colleague sets a book with) as well as e.g Lucida Grande does.
So I still see no explanation for that mystery.

Michel Boyer's picture

I used Andron Scriptor Web, and in that font the character ắ is defined and it shows in the following InDesign grab (whilst it does not show in yours)

The other characters are not defined and do not show. All is thus normal.

You have not confirmed that, in spite of the characters being defined in Andron Mega, when you select the text in TextEdit and look at the top of the font window, the fontname does not show. That would have helped to isolate the problem. So now I can only speculate. One possibility is that, in the font causing problems, the characters are not named uniXXXX (like uni1EAF in Andron Scriptor Web) but are given a name like abreveacute. On the Macintosh, that may cause the type of problem you seem to be experimenting. Look and let us know.

Andreas Stötzner's picture

can you confirm that the font contains the characters that InDesign is showing as notdef? (e.g. ắ ễ)
Yes I can. Regarding the font in question the Vietnamese range is definitely complete. So fallback is surely not the issue.
Michel, in the snapshot by which you show Textedit’s fallback operation you obviously used one of the Andron Freefonts, none of them support Vietnamese, however. But Andron Mega does (which my colleague sets a book with) as well as e.g Lucida Grande does.
So I still see no explanation for that mystery.

John Hudson's picture

Andreas, back to my earlier question: do you know what method your colleague is using to import the text?

Also, do you know from what source he is importing it? Is the source Unicode, or an 8-bit Vietnamese encoding?

Have you tried copying one of those .notdef box characters out of InDesign and into e.g. Word? On Windows, you can then use alt+X to reveal the character code, and then you'll know what InDesign thinks it is.

Michel Boyer's picture

Andreas, if you have the InDesign text, you could just copy it from "Diese" to "2006" and paste it in your thread and we could know exactly what the characters it contains are.

Michel

Andreas Stötzner's picture

John, Michel, thanks for your suggestions so far. I’ll go on testing, along the lines you propose. Just allow one or two days …

Andreas Stötzner's picture

Michel asked: You have not confirmed that, in spite of the characters being defined in Andron Mega, when you select the text in TextEdit …
I’ve checked this by driving the cursor through the lines. Every single letter is there of Andron Mega, no fallbacks to other fonts. Definitely.

… the characters are not named uniXXXX …
I checked this in Fontlab. See the image: blue marked ch.s have descriptive names (all others are named uniXXXX), but the red marked ch.s fail in the Indesign sample. There’s no match which would explain the failure.

Andreas Stötzner's picture

John: Andreas, back to my earlier question: do you know what method your colleague is using to import the text?
– See my earlier statement above.
Also, do you know from what source he is importing it? Is the source Unicode, or an 8-bit Vietnamese encoding?
– I go and ask him.

Andreas Stötzner's picture

… copying one of those .notdef box characters out of InDesign and into e.g. Word?

I copied the passage out of Indesign into TextEdit:

The same happens when the text is pasted into MAIL.

Andreas Stötzner's picture

And here comes the ultimate exercise, copied from TEXTEDIT pasted directly into TYPOPHILE (via FIREFOX):

Diese ist: Nguyễn Huy Thiệp: Tuyển Tập Truyện Ngắn, Verlag Văn Hóa Sài Gòn, Ho-Chi-Minh-Stadt 2006. Eine der Geschichten (Lieder) ist dort nicht enthalten. Wir benutzten als Originaltext: Nguyễn Huy Thiệp: Tác Phẩm Và Dư Luận, Hanoi 1990, S. 211–228.

For comparison, this is how I see the preview of this:
[snapshot]


[snapshot]

Miguel Sousa's picture

This is how I see your text above in Safari (Mac):

And copying that text to make a few text frames in InDesign CS2 (Mac), this is what they look like using (from top to bottom) Myriad Pro, Minion Pro and Arno Pro:

In your TextEdit screenshot above, the Last Resort font symbols seem to be Tibetan.

Michel Boyer's picture

If I save the html source of this thread, remove with vim what is above and under the text pasted above and run a script that outputs the unicode characters, I get that the Vietnamese extract contains characters in the range 0020-007A, the character 2013 (EN DASH) and the following characters

00E0;à;LATIN SMALL LETTER A WITH GRAVE
00E1;á;LATIN SMALL LETTER A WITH ACUTE
00F2;ò;LATIN SMALL LETTER O WITH GRAVE
00F3;ó;LATIN SMALL LETTER O WITH ACUTE
0103;ă;LATIN SMALL LETTER A WITH BREVE
01B0;ư;LATIN SMALL LETTER U WITH HORN
1EA9;ẩ;LATIN SMALL LETTER A WITH CIRCUMFLEX AND HOOK ABOVE
1EAD;ậ;LATIN SMALL LETTER A WITH CIRCUMFLEX AND DOT BELOW
1EAF;ắ;LATIN SMALL LETTER A WITH BREVE AND ACUTE
1EC3;ể;LATIN SMALL LETTER E WITH CIRCUMFLEX AND HOOK ABOVE
1EC5;ễ;LATIN SMALL LETTER E WITH CIRCUMFLEX AND TILDE
1EC7;ệ;LATIN SMALL LETTER E WITH CIRCUMFLEX AND DOT BELOW

and nothing else.

One thing that is quite mysterious is that in line four of the first grab,


the letter

  1EC7;ệ;LATIN SMALL LETTER E WITH CIRCUMFLEX AND DOT BELOW

comes out correctly in the word Thiệp but not in the word Truyện.

Michel

Michel Boyer's picture

Andreas, can you email me the textedit rtf file that you displayed above?

Michel

Andreas Stötzner's picture

Send me an adress via the signogrphie.de contact form.

Meanwhile I did further testing. I copied text from serious Vietnamese websites and pasted it into TEXTEDIT and INDESIGN, applying Andron Mega in both instances – everything looks perfect.
Now I assume that the bug lies in the encoding of the original text (John: probably Unicode-encoded).

Michel Boyer's picture

Send me an adress via the signogrphie.de contact form.
Done.

386sky's picture

You downloaded an font named Reader Sans Roman to make it work in Scribus, thus adding the Vietnamese characters and etc and designed many others.

Attached an image:

The following characters
ể ắ
are blank in Reader Sans Roman, but in Opera it appears as Myriad Pro.

VISCII has the ê and â accented letters in French.

It will be good if FontCreator will have an restore session feature so you can recover your sessions!

Syndicate content Syndicate content