Using Unicode values in dlig - Opentype

Matt Grey's picture

Hi, I'm having trouble trying to create a ligature using Unicode character codes, up to this point I haven't tried to use a Unicode value so I have no Idea if I am going about it the right way.

The ligature in question is an asterisk and another asterisk being replaced by a black star (** = ★)

I've tried the full word, the utf-8 value for it and Unicode U+ but nothing seems to work. I've had a look around and cant seem to find what is wrong.

The problems seem to lie with the '+', (Error came up with = 'invalid token (text was "+")'). But that's the only clue I have.

languagesystem latn dflt;

feature dlig { # Discretionary Ligatures
# Latin
sub t o by t_o;
sub U+002A U+002A by U+2605; #this is the problem
sub parenleft parenright by underscore;
} dlig;

Cheers for any help


charles ellertson's picture

Note: I'm assuming FontLab...

The glyph name should be uni####, so it should read

sub uni002A uni002A by uni2605;

These should also be the glyph names, if you're using that method instead of the AGL. If you use a different name, e.g. asterisk, you must use that, not the unicode number.

sub asterisk asterisk by uni2605;

Assuming you've named BLACK SUN as uni2605

Michel Boyer's picture

Assuming you've named BLACK SUN as uni2605

When I look at the names FontForge is expecting for U+2605, I see blackstar, a35, DD130, StarDingbat1, uni2605 or u2605; there is no black SUN around*. And the Fontlab file standard.nam (mine from some demo is dated 2009) contains

0x002A asterisk
0x2605 !a35
0x2605 !blackstar

which, I guess, means Fontlab expects you to use asterisk for U+002A and does not like a35 nor blackstar for U+2605 so that I would guess

sub asterisk asterisk by uni2605;

to be the most likely working substitution.

I see that U+2605 is in very few fonts. I see it in the DejaVuSans fonts (named uni2605) but it is in none of the DejaVuSerif fonts.

* [Added] Well, there is


in the unicode file NamesList.txt, i.e. at U+2600, ☀.

John Hudson's picture

This is a bad idea. The black star character ★ has it's own Unicode codepoint, and should be encoded as that, not as **. The two-character string ** does not mean the same thing as ★. This is why we have a character/glyph distinction, so that semantics can be encoded at the plain text level and not left to the vicissitudes of glyph processing.

Matt Grey's picture

The reason I am doing this is for a dingbat font, enabling users to shortcut to the black star without having to go through alt operations. The font compiled with no hitches which is great, worked perfectly, thanks Michel.

sub asterisk asterisk by uni2605;

John Hudson's picture

enabling users to shortcut to the black star

But they're not really shortcutting to the black star, if what they want in their text string is the black star character, not just the black star glyph in their display with this one particular font. With your method, what they get in the text string is **, and that's what they'll get if they copy and paste the text, change the font, etc.. I dislike this sort of thing because your resulting display is misleading: the user thinks there is a black star in the text, but in fact there isn't. Also, what if someone chooses your font to display some text in which there is a double-asterisk sequence, e.g. as part of a footnote system? It will display incorrectly as a black star.

As a general rule, don't try to solve character input issues at the glyph processing level.

Matt Grey's picture

It being a purely symbol dingbat font designed for a certain use, one symbol is never used next to each other and so it will be no problem at all. I do get your point however, any other time I would back straight away from doing anything like that.

Michel Boyer's picture

John, I just added the ligature to DejaVuSans in dlig, typed ** in TextEdit using the modified font, got the black star, generated the pdf, opened the pdf with acrobat reader and found the black star when searching for it in the search window; I also copied the black star from that pdf in the reader and pasted it right here to get ★. It is certainly true at the level of the input file that there are two characters but the pdf that is generated appears to know nothing of those two asterisks.

charles ellertson's picture

So, you're saying us typesetters are too stupid -- or lazy -- to enter a Unicode number when we want a particular character?

Seems you young English designers have lost respect for comps. Back when it was know as the London College of Printing, during Tom Eckersley's tenure as head of graphic design, we got proper respect.

However, since I'm just a tradesman, I can be blunt. John's right, you're wrong. And with all due deference to Mr. Boyer, I've lived through too many systems & later changes where improper encodings came back to cause trouble.

Michel Boyer's picture

Charles, I assumed the font was not to be for sale, just for internal use. Maybe I was naive.

My statements about the difference between input and output remain valid and I have still not entered your WISIWIG world.

hrant's picture

WYSIWYG is nobody's world - it's a damaging illusion.


Michel Boyer's picture

To be more precise, here is the problem. Lots of people type their texts in word and send the .doc or .docx document as attachment in emails, assuming that the person that receives the file will see exactly what they see on their own screen. That's almost never the case, but there is no need to worsen the situation.

If you send a .doc file that uses a font that replaces ** by ★, and if the person that receives the file does not have that font installed on his computer (or a font that does the same thing, which is quite unlikely), then that person will see ** where you see ★.

I send .doc attachments only when I am required to do so, else I send a pdf.

charles ellertson's picture

Micheal, for years we set type using TeX running in a DOS box. This is pre-Unicode, but I built what we called super fonts that are similar to today's OpenType -- small caps, proportional & tabular figures, lining & oldstyle, etc, etc., all in one font. Since we were running TeX in DOS, we'd just change whatever characters we wanted to include by writing an encoding vector, leaving the first 127 (named) characters as they are in ASCII. Everything remained properly named

So, much of my experience was with author -- or publisher -- supplied files where a character could be font dependent -- e.g., where someone in the world that relied on hard-wired Mac or Windows encodings had gone in & made an 0macron, but put it in the odieresis slot. What a mess. Even more fun with Greek and all the different encodings used...

Nor is it always clear today just which file someone will go back to. We have jobs from when we typeset using TeX -- always ASCII files, of course -- and were making PDFs for the printer to use. Now, when the publisher wants to resurrect the text, press #1 will want to use the ASCII files, but press #2 will want to use the pdfs. And this with our stuff, where character names were in fact correct. Imagine the situation where a Mac/Windows encoding (I think what you're calling WYSIWYG) was used.

Even today, with Unicode, I refuse to give the double-f ligatures a Unicode codepoint. Yes, we may get a "missing character" with old or supplied files, but now is as good a time as any to fix that. Legacy characters should be banished while there is still someone around conversant with how we got that way. To introduce new errors should be a crime/misdemeanor.


I agree that using the pdf format for archiving typeset material is *probably* the smartest thing to do. Treat it as standing type. We can go back to books set using TeX with PostScript fonts where the printer file was pfd, then make what are essentially patch corrections using InDesign with OpenType, and "paste" them into that pdf file. If you don't do this, the only other sane way would be to reset the whole job. Again, if you don't, the 3rd revised edition somewhere down the road could become really problematic. As I said, just treat the pdf as standing type -- but not every publisher sees things this way, and of course, as the customer...

Michel Boyer's picture

WISIWIG stands for What You See Is What You Get and applies to Word, TextEdit, Pages, InDesign and those editors that show you on the screen what you get on the printout. If you send an InDesign file using a font that the receiver does not have installed, I think he will see squares (undefined characters) for the missing font; it is then clear there is a problem, and it is probably clear what measure to take. In non professional applications, a fallback font is used instead; the same holds for Powerpoint. I have seen nice Powerpoint presentations that look really bad when you don't have the fonts used by the designer to view them.

Added: And the receiver is most of the time unaware that what he sees does not look like intended.

John Hudson's picture


...generated the pdf, opened the pdf with acrobat reader and found the black star when searching for it in the search window; I also copied the black star from that pdf in the reader and pasted it right here to get ★.

Presumably because you generated the PDF in a way that does not write the original text strings to the PDF, and hence Acrobat has used the glyph name of the black star in the glyph string come up with an encoding. That's kind of a round-about way of text input. :)

Michel Boyer's picture

Presumably because you generated the PDF in a way that does not write the original text strings to the PDF

Indeed. What the application is expected to do, I just don't know.

I am now on another machine and I did a fast check, putting this time the ligature in liga to get right away the blackstar when typing ** without having to find what to do to activate dlig; I used the font in XeLaTeX (from the 2012 distribution) and InDesign CS3 and then opened the pdf with FontForge

I see that the name given by InDesign to the ligature in the pdf is asterisk_asterisk (even if the ligature in the font is sub asterisk asterisk by uni2605;).
The name given to the same ligature, same font, in the pdf generated by XeLaTeX is uni2605

That implies that if I copy-paste what displays like the black star in the pdf generated by InDesign, I get this: **; if I copy-paste the same looking thing appearing in the pdf generated by XeLaTeX, I get this: ★. Of course, the search will behave accordingly.

When I have the time, I'll check with MacTeX 2013 because the pdf generated by XeLaTeX or luaLaTeX in MacTeX 2012 does not allow searches on stylistic variants. The pdf that is generated is just wrong.

PS. The font I modified is DejaVuLGCSans.ttf.

Michel Boyer's picture

The pdf generated by Word also calls the glyph uni2605; so far, only InDesign generates a pdf containing the glyphs of the blackstar named asterisk_asterisk.

Syndicate content Syndicate content