PDF font extraction

Nick Cooke's picture

I am curious to know whether it is possible to 'extract' fonts from pdf documents. I can't remember where I heard it, but I seem to remember it may be. Sorry, I can't be any vaguer than that!

clauses's picture

I have heard that it is possible to partly reconstruct a font from a PDF. It involves copy-pasting from the PDF to a text editor. The catch is knowing exactly what to copy from the PDF.

If this holds true I don't know. In theory it sounds possible.

malbright's picture

In theory, it is possible. In practice, it's like stealing. At least it seems like that to me.

Because it is entirely possible to pull fonts from pdfs (granted, without kerning pairs and other important features) it is causing more and more foundries to forbid us designers from embedding fonts when we send our files off to printers. One foundry, Letterhead Fonts, has even built in copy protection that prevents such embedding. This creates a world of extra work, and is unfortunate. Understandable, though.

Anyway, in my personal opinion, this is the last place in the world I would look for information on how to extract fonts from pdfs.

Nick Cooke's picture

Hi Michael, I'm not looking for information on how to extract fonts from pdf's. I sincerely hope it can't be done. And yes, of course it is stealing.

Nick Cooke

clauses's picture

In theory, it is possible. In practice, it’s like stealing. At least it seems like that to me.
But of course. We all hope it's not possible, that goes without saying.

aluminum's picture

"Because it is entirely possible to pull fonts from pdfs (granted, without kerning pairs and other important features) it is causing more and more foundries to forbid us designers from embedding fonts"

Paranoia is causing that. If I want a font without paying for it, I'm not going to spend time extracting it from a PDF...I'll just download it off the internet somewhere.

Nick: It's software. Therefore, yes, it can be done. Pretty much any software can be cracked/reversed engineered by those willing to figure it out.

malbright's picture

Hi folks,
No harm meant, nor did I mean to imply that anyone here was actually going to steal a font by ripping it from a PDF.

aluminum, I happen to agree that it's kind of a desperate and self-delusional act to think that EULAs that forbid font embedding do any good. For a lively discussion on this subject, check out the lengthy thread here about Letterhead Fonts copy protection scheme. Then go to Letterhead Fonts and see what Chuck, the proprietor, has to say on the issue. He's written a rather good explanation for his thinking, and one can just feel his pain.

fontplayer's picture

There is a certain breed (I know, I used to be one), somewhat like the Elvis obsessed meter maid recently busted in England, that has a font addiction. They have to have every font that exists. Right and wrong can be rationalized with only a little effort.

The fonts I have seen extracted, were missing so much, I wasn't really attracted. But what I would worry about if I were a fontmaker is that these files aren't always labeled, and there are stripped copies of commercial fonts floating around that might make people question the work quality of a designer.

But a person can get a pristine version of almost any font that has been out for a while through normal addict channels, if one is dedicated. So I think the main issue with the .pdf extracts is the quality reflection on the artist.

Si_Daniels's picture

>I happen to agree that it’s kind of a desperate and self-delusional act to think that EULAs that forbid font embedding do any good.

Well, there are also embedding permissions in the fonts that most reputable applications will abide by - Acrobat, Powerpoint, Word etc., (not sure about Flash) - so it's not just a EULA issue. Bad people would need to go to some effort to flip the bits to their liking.

I agree there's some paranoia out there, probably based on actual reported font crimes, but also some vendors see offering extended embedding rights as a way of making a few extra $'s and to that I say best of luck to them.

Also I think LH fonts are on the fringe, and until others follow their lead (I don't see a line forming) I don't think they are really part of the equation.

hrant's picture

The bottom line is that for people who would pay for fonts anyway
the effort/cost of reconstructing anything remotely usable from a
PDF just doesn't make sense. If some guy in a shack is turned on by
doing it, who cares.

> some vendors see offering extended embedding rights

What's typically the premium one has to pay for that?

hhp

Nick Cooke's picture

That's what I thought Hrant – It would cost more in hours than it would to buy the font. It's just not worth the effort trying to reconstruct it.

Nick Cooke

typequake's picture

It's not paranoia, but a mistaken believe that whenever one has an economic interest one has a legal right, and further, that the law must come to one's aid by making an act that deprives one of potential profits a crime.

The extraction of fonts from pdf, and I suppose that's possible, would not be a crime (or even an infringement); however, some legislatures may decide to make the use of unlicensed fonts a violation of the owner's rights -- most typically a civil matter.

Uli's picture

I am curious to know whether it is possible to ‘extract’ fonts from pdf documents. I can’t remember where I heard it, but I seem to remember it may be. Sorry, I can’t be any vaguer than that!

Whenever you open a PDF file, the Adobe Acrobat reader (or any other PDF viewer) "extracts" or "copies" the fonts embedded into this PDF file and uses these fonts, as if these fonts had been installed beforehand.

If fonts were protected by "copy"-right and if it were illegal to "copy" fonts, then by opening a PDF file with Adobe Acrobat you would make an illegal "copy" of the fonts embedded into this PDF file.

The Adobe reader may be regarded as a "font extracting hacker tool": Any font embedded into a PDF is automatically extracted -- with no questions asked.

SuperUltraFabulous's picture

The only people on earth that pay an embedding license are corporations. That's it.

There are a few tools to extract fonts from the TUG side of the world... haven't tried but you and I know it sucks.

Its what Hrant says... foundries trying to make a few bucks...

hrant's picture

Actually Simon said that - I'm just trying to find out how much. :-)
Is Tiffany here? She spends a lot at Emigre - she must know.

> by opening a PDF file with Adobe Acrobat
> you would make an illegal “copy”

Who says that's not illegal? In fact if for example you read the Emigre EULA it becomes clear that doing a convert-to-outlines in Illustrator is illegal! But the companies decide when to hunt somebody down and when to look the other way.

hhp

Si_Daniels's picture

>What’s typically the premium one has to pay for that?

Don't know, but I have a feeling the pricing may not always be transparent.

>Is Tiffany here? She spends a lot at Emigre - she must know.

I was going to suggest Tiffany too. ;-)

hrant's picture

Typophile: the Transparenter.

hhp

Linda Cunningham's picture

I was going to suggest Tiffany too. ;-)

Um, I think she's a little busy right now.... ;-)

hrant's picture

I completely forgot! :-)
Three days ago it was...

hhp

typequake's picture

Who says that’s not illegal? In fact if for example you read the Emigre EULA it becomes clear that doing a convert-to-outlines in Illustrator is illegal

A breach of contract may be a lot of things, but generally not "illegal". Our civilization depends on it.

clauses's picture

I have had some thoughts about a way to remedy the illegal distribution and use of fonts. Acually it can be summed up in one word: iTunes. Think about it: If it's possible to run a DRM system on music files that only play on authorized computers, then why not font files? Something like Linotype FontExplorer is already half way there. It has the shop, now 'just' add DRM. I could see MyFonts doing something like this. What do you say?

ralf h.'s picture

Apple has just started moving away from DRM. And for good reasons. DRM will only hurt the users who have bought the software (music, fonts and so on). DRM didn't stop illegal sharing of music. Why should it be different with fonts?

charles ellertson's picture

Like most anything, you both can & can't "steal" a front from a PDF. Early on, there was no protection. Later, around the time of Acrobat 5, protection became better. I assume it's gotten even better with later versions. I tried it just for the hell of it back in the late 1990s, extracting a font I already owned. Aside from the "no metrics," I probably spent an hour figuring out what to write off. Metrics add a day, right? Maybe a high school kid will do it for fun -- but they'll likely get the metrics wrong. For anybody else, it costs over twice as much to steal it as to buy. BTW, I tried this after Acrobat 5 or so (with protection) & couldn't extract it, but all that probably shows is my level of programming skill, which isn't very high. If you can break into a high-security computer, you can probably steal a font -- in fact, you can probably break into a foundry's computers & steal the whole library.

BTW, not being allowed to embed a font stops it's use for bookwork.

aluminum's picture

Clauses:

While DRM is possible, the point is that it is completely breakable. And, typically, always is.

twardoch's picture

Well, every "DRM" system is breakable, in worst case using analog tools. With music, the principle is "if you can hear it, you can copy it". That's obvious. The simplest way is to attach a digital recorder to the Audio Out port of your computer.

With fonts, the same principle applies, by the way of "if you can see it, you can copy it". The most brute-force method is scan and autotrace, a bit more refined method would be convert to outlines and dissect into a font again.

There are some legal implications of that. I hope that I will be able to collect some data and talk about it, perhaps at ATypI TypeTech.

A.

Ricardo Cordoba's picture

Early on, clauses said... I have heard that it is possible to partly reconstruct a font from a PDF. It involves copy-pasting from the PDF to a text editor.

I'm sure it must be more complicated than that. What you are describing is text extraction rather than font extraction. ;-)

Later, fontplayer said... The fonts I have seen extracted, were missing so much...

Sir, yours is the first eyewitness report I have ever read... So it's not just a paranoid theory. Apparently there is also a piece of software out there that can purportedly reconstruct a Flash .fla file from just a .swf file! I can imagine that the results of that are just as paltry...

Ricardo Cordoba's picture

It’s not paranoia, but a mistaken believe that whenever one has an economic interest one has a legal right, and further, that the law must come to one’s aid by making an act that deprives one of potential profits a crime.

Typequake, do you own a small font foundry that in spite of lots of recognition from designers and the media, has to put up with many, many instances of its fonts (i.e., its livelihood) being pirated? I'm just curious. :-)

Uli's picture

Mr. Cordoba:

"Sir, yours is the first eyewitness report I have ever read… So it’s not just a paranoid theory."

Extraction of fonts from PDF files is not a mare's nest. For example, I extracted fonts from PDF files using my own extraction tools specifically designed for the purpose of documenting inhouse font forgeries not available on the market and made by big companies (e.g. UPS). However, at Typophile, for reasons of censorship, it is not allowed to describe the techniques of extracting fonts from PDF files.

Mr. Twardoch:

"if you can see it, you can copy it".

Correct. We can go a step ahead and say: Seeing a font proves that Adobe Acrobat extracted the font from the PDF file. What is regarded as illegal by Adobe, namely to extract fonts, is done by Adobe with its own font extracting tool Adobe Acrobat.

For example, by the German copyright act, it is illegal to make a "permanent or temporary reproduction of a computer program by any means and in any form, in part or in whole" (see § 69c UrhG). If fonts were software (Adobe says so. I don't), then the sale of Adobe Acrobat would be illegal, since it is a font extracting tool.

Adobe Acrobat would be entirely useless, if it did not extract the fonts from PDF files, because this would mean that you could not see the embedded fonts. Since you would have to use substitute fonts such as "Courier" for reading PDF files with the consequence that Adobe would not sell any longer its big money-making Acrobat cash cow, if it did not extract the fonts, Adobe does what it calls illegal and extracts them from PDF files.

typequake's picture

Ricardo,

I'm an academic. I don't own or operate a foundry, I don't design faces, and I have no interest in copying or extracting fonts. It doesn't make my opinion less valid than anyone else's. On the contrary, my point was that a legal right can't be established simply from the fact that one has an economic interest. Therefore, if anything, my opinion is more objective -- but I do sympathize.

Jackie Frant's picture

Nick,

The answer is a simple "Yes, it can be done."
Followed by, "Yes, it is done."

However, the people who extract these fonts do not care that the kerning pairs are not part of the extraction. Some folks do it just because they want to own the font, will never use it.

BTW, this is one of the reasons that Chuck Davis of Letterhead has his "new-style" of open font which CANNOT be embedded into a pdf.

And P.S. You may have read about it here from me, when I implored someone to "flatten" their pdf so the type would be a graphic and not a font! (And therefore, impossible to extract.)

Co's picture

The fonts are being extracted more and more. It's actually surprisingly easy. Those who extract fonts probably never purchase fonts.

OurType fonts have no embedding restrictions, and if we would apply them, we would most likely punish our real clients.

Extracting fonts for further sampling - is another related issue...

Eastern European countries and Russia are a paradise for illegal and ripped-off fonts. At the moment those countries are literally uncontrollable: there is no legal frame regarding copyright infringement, and above all, there is no culture of purchasing software or fonts.

Co Cotorobai
OurType

hrant's picture

> Those who extract fonts probably never purchase fonts.

This is the key realization, but sadly seems to escape many foundries.

hhp

speter's picture

I implored someone to “flatten” their pdf so the type would be a graphic and not a font! (And therefore, impossible to extract.)

And impossible to search.

Si_Daniels's picture

>I implored someone to “flatten” their pdf so the type would be a graphic and not a font! (And therefore, impossible to extract.)

>And impossible to search.

Microsoft's XPS actually does this for 'no embedding' fonts - generating a static bitmap for the text to preserve some level of font fidelity. I believe the Unicode codepoints stick around allowing for searching, but I could be wrong.

hrant's picture

Here's a trick: make a PDF where the text is rendered to a bitmap, but have a layer of invisible, searchable text on top, set in Adobe Sans/Serif to match the set widths, and when the searched word is highlighted you [might] even get a box on top of the bitmap rendering.

hhp

Si_Daniels's picture

The Adobe funded Octavo CD-ROM books worked this way, if I recall correctly.

hrant's picture

The guys aren't even old and they already took my idea! ;-)

hhp

typequake's picture

That's because ideas aren't subject to copyright...

Uli's picture

> "The fonts are being extracted more and more. It’s actually surprisingly easy."

Mrs. Cotorobai:

What makes you think so? I can't confirm that it is "actually surprisingly easy".

In the good old days, there was only one font format and one PS and one PDF file format, but today, there are different types of font formats and hence different types of PS and PDF formats, which makes things very tricky.

For example, extracting all 22 Akzidenz Grotesk fonts from this "stone-age" file

http://www.sanskritweb.net/fontdocs/ag1992ps.zip

is "actually surprisingly easy" even for "bloody laymen", since no programming skill at all is needed here.

But today's PostScript and PDF files are much trickier so that a high degree of proficiency in computer programming is required.

aluminum's picture

"“flatten” their pdf so the type would be a graphic and not a font! (And therefore, impossible to extract.)"

As stated, that also makes PDFs rather useless as electronic decouments and, besides, it's fairly trivial to get the vector outlines and import them back into a font format anyways.

Co's picture

To Uli --

yes, one can get a better or a poorer result after extracting (depending on one's skills). The 'easy' method consists of 5 steps (which I am not going to describe here for obvious reasons). Well, we must not argue about the 'easy' part; purchasing a font online seems much easier to me.

Co Cotorobai
OurType

Uli's picture

> "purchasing a font online seems much easier to me."

Purchasing fonts may seem to be easier. But copying fonts instead of buying them is regarded as easier, especially by font sellers. For instance, the Akzidenz Grotesk copyable from the above file was copied by Linotype without purchasing it and is now sold by Linotype under a new name. Many font sellers regard it as easier to copy fonts instead of buying them.

hrant's picture

Yes, saving $29 was the key to that dastardly deed...
I'm sorry, but you're a dumb-a*s.

hhp

SuperUltraFabulous's picture

Uli: What is the name of the font that Linotype allegedly copied and renamed?

Si_Daniels's picture

Bruno Steinert had a dream of making it easier to license fonts than pirate them hence Font Explorer X. Price probably comes into it too, but quality, service after the sale, upgrades etc., play into this as well.

bieler's picture

"In the good old days, there was only one font format..."

Uli

That would have been Bitstream's, correct? There was a time when only certain fonts could work on certain computers/printers (as someone postulated here as a solution). Proprietary economics is actually how type had been sold since the beginning of the stand alone foundry. And by foundry I mean REAL foundry, ca 16th century.

Adobe's unlicensing of the PS1 format was unprecedented (and killed off traditional protection schemes—and Bitstream's competitive format, which was the idea). The problems of copying, theft, extracting, etc., are a natural extension of the fact that the software tools (Fontographer, FontLab) used for type design rely on unlicensed font formats.

Has any digital type foundry ever paid one cent in tribute to Adobe, Apple, or Microsoft for piggy-backing on their free font formating? Hell no. And why would they? It's free. Still, when you opt into the system, you have to live by its rules. That, or come up with your own system.

Gerald

Thomas Phinney's picture

I was posting to try to stop the italics, but I guess I didn't figure out what tag was left open (not "i" or "cite").

But also to say, yes it's possible to rip fonts off from a PDF. There are a number of limitations to this, and there are easier ways to steal an intact font. BTW, it's certainly my understanding that stealing fonts from PDFs like this *is* illegal in the USA and most western European countries. Of course, I'm not a lawyer, and you should talk to your own lawyer before doing anything like that.

Cheers,

T

Jackie Frant's picture

“In the good old days, there was only one font format…”

I'm from those good old days - and even in the beginning - there was more than one format for MAC (I cannot speak about IBM and IBM Clones [Dell, Gateway, etc.[) but from Adobe we had Type 1 and from Bitstream and most others Type 3, or as some called it Type C. When Adobe allowed others to use their Type 1 method - it was a little easier. Some manufacturers took back their Type 3s and replaced them for us as Type 1s. Others were out of business. Every now and then I do a job and a Type 3 (or C) font emerges and cannot rip -- out comes Fontographer to change it to a Type 1 and therefore usable.

Meanwhile back to "flatten out the type" -- I was asking the manufacturers of their own type to do this for their initial PDFs when showing a new face. It is amazing that when a manufacturer wants to release the new designs, they put out "embedded" fonts in their PDF -- and the Russians jump on this and immediately extract it. Right now there are so many bad versions going around -- in some ways it is quite funny. And I always hope that if anyone picks up one of these and wants to use it -- they will go and buy the original. (Less points per letter and kerning pairs worked out!)

fontplayer's picture

> I was posting to try to stop the italics, but I guess I didn’t figure out what tag was left open (not “i” or “cite”).

Typequake left out the / after citing "Who says that’s not illegal? In fact if for example you read the Emigre EULA it becomes clear that doing a convert-to-outlines in Illustrator is illegal"

I think only he or a moderator can fix it.

paul d hunt's picture

I was posting to try to stop the italics, but I guess I didn’t figure out what tag was left open (not “i” or “cite”).

Thomas, i can never catch these problems unless pointed out because i use Firefox, which doesn't display tags the same way some other browsers do. If anyone ever finds one of these unclosed tags, you can always contact me or one of the moderators to fix this without resorting to code hacks. >^p

typequake's picture

I don't think I left out anything -- the post looked good on my Firefox too.

Syndicate content Syndicate content