TypeSet - better justified web text

Frode Bo Helland's picture

I just came across this. It’s a really interesting idea. He use a javascript algorithm to manipulate the inter-word spacing.

Below: A screen shot from Opera/OSX (Firefox/OSX is not working correctly yet).

blank's picture

The difference is quite noticeable, and in this example, in TypeSet’s favor. And of course, this raises an interesting question about H&J on the web—should it be done by the browser, or should designers have more control as they do in print?

Frode Bo Helland's picture

Thing is, depending on the device it’s displayed on and the users settings the width of a text block will vary. That is the nature of the web, but also what gives it it’s unique quality.

I’d love to see this as a CSS feature, with desired justification specified as a percentage number or a few level keywords like for example “tight”, “normal” and “loose”.

font-align: justify;
desired-justification: 90%;

It would have to be smart enough to adjust the percentage according to the width, though.

Richard Fink's picture

@frode

By “smart” I’m thinking it should be able to detect the width of the block and adjust the desired percentage accordingly.

Thanks for this. I've done quite a bit of experimentation with H&J in browsers but I never really thought about handling the column width (measure) as a "liquid" variable before now. Just fixed-width. (In pixels at least, which isn't physically fixed like print.) There is probably some ratio that can be determined.

@james puckett
And of course, this raises an interesting question about H&J on the web—should it be done by the browser, or should designers have more control as they do in print?

As far as justification, there's already quite a lot of control using the CSS word-spacing and letter-spacing properties along with either the HTML spacer entities or the Unicode general punctuation spaces. (But the font needs to have them to use them.) These things in combination will give control down to the pixel level, at least. Just tightening or loosening the word-spacing can have a big effect. In a print style sheet, where things don't have to be rounded for the raster grid, I would imagine you get about as much control as you can in any app. (Gotta test that, though. Never checked.)
As far as hyphenation, the soft hyphen is supported now in every major browser and there's a javascript implementation of Franklin M. Liangs hyphenation algorithm from LaTex available that works pretty darned well. I suppose a native in-browser implementation would be a bit more thorough, but not by much. (I can only vouch for how it works for English.)
The biggest problems I've run into stem from how different browsers work with the em unit. Rounding and such. Plus, I've never been able to get rid of at least some reflow on Zoom which runs the risk of text that looks fine at 100% but sprouts widows and orphans and other undesirables at, say, 130%.
And then, because soft hyphens are added at every available breakpoint, there's also a need to remove them via javascript on cut-and-paste. Detecting line endings and placing the soft hyphens more selectively would be more elegant, but tough to do cross-browser right now.
It's a pain. But a lot of control is already there with CSS. And whether or not the browser handles hyphenation, that won't change.

Frode Bo Helland's picture

I did a late editing there, hence the wrong quote.

Bram Stein's picture

Frode told me you were having an interesting discussion here about my line breaking implementation so I thought I would chime in. In my opinion justification should be done by the browser, preferably with some user defined properties. However, this won't be that much of an improvement until browsers also support hyphenation, which in turn requires them to know or detect the language of any given page. Even though those features won't be implemented any time soon, I believe that replacing the current text layout engines with a Knuth and Plass based algorithm would increase the quality of justification on the web. The Knuth and Plass line breaking algorithm guarantees results no worse than the naive implementation browsers currently use.

If I remember correctly, Internet Explorer exposes a CSS property that affects justification. There is also a proposal for a CSS3 text-justify property, but it seems they leave the actual justification algorithm up to the browser.

I ran into similar problems as those mentioned by Richard Fink. Wrapping each line in an element and setting the word spacing is fragile (at best) since there is no way to get reliable font metrics back from the browser. Print stylesheets would still need flexible widths as there are different paper sizes and orientations, all out of your control. That's the beauty of the web :)

Khaled Hosny's picture

@ Bram Stein

Out of curiosity, how much overlap between your code and
http://code.google.com/p/hyphenator/, since hyphenation and justification are closely related, is it possible to combine both works?

maxgraphic's picture

Since this uses canvas to render text as a graphic, don't we lose selection/searching/indexing/accessibility? As much as I love decent H&J, that's a high price to pay.

Bram Stein's picture

@Khaled Hosny

Actual code, not so much. It is already possible to manually insert hyphenation points in my implementation, and adding hyphenator would just automate that process (i.e. the algorithm doesn't need to change, just the way the input is generated.) My current implementation just assumes there are no legal break points within a word.

@maxgraphic

Correct, this example uses canvas for rendering. It is however not limited to canvas. I have an implementation lying around that uses the browsers rendering engine, making text selectable, etc. I haven't released it, because I think justification should be a browser feature and not a JavaScript add-on.

Richard Fink's picture

@bram stein

I understand your having an opinion on how and where H&J should be deployed, but holding back code seems an odd way to make a statement! Anyway, could I perhaps get a look at what you've done?
rfink [..] readableweb.com or just post a comment to my blog, readableweb.com.

It's going to be a long time until we see hyphenation dictionaries within browsers. Meanwhile, JavaScript engines have become very fast, and the performance hit is neglegible in comparison to network costs - the costs in time of gathering the resources. It takes far longer for the browser to layout the page and parse the CSS rules than interpret the JS.
Let me know. Thanks.

Rich

Bram Stein's picture

@Richard Fink

Perhaps my statement about not releasing the code came out a bit weird. I'm not holding back any code for idealogical reasons. :) I merely made a proof-of-concept implementation but never actively promoted it because there are some limitations to it. I have now added the proof-of-concept to the Typeset page at:

http://www.bramstein.com/projects/typeset/

The code can be found on my Github page (browser-assist.js):

http://github.com/bramstein/javascript/tree/master/src/typeset/

What it does is split a paragraph into lines according to the line breaks found by the Knuth and Plass algorithm and then wraps each line in a span element with an individual CSS word-spacing attribute.

The main problem with this approach---and the main reason I think this should be done in the browser---is floating elements (i.e. elements that affect the layout of the text.) A JavaScript implementation would need to measure the "shape" of the text before it can do line breaking. As far as I know there is no easy way to get this text "shape" back from a browser, you can only approximate it by trying to lay out text and see where line breaks happen.

I'm still thinking about an efficient way of solving the above problem in JavaScript. If successful it would be possible to create a script that fixes both justification and hyphenation until browsers implement it natively.

As an aside, someone (@lqd on Twitter) combined my Typeset code with hyphenator.js and sent me the following screenshots:
* http://twitpic.com/1zdlh6
* http://twitpic.com/1zdh4a

Richard Fink's picture

@bram stein
thanks. I'll take a long look. I'm working on code (and an accompanying article) about H&J for browsers and it's looking pretty good, frankly. (The H&J, that is.)

I'm not quite sure what you mean by the "shape" of the text but I'll try to figure that one out. Got your email, BTW.

One problem I'd like to solve is exactly where the line breaks occur in all browsers. You could do it in IE7 but that got re-worked and broken in IE8. And I can proceed without it, but it would solve a few remaining problems.
(If anybody has any ideas? Please...)

Thanks again.

rich

Frode Bo Helland's picture

Am I missing something, or is this still canvas and non-selectable/searchable?

Bram Stein's picture

@frode frank

See the "Assisted browser line breaks" section and below.

@Richard Fink

I had a go at a bit more serious implementation today:

http://www.bramstein.com/projects/typeset/flatland/flatland.html

The page contains two texts: the one on the left is justified by your browser, the text on the right is justified by my implementation and hyphenated by Hyphenator.js. Canvas is not used. Both texts use fonts from Google's Font Directory. I have tested it in Firefox, Safari and Chrome, and all seem to work.

There are some (small) problems but I think most of them can be worked out:
* Selection looks ugly (this is to workaround a bug in Webkit that ignores subpixel word-spacing)
* Copied and pasted text includes the soft hyphens
* Chrome sometimes executes the H&J before activating the font (refreshing usually fixes it.)
* Floating elements are not yet supported (that is what I meant with the "shape" of the text, the line breaking algorithm needs to take the space that floating elements take up into account.)

Loading may look slow, but that is mostly caused by downloading the fonts from Google's Font Directory. Only then can the H&J algorithm run (as it needs the correct font metrics.)

What exactly do you mean by finding where line-breaks occur in all browsers?

Frode Bo Helland's picture

It breaks when when you change text size.

Bram Stein's picture

@frode frank

Don't do that :P

Seriously though, I think this is the result of my "fix" for Chrome (Webkit?) which doesn't support subpixel word-spacing (it rounds it to the nearest integer.) I hardcoded the spacing between words for all browsers to fix it. This will result in strange behaviour when changing the text size (and is also the cause of the odd looking selection.) This probably wouldn't happen if Chrome supported subpixel word-spacing. I have some other ideas on how to attack this particular problem, and hopefully I come up with something that works with text size changes.

Frode Bo Helland's picture

Sorry, Bram! :) Seriously though, when you start running into more problems then you had before you started, it’s usually time to restart the process.

Ever seen the roads these guys patch up? It’s sort of like that.

Richard Fink's picture

frode>It breaks when when you change text size.

stein>Don't do that!

hah!
I've run into this one before. This would not be a problem if browsers had a "zoom" event and/or an "ontextsizechange" event. Then you could handle it programatically. At least refresh the page or let the reader know what's going on.

@bram stein
Copied and pasted text includes the soft hyphens
This was a deal-killer for me until recently. Got this one figured out. (Not my work but I've improved it some.)

I'll take a closer look at your demo and be back.

Rich

Richard Fink's picture

@bram

I took a brief look and am still trying to figure out what your algorithm does that can't be done just by fiddling with the value of word-spacing. In that brief look, I'm having a bit of trouble figuring out exactly what the code is doing. (But I'll get back to you on that.)

>What exactly do you mean by finding where line-breaks occur in all browsers?

As you know, hyphenator.js takes a "shotgun" approach and adds soft-hyphens everywhere. But they are only visible where the browser breaks the word.
Here's what would be helpful: being able to identify only those soft hyphens that the browser displays. (Those breaking words at the end of lines.)
I need to identify them because I need to remove them, yet preserve the amount of space the soft-hyphen takes up horizontally and also preserve the line break. (Sounds odd, I know. Seems I can do the preserving, but it's the identifying that's hanging me up. I have an idea I want to try, but haven't had the time to write some code and test. But I will.)

In the meantime, maybe you're smarter than me, any ideas on how to identify only the soft hyphens that appear at the end of lines?

Also - do you think your algorithm would be useful in preventing widows? (Like I said, I'm still trying to follow the logic of your code. Sooner or later I'll firebug it. Or you'll explain!)

Thanks.

Rich

Bram Stein's picture

To understand what my algorithm (or rather the Knuth and Plass algorithm) does I need to explain how browsers currently implement justification. Most browsers implement justification by employing a first fit strategy (except Internet Explorer which uses the Knuth and Plass algorithm if enabled with "text-justify: newspaper;".) Words are inserted one by one using normal word-spacing. A line break is inserted before the first word that doesn't fit on a given line. The browser then adjusts the word-spacing on that line so that the line is stretched to the maximum width it can have (defined by the width of the text column.) Note that this strategy always increases word-spacing, it never decreases. Rivers are created by large words at the end of the line that do not fit and are moved to the next line. Large word-space values need to make up for that missing word so as to justify that line.

Hyphenation can avoid rivers by creating more break points in a line. So instead of just being able to insert a line-break between words, hyphenation makes it possible to also insert line-break in words (defined by a hyphenation algorithm or dictionary.)

My JavaScript implementation of the Knuth and Plass line breaking algorithm takes over line breaking from the browser. The major differences is that instead of just increasing the word-spacing it also allows decreasing of word spacing, and instead of basing its line breaking on one line it tries to minimize the word-spacing over the whole paragraph. For example, it might take a slightly less than optimal word-spacing in one line to prevent an even worse word-spacing in the next line. It also tries to prevent two consecutive lines from having a too large difference in word-spacing.

In my earlier Typeset examples I did not consider hyphenation and only focused on justification. I have since then added hyphenation using Hyphenator.js. This basically gives the algorithm more break-points to work with. The result is an even better justification (i.e. closer to the global minimum amount of word-spacing per paragraph.)

My implementation returns a list of line-break points that represent the global minimum of word-spacing. I split the original paragraph by the line-break points and calculate the amount of word-spacing required to justify each line. In my first tests I just passed the word-spacing as a CSS attribute on a line-by-line basis, but I quickly found out that Webkit (Safari & Chrome) do not support word-spacing with values less than 1 pixel. The necessary word-spacing is often less than 1 pixel so I had to resort to some hacks to fake it. I do this by adding fixed-width spaces. It unfortunately does have some side-effects as I mentioned (the strange selections.)

As my implementation completely replaces the browser's built-in line-breaking it knows exactly when and where line-breaks occur. As you can see in the Flatland example, I only insert soft hyphens where they are needed.

Prevention of widows ands orphans can be added to the Knuth and Plass line-breaking algorithm as an extension. I have not implemented this since I didn't see the need for it on a web page. Apart from printing there is plenty of vertical space. Though, adding it is about 10 lines of code as my implementation already calculates the necessary information---it just ignores it at the moment.

Regarding the detection of line-breaks. I've done something similar on my Typeset page. The basic idea is that you iterate through each word in the paragraph and check its Y position. If the Y position is different from the previous word, then you know the browser has inserted a line break. I'm not sure exactly if that is what you are after, but perhaps it is of help.

Theunis de Jong's picture

(i.e. closer to the global minimum amount of word-spacing per paragraph.)

Shouldn't that be the global optimal word-spacing?

As for 'widows and orphans', those are per definition page terms. On a web page, the page vertically never ends, so you'll never have to worry about windows or orphans! I think Richard may mean 'runts' -- a very short last line of a paragraph. Those can be avoided by stopping scanning for feasible breakpoints before the end of the actual paragraph is reached.

Bram Stein's picture

@Theunis de Jong

I wrote "global optimum amount of word-spacing" at first, but then I changed it to "minimum" because of the way CSS word-spacing works (i.e. it is not an absolute value but an additional value added to the default word-spacing, which in the case of CSS should be kept to a minimum.) What I meant is that the word-spacing changes should be as close as possible to the default word-spacing. You are correct in pointing out that is technically a global optimum.

I think runts are related to widows and orphans as well (at least in the algorithm.) The Knuth and Plass algorithm usually finds multiple sets of line-breaks. It would be possible to avoid runts by choosing one that might be (slightly) less optimal but has one line more or less to avoid single word lines. I think early stopping might also work, but I haven't tried that yet.

Theunis de Jong's picture

Ah, yes: from a point of view of adjustment to space it should be as minimal as possible, either positive or negative.

It would be possible to avoid runts by choosing one that might be (slightly) less optimal but has one line more or less to avoid single word lines. I think early stopping might also work, but I haven't tried that yet.

I don't even know why I wrote that down! I was, in fact, thinking of the TeX way: find all breakpoints but assign a penalty for the final line length; huge for a very short line, small for a rather short line, and perhaps a negative value ("I want this!") if it's an exceptionally good last line (by some objective definition ...).

I don't think this will need any adjustment of the rest of your code -- i.e., a paragraph is typeset optimally if it avoids an ugly runt at the end.

Bram Stein's picture

@Theunis de Jong

This presentation (given tomorrow) seems relevant:
http://tug.org/tug2010/abstracts/rundell.txt

I'll keep an eye on it and see if I can get my hands on a copy.

Theunis de Jong's picture

Finally! Knuth on this in The TeXbook (1984):

"TeX breaks lists of lines into pages by computing badness ratings and penalties, more or less as it does when breaking paragraphs into lines. But pages are made up one at a time and removed from TeX's memory; there is no looking ahead to see how one page break will affect the next one. In other words, TeX uses a special method to find the optimum breakpoints for the lines in an entire paragraph, but it doesn't attempt to find the optimum breakpoints for the pages in an entire document. The computer doesn't have enough high-speed memory capacity to remember the contents of several pages, so TeX simply chooses each page break as best it can, by a process of "local" rather than "global" optimization."

(my emph.)

(It's possible you already read and memorized every page.)

raph's picture

For the record, I think this is a great proof of concept. I've kicked around the idea of coding up something similar for a while now, and am very glad that someone has finally done it.

Making it really useful and usable is hard. I'm not sure whether the best path for that is to grind through the slog of fixing the problems one by one: (async loading and size-change issues, copy and paste, etc, etc), or to advocate for browsers to implement this natively. All I know is that I want high quality text layout in my browser before I grow old.

Bram Stein's picture

@raph
Thanks for the kind words. I see you were also involved with Google's font API---very nice work.

I'm kind of on the side of advocating the browsers to implement this natively. I have been looking through the Webkit and Gecko source, but I'm not that familiar with either engine internals (that, plus limited time.) As you said, I would love to have high quality text layout on the web as well, preferably sooner than later.

For now I'm trying to see how far I can take it in JavaScript. I have fixed some of the bugs I mentioned before: selections are working, resizing as well, and copy and paste is no worse than a normal paragraph with soft-hyphens in it (these fixes are not yet online though.) The big issue remaining is floating items; I'm planning to measure all floating elements I encounter, create space for them in the text and absolute position them in the appropriate place. The trick is making it generic enough.

Richard Fink's picture

@bram

Thanks for the explanation. It does help some. And glad to know after all these years that text-justify:newspaper in IE is the Knuth-Plass alg. Never knew until you told me. Live and learn.

Theunis is correct - I'm more concerned about runts. (And BTW - IE has a widow/orphan preventing CSS property for print stylesheets if I remember correctly.)

[Note: I'm not totally free to go into this in depth in a public forum because it's part of a larger article and the publisher understandably doesn't want to run the risk of someone popping out of nowhere when the thing is published and saying, "Hey, that was published on my blog! Take it down!" However, any help I get will be duly and happily credited and I'm perfectly free to send or publish links to my experiments and so I will do that. All info will be published free and clear for anyone to use, anyway.]

I'm convinced that decent H&J is now a practical proposition for the web. My instincts tell me that hyphenator.js's shotgun approach (along with text-align:justify) is the more flexible approach. The browser engine does more of the work. I've been keeping an eye on hyphenator.js for a year and half now and Mathias Nater has done an excellent job.
Want to see a case in point? Check out the browser-as-e-reader approach at ibisreader.com. Check out the adjustable page width feature. Your code would require a complete re-rendering. With hyphenator.js, the text would just reflow with the hyphens appearing where they need to be. Similar to the text size change issue. With that, too, hyphenator.js presents no problems. Floats present no problems, either.

I've put together a variation of hyphenator.js that I'm pretty happy with so far.
1) First, there was the cut and paste problem. (That is, if you select the words, "Helping the homeless", you get "Help-ing the home-less". WTF! But it's fixable as it turns out.)
2) My next step will be to add a generic CSS selector engine. Probably Sizzle from jQuery. This will add a lot more flexibility. So it won't be necessary to add the class "hyphenate" when there are already enough identifiers in the markup to select the elements you want to H&J.
3) After that, there is the problem of the context menu. For some reason, the soft hyphens are showing up as spaces. Help ing the home less. Another WTF. This too, looks fixable, though.

As soon as I've got the selector engine cooking, I'll get some test pages up somewhere for input. Maybe find a way to tackle the runts.

Thanks much. I'll be back.

Bram Stein's picture

@Richard Fink

For the record, I do not know for sure that Internet Explorer uses the Knuth and Plass algorithm for justification when text-justify is set to newspaper. I only compared the justification of my implementation with the one created by Internet Explorer, and found them to be identical for my sample text.

I agree that just hyphenation (using Hyphenator.js or some other script) is the more practical solution right now. I'm hoping to convince browser authors to implement the Knuth and Plass algorithm instead of trying to provide this as a drop-in script. I'm looking forward to your article (is it going to be published at ALA again?) Are you planning on mentioning the sub-optimal justification done by most browsers? I was planning to write a technical article on that, but perhaps I don't need if you cover it as well.

With regards to re-rendering; almost every change to a page requires re-rendering, the question is where that re-rendering is executed. With Hyphenator.js it is done by the browser, with my implementation it is done by JavaScript. Both are doing essentially the same---calculating font metrics, and re-doing the justification. The greatest slow down in my implementation is not actually the Knuth and Plass algorithm (which runs very fast) but getting font metrics back from the browser. I'm currently not caching any of the measurements I do, so there is room for improvement there. There are however a myriad of other problems as you mentioned. I'm slowly working my way through them, but I doubt it will ever be as good as a native implementation. It is kind of fun to see how far I can push it though :)

jdaggett's picture

One thing to note here, hyphenation controls are included in the Editor's Draft of the CSS3 Paged Media spec:

http://dev.w3.org/csswg/css3-gcpm/#hyphenation

I don't know if there are any implementations of this already (Prince? Antenna House?) but it would interesting to hear opinions as to whether these properties cover the desired functionality. The hyphenation resource property is interesting, it avoids the problem of how to cover a wide set of languages. Seems like it might be good to include 'auto' in there, to pick up system dictionary resources if available.

Richard Fink's picture

@jdaggett

Thanks very much for the heads up. I'll look over that spec carefully.
I do think Prince supports H&J but not sure how they're implementing it.
Must look.

@bram stein, frode, theunis, et al:

Here's a crude and basic page, but with pretty decent H&J, I think.
Using a font stack of constantia, georgia, serif.
It's the first chapter of Joseph Conrad's Heart Of Darkness.
Looking good in Chrome, FF, IE, Opera, Safari.
You can Page Zoom it, Text Zoom it, and it still holds up. (The font-size is px, so IE's text size menu won't do squat.)

Uses Sizzle as the selector engine and my own fork-in-progress of hyphenate.js.
Many improvements to make, still, but so far, I like the results.
(I would love some font-family/font-size/word-spacing/line-height/column-width widgets on the page to speed up the design process instead of bouncing back and forth from the style sheet to the browser. Only a matter of time, I guess.)

My next move might be to do a print style sheet using points and inches instead of pixel values and see how that pans out. Try some different stuff and print to PDF's to see what's coming out. Break out of the pixel grid a little.

Print stylesheets never got much traction - they remain largely unexplored territory. But if you can include a printable version of a document with really nice typography along with the screen version, that would seem to make a lot of sense to do. No?

Anyway, that seems to be what the folks at Prince and Antennae House that JD mentioned are banking on - with PDF as the intermediary format.

Rich

dezcom's picture

Flatland was one of my favorite books as a child!

Frode Bo Helland's picture

This looks real nice, Richard! An external javascript file is a viable solution until browsers start moving.

Why not ems and/or percentage?
Are there any issues with @font-face?

Theunis de Jong's picture

Your sample looks awesome! Congratulations, it looks just like a PDF from InDesign!

Frode Bo Helland's picture

What does the word-spacing in the CSS have to say your method?

riccard0's picture

@Richard Fink (3.Jul.2010 2.32pm) It looks good even in Camino (nightly)! :-)

Richard Fink's picture

@frode
>Why not ems and/or percentage?
Are there any issues with @font-face?
Ems and percentages will work fine. Remember, the font always gets rounded out to a pixel "computed style". So if ems and percentages are your preference, not a problem.

>Are there any issues with @font-face?
No problem in IE. But I'm not sure what's going on with the other browsers. Last I checked, FF *did not* print with an @font-face font. It's an important question - I have to check that out.

>What does the word-spacing in the CSS have to say your method?
I just usually prefer a tighter word spacing than the browser usually provides. And, of course, if you tighten it up (or loosen) it will affect where the line breaks occur.
And the font plays a role, too - for word-space values you should always use ems.

@theunis
why thanks. I've read Heart of Darkness I think, twice, in my life. Today I found myself reading it again. Now I'm going to have to set the other two chapters so I'm not frustrated.

@riccard0
Thanks for checking in Camino. I just bought a Mac for the first time in my life about a week ago but believe it or not I haven't turned it on yet. (Been too busy and I want to take my time with it.)
You reminded me that soon I'll be able to test outside of Windows. Yay!

Mathias Nater's picture

Hi

Thanks for all your nice words about hyphenator.js (It makes me feeling great;-)
I just stumbled upon this very interesting discussion and I'd like to share some thoughts:

(a) When I started the hyphenator.js-project I wasn't sure if it is right to do such things in Javascript. In my opinion this definitely should be done by the browser. But the parts of the CSS3-Specification treating hyphenation haven't changed for a long time and there seems not to be much work on it currently. So I thought it would be nice to have this 'crutch'. And I hope that it is used and that the demand for a native solution grows with it. I think that Bram Steins excellent work tends to create the same demand.
And if there are some browsers supporting hyphenation and/or hq linebreaking it's great to have this 'crutches' for the others in the meantime.

(b) As every JS-library guy, I'm struggling with browser flaws an inconsistencies. It's often hard to decide, if I should wait for the browser to fix it or if I should do a hack to work around it.
The issue with copy and paste is just one example. BTW: it's fixed. Thanks to sweet-justice
@Richard Fink: it's the same approach you made. We did some double work here. Are there more such nice things to come?

(c) Many people are asking for new features. Meanwhile I'm trying to hold the library as small as possible and for most requests there are two answers: 'it's already in there' or 'it shouldn't be in there'.
Having a selector engine to select elements to be hyphenated is one of it. In hyphenator.js one can redefine the 'selectorfunction' so it's not necessary to include an engine that has to be loaded twice if there are other libs that use it.

(d) I did a lot of investigation about how to avoid the 'shotgun method' (nice wording, though;-). The problem is that changing the DOM is very expensive and measuring lines and getting hints about where the line is broken means inserting a lot of spans (and re-doing the whole process upon resizings). So the actual method is ugly but turned out to be very effective.

Greetings from Switzerland,
Mathias

John Hudson's picture

Rich, the justified Conrad example looks really good. It even works pretty well on my mobile with, obviously, a narrower column and text a bit too small.

By the way, ‘Justified Conrad’ sounds like an indie rock band.

Richard Fink's picture

@John Hudson
JH>It even works pretty well on my mobile with, obviously, a narrower column and text a bit too small.

Thanks.
Media Queries (CSS3) hold the most promise for getting the text to fit the viewport with the least amount of hassle. But there's stuff that can be done now, today, that will work pretty well, too.
I played around with H&J quite a bit last year and there's tricks I didn't pull out for this demo.
For font sizing - unless you provide a Text Size widget in the page itself - we're stuck with Zoom. But it does Zoom and re-hyphenate without any hiccups.
Makes you wonder WTF is up with iBooks, doesn't it?

@Mathias Nater

You da man! I like the way you constructed hyphenator.js a lot. It's easy to follow what function is doing what. Easy to see where to hook in something like a selector engine, for example. Or the copy-paste fix. (Next problem is dealing with the right-click contextmenu.) The browser makers should be automatically stripping the soft-hyphens. I'm going to file bug reports on it and personally bug the sh-t out of them about that.
>the actual method is ugly but turned out to be very effective
It seems wasteful but because the text is sure to reflow due to Zoom or Text Size or the size of the container element or viewport, it's the *necessary* approach.
It works.
>We did some double work here.
I was going to contact you, we should join forces, definitely. Now that the copy/paste issue is pretty well solved, I think H&J can fly. Even if it remains in JavaScript for a long time, I don't necessarily see performance as an issue. The javascript engines in modern browsers are incredibly fast and if you look at the stats, script execution time is almost never the bottleneck.
Hyphenator.js as a jQuery plug-in perhaps?

(BTW - my wife's family on her mother's side are Swiss. Lugano. Emigrated to the states after the war. She spent summers there growing up.)

Richard Fink's picture

@Mathias Nater
>http://bitbucket.org/webvariants/jquery-hyphenator/wiki/Home

Thanks! I have to experiment with it. I'll post a page along with some other plugins that add some touchscroll functionality and other fun stuff.
(I don't know about anybody else, but I find it easier to read a scrolling document if I can push it up slowly with my mouse acting like a finger on a touchscreen. Like you can do in a PDF. Makes it easier to keep my place in the text.)

I also found out just today that the epub format can be unpacked, the individual files posted on a web server, and read by a javascript library.

This is all getting more and more interesting.
Being done here with epubjs.

Now, I don't think anybody's incorporated hyphenator.js into *that* yet.

Here's more info, it seems there are some other ePub javascript readers, too:
http://ajaxian.com/by/topic/ebooks

I'm going to start writing my article in earnest now, and shine a light on all this stuff.
I'll email you and come back here, too.

@frode
>Are there any issues with @font-face?
No.
And as far as printing I found - testing in IE and Firefox - that as long as the style sheet has the right media attribute or @media rule: "all" or "print", an external font prints fine, too. And I'm able, in the print style sheet, to get the in-between font sizes: like the difference between 16pt, 16.5pt, and 17pt that the browser won't distinguish because everything has to snap to the pixel grid. (Values of 12.5 px not being allowed.)
Makes me wonder what's going to happen when screens are high-res enough to do it, too.

Bram Stein's picture

@Mathias Nater,

Good to see you join the discussion! How was BachoTeX? I couldn't make it unfortunately.

I'm not sure if I can take justification to the same level that you've taken hyphenation. Doing proper justification requires a lot of measuring and DOM modifications. What it comes down to is redoing (in JavaScript) most of the work the browser has already done. I'm actually surprised by how far I've gotten.

You said people are asking for more Hyphenator.js features. Here's a new one, I would like to have less features. Do you have any plans to make Hyphenator.js more modular? I am mainly interested in the hyphenation core and don't have much need for all the DOM manipulation code. You could of course bundle the modules in a single file for distribution like you do now.

@Richard Fink

I'm actually working on an ePub reader with proper hyphenation and justification, hence my interest and recent work on the subject :)

Dealing with the pixel grid has been interesting: it seems Webkit based browsers actually round values before they use them in their calculations while Gecko (Firefox) based browsers use the values and then round the result. The latter is of course preferred by me because I can then set word-spacing to values less than one pixel, and have it distribute the word-spacing evenly before snapping to the pixel grid. (You can do this manually in Webkit based browsers, but it is a bit more involved.)

Mathias Nater's picture

@Richard Fink
>Now, I don't think anybody's incorporated hyphenator.js into *that* yet.
I think calibre has.

@Bram Stein
BachoTeX was great. Really friendly and very interesting people. The talks (at least the english ones) were great and led to the future. I got a bunch of work to do;-)

I'll think about making Hyphenator.js modular. It seems to be a logical step now. But you'll have to be patient.

Richard Fink's picture

@mathias nater
>I think calibre has.
thanks for the info. I'll check it out.

I was just quickly looking over the additions to version 3 - absolutely excellent.
Adding DOM storage was a great idea.
BTW - one thing of convenience that's easy to provide is a minified version of hyphenator.js. Can do?
Or would you rather users do it on their own?

Also, one thing I just ran into with a little iPhone demo page I was playing with is the need to add an event to each paragraph. Where in the code might *you* see that fitting in?

Anyway, great, great, great - (you too Bram!) - and I guess I'll start posting questions and suggestions on the Hyphenator.js Google Code site.

Regards,

Rich

Richard Fink's picture

Back already - just got twittered with a link to an interesting page.
Needs hyphenation, though!

http://lamb.cc/typograph/

rich

Mathias Nater's picture

There's a tool called mergeAndPack in the bundle (http://hyphenator.googlecode.com/svn/trunk/mergeAndPack.html)
It merges core patterns and settings in one single file and packs it. It's really easy to handle but on most sites the source version is used. People don't read the how-to's:-)

Those upcoming features (webstorage) are great. I'm planing to use webworkers, too. Already did some work, but it was too early. The time has come now…

Richard Fink's picture

@mn
Yeah, I saw the merge and pack. An excellent idea as well. There's a lot more optimization is to be had by cutting down the HTTP requests through incorporating everything into a single .js file than anything else. At least on the first visit, with nothing in the browser cache, that's for sure.

Also, as long as we're rockin' and rollin' in the optimization department, there's a little used default behavior in IE6, 7, and 8 (don't know about 9 yet) called the userData behavior that gives you expanded local storage. Like a super-large cookie. Sort of a precursor to webstorage.

http://msdn.microsoft.com/en-us/library/ms531424(VS.85).aspx
The userData behavior persists information across sessions by writing to a UserData store. This provides a data structure that is more dynamic and has a greater capacity than cookies. The capacity of the UserData store depends on the security zone of the domain. The following table shows the maximum amount of UserData storage that is available for an individual document and also the total available for an entire domain, based on the security zone.

I've used it and it works. I'll play around with it as time allows and let you know if I think it can be helpful to Hyphenator.

rich

Mathias Nater's picture

@rich
>the userData behavior that gives you expanded local storage.

IE8 does support DOM Storage (http://msdn.microsoft.com/en-us/library/cc197062(VS.85).aspx) and Hyphenator degrades gracefully for older versions. Adding code for non standard features in outdated browsers isn't the right way (imho) as long as it is degrading.
So there is no use for that. Thanks anyway.

Richard Fink's picture

@mathias
Well, it would all depend on how much bang you get for the buck. In this case, I think you're right. Not worth the extra code for IE7 and 6. 'Twas just a thought.

rich

Bram Stein's picture

For those interested, I've fixed some of the bugs that were mentioned here before and created a new example: http://www.bramstein.com/projects/typeset/flatland/

It doesn't use external fonts so it is pretty fast (and in my measurements about twice as expensive as hyphenation.)

Syndicate content Syndicate content