Eye-Tracking Measures Provide a “Metrics of Readability”

enne_son's picture

At the link below is a study of the effects of intraword and interword spacing that uses eye-tracking techniques used to study eye-movements in reading and presents the eye-tracking measures as providing a “metrics of readability.” This is somewhat of a departure, since most studies using eye-tracking techniques use the eye-tracking measures to draw conclusions about processing mechanisms. And recently the authors, Timothy J Slattery and Keith Rayner, did a study on the influence of text legibility on eye movements during reading (specifically in relation to ClearType technology).

The shift in eye-movement studies to talking about readability is noteworthy in relation to the association, in typography and type design circles, of readability with “the ease with which the eye can absorb the message and move along the line” [J. Ben Lieberman, 1967].

According to the authors, their results indicate that the optimality of intra- and interword spacing is font specific; font designers are doing a relatively good job at selecting default intraword spacing values; an optimal amount of interword space will be a balancing act similar to finding an optimal amount of intraword space; and there is an interword versus intraword offsetting effect relating perhaps to the countervailing demands of parafoveal and foveal previewing / viewing, and the benefits to foveal processing of an effective parafoveal preview.

The authors also suspect that, [paraphrasing here] for the purpose of reading, words are more important objects than letters: words are the important processing unit for reading. They argue that [paraphrasing again] it is the properties of words and their recognition that influence eye movements during reading. “While successful letter perception is a necessary step in reading, the bottleneck in reading performance is with word recognition.” Associated with this there is a recognition that disruption of “the integrity of word units” is an issue with increasing intraletter spacing.

How would using eye-tracking measurements to provide a “metrics of readability” work? Probably more needs to be done to formalize this, but in their analysis Slattery and Rayner look at “global measures” and “target-word-dependant measures” to form a global picture of effects, that looks beyond significance scores to trends. Typophiles interested in pursuing this can buy or use their institutional access, if they have one to explore this by going to:
http://link.springer.com/content/pdf/10.3758/s13414-013-0463-8.pdf

“Effects of intraword and interword spacing on eye movements during reading: Exploring the optimal use of space in a line of text” Timothy J. Slattery, Keith Rayner. Attention, Perception, & Psychophysics, May 2013

Abstract
Two eye movement experiments investigated intraword spacing (the space between letters within words) and interword spacing (the space between words) to explore the influence these variables have on eye movement control during reading. Both variables are important factors in determining the optimal use of space in a line of text, and fonts differ widely in how they employ these spaces. Prior research suggests that the proximity of flanking letters influences the identification of a central letter via lateral inhibition or crowding. If so, decrements in intraword spacing may produce inhibition in word processing. Still other research suggests that increases in intraword spacing can disrupt the integrity of word units. In English, interword spacing has a large influence on word segmentation and is important for saccade target selection. The results indicate an interplay between intra- and interword spacing that influences a font’s readability. Additionally, these studies highlight the importance of word segmentation processes and have implications for the nature of lexical processing (serial vs. parallel).

hrant's picture

{To Follow}

Nick Shinn's picture

So, they have discovered that spacing (“selecting default spacing values”) is a part of typeface design.
Should one applaud or groan?

hrant's picture

Larson's answer might be revealing...

hhp

John Hudson's picture

Prior research suggests that the proximity of flanking letters influences the identification of a central letter via lateral inhibition or crowding. If so, decrements in intraword spacing may produce inhibition in word processing. Still other research suggests that increases in intraword spacing can disrupt the integrity of word units.

In other words, existing research suggests that there is a sweet spot for intraword spacing, which is wide enough to minimise the effects of crowding but not so wide as to disrupt the integrity of the word as a visual unit. That conclusion shouldn't be any kind of surprise to any type designer or typographer. What might be interesting is developing methods to experimentally quantify that sweet spot, and hence determine whether that sweet spot corresponds to conventional text spacing as practiced by type designers and typographers.

hrant's picture

experimentally quantify that sweet spot

Yes.
But you need a field scientists who's willing to give boumas a chance.

hhp

enne_son's picture

[Nick Shinn] So, they have discovered that spacing (“selecting default spacing values”) is a part of typeface design.

No, the discovery, if that’s what you want to call it, is that spacing influences a number of eye-tracking measures.

Sentences, like “They shouted at the driver who wildly cut them off” were used. In the first experiment 10-point Cambria and 10pt Times New Roman with Microsoft ClearType subpixel rendering were used. There were four levels of spacing: reduced by half a pixel, normal, increased by half a pixel, and increased by a full pixel.

The global measures of sentence reading used were: mean fixation duration, number of fixations, total sentence reading time, and comprehension question accuracy. The “target-word-dependent measures” used were: first-fixation duration (the duration of the first fixation on the target word), gaze duration (the sum of all first-pass fixations on the target word), skipping rate, and the length of the critical saccade that landed on (or beyond) the target word. Of the target-word-dependent measures, the first fixation duration is interesting:
Spacing: –1/2 px normal +1/2 px +1 px
Cambria: 263 (7.7) 244 (7.9) 248 (8.7) 253 (6.5)
Times New Roman 261 (9.4) 255 (8.0) 236 (8.1) 246 (8.2)

For total sentence reading time (a global measure) there is this chart:
www.enneson.com/public_downloads/typophile/2013_slattery+rayner_chart.jpg
[insert image not working for me]

Not all the measures are equally informative, and some show complicating trends, but that’s why a ‘metric’ of readability has to be developed around them.

eliason's picture

In well functioning text types, interletter spacing should be bear a certain balanced relationship to "intraletter spacing" i.e. counterspaces. Changing the spacing of a given typeface is "breaking" that relationship, and makes me wonder whether the measured reading inefficiencies reflect that the adjusted spacing is not in the "sweet spot," or just this brokenness instead.

Would it make sense to test well-designed condensed and expanded cuts of a typeface family in place of altered regular fonts?

John Hudson's picture

Craig: In well functioning text types, interletter spacing should be bear a certain balanced relationship to "intraletter spacing" i.e. counterspaces.

That's a given for most typographers. But I believe in terms of empirical testing it would constitute only an hypothesis, i.e. something that has not yet been experimentally confirmed but on which one could base reasonable predictions that could then be tested with a well-designed experiment. Bill Berkson has taken the hypothesis one step further in suggesting a reason why it might be the case: mental alignment of features to a perceptual grid that is aided by regular arrangement of visual features, i.e. your 'certain balanced relationship'.

Nick Shinn's picture

This “metrics of readability” is just a tracking test of a couple fonts.

If the optimality of tracking is font-specific, what conclusions can be drawn, other than that readable typography is well tracked?

Déjà vu!—
http://typophile.com/node/41365

Kevin Larson's picture

The question being addressed in this paper is what happens to the process of reading when you change the amount of space between letters, and when you change the amount of space between words. Eye tracking lets us see if these manipulations change the amount of time a word is fixated, the length of the saccades, and the number of regressions (backward saccades).

While everyone recognizes that there is a sweet spot of optimal letter and word spacing, I don’t think it’s clear where that sweet spot is located, nor is it clear what happens when the spacing is too tight and when it is too loose. If everyone agreed on the best amount of spacing then we wouldn’t see the difference between typefaces that we see today. In this study Times New Roman has a tighter default spacing than Cambria.

My multidisciplinary team (psychologist, typographers, mathematician, and computer scientists) was interested in this question, and funded Slattery & Rayner to find out. We also provided them with the tool that they used to generate their spacing conditions. Before I say anything about the results of their studies, would anyone like to guess what happened to fixation times, saccades, and overall reading time when A) letter spacing changed, B) word spacing changed, and C) if those changes impacted TNR and Cambria differently?

Nick Shinn's picture

It’s a meaningless test unless you isolate the variable in one typeface.

“Cambria” and “TNR” are not scientific values.

I don’t believe there is a sweet spot for a font’s H&J values (“letter spacing and word spacing”)—it varies with document design, textual content, document length, media, reading conditions, and reader demographics.

John Hudson's picture

Kevin: If everyone agreed on the best amount of spacing then we wouldn’t see the difference between typefaces that we see today. In this study Times New Roman has a tighter default spacing than Cambria.

In this regard it is important to understand why different fonts are spaced the way they are, and hence why particular apples and oranges might not be the best subjects for this sort of testing. Times New Roman shows all the signs of inherited unitised advance widths from previous typesetting technology. Take a look at the advance widths of the upper- and lowercase letters: they are a small set of common widths, based on multiples of appoximately 115 units (the actual number varies, due to rounding from the original unit metrics to the TrueType 2048 UPM grid). So this is a font that has been spaced not with regard to 'the best amount of spacing', but according to the limitations of obsolete technology.

[Unitised widths will have had an impact on letter proportions as well as interletter spacing. Note, for instance, that the counters of the lowercase m are actually slightly wider than that of the n, which is the opposite of most design practice.]

Thomas Phinney's picture

I expect that in general, changing the tracking (all spacing including intra-word) or the between-word spacing will harm readability.

That being said, Times probably won't get hurt much by just a *little* increased tracking. It might even be an improvement. That would be because it was optimized to balance saving paper against maximizing legibility.

Nick: "It’s a meaningless test unless you isolate the variable in one typeface."

Can you be more specific? My reading of this is that they did the full set of tests on *each* typeface. How would that be meaningless, and what could they have isolated better?

dezcom's picture

It seems clear from experience that spacing is affected by weight and the absence or presence of serifs. This is the relative interaction of counter space to letter space and proximity attraction/repulsion (also stroke contrast). Generally, the bolder weights need less letterspacing (which makes spacing extrabold weights tougher than light and open weights. It would make sense to assume that a reader is looking for an amount of space where the word holds together as a word without either falling apart into discrete letters or jamming into an unclear mass. The "How Much" part is very much dependent on glyph shape interactions and would be very difficult to generalize about across typefaces. Since they mention pixel units, I assume the whole study refers to screen reading, not print. Units are much cruder on screen than print so the sweetspot would appear to be less forgiving than with print. One unit tracking in print is a far finer change than one pixel change in screen reading of a comparable size.

Nick Shinn's picture

Thomas, I think John just did so very nicely!

In this test, what is the point of testing Cambria vs. Times?

Certainly, it confirms that Cambria, designed by Microsoft for digital display, reads faster on screen at 10 pts than Times, designed long ago for letterpress, somewhat out of its element.

So this research validates Cambria according to the readability mantra associated with corporate efficiency.

But what does this reveal about the “metrics of readability”?
It’s been obvious to typographers for 20 years that Times, even the heavily hinted “core TrueType” fonts distributed with IE, has some spacing issues—accidental ligaturing, for instance—at low resolution screen display.

Now let’s see Cambria go up against a WebType font!

enne_son's picture

In exchanges with other researchers doing spacing tests I've tried to argue for a better benchmark than default values.

In my personal thinking about spacing I’m beginning to work with a notion of “cohesive equilibrium.” A metric of cohesive equilibrium was introduced by William Dillard Orbison way back in the 1930s in a little-known paper published in 1939 in The American Journal of Psychology, titled: "Shape as a Function of the Vector-field." The paper uses ideas about dynamic forces present in gestalts which resemble those Chris described two posts above. This notion of cohesive equilibrium links up intuitively with some of the ideas I’ve explored on typophile about “narrow phase alignment” in type, and with Gerrit Noordzij’s notion of a rhythmic cohesion or equivalence of the whites inside and between letters.

My thought is that type designers and typographers shoot for a cohesive equilibrium — the sweet spot — in their design processes, and any changes to optimal spacing introduced by increasing or decreasing between-letter space relative to within letter space, will disturb the cohesive equilibrium and affect readability. So a gauging of cohesive equilibrium for letters in words could provide a spacing benchmark. I don’t know if the Orbison metrics can actually find positions of cohesive equilibrium in type. His study uses geometric shapes superimposed on complex geometric fields. I’ve pointed out before that Fourier Transforms (frequency channel) appear to gauge phase alignment in type, independent of style, weight size and contrast, so now I’m thinking Fourier Transforms will help gauge cohesive equilibrium quickly and easily.

The question for psychologists would be what happens to the process of reading when you change the amount of space between letters so letters are no longer in positions of cohesive equilibrium, and how does word spacing modulate this.

The mathematical challenge would be to find algorithms that capture the narrow phase alignment typical of well-designed and well-spaced type, and use this alongside of a “metrics of readability.”

The thing about the eye tracker data is that some measures appear to point in slightly different directions and might lead to conflicting inferences, for example the first fixation durations (target-word-dependent measure) align pretty well with the overall reading time (a global measure) for experiment 1 (see the data and link in my earlier post), but the average fixation duration data (a global measure) for experiment 1 shows a different trend. The + 1/2 and + 1 spacing perform progressively better. So a key will be to learn how to read the data in such a way that they will indeed provide a reliable metric of readability.

Nick Shinn's picture

Yes, default values are discretionary, determined by the foundry.
Kerning has an effect too.
John Hudson has suggested some kind of “envelope” algorithm for spacing.

This kind of tracking (typographic, not eye) test seems particularly divorced from the spacing considerations type designers address.

While Fourier analysis does reveal some kind of pattern grid, I would think that spacing design acts against the grid in a subtle manner, so how would readability metrics even begin to identify how that happens?

How does fence-posting effect reading speed?
How does stem thickness and contrast effect fence-posting?

How would a readability metrics theory be able to accomodate both the incunabula and neoclassical methods of spacing?—


(Image from the Typophile thread discussing Richler’s spacing: http://typophile.com/node/102875)

enne_son's picture

[Nick] While Fourier analysis does reveal some kind of pattern grid, I would think that spacing design acts against the grid in a subtle manner, so how would readability metrics even begin to identify how that happens?

Yes! that’s why it’s narrow phase alignment. In narrow phase alignment the information coming from the vertical strokes and the vertical means of curved strokes clusters around a phasal mean but rarely sits right on it. It’s emphatically not perfect phase alignment. A metrics of readability is not a spacing formula, but would be a way of weighting and integrating the measures eye-tracking technology provides.

In an earlier thread we were able to see subtle but distinct differences in the kind of phase alignment different spacing methods produce.

Kevin Larson's picture

There are many interesting ideas for further study here. Peter, I too am interested in having a metric for evaluating the evenness of spacing within a typeface, though have not been satisfied with anything I’ve seen yet.

It is not the goal of this test to compare TNR against Cambria. We’ve made no press releases publicizing these findings.

The goal is to learn more about the role of letter and word space in reading: what happens to eye movements when you reduce letter space, when you reduce word space, when you increase letter space, and when you increase word space? These results could be used to inform future typeface designs.

Since only Peter has had the opportunity to read the paper, would anyone else care to predict what happens to eye movements when letter and word space are increased or decreased?

dezcom's picture

The logic of the sweetspot is any digression from it in either direction (given all other variables remain the same) would cause degradation. The weight/contrast/shape values would determine the degree of degradation. I would assume the heavier weights would suffer more than the lighter ones. I would assume Times would suffer more than Cambria, as well.

Kevin Larson's picture

Chris, by degradations, I assume that you mean that all of the above would happen: fixation times would increase, forward saccades would get shorter, and the number of backward saccades would increase. Do different things happen when the letters are too close together versus too far apart? Do different things happen when words are too close together versus too far apart?

dberlow's picture

I think, e.g. this should never be said idly...
"...the goal is to learn more about the role of letter and word space in reading: what happens to eye movements when you reduce letter space, when you reduce word space, when you increase letter space, and when you increase word space? These results could be used to inform future typeface designs."

...without qualifying at least the ballpark of resolution in which the study is being done.

What is resolution in this study? After all, if everything that were true about reading, was resolution-independent, it would be further proof that we know nothing.

Then, most likely...the goal is to learn more about the role of letter and word space in reading at course screen resolutions: what happens to eye movements when you reduce letter space, when you reduce word space, when you increase letter space, and when you increase word space in huge increments relative to what readers are used to, and at lower resolutions than they are accustomed to reading?

These kind of results have already been used to inform typeface designs for text at lower resolutions. You see, it's best to leave these types of fonts alone (RE series, Verdana, Poynter Agate, e.g.), as far as spacing goes. It's only fonts that don't work well at low resolutions and small sizes, that "seem to" react well to re-spacing in low resolutions, you see...

dezcom's picture

" by degradations, I assume that you mean that all of the above would happen: "
Kevin,
What I mean is that their may be some correlation between eye movements and reading variables, but I don't buy that we really know cause and effect properly. Therefore, my use of the term "degradation" refers to reading efficiency as a generality without determining if forward saccades get shorter or if their are more regressions. Before we can, with certainty, rule out other factors as contributory (such as awkward sentence construction, unclear wording, readers knowledge of the words or context), I don't know what to make of regressions. If I read a word or words which reminds me of something else, I am sure to have a regression even if I know the meaning intended by the author. I don't know how you can rule out attentiveness or distraction by other mental processes not in evidence. Someone with ADD may be a prime case. Another just might be me seeing the possibility of a pun unintended by the author or not even consciously brought on by me.

dezcom's picture

and yes, "different things could happen" that might be interpreted as degradation, but I don't know what they would be. I might guess that too open a spacing would cause a reader to look more closely at individual letters instead of perceiving the word as a unit. My guess with too tight a spacing is that the shapes between letters could be confused with counters and cause confusion like the r-n as m problem, c-l as d, v-v as w. Or just an unclear mashup of forms that become confusing and need to be reread more times in context to decipher meaning.

Chris Dean's picture

[to follow]

Syndicate content Syndicate content