New to Typophile? Accounts are free, and easy to set up.
Create an account
Typophile RSS | More Feeds
Assume this is a new study...
Next up, students instructed on how to do laundry: Arial vs Old English (a hard-to-read font that looks like it was written with a pen).
"Assume this is a new study..."
Why? It looks like the same study to me, i.e. useless to anyone but the utterly typographically ignorant who also need instructions. ;p
What’s sad is Song & Schwarz getting picked up by every cheesy news site on Earth, but the only people who read the Yaffa story about Clearview HWY are probably smart enough to not demand designers use Arial in the first place.
This is the same study. While I enjoy roasted peanuts (fancy!), does anyone want to propose an improvement to the Song & Schwarz study? If the article was about a painting technique, would people think the technique easier to perform if they were reading about it in Brush script than in Arial? Which fonts would be better than Arial for convincing people that an exercise program is easier?
…does anyone want to propose an improvement to the Song & Schwarz study?
For starters, bring more typefaces into the mix. Two just isn’t enough. The sample needs to be much larger. And before the faces are chosen a very serious consideration of what makes a typeface easier to read really needs to be considered—I doubt that many people could write a convincing explanation of why Arial is more readable/legible than brush script that most undergraduate design students couldn’t shoot full of holes.
"...does anyone want to propose an improvement to the Song & Schwarz study? "
Yes. I would like S&S to publish the composition they used in the study. Any other suggestions would have to follow examination of these/this composition(s). Thanks!
Hyunjin Song was kind enough to send me the Word document that contained their test material. The study participants saw printed documents, so that will eliminate the letterspace rounding errors. This image contains the first four bullet points from the recipe stimulus. Any given participant either gets the page with the recipe in 12 point Arial or get the page with the recipe in 12 point Mistral. When printed the first line in the Arial stimuli is 14.9cm long from the bullet to the period after tofu.
The easy way to get a significant outcome (and justify your funding) is to start with a preconceived result and make one option "designed to fail".
How can you take this seriously Kevin?
Now, if they had budgeted a typographer into the mix, and said "Arial and Mistral, a recipe with the same copy (text)--do the best you can with each," then maybe you'd have a fair comparison. But still la-la land science, because those just aren't typefaces that professional designers use for recipes.
You should at least set these fonts to the same x-height. What kind of science is this?
(BTW I bet that Mistral would win in that case…)
. . .
Bert Vanderveen BNO
Nick, I think you continue to misunderstand the point of this study. They found that people who read the recipe in Arial thought that it would be faster to complete the steps in the recipe. It’s utterly boring to say that one was designed to fail. It’s 12 point Arial and 12 point Mistral. They are what they are.
Bert, your prediction is more interesting. You are predicting that a larger point size of Mistral would result in people believing that the recipe is easier than with Arial. Could you generalize this prediction? Since it is also then easier to complete the recipe with the larger point size of Mistral than the smaller point size of Mistral, is it true with all other fonts that will perform better with larger point sizes? Alternatively, is there some point size for Mistral that will result in the people believing that the recipe will be easiest?
Sheesh! As designed, the whole rigmarole is utterly ludicrous. Might as well give one group of test subjects recipes printed in R. Crumb's hand-lettering, say from the "Zap Comix" heyday, and the other group the identical recipe printed in Schaftstiefelgrotesk and afterwards ask them which version might demand more culinary discipline. And to think this so-called study was financially underwritten to boot. [shakes head in disbelief; exits stage left]
I'm posting this on behalf of Norbert Schwarz:
Hi Typophiles -- Kevin kindly forwarded the link to this discussion and I have to say it's amusing. If you had ever bothered to read the paper you discuss with so much engagement, you would have noticed that it is NOT about type fonts. It is about inferences from the subjective experience of ease of processing and in other studies we use manipulations like ease of pronunciation, figure-ground contrast, repetition, distraction etc. All we care about is that our difficult condition differs from our easy condition in terms of experienced ease of processing. Arial and Mistral meet this criterion.
No typographer is needed to figure that out and no position has been taken in your turf wars. Relevant papers are at
Best, Norbert Schwarz
it is NOT about type fonts
And yet you suggest that one "type font" is measurably easier to process than another?
That is not true.
You might just as well say that a trumpet is easier to hear than a piano.
It shouldn't be about fonts.
It should be about typography.
Because it is not the choice of typeface, but the way it is set which determines its readability.
And yet you blithely speak of "easy to read type fonts" as if such things exist objectively.
That has more typographic truth than scientific truth.
What scientists don't seem to be able to grasp is something that designers are very familiar with, namely that the metacognitive qualities of typography are as much cultural as physical.
As I said before, one doesn't have to read the text to infer the difficulty of the task.
To test the veracity of that proposition, try the experiment with a langauge that the participants don't understand.
The results would be similar. Participants would infer that the Arial recipe is easier, and the Mistral recipe harder.
And yet this experiment is predicated on ease of processing?
The typographic component of this experiment demonstrates that if something is made to look as if it's hard to do, people will think it's hard to do. Surely that is axiomatic, and not in need of scientific study?
Please, stop spreading the rumor that the readability of typefaces can be scientifically measured.
OK, it is not about typefaces or typography, fine. Then what IS it about and what did you learn from your exercise? With utmost certainty, any fool can see that the Arial version is easier to read/"Follow" than the mistral version. Do you really need to test this? If you had a condition called "Lighting" where one page would be read in normal roomlight and the other would be read in a room lit with only one match 5 feet away from the page, would you need a test to tell you that participants preferred the condition with roomlight?
The problem is that people read only your conclusions and are totally thrown off by your professional jargon. The take the simple words "Arial was chosen by a great margin to be the favorite to read recipes" and the next thing you know, everyone with a blog or newsletter is hailing Arial as the end all typeface for everything. These are not scientists, mind you, these are average Joes and Janes who think they are being scientific. Be careful how you word conclusions. It can do damage.
And yet you suggest that one “type font” is measurably easier to process than another?
That is not true.
It was not the goal of Song & Schwarz to pick a great font. They were using fonts to study how outside factors affect the understanding of the content. I find it interesting that the font choice impacts how hard it is to cook a recipe or complete an exercise program. I’m surprised that no one else has found this the least bit remarkable?! It is though clear, based on the results of this scientific study, that 12 point Arial was processed more easily than 12 point Mistral. Many here have said that is obvious, and I agree with them.
You should quit putting science on a pedestal. Everything in the material world can be measured scientifically. That fact just isn’t that remarkable anymore.
Presumably Song and Schwarz could have made the Arial hard to read and the Mistral less of a strain with the opposite result from a font point of view. But font choice per se shouldn't be considered the independant variable here, but how much of a strain text is to read in perceptual processing terms. It doesn't surprise me that the subjective perception of how hard an argument or a set of instructions is to follow, can be impacted by the objective reality of how hard might be the perceptual psychophysical resolution into recognizable word-forms of the text itself.
Kevin, I like your ‘pedestal’ counterpunch. Yet, I’d amend your “[e]verything in the material world…” sentence to read: everything in the material world that has a quantitative dimension can be measured scientifically. (The qualitative aspects that cling to items in the material world often don't and can’t, though behaviour responses to qualitative aspect show strong and identifiable patterns or regularity.)
It is though clear, based on the results of this scientific study, that 12 point Arial was processed more easily than 12 point Mistral.
That is not clear at all, it is hypothetical.
A thorough test would have had the participants actually carry out the instructions, and see, for instance, which group of students actually lost weight or whatever, those who followed the Arial instructions, or the those who followed the Mistral text.
There is a distinction between being able to read text, being able to understand what it means, being able to use it as instructions, and being motivated by it to carry out the instructions.
I’m surprised that no one else has found this the least bit remarkable?!
Well, working with the "metacognitive" qualities of type is what graphic designers, art directors, and typographers do for a living every day, so no, it appears to be notable only to those who don't get that.
Everything in the material world can be measured scientifically.
Good old scientific hubris.
Nick, without having read the study, I’m assuming that actually it is probably clear that the 12 point Arial was processed more easily than the 12 point Mistral. But my point was that that’s just circumstantial, and says nothing about the inherent readability or legibility of the fonts. I doubt this was even the question. Certainly it sounds like it wasn’t the focus of the test.
The starting pont would probably have been to use a set of stimulu that was hypothetically clearly different on the easy to process / difficult to process spectrum. Some intuitive or experietial fore-knowledge of how Mistral at 12 point versus Arial at 12 point performs would have had to have been involved. I'm assuming difficulty in processing was gauged before it's effect on following instructions, etc was tested.
I also think you might be reading past Kevin’s important caveat “material” when you make your sweeping disclaimers. My emendation “everything in the material world that has a quantitative dimension can be measured” (let’s drop the “scientifically” for now) is almost tautological. The only important caveat seems to be what's stated in the uncertainty principle.
My world is not so easy to slice up.
it is probably clear that the 12 point Arial was processed more easily than the 12 point Mistral.
Yes, if "processing" means "in one ear, out the other".
We have to ask ourselves what is the utilitarian value of a given typeface--what is its intended use and how well it succeeds in that intended area. Arial and Mistral are not matched to the same functional arena. Mistral is a strictly display face intended to mimmic handwritten personal notes. It is supposed to be warm and fuzzy and used for very brief little clumps of text. The atmosphere Mistral lends is what it is all about. Arial is a derivative of Helvetica with very little detail that is intended to work reasonably well both onscreen and printed in low rez home printers. It is a default face which everyone on Earth has been quite accustomed to do to it being bundled with everything computer related for decades. If I were to set a page in both fonts and ask a group of subjects which font gives a warmer, friendlier look, and it came out to be Arial, then you would have some data that would raise eyebrows and cause a recount.
Another way to look at it is if I ask a group of people which cutting tool they would choose to cut firewood with and showed them a chain saw and a pair of scissors. Would anyone need to bother counting the votes?
If I were to go back and replicate your exact test but substitute Arno for Mistral, I might find out something a bit more usable. Even better, If I were to compare a dozen respectable text faces of different kinds, I might find out even more.
However, this is not about typefaces so why bother testing typefaces at all?
@ChrisL: However, this is not about typefaces so why bother testing typefaces at all?
After having read through the less-than-lucid prose in the studies made available at the link above (yes, all of them), I think the exercise is roughly tantamount to finding a visual corollary to the judgment under uncertainty principle and precious little else. In a famous and subtantiated finding, Tversky & Kahneman showed that people tend to regard propositions that they do not understand as more risky, regardless of their intrinsic risk, and to regard things they do understand as less risky, again without regard to intrinsic risk.* See the parallel between printouts set in clear-as-bell Arial or squint-to-read-it Mistral, content be damned?
In a word, it's boneheaded. Taken as a whole, the results and self-referential citations among these studies end up telling us less about inferences from the subjective experience of ease/difficulty of visual processing and far more about the researchers themselves. The studies lack intellectual rigor, are poorly designed, and serve only to perpetuate careers. How they managed to sail this stuff past peer review amazes — no, wait, take that back: said peers were doubtless typographically clueless as well or the project wouldn't have commenced.
Nick Shinn is right to implore: "Please, stop spreading the rumor that the readability of typefaces can be scientifically measured." It's a pity these canards can be so easily foisted on the masses. Just because the snakeoil salesman has a degree and can show us charts and graphs and statistics doesn't make the product any less snakeoil.
So sounding more like a cop at an accident site, let me assure you "There's nothing here to see, folks, so please move along."
Άντε γειά for now.
* Amos Tversky and Daniel Kahneman, "Judgment Under Uncertainty: Heuristics and Biases," Science  1974, pp. 1124-1131.
We have to ask ourselves what is the utilitarian value of a given typeface
Chris, you’re making a good point here. Song & Schwarz showed that it’s possible to measure a difference in the perception of content. The particular fonts they used were not particularly interesting, and that was not the point of their study. Their study fits into a larger program of work that is not related to typography. Somebody interested in typography could build on their work by making more interesting typographic choices.
You describe Mistral as warm, fuzzy, and useful for mimicking handwritten personal notes. An interesting hypothesis based on the Song & Schwarz methodology would be to set a personal note in Mistral and some other font that isn’t useful for mimicking handwritten personal notes, and then asking the readers how warm and fuzzy they feel towards the author of the note. This is quite different from asking if the font is warm and fuzzy. It’s measuring that the font is having an impact on the perception of the content. It might be the case that with a change in content, Mistral makes the text easier to process than Arial (or whatever font). Does anyone want to make that prediction?
Song & Schwarz ran one study. We were able to learn something non-obvious from it. There are many possibilities that we didn’t learn from the study. We learn by answering one question at a time because no study can answer every question.
"It’s measuring that the font is having an impact on the perception of the content. It might be the case that with a change in content, Mistral makes the text easier to process than Arial (or whatever font). Does anyone want to make that prediction?"
Now that would be worth doing. This is what I was getting at above. This is also not as predictable. My guess would be that their would be differentiating data and some sort of a curve as opposed to the slam dunk in the original study prompting this discussion.
For what it's worth, I — an informed typophile (if I may say so myself) — find Schwartz's line of research, as elaborated in the abstracts on his site and the snippets of his “Metacognitive experiences in judgment and decision making” that I've read, prima facia both worthwhile and informative. It surprises me that type designers should attack a research program in which metacognitive impacts on how textual content is perceived or understood, when a good part of their activity has this as an implicit premise!
We learn by answering one question at a time, though answers of this study's question size come in bunches, or dozens per second in Type 101. But, okay, if it is about inferences from the subjective experience of ease of processing, and the widespread and most read answer from this study is that Arial is the best font for reading, and if the study is NOT about type fonts, I've become bored to confusion. Except that I think it's fair to say that science is not ready for types like us, and types like us are not ready for science like this. Or if the scientists agree, that the fly is dead, go get a smaller hammer, or try a bigger fly, then I agree that progress has been made.
Kevlar: is it true with all other fonts that will perform better with larger point sizes?
No, and there is one MS "study", Times vs. Georgia in the under card, and Helvetica vs. Verdana in the main event. You should read it.
No, and there is one MS “study”
Thank you for providing a straw man response for my straw man question. I’m impressed that you would cite a study as your evidence.
The context for this question was that Bert proposed that a larger size of Mistral would perform better than Arial. i.e. With a larger point size of Mistral people would perceive that the above recipe would take less time to complete. I expected that the hypothesis would be some sort of curve, that Mistral would perform poorer with a size too small or too large, and was hoping to hear a prediction of the optimal point size. Since this is all known and trivial to you, would you please enlighten me about the optimal point size for Mistral for allowing recipes to be perceived as quickly to complete?
"would you please enlighten me about the optimal point size for Mistral for allowing recipes to be perceived as quickly to complete?"
I think you misunderstood him. I think he was indicating the the xheight differences between Mistral and Arial are so great that it was not a fair test. It was not to say that there is an optimum size where Mistral would perform better than Arial but a size where the two fonts would be perceived to be the same size to the reader. Point size does not tell the story. As is always the case with typefaces, you have to consider all the variables, not just the point size.
I would still predict that Arial would be preferred even with Mistral enlarged because a contiguous script like Mistral with numerous irregularities and mimicking handwriting is clearly harder to read than a more regular typeface like Arial. This just takes one variable out of the mix and gives a more fair test--more fair but still pointless.
Bert did say that he though Mistral would perform better than Arial if the x-heights were matched, but I was asking a different question. What is the size at which Mistral will perform best? There is a lot of data that reading speed is fastest for text faces at 11 point plus or minus a point. But I don’t know if reading speed will correlate perfectly with perception about the content. (i.e. will the recipe be perceived as easiest to complete in 8 point, 11 point, or 14 point Arial). And I don’t know the point size of Mistral that will result in the fastest reading speed, so I don’t have a great starting point for hypothesizing which point size of Mistral will cause the recipe to be perceived as easiest to complete. I agree other factors need to be considered, so it would be fine specify the leading or other variables when predicting the point size at which Mistral will perform best.
"There is a lot of data that reading speed is fastest for text faces at 11 point plus or minus a point."
You will notice that you said "text faces" above. Mistral is not a text face and never could be whatever the size. You also said the test measured "reading speed" but Mistral does not qualify as a font to be used when reading speed was an issue. It is kind of like saying we would race a Formula 1 Ferrari race car against a paraplegic guy in a wheel chair. This is not a race that would ever happen. Comparing several handwriting script fonts to see which one was more effective in getting the subjects to respond to a call for action might be a more fair and more valuable test.
To answer you question more specifically (even though it does not help any), I would guess that Mistral would need to be about 50% bigger in point size than Arial to give its best shot at it pretending to compete.
Think of it like a Golf handicap. If I were playing Arnold Palmer, you could give me a 50 shot handicap and I would still loose very badly because I don't play Golf at all. Or going back to my scissors and chain-saw example; you could give the scissors user a 3 hour head-start over the chain-saw user when cutting a log to stretch out the test but the results would not change.
Sorry, I’ve been dancing around the idea I’m trying to convey. Let me try again. I think the Song & Schwarz methodology could be particularly valuable for studying display faces. With display faces we are not trying to get out of the way of the content, but are actively trying to impact it. Mistral is a warm and fuzzy display face trying to mimic handwriting, but it’s not clear to me what the best size is for conveying warm and fuzziness. What would happen if we had three versions of a thank you note in 12 point, 24 point, or 36 point Mistral? Which version would cause the reader to feel maximally warm and fuzzy about the author of the thank you note?
From my point of view, all that’s needed (for the Song and Schwartz test to be interesting and informative) is one thing that clearly promotes and another thing that demonstrably disturbs rapid automatic visual word-form resolution. The version that promotes should sit squarely on the rapid automatic visual word-form resolution affordance plateau, and the version that disturbs should be at or near the visual word-form resolution threshold.
My guess would be that your answer would be something approximating real handwriting size in the eyes of the beholder, however, there is no standard writing size (see Declaration of Independence). I am not convinced that size matters much in your scenario. I would feel that other attributes might give more or less credence to "Warm-and-Fuzziness." Perhaps printing color and texture of paper (handmade paper with deckled edge may be better than normal bond). Mistral is one of the first typefaces to attempt to mimmic human handwriting and predates OpenType. Some of the newer fonts like "Dear Sarah" may fit your scenario better. You might be better served comparing several handwriting fonts in your sample and include Mistral to see how it faired against a control font like Times New Roman.
What do you see as the likely control factors in Warm-and-Fuzziness? You mentioned size of text but why? I would think things which can further humanize the text would be better. There also may be a backfire effect in your test. Much of the junkmail we receive is in faux handwriting to try and trick the reader in to thinking a real person is writing them a personal letter. It has got to the point where if I see a letter addressed in a faux handwriting font, I toss it without opening it because it is disingenuous.
Chris, I have no idea what factors control warm and fuzziness. Perhaps size doesn’t matter, but what if we test that. Either we would find out that your intuition was correct and size doesn’t affect warm and fuzziness, or we find that one size was more warm and fuzzy than another size. If 36 point Mistral was more warm and fuzzy than 12 or 24 point, would that be valuable information than you could use?
Printing color and/or texture of paper may also impact warm and fuzziness. If we test different printing colors and find that one results in more warm and fuzziness, is that valuable information?
All the things I envision as having value for me involve many variables. All the variables interact in a symbiotic way to make a graphic design piece work. I don't see how one variable without the others in combination is very helpful in communication to action proposals. The kind of bit-at-a-time testing of variables you are talking about may work in highway sign legibility where the variables are more predictable--size, speed of vehicle, distance, lighting, brightness contrast, difference ratio, proximity to event, height, day/night usage, font, color, ground color, etc. All of that is complex enough without adding human attitude change, emotion, communication of feelings, and desired action on the item in question.
Graphic designers operate under quick deadlines and small budgets. I can't imagine the kind of research you describe as being doable under those constraints and I can't see what degree of certainty your tests would bring. That is why most testing of graphic design is just focus testing of complete designs with the goal of the participants in just choosing which of the submitted designs works best for them.
Kevin, Mistral has a very different gestural-atmospheric force than Arial. I'm not sure I would describe it’s g-a force as warm and fuzzy. Warm and fuzzy aren’t descriptors that readily come to mind with Mistral. To stay with Song and Schwartz, gestural-atmospheric force is a metacognitive reality. I think gestural-atmospheric force impacts perceptions of content. It might play a role in signalling the modality of the content the reader should expect. This signalling could be important from a strategic formatting point of view.
Arial might lead a reader to expect ordered instructional content; Mistral, spirited call-out snippets or vignettes. Readers may viscerally register a momentary or persistent disconnect with Mistral for instructional content, which might affect the allocation of attention. Exploring content-modality signalling through gestural-atmospheric force characteristics might be worthwhile.
Geeze, someone answer the question, is it Schwartz's "It's NOT about FONTS, (you stupid non-functioning type literati)" or Sung's "Arial is the BEST font to READ! (you frightened font menu-munching type illiterati)" ?
Kevlar: ...would you please enlighten me about the optimal point size for Mistral for allowing recipes to be perceived as quickly to complete?
Zero. That is, when serving Mistral in recipes to be perceived as quick to complete, zero is the correct answer, except as Peter suggests, perhaps, when the complete recipe is "Open Here", or "Remove Plastic Cover Before placing in Microwave for 1 minute", which may be stretching it.
Kevlar: With display faces we are not trying to get out of the way of the content, but are actively trying to impact it.
See!? when you stop dancin' around it trying to marginalize the center, you can make sense. Unfortunately, (just for your argument), S&S's study was a text problem.
So there are two clear ways to improve S&S now.
I agree with CrisL, the readability of this fonts, Arial & Mistral can not be understood or even compared if set in the same body size. The sample with Arial in a small body size close to Mistral in a bigger body size makes both very readable.
A idea: Compare Arial & Helvetica at the same body sizes. Then come out and say that Arial is better. Or compare Mistral and Brush Script at the same body size, to compare. Just a thought.
I have to agree with Chris and others. The comparison between a 12 point Arial and 12 point Mistral is an unfair and unrealistic comparison. You can't base any conclusions on such a comparison, because pointsize doesn't actually tell you a whole lot about how large or small letters are. The only thing you can conclude from the current comparison is that, at the same pointsize, Arial has a bigger body size than Mystral. But we already knew that.
So, as many people have already said: if you want to compare those typefaces objectively, you have to set them to match x-height/body size, to 'equalize' them as much as possible, otherwise you are basically forcing the outcome?
Manufacturer's attribution of point size is nominal, and somewhat arbitrary.
For instance, digital URW Futura has a different x-height (measured in units of em) than Adobe Futura.
So to base typeface-comparison research on point size is sloppy science.
Kevin: They found that people who read the recipe in Arial thought that it would be faster to complete the steps in the recipe.
I think generalisation of the results would require some kind of prefatory statement: ‘Presuming subjects with a poor grasp of causal relationships....’
I’m mystified that typophilers aren’t excited about this study.
As Norbert Schwarz said above, their goal was not to study something specifically about typography, but to study the subjective experience of ease of processing. That Mistral and Arial have different x-heights is irrelevant to them. They could have compared 8 point Mistral to 12 point Arial, or compared 8 point Arial to 12 point Arial.
We may not find the specific comparison made in this study informative, but the methodology shown here is very useful. It demonstrates that typography impacts how we understand text. People reading the exact same text thought that the recipe would take longer to complete when it was written in one typographic treatment versus another. That is fascinating.
Has anyone ever asked you why good typography matters? This study addresses that question.
To be honest I don't think people can get past Arial, and to a lesser degree Mistral. If they'd used Myriad vs Bickham or Helvetica vs Zapfino people here would likely respond differently. Imagine giving a talk wearing a Sarah Palin t-shirt, Arial in this study has the safe effect on its credibility.
Si is right, they have unknowingly interjected bias into the study. It is like having choices "The Big Bad Wolf" and "Little Red Ridinghood" and asking a kid who they would rather walk with in the woods.
That is fascinating.
Yes, isn't it amazing!
What's more, there is a multi-billion dollar industry, that's been around for ages, based on the premise that the design of visual communications (they are known as "advertisements" or sometimes "corporate identities") forms an important part of the message. Awesome!
> I’m mystified that typophilers aren’t excited about this study.
I believe you.
Because it parallels the dismissal of anecdotal typographic evidence...
> It demonstrates that typography impacts how we understand text. People reading the exact same text thought that the recipe would take longer to complete when it was written in one typographic treatment versus another. That is fascinating.
Has anyone ever asked you why good typography matters? This study addresses that question.
Actually, I don't think it gives us answers that we didn't already know, at all. Everyone on this forum already knew from experience that good typography matters and that it makes the message more readable, because that has been the whole idea of typography for at least 600 years, and that is why we are typographers. :)
When you do a study like this, with comparisons, you have to eliminate all outside variables and make that which you are comparing as equal to each other as possible. Otherwise the comparison is biased from the start. When you set it 12 to 12 the conclusion is obvious, because it is evident that a typeface with a tiny body size won't be as readable as the huge-bodied typeface with very open shapes. You don't need people to fill in a form to determine that. And furthermore, you should do several more comparisons with different typefaces to further eliminate bias...
I suppose the other question that I should ask about this study is what degree of cooking experience the subjects had? I can imagine that the same test might produce different results when taken by professional chefs, who are going to be quickly relating the content of the recipe text to a body of experience and knowledge, and by non-cooks who lack that experience and knowledge. In that case, one might conclude that the form of a text provides an interpretative handle for people who don't have any other frame of reference.
x-height, cooking experience &c. are independent variables than can easily be refined, through multidimensional counterbalancing, by conducting followup studies where you simply refine your materials, experimental design, population and data analysis. Science is, as design, is an iterative process.