Orthographic Connectivity in Arabic Facilitates Word Recognition for Skilled Readers

enne_son's picture

In a recent paper [ “How Does Arabic Orthographic Connectivity Modulate Brain Activity During Visual Word Recognition: An ERP Study” Brain Topography, April 2013 ] Haitham Taha, Raphiq Ibrahim and Asaid Khateb show “that instead of slowing down reading, orthographic connectivity in Arabic skilled readers seems to impact positively the reading process already during the early stages of word recognition.

The practical relevance of this for typographers and type-designers working on Latin-based scripts might be in the realm of ligaturization and contextual alternatives. It also has a bearing on discussions of how reading works. A good theory of how reading works should explain why or how connectivity has a positive impact on processing. It should explain why ligaturization and devising contextual alternatives might be beneficial.

Abstract One of the unique features of the Arabic orthography that differentiates it from many other alphabetical ones is the fact that most letters connect obligatorily to each other. Hence, these letters change their forms according to the location in the word (i.e. beginning, middle, or end), leading to the suggestion that connectivity adds a visual load which negatively impacts reading in Arabic. In this study, we investigated the effects of the orthographic connectivity on the time course of early brain electric responses during the visual word recognition. For this purpose, we collected event-related potentials (ERPs) from adult skilled readers while performing a lexical decision task using fully connected (Cw), partially connected and nonconnected words (NCw). Reaction times variance was higher and accuracy was lower in NCw compared to Cw words. ERPs analysis revealed significant amplitude and latency differences between Cw and NCw at posterior electrodes during the N170 component which implied the temporo-occipital areas. Our findings show that instead of slowing down reading, orthographic connectivity in Arabic skilled readers seems to impact positively the reading process already during the early stages of word recognition. These results are discussed in relation to previous observations in the literature.

Available for a fee online at: http://dx.doi.org/10.1007/s10548-012-0241-2

While seeming to adopt the ‘parallel letter recognition’ based orthographic processing framework implicit in the “Interactive Activation Model” of Rumelhart and McClelland and in the “Local Combination Detection” model of Stanislas Dehaene as their starting point in their “Introduction,” the authors use ideas that appear to be at odds with such accounts in their “Discussion,” specifically the idea that “N170 ERPs could represent a logographic processing strategy in visual word recognition,” which is the result of more frequent exposure to connected words. The authors draw this explanation from: G. Simon, L. Petit, C. Bernard and M Rebai, “N170 ERPs could represent a logographic processing strategy in visual word recognition.” [ Behavioral and Brain Functions, 3:21, 2007 ]. For the authors of the Simon, et.al., paper, a logographic processing strategy is “a more holistic process where words are processed as a global visual pattern [ rather than on an orthographic basis as a string of letters ].”

For my part, I'm not convinced the N170 ERPs actually represent a logographic processing strategy in visual word recognition. For typophiles, conceiving of word recognition in “more holistic” terms might seem attractive, but it’s unclear what “processed as a global visual pattern” means. Currently prominent developmental theories of reading acquisition distinguish three phases, 1) a pre-alphabetic logographic phase when readers recognize words (for example, those appearing in their everyday environments, such as the names of restaurants, brands of candy, their own or friends’ names printed on cubbies at school) on the basis of salient or distinctive visual cues and contextual features in or around the written words; 2) an alphabetic phase when readers use spelling-sound rules to read words; 3) an orthographic phase when words are recognized by larger spelling patterns, especially morphemic units. [ This description is freely adapted from the work of Linnea C. Ehri. ] The idea that in skilled readers the N170 ERPs represent a logographic processing strategy in visual word recognition doesn’t seem to recognize the perceptual learning in early visual cortex which takes place during and after the orthographic phase.

My alternative to the “global visual pattern” idea is an “intrinsic integration” account. I think the facilitating effect of connectivity actually has to do with the fact that connectivity frustrates the channeling of feature (closure, aspect, extendedness) and role-unit or glypheme (stroke and counter) level information into independent letter slots, as the Interactive Activation Model assumes. In an script system characterized by connectivity, feature-analytic processing encourages an holistic or across-the-word gathering of fine-grained, location-specific role-unit level information. A re-coding of words at this more elemental (than letter) level, and more fine-grained than “global visual pattern” level probably represents a 4th phase in reading acquisition, and is the result of perceptual learning at the neural level during phase 3.

So connectivity through ligaturization and contextual alternatives in the late-alphabetic and orthographic phase might actually facilitate the emergence of the rapid automatic “visual word-form resolution” capabilites fundamental to immersive reading of extended text.

Chris Dean's picture

That link taks me to a paid download, however I can access it for free within the university databases. How are you accessing this so you do not need to pay?

hrant's picture

Good news. Not at all surprising. :-)

The way this fits into my own model of reading is that compound shapes -even though they are harder to extract the constituent letters from- are "closer" to the content being conveyed, hence more efficient. And we can handle many more shapes than the few dozen of most writing systems.

Specifically concerning ligation, I mentioned such potential benefits as part of my "Designers of the World, Ligate!" talk at the Thessaloniki conference of 2004, and on Typophile even previous to that.

hhp

John Hudson's picture

Peter, to clarify one aspect of the abstract, when the authors refer to 'partially connected and nonconnected words', am I correct in thinking that they're referring to normal Arabic words in which right-joining-only letters either occur in combination with dual-joining letters (hence partially connected) or make up all the letters in a word (hence non-connected)?
____

LATER

Ah yes, that is what they meant.

enne_son's picture

Chris, I am also able to access some papers through university databases.

Chris Dean's picture

Yes, but on your original post you say “Available for a fee online at: http://dx.doi.org/10.1007/s10548-012-0241-2

Is this a typo? It is not a link to a free download.

John Hudson's picture

Not 'free', 'fee'. Not a typo, a misreading. :)

John Hudson's picture

Having now read some of the article, my main concerns are with the form of Arabic used to display the word stimuli to participants. This isn't entirely clear from the article description, and no illustration is provided of the actual stimuli as presented. But this table shows examples of the different classes of stimuli words, and my concern is that this also represents the form of Arabic shown to participants in the experiment.

Now, because the experiment is comparing results for stimuli within a single style of Arabic text, the fact that this is a very flat font using Simplified Arabic forms that fails to represent traditional Arabic letter shaping and joining behaviour isn't necessarily problematic in terms of what the researchers were testing for. More problematic, I think, is the very poor spacing shown in this table, in particular the lack of kerning for non-connecting pairs in the PCw and NCw examples. If this accurately represents the stimuli as shown to participants, then the test may have a significant flaw, because the words as thus displayed are abnormal, and do not reflect normal written or good quality typographic practice. Of course, Arab readers are sometimes presented with text that lacks kerning, just as English readers are, but finding a significant advantage for Cw stimuli in badly spaced text is not the same as finding a general advantage.

[Spacing is also very bad within the Cw stimuli as shown in the table. Basically, this is a very bad font. But since the nature of the spacing problems affect connecting and non-connecting sequences in different ways, I don't think they cancel each other out.]

John Hudson's picture

For comparison, here are the stimuli words from the table presented in the naskh script. Note how the nesting of unconnected letter sequences produces more unified word images with more balanced spacing.

These sorts of differences might not alter the overall results of the experiment, and I'm not arguing that orthographic connectivity does not provide the advantage described, only that the experiment might have been better designed to eliminate the possible effect of abnormal letter spacing.

enne_son's picture

I think your concerns about the possible confounding effect of bad spacing are very relevant John. And effectively shown. Probably I'll alert the authors to this thread.

Bendy's picture

I'm not sure I understand the premise of this study. Is it trying to establish that a script that has evolved to be connected is less readable when it's broken? I'd imagine any script is less readable when its orthography is corrupted.

John Hudson's picture

No, Ben, it isn't using a disconnected variant of the Arabic script, it is comparing Arabic words in which all the letters are connected with those in which only some letters are connected and those in which no letters are connected. This is a natural phenomenon within the normal Arabic orthography, arising from the fact that some Arabic letters only join on the right side, so exist only in isolated and final forms. [This is why the common statement one encounters that the shape of an Arabic letter depends on its position in a word is not accurate. The shape of an Arabic letter depends on its immediate adjacency.]

hrant's picture

John, assuming that font is in fact the one used in the actual experiments:
- It's actually the sort of thing most Arabic readers see most of the time. Well, not the poor spacing as much but the "flatness". Back when I was still living in Lebanon -typographically the most advanced Arab nation- newspapers, textbooks, etc. mostly used that style, and I think that's still mostly true. So saying the font might invalidate the results I don't think makes sense.
- I would say that the bad typography -especially the spacing- can only reinforce the findings! If as you say "properly" set Arabic is more readable (which I agree in terms of spacing but am less sure in terms of "script grammar") then the idea that boumas bypass letterwise decipherment (where necessary/possible) can only be helped with better typesetting.

BTW was the reading in the experiment foveal only, or what I might call "full-spectrum"? If it was the former then Peter's view that we read boumas even in the fovea becomes stronger (and Larson's view becomes even weaker).

I'd imagine any script is less readable when its orthography is corrupted.

Not necessarily. I think the letterwise-decipherment model says that when the letters are individually decipherable (since people do know them individually) reading should not suffer. Note that any Arabic letter does occur in normal text disconnected on both sides.

Also, if the experiment was foveal-only, we're not really talking about "readable", but more like "decipherable". But if boumas are showing up even in mere decipherment, they must be running rampant in immersive reading.

hhp

John Hudson's picture

I did not say that the style of font might invalidate the results. I explicitly said that the style of font in itself may not be an issue, not for the reasons you give -- commonality of the style --, but because the study is comparing results within a single style. This is what I wrote:

Now, because the experiment is comparing results for stimuli within a single style of Arabic text, the fact that this is a very flat font using Simplified Arabic forms that fails to represent traditional Arabic letter shaping and joining behaviour isn't necessarily problematic in terms of what the researchers were testing for.

hrant's picture

Oh, sorry. So the passage I'm contesting is this:

finding a significant advantage for Cw stimuli in badly spaced text is not the same as finding a general advantage.

It's not the same - it's better. :-) I mean in terms of revealing the mechanics of reading.

hhp

John Hudson's picture

Hrant: I would say that the bad spacing can only reinforce the findings! If as you say "properly" set Arabic is more readable ... then the idea that boumas bypass letterwise decipherment ... can only be helped with better typesetting.

It really should be noted that the term bouma does not occur anywhere in the article, and you're making extrapolations that the researchers do not.

hrant's picture

I would be shocked if they used "bouma". I would also be surprised to see psychology researchers successfully extrapolate the findings to practical type design decisions. That's our job. :-)

If reading connected Arabic is indeed faster* than disconnected Arabic -especially for skilled readers- I feel even more confident that what I call boumas are the key to reading (as opposed to Larson's parallel letterwise decipherment). But "confident" does not mean "dead sure".

* Note the "impact positively the reading process".

BTW maybe Tankard should make a "normal" version of Blue Island* for comparison testing... Although nobody's used to reading much text in it, so it wouldn't make for as good a test as with Arabic.

* http://www.myfonts.com/fonts/adobe/blue-island/

hhp

John Hudson's picture

In the concluding discussion of the paper, the authors note that

Another factor that might explain these results are the differences in the physical shape of the orthographic patterns of the written words. Indeed, the Cw have no spaces between the letters while such spaces exist inside the NCw. These spaces may contribute, at least partly, to the time consumption during the visual processing of the NCw, due for instance to longer gaze duration (Roman and Pavard 1987), higher number of fixation/saccades etc.

In other words, they acknowledge that their results may be affected by the internal spaces in the non-connected sequences, although they do not register that the size of these spaces might be a factor. They go on, rightly, to suggest that this would need to be independently tested using eye tracking methodology. They also suggest that differences in spatial frequency between connected and unconnected sequences might be a factor:

...it is possible to assume that NCw words are of a higher SF than Cw words and thus at least some of these differences might be explained by such physical differences.

And this is the actual conclusion drawn by the authors of the study:

The data presented here show for the first time that words’ connectivity does not impact negatively reading and word recognition processes in skilled readers of Arabic. Our analysis shows that processing connected letter forms, which are standard and inevitable in written Arabic, does not present any particular difficulty. However, further research still needed to investigate how the effects observed here compare really to word frequency and to spatial frequency effects among readers of Arabic.

In other words, their results show that connected script does not have a negative impact on word recognition for experienced readers -- and that is really interesting and contrary to much prediction or assumption --, but the apparent positive advantage cannot be definitely explained as a result of connectivity independent of other factors.

hrant's picture

contrary to much prediction or assumption

Only by people who don't realize how powerful our brains are and/or want Arabic to look more Western.

hhp

John Hudson's picture

...look more Western.

Actually, the example cited in the paper was a study that involved adolescent Israeli Arabs who were learning Hebrew as a second language, and found that the particular visual test used resulted in participants performing better with the non-native, unconnected Hebrew script than with their native, connected Arabic script. The authors of that study explained this result in terms of the complexity of the Arabic orthography. Since they're cognitive psychologists, I doubt if they have any problem realising how powerful our brains are, and they don't appear to have a bias towards 'looking more western': they were comparing two Semitic scripts.

Really, Hrant, you should actually read these articles.

hrant's picture

you should actually read these articles.

Agreed.

However:
- I don't see how any of that counters my statement that many people who assume connected Arabic is hard to read do so because of an -unfounded- assumption that Western writing is superior*. That's the same place Latinized fonts (partly) come from, and we both agree those do exist.
- Cognitive psychology is a job, and you have to show results. Saying "this is more complex than we can handle" (which I believe is still the case) doesn't get you grant money. So the brain is simplified in order to arrive at conclusions. This is also why every single such study ends with a statement that "more research is needed"...

* Reminding me of Panos Vasilliou's statement that descenders are bad for reading.

hhp

John Hudson's picture

Hrant: I don't see how any of that counters my statement that many people...

You didn't say 'many people'. You said that prediction or assumption of a negative impact from connectivity was only made by people who don't realise etc.

In the case of the study that we're talking about, prediction is based on results and conclusions of previous studies, which is how science works: you make predictions based on prior observation and then come up with new ways of testing to see if these predictions are accurate or were based on errors in the prior observation, e.g. lack of adequate controls.

hrant's picture

Just to be clear, I wasn't thinking much about scientists when it comes to predicting/assuming that connectedness is a problem - I was thinking of designers.

hhp

Té Rowan's picture

Best make sure you start off with good data, or you'll certainly come up with something the Sirius Cybernetics Corporation would be proud of.

quadibloc's picture

@John Hudson:
The authors of that study explained this result in terms of the complexity of the Arabic orthography. Since they're cognitive psychologists, I doubt if they have any problem realising how powerful our brains are, and they don't appear to have a bias towards 'looking more western': they were comparing two Semitic scripts.

I find that conclusion highly unwarranted. If Hebrew shares a characteristic with English/Swedish/French and so on that Arabic does not, a bias towards the Latin script as the paradigm for writing systems would also favor those Semitic scripts which more closely resemble it.

And researchers are biased towards easily testable and falsifiable explanations which don't involve attributing mysterious powers to the human brain.

My biases tell me that the bouma is likely to be the source of the observed phenomenon, and they also tell me that a short learning curve is also an important virtue in a script. (The Chinese script does inflate the cost of mass literacy so that it's only achievable by First World countries like Japan; that, at least, is indisputable. Or is it? In some quarters, claims by the People's Republic to have achieved universal literacy are accepted.) Thus, even if the Arabic script is superior for the skilled reader, it's not sufficient reason to switch.

At least, the Turks switched to the Latin script, which, because it has lowercase with ascenders and descenders, also has quite a bit of bouma in its words.

Yet, both the Jewish people and those Asians influenced by Chinese culture have been distinguished by their success, even under adverse conditions. Despite Hebrew being low on bouma and legibility, and Chinese being low on bouma and having a high learning curve. As I've noted, I think the bar mitzvah and the Imperial civil service examinations are the explanation - in both cases, literacy is tied in with cultural identity, and so sitting still for education conflicts less with the emotions of the hot-blooded youth.

William Berkson's picture

I don't know Arabic script, but I think it is possible that in a script whose basic form is connected, like Arabic, or in a different way various Indic scripts, is read most easily in a connected form. But it may also be that one whose basic form is unconnected, like latin or Hebrew, is read most easily unconnected. At least to me all latin cursive forms are less readable than the separate ones. —And I think that ligatures in an unconnected script are a compromise.

The brain is quite capable of developing a different processing structure for a different script. A different issue, but which is illustrative, is that I read that native Hebrew speakers and readers have a 'root form' area in the brain. I would imagine that this is also true in other Semitic languages. In English, which has roots from many different languages, we generally don't have an awareness of roots.

By the way, I don't see why the 'bouma' in Hebrew has to be any worse than English. Or Chinese. At least in Peter's conception, the visual word form is a matter of the pattern of 'role units', salient features of letters, across the word, not simply external envelope.

Chinese children learning a few characters a day through elementary school, in a language they already understand aurally, is not a huge difficulty, though it takes diligence. And they tell me that after the first 2000 it gets easier. Trying to learn both language and characters as an adult is unfortunately extremely time consuming. So my impression is that the literacy barrier is much higher for adult foreigners than natives.

hrant's picture

William, note that the connections in Latin are mostly brute-force, while in Arabic they generally affect the shape of the letter, thus effectively providing more information.

To repeat something I've said now and again: aesthetic ligation is irrelevant to reading, but judicious ligation intended to reinforce/diverge boumas can help. The example I like to give is this: if we -consistently- used an "st" ligature in "quest" but not one in "guest", that would help.

I don't see why the 'bouma' in Hebrew has to be any worse than English.

It's because the farther from the fovea a bouma occurs (noting the/my view that more than half of reading happens in the parafovea) the blurrier it gets, which makes internal information less important than envelope information. This is by far the biggest reason I can think of for the Latin x-height to be too large sometimes, and for all-caps text to be harder to read.

So yes, it's not only external envelopes that matter, but those do play a big role.

hhp

John Hudson's picture

Can we please stop talking about boumas as if they're something that exist in a script or in a typeface. A bouma is a perceptual unit: it exists in the perception, not in the thing perceived. It is not analogous to a letter or a word, which are units in writing. You can't look at text and point to boumas, and there's no guarantee that the boumas I perceive in reading a text are the same as the boumas that you perceive.

John Hudson's picture

Bill: A different issue, but which is illustrative, is that I read that native Hebrew speakers and readers have a 'root form' area in the brain. I would imagine that this is also true in other Semitic languages.

Yes. Arabic, like Hebrew, has a triconsonantal root system for most nouns and verbs. This presumably is one of the reasons why abjad systems works fairly well for Semitic languages: one can easily identify words of a shared root by their consonant sequences independent of intermediary vowels, which may be either absent or written as marks around the consonants.

hrant's picture

Bouma: agreed, on all counts. Where did who mess up?

BTW, although you can never know where a bouma is, you can:
- Take educated guesses, which can help. For example you can tell that "readjust" is much easier to read when it's hyphenated. That's not a type-design thing, but still.
- Conduct rigorous testing to develop smart algorithms that reveal the actual numeric probabilities of where the boumas are. The problem is that would probably take more money/time than even Microsoft could muster; if the shareholders found out it would be hard to explain exactly how that helps them get a bigger yacht.

Hebrew versus Arabic: their shared tri-consonantal foundation is certainly relevant (although less to readability than legibility), but it remains that in terms of glyph structure and word composition Hebrew is much closer to Latin.

hhp

enne_son's picture

[Hrant] […] if we -consistently- used an "st" ligature in "quest" but not one in "guest", that would help […]

[John Hudson] Can we please stop talking about boumas as if they're something that exist in a script or in a typeface. A bouma is a perceptual unit: it exists in the perception, not in the thing perceived.

If we take our cue from the Arabic, a better strategy for diverging the boumas might be to ligature the >gu< bigram in >guest< at the top between the >g< and >u< and to extend to right the bottom serif on the descender of the >q< in >quest< so it undercuts the u. One could also extend the left-side of the tail of the g to the right. The perturbation at the left end of the word would cue the visual system to where the distinguishing visual information lies.

I do think boumas are real, and that in immersive reading the visual system uses them. A word is a spatially bounded map — bou[nded]ma[p] — of prototypically structured sub-letter shapes or letter-shape primitives. I like to call these letter-shape primitives role-units. Ascenders, descenders, bowls and counters are role-units. The composite shape to the left of the stem in the u is a role-unit. The angular components of the x are role-units. It seems likely to me that the surface form of a role-unit in a letter in a typeface in a script can be modified while it’s prototypical or deep-structure is kept.

hrant's picture

Great thinking on "guest" vs. "quest". I was just trying to find a functional raison-d'être for the "st" ligature since it has the luxury of already existing in many fonts.

It seems likely to me that the surface form of a role-unit in a letter in a typeface in a script can be modified while it’s prototypical or deep-structure is kept.

Exactly.
Innovate in readability while appeasing legibility (which BTW is the thrust of my old Alphabet Reform evil plan for world domination).

hhp

enne_son's picture

[John Hudson, re: Arabic] […] one can easily identify words of a shared root by their consonant sequences independent of intermediary vowels […]

Although meaning relations in western languages don't work on a system of shared consonantal roots, readers of the western alphabetic scripts can more easily identify words when vwls are left out than they can when only _o_e_s are written.

John Hudson's picture

Peter: ...prototypically structured sub-letter shapes or letter-shape primitives. I like to call these letter-shape primitives role-units. Ascenders, descenders, bowls and counters are role-units.

Part of the genius of the Perso-Arabic script is that it classifies some sub-letter shapes as preservable and some as dispensable, such that the graphotactics of Arabic writing largely involve replacing the dispensable parts of letters with connecting strokes that join together the preservable parts. A large number of Arabic letters, in their isolated forms, consist of a preservable initial shape and a dispensable tail. When the letters are joined, the tail stroke is replaced with a connecting stroke. This is the general rule; there are, of course, exceptions, some particular to individual styles and mostly involving cursive construction of dual-joining (medial) forms. So for example, the general pattern can be observed in the graphotactics of isolated, initial and final ع, but the medial form represents a cursive construction derived from the preservable part but visually disinct in result.
____

PS. Simplified Arabic can be analysed as applying the general pattern more strictly, such that many of the cursive formations of medial letters are replaced by straight duplications of the initial form preservable part.

John Hudson's picture

Peter: Although meaning relations in western languages don't work on a system of shared consonantal roots, readers of the western alphabetic scripts can more easily identify words when vwls are left out than they can when only _o_e_s are written.

I think that is simply an observation that language is predominantly a consonantal affair, which makes sense because we can produce more clearly distinct consonant phonemes than vowels, and consonants sounds are fairly precise while vowel sounds are fluid. If we had vocal systems that worked the other way round, so favoured vowel sequences as the main means of distinguishing words, we would presumably have developed writing systems with more vowel signs, and would find vowel-only writing to be easier to read than consonant-only writing.

William Berkson's picture

Here is a pretty interesting effort to compare reading speeds in different languages, including those with latin script and Hebrew, Arabic, Chinese and Japanese. It turns out to be a bear of a problem.

The authors compare rates of reading letters, words, syllables, and whole texts. The problem is that some languages have a lot more letters (Finnish) or syllables. So, for example, English and Spanish seem to be at the top for words per minute, but Chinese, which is slow in words per minute, is faster for reading a whole text, because of the economy of Chinese in terms of words needed to express the same thought—it's not a conjugated language, for a start.

I don't know what to make of this. There do seem to be differences in decoding text, but their influence may be swamped by other factors. It may be that the cognitive constraints are much more significant than visual decoding ones.

ilyaz's picture

> …When the letters are joined, the tail stroke is replaced with a connecting stroke. …

BTW, I could not find any info on how ACE's strategy in typesetting “skeletons” of Arabic words differs from the OpenType’s one. The PDF explainations of α- and β-stages are not very helpful to me, — I do not read/write Arabic. And the next stage is not even documented…

Chris Dean's picture

The article William Berkson was referring to:

Trauzettel-Klosinski, S., Dietz, K. & the IReST Study Group (2012). Standardized assessment of reading performance: The new International Reading Speed Texts IReST. Investigative Ophthalmology & Visual Science, 53(9), 5452–5461. (PDF, 553 KB)

If possible, best to avoid hyper links such as “click here” or “this article” (especially when they are linked to PDFs and/or initiate downloads) as they provide no information regarding their destination, are less searchable, and provide accessibility issues to visually impaired readers using text-to-speech software.

John Hudson's picture

It's also a good idea to test your links and remove trailing html br/ tags from within pasted URLs. :)

Chris Dean's picture

And that too ;)

William Berkson's picture

Thanks, Chris, will do this in the future.

Syndicate content Syndicate content