Question regarding Dutch 'ij'

agisaak's picture

Are there any instances where Dutch distinguishes between the digraph 'ij' and 'ij' as a sequence of two distinct graphemes?

And a followup question: Do Dutch speakers normally type this character as U0133 or as sequence of i + j?

TIA

André

neverblink's picture

There already is a topic which anwsers most of your questions: http://www.typophile.com/node/34111

As a Dutchman myself, I can say we type i + j, but see them as a single character. So they get capitalized together, and also stay together in vertical writing. (They also go into a single square in crossword puzzles ;) )

The only instance I can think of where i+j isn't seen as a single character is when they are the end and beginning of two seperate syllables, like in Bijoux (bi-joux), which is a word we borrowed from French.

agisaak's picture

Thanks for your reply and the link.

'Bijoux' is the sort of example I was interested in.

The reason I had asked is because I am working on a face where some of the alternate forms of ij differ from those which would be produced by the sequence i + j. Since I assumed most Dutch speakers would enter this character as a sequence, I was thinking of including a localised 'ccmp' feature which substituted the sequence with the digraph, but the existence of words like 'bijoux' demonstrates that that isn't a viable option.

André

neverblink's picture

Fiji (fi-ji) is another word that shouldn't have the ij.

This also brings up another ligature problem: In most cases it should be f+ij (as in fijn) and not fi+j (as in fiji).

Michel Boyer's picture

André

In French, when using Word, the spelling dictionnary automatically substitutes (if automatic correction is activated) sœur for soeur, Œuvre for Oeuvre but leaves moelle and incoercible intact. There is no need to type the letters Œ or œ.

For Dutch, the only words containing ij as two letters in my Thunderbird dictionary are

Anastasija, Beijing, Dimitrij, Fiji, Henrijette, Maija, Marijanne, Mija, Mirijam, Mirijana, Nadija, Naija, Neija, Seija, Taija, Veijo, bijektion, bijektionen, bijektionerne, bijektiv, bijob, bijobbet, bijouteri, bijouterivarer, dijonsennep, fijianer, fijiansk, frijord, hijacker, hijacking, politijagt, rijsttafel

(is "rijsttafel" really with two separate letters?). That means (in my opinion) the words with the letter ij should be automatically corrected by the spell checker. That is part of its job.

Michel

Michel Boyer's picture

Sorry, I conflated the .da dictionary with the Dutch dictionary. The nl Open Office Dutch dictionary contains 13853 entries with ij as two letters. Bad luck! My list was good for Danish I guess. That does not change the fact I think it is the spell checker that should do the substitution.

Michel

Jongseong's picture

There already is a topic which anwsers most of your questions: http://www.typophile.com/node/34111

Ah, that was a fun thread. To repeat some of the relevant bits:

John Hudson: Actually, Unicode encoded the Dutch IJ/ij digraphs as distinct characters, separate from the I+J and i+j sequences, because they were pre-existing characters inherited from a Dutch telecom standard. For the most part, Unicode does not encode digraphs. But some are encoded for backwards, roundtrip compatibility, as in this case.

For the most part, Dutch users do not use the digraph characters -- although some, like Thomas Milo, certainly advocate its use -- they just type I+J and i+j. So far as I know, the IJ/ij digraph characters are not accessible via the standard Dutch keyboard.

Me: I don't think automatic ligation is the best option because not every 'ij' combination in Dutch is the digraph (even if for 99% of the time it is). The 'i' and 'j' might come from different syllables, or it might be a loanword.

My take: to use special 'ij' and 'IJ' ligatures for Dutch, one should manually search and replace the 'ij' and 'IJ' combinations in the text, taking care to make sure to replace only when the digraph is wanted.

Does anyone know of a dictionary of the kind Michel was looking for, that distinguishes between the digraph 'ij' and the non-digraph 'i'+'j' sequence? I doubt one exists.

Theunis de Jong's picture

Does anyone know of a dictionary of the kind Michel was looking for, that distinguishes between the digraph 'ij' and the non-digraph 'i'+'j' sequence? I doubt one exists.

Not that I know of. It never has been a problem ;-)

(Michel) That means (in my opinion) the words with the letter ij should be automatically corrected by the spell checker. That is part of its job.

Jah. Well. Maybe not. Being Dutch, I'd object against forcing all ij's to be typed as ijs. (Not really visible, is it? I mean the separate 'i' 'j' to be typed as a single character.)

The "ij" has a status aparte in Dutch but it does not need any specialized software handling, where "oe" vs. "œ" does.
The status aparte is especially significant when capitalizing (IJmuiden, IJstijd) and sorting (as a single glyph in the place of "Y" -- and suddenly, I have no idea what happens with native words starting with an "Y"! (before? after? interleaved?)).

Granted, sorting and capitalizing would be easier for programmers if the digraph was a single glyph, but virtually all of our current "smart" software would need to be re-written ;-)

neverblink's picture

Martin Majoor writes in 'Had de Franse koning schoenmaat 49?' (his chapter in 'Letters, een bloemlezing over typografie'):

"De typisch Nederlandse diftong ij wordt in alfabetische lijsten op grond van haar klank vaak tussen de y geplaatst. Dat is net zo onjuist als dat met de letter c op grond van haar klank tussen de k of s rangschikt. En ook de andere diftongen (au, eu etc.) staan gewoon in alfabetische volgorde. Het gebruik om bij een beginhoofdletter de beide tekens van de ij in kapitaal te zetten - zoals in IJmuiden - is formeel gezien niet juist (men maakt er ook geen AUsterlitz van of OUde Pekela). De oorspronkelijke schrijfwijze Ymuiden ligt waarschijnlijk ten grondslag aan dit ingeburgerde gebruik. Bij spatiëren van een kapitaal woord met een lange IJ moeten dus ook de I en de J gespatieerd worden.
I J M U I D E N en niet IJ M U I D E N ."

(I'll try to translate it in English for those who don't speak Dutch.)

"In alfabetical ordered lists, the typical Dutch diphtong ij is often ordered with the y, based on it's pronounciation. That is just as wrong as ordering the c, based on it's pronounciation, with the k or s. Also, other diphtongs (au, eu etc.) are ordered in a normal alfabetical way. The use of capitalizing both characters - as with IJmuiden - is formally wrong. The original spelling Ymuiden is probably the basis of the common practice. When spacing a word in capitals with an IJ, the I and J should be spaced seperatly.
I J M U I D E N en not IJ M U I D E N ."

Theunis de Jong's picture

Good quote, ne'erblink!

The letterspacing is certainly a good issue; what is the French usage, for example with "sœur"?

Michel Boyer's picture

Here is from http://www.fhscm.com/


You often see OE, as two letters (but never at the start of a word in careful editing). For fullcaps or smallcaps, I have no fast access to some relevant documentation.

nina's picture

So if a font includes a «true» (connected) "ij" variant (sort of like a discretionary ligature), would that have to be applied manually? Would anybody want to do that or is it usually too much hassle?

Theunis de Jong's picture

Nina -- personally, the latter option :-)

Oh, unless there is a really good drawn ij ligature in the font, worth replacing (and worth potentially mucking up text search, alphabetizing ... etc.). If it comes as an Opentype option, that'd be even better.

neverblink's picture

Nina, that depends on the text. Like Theunis says; If it was short and the ligature was nice, I'd go through the text and change it.

Although I can't remember ever seeing any text (that wasn't handwritten) with a true ij. Other than logo's. You can find old letters written on a typewriter that use the y (or even y-dieresis) instead of a (disconnected) ij.

nina's picture

Ah, I guess it's rarer than I thought. Thanks guys!

riccard0's picture

You can find old letters written on a typewriter that use the y (or even y-dieresis) instead of a (disconnected) ij.

Somewhat related:
http://www.typophile.com/node/60316?page=4#comment-394823

neverblink's picture

The nl Open Office Dutch dictionary contains 13853 entries with ij as two letters. Bad luck!

Michel

---------------

To add to the confusion: What you've found is probably every word in the Dutch language that contains an i-j combination, whether they 'should' be written with a ligated/single glyph ij or not. There is no spelling difference between i followed by j or an ij.

Michel Boyer's picture

What you've found is probably every word in the Dutch language that contains an i-j combination

Indeed! The first line of the file nl_NL.aff is
  SET ISO8859-1

which means the dictionary nl_NL.dic was encoded in ISO8859-1, which has no encoding for the digraph ij.

Michel

Theunis de Jong's picture

So purely a technical reason, Michel ... I've also seen Dutch word lists that use 'ÿ' for the digraph.
(Admittedly, that encoding might have been a shortcut for aforementioned crossword creators/solvers, since the 'ÿ' character does not occur in Dutch, and the 'ij' combo is treated as a single character. I don't have that list at my fingertips, but I might check some time if 'bijou' is written in full -- correctly.)

Michel Boyer's picture

So purely a technical reason, Michel ...

Well, the other OpenOffice dictionaries I have seem to be sorted alphabetically, that Dutch dictionary is not; it seems it was randomized. If it had been sorted, there would have been a way to test whether, according to those responsible for the dictionary, the digraph is the 25th letter of the alphabet or not. The way the dictionary is, there seems to be no way to test any hypothesis concerning the digraph and I start wondering if that is not intentional.

Michel

Michel Boyer's picture

Concerning spacing, here is what the Unicode Standard, Version 5.2 says (chapter 7, pages 203-204):

Another pair of characters, U+0133 latin small ligature ij and its uppercase version, was provided to support the digraph “ij” in Dutch, often termed a “ligature” in discussions of Dutch orthography. When adding intercharacter spacing for line justification, the “ij” is kept as a unit, and the space between the i and j does not increase. In titlecasing, both the i and the j are uppercased, as in the word “IJsselmeer.” Using a single code point might simplify software support for such features; however, because a vast amount of Dutch data is encoded without this digraph character, under most circumstances one will encounter an <i, j> sequence.

Thomas Milo's picture

There is a difference betvveen Form and Function.

The FORM of IJ, or better, Dutch "Double I" in analogij to "Double U" (vvhich nobodij vvould seriouslij consider to be a digraph or a ligature), can appear as a digraph looking like a juxtaposition of I+J. But that is just one of its possible forms, another one is ü vvith the tail of j, or a Ü vvith an interrupted left leg. All of this can be encoded vvhichever vvaij ijou vvant - as a single character or a ligature. The realitij is that Dutch users are forced to tijpe vvith the tvvo strokes I+J - and Dutch-ignorant softvvare encodes it as I+J.

The FUNCTION of IJ (Double I) is completelij equivalent to the function of Double U: it's a single character in Dutch orthographij. All the observations above like capitalization, horizontal spacing (where Martin Majoor is prescribing new usage and condemning actual practice), vertical spelling can lead to only one conclusion: IJ is a letter and lacks proper support. Apps like the iPhone vvould do vvell to capitalize IJ correctlij vvhen the Dutch keijboard is active.

A convincing example is the Dutch version of the LINGO game, verij representative of the popular perception of a vvriting sijstem: IJ is treated bij the mass of Dutch speakers as a single letter - as a result it's a daily struggle to maintain it against English-biased software. The attached image also exposes naive layout software, wrestling to squeeze both Double I and Double U into the box.

The suggestion "to search and replace the 'ij' and 'IJ' combinations in the text manuallij, taking care to make sure to replace onlij vvhen the digraph is vvanted" comes form Outer Space. It is in fact denijing Dutch users the benefit of automation. In Dutch, everij I+J is Double I, the exceptions are negligible and, from a Dutch perspective, irrelevant. BTVV, the English vvord Fiji is spelled Fidji, so that's not even a candidate - I knovv, because I vvas UN liaison officer vvith the Fiji Battalion :-).

To give the foreign readers an impression of what Dutch really looks like without proper support for IJ, I have changed in this English text all digraphs W into VV and, in passing, all letter Y into IJ. To correct this, you are advised to manually replace all affected letter groups with the correct English digraphs.

Nick Shinn's picture

Martin Majoor states that, "The use of capitalizing both characters - as with IJmuiden - is formally wrong."
But that's the way it's often spelled, for instance by the town itself, e.g. on its website:
http://www.ijmuidenaanzee.nl/
I've seen old maps with it spelled that way, too, so it may well be traditional.

Fiji is spelled Fidji,

And marijuana marihuana.

Thomas Milo's picture

Nick, you're absolutely right: English words are spelled as such and fall outside the Double I system. In Arabic we would say, there's Ijmāʿ - Consensus.

IJmuiden can only be spelled in one manner: with a single capital IJ. Spelling is with upper case I and lower case j is no option. It would look just as ridiculous as Vvilliam instead of William.

Here are some examples of horizontal spacing taken from the vvidelij used BOS-ATLAS. For anyone with a Dutch education, these are stock images.

Thomas Milo's picture

The start of this thead, to make a typeface that treats Double I intelligently is a great project. Here's anothet Dutch borrowing that needs to be exempted: HI-JACK

When I wrote that the IJ as I+J is irrelevant from a Dutch perspective, I recently saw a nice example showing that IJ and Y are seen as variants of the same letter (Dutch memorize the alphabet ending in X, IJ, Z): Byoux for You!

Thomas Milo's picture

Here are some more examples of how IJ is traditionally treated in typography, before the tsunami of Dutch-ignorant software hit the beaches. But also today a respected newspaper like NRC Handelsblad would never break up IJ - in fact nobody does, apparently with the sole exception of Martin Majoor, who has no mainstream followers.

Thomas Milo's picture

Finally, before it gets boring, some examples from Brill Publishers in Leiden, who are adamant that IJ be supported. An elegant way to combine I+J into Double I is to let J extend below the base line, as can be seen in the examples.

Thomas Milo's picture

All caps example with IJ extending fails to upload - will try later

Theunis de Jong's picture

A great set of examples, but just a tiny niggle: even though I have a 21.5 widescreen, the web site is restrained to a measly (hold on--measuring) 800 pixels wide ... lots of horizontal scrolling going on ...

Your Brill examples are hardly contemporary! The regular "J" also hangs below the baseline, so the "IJ" is just conforming to the entire character design.

Otherwise, well, what I said, great. The Lingo tv shot is hilarious! Never noticed that before. It seems the "regular" characters are horizontally centered but "IJ" is not. The Bosatlas (maps) samples show "IJ" does not get broken when spacing out text:

F R A N K R IJ K

Its index shows "IJ-" between "Ii-" and "Ik-" ... which, actually, looks totally logical to me (a born and bred Dutchie), and I'm starting to think sorting's not a problem either way because we just may unwittingly have developed a habit of looking in two places for "IJgenwijs" :-O

Florian Hardwig's picture

Still my favourite illustration: Wij eisen ijs!

Nick Shinn's picture

Thomas, the scans are great.
However, it would be better for display here if:
1. Image is set to 72 pixels per inch.
2. Image is 588 pixels maximum width (larger brings up scroll bar)
Image may be either .jpg or .png.

quadibloc's picture

I know that the Dutch "ij" glyph has a separate character in OCR-B, and that at least some manual typewriters sold in the Netherlands did have a key for "ij" even if the Dutch keyboard for Microsoft Windows does not. So it is indeed more than just a ligature, even though the fact that it is treated as a letter of the alphabet by the Dutch is confusing to everyone else.

Of course, the thought that "ij" is really the Dutch way of writing "y" would mean that there is a key for it on every Latin keyboard... and all one has to do is draw the glyph appropriately. That, though, is probably not a realistic option - and, indeed, one of the examples given shows IJ as collating as though it were i + j instead of as Y, even if another example shows it drawn almost as a y-umlaut.

I had remembered about IJ in OCR-B from the article "Inside ASCII" by Bob Bemer in Interface Age: I was able to find two examples on the web:

http://www.fontage.com/pages/ocrb10n.html
http://www.barcode-soft.com/kb/ocrb.aspx

showing this - the face is monospaced, so the IJ characters omit the serif which is present on I and J as individual letters.

Jens Kutilek's picture

I wonder if there's any connection to these y with diaeresis which I found in the northernmost part of Germany. Traditionally, there has been spoken a variety of North Frisian I believe, but these inscriptions are not North Frisian.

Jens R. Knudſen
Keÿtūm
Gebohren 1762 d. 7. Mäÿ
im Eheſtand gelebt.
Geſtorben 1791 d. 30. Jűnÿ
alt 29 Jahr 7 Wochen 4 T.
Ich lieg ūnd ſchlaf, nūn
Gutenacht, Die ihr mich
bis hie her gebracht. Wohl
eūch Wohl eūch, ich rūhe
fein In dieſen, in einen
Kämmerlein

Pietate et Iustitia
1682
Lister Dÿbs Told Cammer

quadibloc's picture

This has piqued my curiosity.

From this page, mirroring some content from Roman Czyborra no longer on the web,

http://www.terena.org/activities/multiling/ml-docs/iso-8859.html

I see that not even any of the ISO 8859 series, let alone the registered national variants of the 7-bit ISO 646 code included it...

The Wikipedia article has much of interest...

http://en.wikipedia.org/wiki/IJ_%28digraph%29

however, looking elsewhere in Robert Bemer's Inside ASCII article, use of national use positions for IJ is described; the variant was apparently just not registered. I had expected it must have been a character, given that it was included in OCR-B.

Jongseong's picture

Thomas, I never questioned the proper typographic practices regarding ij in Dutch. While I appreciate the numerous examples, what they showed—IJ capitalized as a unit, treated as a unit when letterspacing, etc.—were certainly not new to me, and have all come up in the discussions here.

Surely, such appropriate typographic treatment would be much, much easier if the Dutch ij was encoded as a single character (well, you know what I mean), so I see where your argument is coming from. Also, electronic dictionaries with the ij encoded appropriately would be a great aid in automatized text processing.

However, my suggestion was based not on that ideal situation but on the reality that the vast majority of digital texts in Dutch including electronic dictionaries do not encode the ij as a single character but use the sequence i+j. The standard Dutch keyboard has no way of inputting the Dutch ij other than as the sequence. Even if the Dutch adopt a new keyboard enabling them to enter ij, there will be a huge amount of legacy texts to deal with, plus the fact that outsiders will presumably continue to type i+j for any ij in Dutch.

So I am operating under these assumptions when I oppose automatic ligation for ij, even if in Dutch texts 99% of the i+j combination represents the character ij (as you can check, I already acknowledged this). That and the fact that language tagging of existing digital texts is not entirely reliable. Do we trust that texts in Dutch will be consistently tagged as being Dutch, and that portions of non-Dutch text in a predominantly Dutch document will also be appropriately language-tagged? This just isn't realistic.

I said "one should manually search and replace the 'ij' and 'IJ' combinations in the text, taking care to make sure to replace only when the digraph is wanted." Maybe I made it sound cumbersome, but is it really too much to ask that a human editor check to see if the text is (a) in Dutch and (b) doesn't contain one of the very, very rare instances in Dutch where the combination i+j shouldn't be replaced by the ij character? It's still search and replace, which the software automatically does for you. I'm merely warning against blind automatism, where you think you can simply build this functionality into a font, for instance.

quadibloc's picture

I have since found that the normal Dutch practice was to substitute IJ and ij for \ and | respectively in 7-bit ASCII, although this was not registered as a national-use set. My source does not show the $ or some other national-use character being replaced by the modified f currency symbol.

Thomas Milo's picture

Using the proper codes for UC&lc "Double I" would be best, and I agree that without an input method, this will not happen.

A hardware keyboard is a no-brainer. Easiest would be a Dutch keyboard driver or locale that intercepts all I+J sequences and and replaces them. Without manual or automatic correction it would of course also affect a minimal number of borrowings. But it would be ridiculous to consider minimal damage to a limited number foreign words a problem, while real damage to Dutch words was never a concern. After all we now have have a legacy of tons of miscoded text - a fact that UNICODE standard has the chutzpah to use as an argument to perpetuate this bizarre situation :-)

Alternatively, smart font technology could be deployed to display I+J sequences as Double I. This makes sense, after all for the end-user it makes no difference how it's done, as long as it works. But that doesn't solve the problem of sorting. "Double I" really belongs at the end of the alphabet, because words so spelled often correspond with words that have Y. This is especially the case with names. Some branches of the same family spell their names as Meijdrecht, others as Meydrecht. When correctly sorted, these names would be mixed into the same positions, but when sorted with I+J they are split up. Very unpractical and counter-intuitive.

In either case, but especially when using font technology instead of proper coding, some kind of language tagging is helpful, but not mandatory. Not to have language tagging is harmless when compared with the present automatism, that blindly - and wrongly - assumes that for Dutch, like for English, any combination of I+J is not a Double I. For Dutch it's really more pragmatic to assume that all cases of I+J are Double I, and accept some marginal collateral damage. At least then you get the casing and sorting, and about everything else, right.

Thomas Milo's picture

The North German tomb stone doesn't seem to be in Frisian, it's more like slightly antiquated German. The place name Keijtum looks indeed Frisian. There are many Frisian place names ending in -um ("heem", "Heim", "home"?) in a broad Frisan belt that starts just North of Amsterdam (West Friesland), and that follows the coastline (Friesland, East Friesland) - all the way to the Danish border.

Michel Boyer's picture

What problem would it cause to temporarily encode ij as ÿ and IJ as Ÿ. The letters Y and y with dieresis do not occur in Dutch, so far as I know, and should be as easy to type as ë for instance. The names Meÿdrecht and Meydrecht would be properly sorted. After sorting, globally substituting ij for ÿ and IJ for Ÿ would produce a file ready for final processing. In fact, it should also be quite easy to modify a keyboard layout so that the combination normally giving ÿ would produce ij and the one normally giving Ÿ would produce IJ. Again, global substitution between ÿ and ij can be applied for sorting and backwards for producing the final text.

quadibloc's picture

I now found a manual for the DEC VT340 terminal which gives a "national replacement character set". This includes a y-umlaut, which presumably is a misprint for ij.

The Dutch 7-bit modified ISO 646 or CCITT Telegraph Alphabet No. 5 which I found there has the following changes from ASCII:

# -> British pound sign
@ -> 3/4
[ -> ij
\ -> 1/2
] -> | (!!!)
^ as is
_ as is
` as is
{ -> diaresis combining form
| -> florin sign
} -> 1/4
~ -> acute accent combining form

DTY's picture

The letters Y and y with dieresis do not occur in Dutch, so far as I know

However, Dutch text-processing has to work in Belgium too, and ydieresis does occur in French place and personal names from Belgium and northeastern France (Croÿ, for example).

Michel Boyer's picture

ydieresis does occur in French place and personal names from Belgium and northeastern France (Croÿ, for example).

Are there many such names? What I have in mind is certainly not ideal but assuming ÿ is substituted for ij for sorting and then ij is substituted for ÿ, we then end up with Croij, which is wrong. It is then possible to replace globally Croÿ for Croij and if the list of such names is small, that may still be worth considering for batch processing (unless there are two names that are identical, except that one uses the ij digraph and the other ÿ).

John Hudson's picture

Michel: What problem would it cause to temporarily encode ij as ÿ and IJ as Ÿ.

Why would you want to do that?

Michel Boyer's picture

Why would you want to do that?

For sorting without having to write a sorting procedure and without having to figure out how to use localedef to define a new locale with a collating sequence where the digraph ij would be sorted as y or ÿ (which I don't know how to do on my Mac; I guess I could do it on Linux).

Michel

quadibloc's picture

Of course what the Dutch should really have done is encode A as @, B as A, C as B... X as W, IJ as X, Y as Y and Z as Z so as to have no problems with sorting. Thus, their computers would be perfectly adapted to their language, and in the extremely rare cases where data is transferred across national boundaries, translation can always be applied.

Michel Boyer's picture

X as W, IJ as X, Y as Y and Z as Z so as to have no problems with sorting

No problem to sort Phone directories or Yellow pages but that would be the wrong order for dictionaries according to the IJ Digraph wiki, as well as the collating sequence wiki.

PS For more serious stuff on collating sequences, the Unicode collation algorithm is worth having a look at.

Syndicate content Syndicate content