Test Paragraph that Includes all Western, Central and South Eastern European Diacritics

indyfont's picture

Hello! I've designed a font that includes diacritics for Western, Central and South Eastern European languages. I'm looking for a paragraph of text or multiple paragraphs that include these diacritics so I can evaluate how all the letters work together. I've looked in all the usual places and can't find something suitable.

Thanks!

indyfont's picture

Thanks, that's a nice resource to know about. I'm looking for one paragraph with all the diacritics. Something like this, but grammatically correct:

Té Rowan's picture

Might want to delve into the 'Pedia's page o' pangrams, then.

indyfont's picture

You're right. I think that could work. Thanks!

PabloImpallari's picture

Or the Drag and Drop Testing Page, on the last tabs: http://www.impallari.com/testing/

quadibloc's picture

I see that in section 5.2, the Wikipedia pangrams page does address the issue raised.

I also made an edit: "Pack my box with five dozen liquor jugs" was used for type specimens before the Beagle Brothers - so I mentioned the Kelsey Press Company of Meriden, Connecticut.

Richard Fink's picture

Pablo's test page is excellent and I'm planning on contributing to the code for it on Github shortly.

(Among other things - adding a single-symbol fallback font implemented as a Data URI internal to the page, that will make it easy to see characters that are specified in the test page's source but missing in the font that's currently displayed by the browser. A "notdef" fallback font I guess you could call it. Similar to the one built into this pangram testing page. (The font's name is declared as "backdrop" in the @font-face rule. It has too limited a character set and I'm not fond of the symbol it's using to indicate a notdef character. Soon to be upgraded.)

But I'm digressing before I even begin....

What I wanted to say was that, in my experience, the sample text for webfonts in HTML test pages should ideally use Unicode points, not keyed-in characters.

Why? Well, here is what happens when the browser tries to interpret the page with the Central European character set:

Menu selection:

Messed up output:

Just flagging a potential problem.

And also - talkin' 'bout Diacritics - take into account the special handling required for the S/s commaaccent/cedilla, T/t commaaccent/cedilla. Uni points: 015E, 015F, 0218, 0219, 0162, 0163, 021A, 021B
There's a thread about that here on Typophile I believe.

rich

PabloImpallari's picture

Just use UTF8 :)

indyfont's picture

What a pleasure to be on a board with such knowledgeable, helpful people. Thanks for all the input!

indyfont's picture

Honestly, *this* is exactly what I was looking for!

PabloImpallari's picture

Keep in mind that while Urtd project is trully awesome research, it is also limited to a narrow set of 26 Latin languages only (out of 105 Latin languages).

Richard Fink's picture

>Just use UTF8 :)

Absolutely. No page complete without specifying charset UTF-8 in the head.
But....
My understanding is that regardless of the charset specified, if it just doesn't seem to fit the content, the browser will take a guess at the codepage to use based on that content. I've seen it happen with my own eyes.
(I've actually seen an alternate code page triggered by a web font - don't ask me how but I've got the screen shots to prove it.
A Google font titled Atomic Age (which was recently re-done so the problem is gone.)

The only way I know to prevent either human error or an unwanted intervention by the browser, is to use Unicode.

But hey, that's just me. I'm picky that way.

Thanks for the additional links. Nice.

rich

PabloImpallari's picture

The thing with UTF-8 is that:
a) While editing your webpage file in your text editor of choice, it must be configured to save the file as utf-8.
b) Some web-hosting (in particular cheap shared webhosting) can fail to deliver your page as utf-8 to the browser, so you must force them to do it by including a PHP header function.
c) You must specify UTF-8 in the HEAD Content Type of your Html

As long as you do all those 3 things, the browser will get it right, and you will have no problem.

Té Rowan's picture

One more trap: The server must not claim contrary in the header. Due to a massive bout of perversion, the HTTP header charset overrides the document charset.

Richard Fink's picture

From Pablo:

>a) While editing your webpage file in your text editor of choice,
>it must be configured to save the file as utf-8.
>b) Some web-hosting (in particular cheap shared webhosting)
>can fail to deliver your page as utf-8 to the browser, so you must
>force them to do it by including a PHP header function.
>c) You must specify UTF-8 in the HEAD
>Content Type of your Html

and from Té:

>One more trap: The server must not claim contrary in the
>header. Due to a massive bout of perversion, the HTTP header
>charset overrides the document charset.

All great tips. (Although Té, isn't what you are describing the same problem as Pablo's 'b' listing, or no?)

One downside I see with taking this approach - what if you are distributing the page to others who don't have that kind of control over their servers and/or the technical expertise?

Anyway, to sum up:

you can take, if possible, all the steps listed above,

or.......

you can write the test text as HTML Unicode points using either the decimal or hex syntax. (Example: 合 versus 合)

For test pages where all browsers need to process exactly the same characters and testers all need to see exactly the same text, I think, in this instance, the principle of "defensive coding" favors the Unicode points. You only have to remember one thing - convert the test text to Uni. Take that one simple precaution and after that, it can't go wrong - the browser can be set to use any old codepage and the test text will still render predictably.

I rest my case.

(BTW - I've had this kind of argument before and it's really rare that I change anybody's mind.* It seems to be very much philosophical. Everybody's got their own inner sense of what's important and even if they acknowledge the logic behind what you say, they still do what they've always done. Perhaps that's out of a feeling of "the way I usually do it is good enough", I don't know. Not an end of the world event, )

* However, I did convince web designer and author Zoe Mickley Gillenwater - with words written right here on Typophile - that using the "local" descriptor in a @font-face declaration is poor practice!

I take solace in the little things.....

Té Rowan's picture

Same or similar; I'm not sure. Remains a fact, though, that if the document header says "charset=utf-8" but the HTTP header says "charset=iso-8859-1", the browser will render the document as iso-8859-1 (aka Latin-1) unless somehow spiked to do otherwise. Caused me quite some scratching of head to find out why so many things looked wrong all of sudden coming from the bedroom LAN server.

Syndicate content Syndicate content