Ligature Counter

eolson's picture

For those interested, my good friend Justin Bakse has just released a handy application named Ligature Counter. In short, you can paste text into the application and get results for both letter frequency and letter combo frequency. Very nice for analyzing text, thinking about frequent letterforms or working out some new ligatures. The application is free and web based.
Take it for a spin here:
http://www.volcanokit.com/volcanokit2/ligCounter/


Many thanks to Justin for making this for me (and then making it free for everyone) and Hrant for planting the initial idea in my head for such a thing.

Best,
Eric O.

hrant's picture

Hey, this is really nice! And it can handle very large pastes (I tried one with about 10K words).

Thanks Justin and Eric!

hhp

John Hudson's picture

This is a very handy little thing. Thank you. Is there any chance of modifying it to analyse Unicode text? I have extensive corpi of ancient Greek and Biblical Hebrew, and would love to develop some frequency data.

Here's something derived from the current implementation that might be useful for people want to compare the relative space efficiency of different fonts. The string of letters below represents the frequency of each in English as sampled from a text of 34,643 words (most of the books of Genesis, Exodus and Numbers in the New International Version (UK spelling)). I also sampled from shorter versions of the text to confirm that the frequency was similar or identical in texts of any significant length (10,000+ words). The benefit of comparing fonts using this string, rather than simply comparing alphabet length, should be obvious.

Below the string are the relative letter frequencies (actual number of occurences in the sampled text divided by 100 and rounded to the nearest whole number (except q, which is rounded up to 1 from .3)). Remove line breaks for best results after pasting into your app of choice.

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbcccccccc
ccccccccccccccccdddddddddddddddddddddddddddddddddd
ddddddddddddddddddddddddddddddddddeeeeeeeeeeeeeeee
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
eeeeeeeeeeeeeeeeeeeeffffffffffffffffffffffffffffff
fffffffgggggggggggggggggggggggggggghhhhhhhhhhhhhhh
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhiiiiiiiiiiiiii
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
iiiiiiiiiiiiiiiiiiiiiiijjkkkkkkkkkklllllllllllllll
lllllllllllllllllllllllllllllllllllllllllllllmmmmm
mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnooooooooooooooooooooooooo
oooooooooooooooooooooooooooooooooooooooooooooooooo
oooooooooooooooooooooooooooooooooooooooooopppppppp
pppppppppppqrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrssss
ssssssssssssssssssssssssssssssssssssssssssssssssss
sssssssssssssssssssssssssssssssssttttttttttttttttt
tttttttttttttttttttttttttttttttttttttttttttttttttt
tttttttttttttttttttttttttttttttttttttttttttttttttt
ttttttttttttttuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuvv
vvvvvvvvvvvvwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwxyyyyyy
yyyyyyyyyyyyyz

a 121
b 21
c 24
d 68
e 186
f 37
g 28
h 101
i 87
j 2
k 10
l 60
m 39
n 91
o 117
p 19
q 1
r 84
s 87
t 131
u 34
v 14
w 31
x 1
y 29
z 1

hrant's picture

Yeah, there could be a number of improvements to Justin's great script, including Unicode (or at least Upper-ASCII support), but also the ability to load the sample text and save out the results. Let's motivate him! :-)

> useful for people want to compare the relative
> space efficiency of different fonts

That's a nice trick! You can also do this by having a straight string of the 26 letters where each is scaled horizontally by its [relative] frequency.

hhp

Joe Pemberton's picture

[ This thread moved to "Build" ]

Syndicate content Syndicate content