InDesign find&replace hebrew with nikud

Bentzi Binder's picture

Is there a fast way in Indesign to change biblical text imported from the internet the includes nikud into the appropriate unicode sign.

The imported text is built - letter, nikud, dagesh
I would like to change to - letter+dagesh glyph, nikud

Bentzi

Michel Boyer's picture

I have never coded for InDesign, and I can't answer the question you are asking.

In any case, I would just recode the input text (before feeding it to InDesign). But first, are you sure your input never contains a sequence like "shin shidot qamats dagesh" (for instance הַשָּׁמַיִם) that you would want to replace by "shin dagesh shindot qamats"? If so, the substitution "[letter][nikkud][dagesh]" to "[letter][dagesh][nikkud]" is not general enough. You need to accept [nikkud]+, i.e. one or more.

If your input file is already utf-8 encoded, then the following Python code should do it (provided there is no cantillation mark between the letter and the dagesh).

---- file reorddag.py ---- cut line
#!/usr/bin/env python

import re, sys
reord = re.compile(ur'([\u05D0-\u05EA])([\u05B0-\u05BB\u05BD\u05BF\u05C1\u05C2\u05C7]+)\u05BC')

if len(sys.argv) > 1:
   f=open(sys.argv[1])
else:
   f=sys.stdin

line = f.readline().decode('utf-8')
while line:
  print re.sub(reord, ur'\1\u05BC\2',line).encode('utf-8'),
  line = f.readline().decode('utf-8')
---- cut line

If you you save those lines to reorddag.py, then

python reorddag.py input.txt > output.txt

should perform the desired changes; if you are on a mac or linux, you can name the file reorddag, make it executable and it can be used with reorddag input.txt to get the output on stdout and it can also be piped.

-- 16 Jul 2013 — 11:51am Added # that was missing before ! in the copy, on the first line
-- Something weird is happening: starting on my mac with shin shindot qamats dagesh in הַשָּׁמַיִם, copying it to typophile and then copying from typophile to my mac I end with the sequence shin qamats dagesh shindot.
-- 17 Jul 2013 — 3:35pm Added a missing comma at the end of the print line (was giving two CR for each line)

Bentzi Binder's picture

Thank you for your help, it took me a little time to figure out how to work a script in python and it works!

Syndicate content Syndicate content