Bash: How to Insert One White Space before and after English Words in Japanese

Submitted by admin on Wed, 09/28/2016 - 13:38

Consider to insert one white space before and after English word in Japanese. Run the following script by changing a string 'YOUR_FILENAME'. The script copy the original file as YOUR_FILENAME.org and overwrite and replace the original file by inserting one white space before and after an English word. It is useful multi-byte language such as CJK (Chinese, Japanese and Korean).
export FILENAME='index.rst'; \
cp ${FILENAME} ${FILENAME}.org; \
cat ${FILENAME}.org | sed y/0123456789/0123456789/ | \
sed -E "s/([^ \t:\*0-9\.,<>\-~a-zA-Z()ー[:punct:]]|\" \w+ )([a-zA-Z])/\1 \2/g" | \
sed -E "s/([a-zA-Z])([^ \t:\*0-9\.,<>\-~a-zA-Z()ー[:punct:]]| \w+ \"$)/\1 \2/g" > ${FILENAME}

Moreover, if you are handling HTML document, add the following lines for <span>...</span>
export FILENAME='index.rst'; \
cp ${FILENAME} ${FILENAME}.org; \
cat ${FILENAME}.org | sed y/0123456789/0123456789/ | \
sed -E "s/([^ \t:\*0-9\.,<>\-~a-zA-Z()ー[:punct:]]|\" \w+ )([a-zA-Z])/\1 \2/g" | \
sed -E "s/([a-zA-Z])([^ \t:\*0-9\.,<>\-~a-zA-Z()ー[:punct:]]| \w+ \"$)/\1 \2/g" | \
sed -E "s/\s+(<span>)/\1/g" | \
sed -E 's/)(<\/span>)\s+/\1/g' > ${FILE}

Tags