Wednesday 7 October 2009

Sed as a handy developer tool

According to EFL standards, static functions started with '_'. So, I had renamed ertf_stylesheet_add to _ertf_stylesheet_add. Going through the source looking for style string stuff, I noticed the error messages did not reflect these changes. So, I decided to do the necessary changes. I wasn't in the mood of looking for all occurrences of ertf_stylesheet_add and change them. So, I decided to try sed.

Starting with the man page and experimenting a little, I wrote the following script:
sed -e s/ertf_stylesheet_add/_ertf_stylesheet_add/ ertf_stylesheet.c
Checking the output, I saw that it worked fine and I could redirect the output to a file to have it saved. Only problem was that it changed occurrences of _ertf_stylesheet_add to __ertf_stylesheet_add, which was completely uncalled for. Digging deeper, I found that I can get rid of this by specifying addresses, i.e. restricting sed usage by specifying line numbers in my case. I wasn't satisfied with this because then I would have to go back and check the line number to ignore. So, I kept looking and finally I found my solution in regular expressions. I typed the following at command line and it worked just fine.
sed -e s/[^_]ertf_stylesheet_add/_ertf_stylesheet_add/g ertf_stylesheet.c
Also, by now I had learnt that adding the 'g' option was safe. Again I had missed on an an essential subtelity. I had forgotten this shall delete the intial double quotes of the error messages which were like:
"ertf_stylesheet_add: ..."

So, I edited my command as follows.

sed -e 's/\([^_]\)\(ertf_stylesheet_add\)/\1_ertf_stylesheet_add/g' ertf_stylesheet.c

This removed the double quotes problem. Now, as things started rolling, I wanted to do with sed. Now, in the error messages _ertf_stylesheet_add was usually followed by a colon. However, a few typos were there where there were colons instead of semicolons. So, I edited the sed line as follows.

sed -e 's/\([^_]\)\(ertf_stylesheet_add\)/\1_ertf_stylesheet_add/g' -e 's/;\(.\)/:\1/g' ertf_stylesheet.c

This did the task with the added problem of replacing

case ';':
with

case ':':

which was undesirable. Now, the problem was I was interested in a conditional replacement. So, I further edited the line as follows.
sed -e 's/\([^_]\)\(ertf_stylesheet_add\)/\1_ertf_stylesheet_add/g' -e 's/stylesheet_add;/stylesheet:/g' ertf_stylesheet.c

Here I am using two s/// expressions as this is the only way out. Checking at ##sed on freenode, I found the semantics of s/// don't allow conditional replacement. I was talking about adding this functionality to sed; but that would change sed's regxp flavour. However, this could be achieved easily using perl.

perl -pe 's/(ertf_stylesheet_add)(;?)/"$1".($2?":":"")/eg'
Well doing this task using sed took longer than it would have taken if I had done it by hand. However, since these tasks shall be common (code refactoring), I thought it shall save much time and effort in future.

Actually, the method worked so well that the "future" came in seconds and I was writing multi-line sed scripts to format error messages the whole lib folder. I quickly wrote the following lines followed by some others.

sed -e 's/\([^_]\)\(ertf_font_add\)/\1_ertf_font_add/g' -e 's/font_add;/font:/g' ertf_font.c

sed -e 's/\([^_]\)\(ertf_color_add\)/\1_ertf_color_add/g' -e 's/colortbl:/ertf_color_table:/'  -e 's/color_add;/color_add:/g' ertf_color.c

sed -e 's/readloop:/ertf_document_parse:/g' ertf_document.c

No comments: