Format Text at the Command Line





Format Text at the Command Line

figs/expert.gif figs/hack16.gif

Combine basic Unix tools to become a formatting expert.

Don't let the syntax of the sed command scare you off. sed is a powerful utility capable of handling most of your formatting needs. For example, have you ever needed to add or remove comments from a source file? Perhaps you need to shuffle some text from one section to another.

In this hack, I'll demonstrate how to do that. I'll also show some handy formatting tricks using two other built-in Unix commands, tr and col.

1 Adding Comments to Source Code

sed allows you to specify an address range using a pattern, so let's put this to use. Suppose we want to comment out a block of text in a source file by adding // to the start of each line we wish to comment out. We might use a text editor to mark the block with bc-start and bc-end:

% cat source.c

  if (tTd(27, 1))

    sm_dprintf("%s (%s, %s) aliased to %s\n",

        a->q_paddr, a->q_host, a->q_user, p);

  bc-start

    if (bitset(EF_VRFYONLY, e->e_flags))

  {

    a->q_state = QS_VERIFIED;

    return;

  }

  bc-end

  message("aliased to %s", shortenstring(p, MAXSHORTSTR));

and then apply a sed script such as:

% sed '/bc-start/,/bc-end/s/^/\/\//' source.c

to get:

if (tTd(27, 1))

    sm_dprintf("%s (%s, %s) aliased to %s\n",

        a->q_paddr, a->q_host, a->q_user, p);

  //bc-start

  //  if (bitset(EF_VRFYONLY, e->e_flags))

  //  {

  //      a->q_state = QS_VERIFIED;

  //      return;

  //  }

  //bc-end

message("aliased to %s", shortenstring(p, MAXSHORTSTR));

The script used search and replace to add // to the start of all lines (s/^/\/\//) that lie between the two markers (/bc-start/,/bc-end/). This will apply to every block in the file between the marker pairs. Note that in the sed script, the / character has to be escaped as \/ so it is not mistaken for a delimiter.

2 Removing Comments

When we need to delete the comments and the two bc- lines (let's assume that the edited contents were copied back to source.c), we can use a script such as:

% sed '/bc-start/d;/bc-end/d;/bc-start/,/bc-end/s/^\/\///' source.c

Oops! My first attempt won't work. The bc- lines must be deleted after they have been used as address ranges. Trying again we get:

% sed '/bc-start/,/bc-end/s/^\/\///;/bc-start/d;/bc-end/d' source.c

If you want to leave the two bc- marker lines in but comment them out, use this piece of trickery:

% sed '/bc-start/,/bc-end/{/^\/\/bc-/\!s/\/\///;}' source.c

to get:

if (tTd(27, 1))

    sm_dprintf("%s (%s, %s) aliased to %s\n",

        a->q_paddr, a->q_host, a->q_user, p);

  //bc-start

if (bitset(EF_VRFYONLY, e->e_flags))

{



    a->q_state = QS_VERIFIED;

    return;



}

  //bc-end

message("aliased to %s", shortenstring(p, MAXSHORTSTR));

Note that in the bash shell you must use:

% sed '/bc-start/,/bc-end/{/^\/\/bc-/!s/\/\///;}' source.c

because the bang character (!) does not need to be escaped as it does in tcsh.

What's with the curly braces? They prevent a common mistake. You may imagine that this example:

% sed -n '/$USER/p;p' *

prints each line containing $USER twice because of the p;p commands. It doesn't, though, because the second p is not restrained by the /$USER/ line address and therefore applies to every line. To print twice just those lines containing $USER, use:

% sed -n '/$USER/p;/$USER/p' *

or:

% sed -n '/$USER/{p;p;}' *

The construct {...} introduces a function list that applies to the preceding line address or range.

A line address followed by ! (or \! in the tcsh shell) reverses the address range, and so the function (list) that follows is applied to all lines not matching. The net effect is to remove // from all lines that don't start with //bc- but that do lie within the bc- markers.

3 Using the Holding Space to Mark Text

sed reads input into the pattern space, but it also provides a buffer (called the holding space) and functions to move text from one space to the other. All other functions (such as s and d) operate on the pattern space, not the holding space.

Check out this sed script:

% cat case.script 

# Sed script for case insensitive search

#

# copy pattern space to hold space to preserve it

h

y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/

# use a regular expression address to search for lines containing:

/test/ {

i\

vvvv

a\

^^^^

}

# restore the original pattern space from the hold space

x;p

First, I have written the script to a file instead of typing it in on the command line. Lines starting with # are comments and are ignored. Other lines specify a sed command, and commands are separated by either a newline or ; character. sed reads one line of input at a time and applies the whole script file to each line. The following functions are applied to each line as it is read:


h

Copies the pattern space (the line just read) into the holding space.


y/ABC/abc/

Operates on the pattern space, translating A to a, B to b, and C to c and so on, ensuring the line is all lowercase.


/test/ {...}

Matches the line just read if it includes the text test (whatever the original case, because the line is now all lowercase) and then applies the list of functions that follow. This example appends text before (i\) and after (a\) the matched line to highlight it.


x

Exchanges the pattern and hold space, thus restoring the original contents of the pattern space.


p

Prints the pattern space.

Here is the test file:

% cat case

This contains text         Hello

that we want to            TeSt

search for, but in         test

a case insensitive         XXXX 

manner using the sed       TEST

editor.                    Bye bye.

%

Here are the results of running our sed script on it:

% sed -n -f case.script case

This contains text         Hello

vvvv

that we want to            TeSt

^^^^

vvvv

search for, but in         test

^^^^

a case insensitive         XXXX 

vvvv

manner using the sed       TEST

^^^^

editor.                    Bye bye.

Notice the vvv ^^^ markers around lines that contain test.

4 Translating Case

The tr command can translate one character to another. To change the contents of case into all lowercase and write the results to file lower-case, we could use:

% tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz' \

  < case > lower-case

tr works with standard input and output only, so to read and write files we must use redirection.

5 Translating Characters

To translate carriage return characters into newline characters, we could use:

% tr \\r \\n < 

cr

 > 

lf

where cr is the original file and lf is a new file containing line feeds in place of carriage returns. \n represents a line feed character, but we must escape the backslash character in the shell, so we use \\n instead. Similarly, a carriage return is specified as \\r.

6 Removing Duplicate Line Feeds

tr can also squeeze multiple consecutive occurrences of a particular character into a single occurrence. For example, to remove duplicate line feeds from the lines file:

% tr -s \\n < lines > tmp ; mv tmp lines

Here we use the tmp file trick again because tr, like grep and sed, will trash the input file if it is also the output file.

7 Deleting Characters

tr can also delete selected characters. If for instance if you hate vowels, run your documents through this:

% tr -d aeiou < file

8 Translating Tabs to Spaces

To translate tabs into multiple spaces, use the -x flag:

% cat tabs

col     col     col



% od -x tabs

0000000     636f    6c09    636f    6c09    636f    6c0a    0a00        

0000015



% col -x < tabs > spaces

% cat spaces

col     col     col



% od -h spaces

0000000     636f    6c20    2020    2020    636f    6c20    2020    2020

0000020     636f    6c0a    0a00                                        

0000025

In this example I have used od -x to octal dump in hexadecimal the contents of the before and after files, which shows more clearly that the translation has worked. (09 is the code for Tab and 20 is the code for Space.)

9 See Also

  • man sed

  • man tr

  • man col

  • man od


     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows