Home » Linux Basics » 10 - Regular Expressions, sed, and awk
10

SED (Stream Editor)

Using the SED utility

The sed utility is used to perform basic text substitutions and other transformations on input streams from any source (a file, standard input, pipe etc). The changes are made using regular expressions. New lines may be inserted between certain patterns, lines that contain patterns may be deleted, and patterns may be searched for and replaced throughout the file (global search and replace). The sed utility is in fact a programming language but its syntax is quite archaic and is now mainly used for text manipulation. Sed is most often used at the command prompt with simple parameters; however, scripts that perform a sequence of transformations may also be written using sed. Note that the input file is not changed; sed's output contains the modified text from the file; this may be redirected to a file if necessary.

    sed [options] "command1" [files]
sed [options] -e "command1" [-e "command2" ...] [files]
sed [options] -f sed_script [files] 

sed Examples

  • Substitutes one occurrence per line of sitex.com with mysite.com in the file 'sampleconfig' and writes output to file with name 'config'.
    [ LinuxUser ] ~$ sed "s/sitex\.com/mysite\.com/" sampleconfig > config
  • Substitutes all occurrences of sitex.com with mysite.com in the file 'sampleconfig' and writes output to file with name 'config'.
    [ LinuxUser ] ~$ sed "s/ftp\.sitex\.com/ftp\.mysite\.com/g" sampleconfig > config
  • Removes lines containing anything other than alphabets, numbers, or spaces
    [ LinuxUser ] ~$ sed "/[^0-9a-zA-Z ]/d" somefile > onlyAlphaNumeric
  • Literal substitution of 'cat' for 'Cat'
    [ LinuxUser ] ~$ sed "y/Cat/cat/" filea > fileb 
  • Deletes lines from 2 through 27 in file mydata.txt:
    [ LinuxUser ] ~$ sed '2,27d' mydata.txt
  • Processes more than one directive in one command:
    [ LinuxUser ] ~$ cat myfile.txt
An apple is a fruit, so is an orange
[ LinuxUser ] ~$ sed -e 's/apple/orange/g' -e 's/orange/pear/g' file
A pear is a fruit, so is a pear

A script using sed

The following simple exercise finds and returns the first occurrence in a file of any word from a list and returns the found word. This shell script uses $1 to refer to its first argument (the input file). If you were to save this file with the name sedScript.sh, you should make it executable through 'chmod ugo+x sedScript.sh' and execute it as follows:

    [ LinuxUser ] ~$ ./sedScript.sh inputFile.txt

The script follows. As always, it starts with a directive that says that the script should run on the bourne shell. Our list of words is apple, orange, grape, banana, and pear. This program will quit as soon as it finds one of these words in the input file.

    #!/bin/sh

List='\<apple\>\|\<orange\>\|\<grape\>\|\<banana\>\|\<pear\>' sed -e " /$List/!d
/$List/{         s/\($List\).*/\1/         s/.*\($List\)/\1/         q         }" $1
  1. A variable 'List' is set to a regular expression that contains a list of words separated by the conditional operator to tell sed to search for 'apple' or 'orange' or 'grape' or 'banana' or 'pear'.
  2. The line '/$List/!d' tells sed to delete all lines that do not contain the pattern $List (d would delete the lines containing the pattern; !d deletes the lines that do not contain the pattern). These lines do not get deleted from the input file. This just means that the lines are not displayed on sed's output displayed to the user.
  3. The next command uses the two regexs /\($List\).*/ and /.*\($List\)/ to search each line for one of the items in the list. Since .* matches as many characters as possible (greedy matching), a line containing 'apple orange banana' will be completely matched except for 'apple' by '/\($List\).*/' (since out expression matches the first word).
  4. The group operator - the pair of braces - \( \) saves the found word, we recover it using \1. The 's' switch simply replaces the entire line with the found word at \1
  5. The program quits as soon as the word is found.
  6. Two regexs - /\($List\).*/ and /.*\($List\)/ are used because the word may occur in the beginning, end, or middle of the line. Shortening this to /.*\($List\).*/ will not work; we will never know which item occurs first and it may not be returned through \1 if the sentence contains more than one matching word (e.g. 'orange pear banana').