Bitgolia

Back




Home

Awk Introduction

June 03, 2023.






Many of us know awk only as a command line tool for quickly editing output from a previous command.

Few are aware that it is a fully fledged programming language. The fact that awk comes already installed on most Linux and Unix systems should be more than enough to highlight the importance of knowing its capabilities.

It is useful for a system administrator to be able to quickly put together an awk script that formats some input into a report, or to perform some math calculations from a file’s contents.

In this article, we give a brief introduction to the language and an example use case of the awk programming language.




Table of contents




The anatomy of an awk program

Suppose we have a file prog.awk which is an awk program. An awk program is comprised of the following sections:

    
#!/usr/bin/awk

# Here we can define functions
function helloWorld(){
    print "Hello world!"
}
BEGIN{
    # This code is executed only at the start of the program
    myvar=0
}
/run/{
    # This code is executed on whatever matches the regex /run/
    helloWorld()
    print "My var is " myvar
}
END{
    # This code is executed only at the end of the program
    print "Goodbye cruel world!"
}
  

[Top]


Running awk interactively

We can run such program in the following way. Note that we are matching for a regular expression that contains the string run, so only then, will the helloWorld function and the print will be called, to exit simply Ctrl+C at any time.

    
msoto$ awk -f prog.awk
asdf
hello
ran
ron

run
Hello world!
My var is 0

I run
Hello world!
My var is 0

Program, run!
Hello world!
My var is 0
^C
msoto$
  

But wait, you may ask, why did the code inside the END block did not run after I Ctrl+C?

Ctrl+C gives an abort signal called SIGINT, meanwhile the code inside END block runs after the end of input.
So we need to send an end-of-file signal called EOF, to the system with Ctrl+D, in order for this code to run:

    
msoto$ awk -f prog.awk
asdf
runn
Hello world!
My var is 0
^D
Goodbye cruel world!
msoto$
  

You may not see the ^D characters being printed, I added them to illustrate when I pressed the keys. Now we see the last piece of code running.

[Top]


Running awk on the command line

Now, not all of the blocks are needed, in fact, the BEGIN and END blocks are completely optional and can be omitted. The regular expression can also be omitted so we can have a much simpler program that fits in a single line.

For example: {print "Word counter: " NF} is a valid awk program.

And you do not even need to save it in a file. From the command line simply run:

    
msoto$ awk '{print "Word counter: " NF}';
hello my name is moises
Word counter: 5
one twoo
Word counter: 2
goodbye
Word counter: 1
^C
  

[Top]


Built-In Variables

In the last example we used NF which is a built-in variable for number of fields, by default, the field separator, given by the built-in variable FS is a space character.

The last program was using this in order to count words. There are several built-in variables:

    
Variable      Description	                                Default
ARGC	      Number of arguments	                        (none)
ARGV	      Array of arguments	                        (none)
FILENAME      Name of current input file	                (none)
FNR	      Record (line) number in current file	        (none)
FS	      Field separator	                                " "
NF	      Number of fields	                                (none)
NR	      Number of records (lines) read	                (none)
OFMT	      Output format for numbers	                        “%.6g”
OFS	      Output field separator	                        " "
ORS	      Output record (line) separator	                “\n”
RLENGTH	      Length of string matched by match function	(none)
RS	      Input record (line) separator	                “\n”
RSTART	      Start of string matched by match function	        (none)
SUBSTEP	      Subscript separator	                        “\034”
  

[Top]


Arithmetic functions

Awk is also good for simple mathematics, the built-in arithmetic functions are the following, to get more detailed information you can read the awk manual in the terminal with man awk

    
Fucntion          Description
atan2(y,x)	  Return the arctangent of y/x in radians
cos(x)	          Return the cosine of x, where x is in radians
exp(x)	          Return the exponential of x
int(x)	          Return x truncated to an integer value
log(x)	          Return the natural logarithm of x
rand()	          Return a random number n such that 0<=n<1
sin(x)	          Return the sine of x, where x is in radians
sqrt(x)	          Return the square root of x
srand(expr)	  Set the seed for rand to expr, returning previous seed
  

[Top]


String functions

These are only a few of the built-in string functions in awk, you can read the manual to get a full list as well as more detailed descriptions.

    
Function	Description
gsub(r,s)	Substitute s for r globally in $0
gsub(r,s,t)	Substitute s for r globally in t
index(s,t)	Return first position of string t in s
length(s)	Return number of characters in string s
match(s,r)	Test if s contains a substring matched by regex r
split(s,a)	Split s into an array a using FS
split(s,a,fs)	Split s into an array a using fs as field separator
sub(r,s)	Substitute s for the leftmost longest substring r in $0
sub(r,s,t)	Substitute s for the leftmost longest substring r in t
substr(s,p)	Return suffix of s starting at p
substr(s,p,n)	Return suffix of s of length n starting at p
  

Other built-in functions

Awk includes time, I/O, and bit-operation functions as well, to get more information on these you can check the awk manual page on your terminal.

[Top]


Running awk using a file as input

Suppose we are working with a file addrs.txt containing IP address information. Sometimes these are written in CIDR notation, other times, the range is given, the formatting is also varied, having more or less spaces at certain lines:

    
192.168.1.64/24
    172.10.1.100/14
192.168.1.64   - 192.168.1.127  
 192.168.0.1/18  
172.8.0.0      - 172.8.1.255
172.8.1.127/23
10.0.0.0/8
 10.0.0.0 -  10.255.255.255
10.0.0.0  - 10.127.255.255
190.1.0.0-190.64.255.255
  

We will use a small awk program that will convert from CIDR notation to range and, if given a range, it will calculate the CIDR notation, you can find the source code here: cidrtr.awk

    
msoto$ awk -f cidrtr.awk addrs.txt
CIDR               Range
192.168.1.64/24    192.168.1.0 - 192.168.1.255
172.10.1.100/14    172.8.0.0 - 172.11.255.255
192.168.1.64/26    192.168.1.64 - 192.168.1.127
192.168.0.1/18     192.168.0.0 - 192.168.63.255
172.8.0.0/23       172.8.0.0 - 172.8.1.255
172.8.1.127/23     172.8.0.0 - 172.8.1.255
10.0.0.0/8         10.0.0.0 - 10.255.255.255
10.0.0.0/8         10.0.0.0 - 10.255.255.255
10.0.0.0/9         10.0.0.0 - 10.127.255.255
190.1.0.0/10       190.1.0.0 - 190.64.255.255
msoto$
  

That's much better.

[Top]


Redirecting output

Having this printed to the terminal may not be as useful as having it in a file. If we want, we can redirect the output to a new file

    
msoto$ awk -f cidrtr.awk addrs.txt 1>addrs_nice.txt
msoto$ cat addrs_nice.txt
... 
  

NB: You cannot redirect output to the same file you are using as input, trying to do so will actually empty the file.

Closing thoughts

While probably most of what awk does can be achieved by using a combination of other utility programs like: cut, tr, col, etc. taped together in a shell script, the awk programming language is useful for creating reports using a single tool.

More so, for more complex tasks, an awk script may be more readable than a skillful combination of the previously mentioned utilities in a shell script.

[Top]




Tags: awk unix linux

Documentation