• Awk is used to process text files/spreadsheets/flat files.
  • It is a linux command line program that can be used to read a file and process it to output cleaned up data, or whatever you might need.

Basics

  • The simplest awk program is to simply print out a file.
    • Note, substitute the <name of file> with the file you are processing.
awk '{print}' <name of file>
  • Each ’thing’ is a field. The default is to take each space as a separator (this can be changed).

    • To access each field, you use the dollar sign and the field number (from left to right).
    • If you use 0, you access the whole line
  • Therefore the above program, is the same as the following.

awk '{print $0}' <name of file>
  • Inside the curly brackets you define your program, but outside, you can do things such as pattern matching.
    • For example, using /<pattern>/ the forward slashes denote a regular expression, and you could therefore filter based on that, so your program is only fed the text that matches.
    • Note, the regular expression applies against the whole line. This is process that happens before your program runs.
awk '/etc/ {print $0}' <name of file>
  • If you choose to print out multiple fields, you can add a space between by comma separating the fields in the print.
    • If you don’t want a space between fields, don’t comma separate your fields
# with space between outputted fields
awk '/etc/ {print $1, $2}' <name of file>

# Without spaces between output
awk '/etc/ {print $1 $2}' <name of file>
  • awk parses numeric fields for you, so if a field is numeric, you can do mathematical operations against it.
    • Things such division, multiplication, addition, etc
    • If the field is not numeric, the operation simply does nothing. ( Seems to effect the column, based on the operation, but against 0 if fails to parse)
    • If there is a trailing character, such as 1042k parses to 1042 and completes the operation.
# if you know that a field is numeric
awk '{print $1/1024}' <name of file>
  • To append text to a field, simply add quotes plus the text " something"
awk '{print $1 " something"}' <name of file>
  • Before the program, you can also make use of conditions against fields.
  • You can also chain multiple conditions/ regular expressions using the && binary operator.
  • Note, if the field has text, might be included in a greater than check of something similar because of the trailing, or preceding text. You can multiple by one to split the text off, and instead only compare the number.
# Regular expression to filter, and an amount check
awk '/etc/ && $1 > 100 {print $0}' <name of file>

# Chaining two regular expressions
awk '/etc/ && /linux/ {print $0}' <name of file>

Usage

  • Start with awk.
  • Define your program using ‘{program}’.
  • Point to your file

Scripting

  • When the awk program starts to get a bit verbose, you can write a script and pass that to awk.
  • Create a file called <something>.awk
  • A one line awk script would be the same as the script above, but omitting the awk program name, the single quotes, and the name of the file/ file you’re processing.
/etc/ && /linux/ {print $0}
  • Then to run the script against a file, use the following.
awk -f <scriptName> <name of file>
  • When you start using scripting, you can add multiple lines to do different actions, these actions will output their results one after the other, each line; line by line.
/etc/ {print $0}
/etc/ && $5 > 100 {print $0, $2}
  • When you run the above, the first line will only print out lines matching the regular expression /etc/. When this line is finished processing, the second line will run, and print out lines that match the regular expression and where the fifth field is greater than 100.

More advanced features of awk

  • There are built in functions in awk.
  • For example, if you wanted to convert a field into an int, to say - for example - round it off and remove decimal places, you could use the int() function.
# Would convert the fifth field to a integer
/etc/ {print $0, int($5)}
  • For a list of built in functions, refer to the following awk built in functions

  • User defined functions

    • functions can be written to handle more distinct use cases. Such as generating a random number. (A very basic example)
    • You can define a function with the keyword function or func
function randomInt(n)
{
    # The rand built in function generates a number from 0 to 1, inclusive of
    # 0, but never 1.
    return int(n * rand())
}
  • There are programming language elements, and awk is turning complete. As a brief example, you can write a for loop.
    • Note, the syntax is similar to c.
function printList(n)
{
    for (i=1; i <= n; i++)
    {
        printf("%d ", i)
    }
}

Separators

  • The default separator for awk is white space delimited.
  • To run an awk script against a file, you can use the following:
awk -F"," <script name> <file name>
  • To work with an xls file, you could convert it to csv and process it like that.