• Awk is used to process text files/spreadsheets/flat files.
• It is a linux command line program that can be used to read a file and process it to output cleaned up data, or whatever you might need.

Basics

• The simplest awk program is to simply print out a file.
• Note, substitute the <name of file> with the file you are processing.
awk '{print}' <name of file>

• Each ‘thing’ is a field. The default is to take each space as a separator (this can be changed).

• To access each field, you use the dollar sign and the field number (from left to right).
• If you use 0, you access the whole line
• Therefore the above program, is the same as the following.

awk '{print $0}' <name of file>  • Inside the curly brackets you define your program, but outside, you can do things such as pattern matching. • For example, using /<pattern>/ the forward slashes denote a regular expression, and you could therefore filter based on that, so your program is only fed the text that matches. • Note, the regular expression applies against the whole line. This is process that happens before your program runs. awk '/etc/ {print$0}' <name of file>

• If you choose to print out multiple fields, you can add a space between by comma separating the fields in the print.
• If you don’t want a space between fields, don’t comma separate your fields
# with space between outputted fields
awk '/etc/ {print $1,$2}' <name of file>

# Without spaces between output
awk '/etc/ {print $1$2}' <name of file>

• awk parses numeric fields for you, so if a field is numeric, you can do mathematical operations against it.
• Things such division, multiplication, addition, etc
• If the field is not numeric, the operation simply does nothing. ( Seems to effect the column, based on the operation, but against 0 if fails to parse)
• If there is a trailing character, such as 1042k parses to 1042 and completes the operation.
# if you know that a field is numeric
awk '{print $1/1024}' <name of file>  • To append text to a field, simply add quotes plus the text " something" awk '{print$1 " something"}' <name of file>

• Before the program, you can also make use of conditions against fields.
• You can also chain multiple conditions/ regular expressions using the && binary operator.
• Note, if the field has text, might be included in a greater than check of something similar because of the trailing, or preceding text. You can multiple by one to split the text off, and instead only compare the number.
# Regular expression to filter, and an amount check
awk '/etc/ && $1 > 100 {print$0}' <name of file>

# Chaining two regular expressions
awk '/etc/ && /linux/ {print $0}' <name of file>  Usage • Start with awk. • Define your program using ‘{program}'. • Point to your file Scripting • When the awk program starts to get a bit verbose, you can write a script and pass that to awk. • Create a file called <something>.awk • A one line awk script would be the same as the script above, but omitting the awk program name, the single quotes, and the name of the file/ file you’re processing. /etc/ && /linux/ {print$0}

• Then to run the script against a file, use the following.
awk -f <scriptName> <name of file>

• When you start using scripting, you can add multiple lines to do different actions, these actions will output their results one after the other, each line; line by line.
/etc/ {print $0} /etc/ &&$5 > 100 {print $0,$2}

• When you run the above, the first line will only print out lines matching the regular expression /etc/. When this line is finished processing, the second line will run, and print out lines that match the regular expression and where the fifth field is greater than 100.

• There are built in functions in awk.
• For example, if you wanted to convert a field into an int, to say - for example - round it off and remove decimal places, you could use the int() function.
# Would convert the fifth field to a integer
/etc/ {print $0, int($5)}

• For a list of built in functions, refer to the following awk built in functions

• User defined functions

• functions can be written to handle more distinct use cases. Such as generating a random number. (A very basic example)
• You can define a function with the keyword function or func
function randomInt(n)
{
# The rand built in function generates a number from 0 to 1, inclusive of
# 0, but never 1.
return int(n * rand())
}

• There are programming language elements, and awk is turning complete. As a brief example, you can write a for loop.
• Note, the syntax is similar to c.
function printList(n)
{
for (i=1; i <= n; i++)
{
printf("%d ", i)
}
}


Separators

• The default separator for awk is white space delimited.
• To run an awk script against a file, you can use the following:
awk -F"," <script name> <file name>

• To work with an xls file, you could convert it to csv and process it like that.