8 minute read

Overview

Contrary to what my friends and family think I do as a software engineer—which is that I type code all day in front of a computer—I actually spend a lot of time browsing code bases, reading code, debugging issues, and designing systems. When triaging a bug, I usually browse the surrounding code and read and understand how the system is supposed to behave under good conditions (i.e. normal operation without a bug), and then try to figure out what is going wrong, or what is causing the bug. For browsing the code, there are no better tools than grep & git-grep, cscope (for C/C++ projects) and opengrok. These tools are literally life saving, and one grep could save hours of manual searching. Let me explain and show the usefulness of these tools.

Grep

Grep is one of the most powerful open source tools out there. It is installed by default by almost every installation of Linux, BSD, and Unix, regardless of the distribution. Simply stated, grep is a command-line utility that finds plain-text in a file or output efficiently and quickly. It searches the provided input files (if any) or standard input, for lines containing a match to a given pattern. The pattern is a regular expression (regex).

Examples

Search Standard Output

$ grep [options] search_string

Suppose you want to search for all instances of the word “License” in your working directory:

# Output truncated for sake of brevity
[andy@kubuntu:~/dev/demo/plotly.py]$ grep -R "License"
packages/python/chart-studio/LICENSE.txt:The MIT License (MIT)
packages/python/plotly/plotly/package_data/plotly.min.js:* Licensed under the MIT license
packages/python/plotly/plotly/package_data/plotly.min.js: * Licensed under the MIT license
packages/python/plotly/plotly/package_data/plotly.min.js: * Licensed under the MIT License.
packages/python/plotly/versioneer.py:* License: Public Domain
packages/python/plotly/versioneer.py:## License
packages/python/plotly/README.md:        <td>License</td>
README.md:        <td>License</td>
README.md:## Copyright and Licenses
LICENSE.txt:The MIT License (MIT)

Search String in a File

$ grep [options] search_string path/to/file

Suppose you have a requirements.txt file containing all the required packages for a library, and you would like to know the required version for pandas library. You can open the file and search for pandas and see the required version and then close the file, but that will take too long, and life is too short, so why not just grep for pandas like the example below?

[andy@kubuntu:~/dev/demo/plotly.py/doc]$ grep pandas requirements.txt
pandas==1.0.3
geopandas==0.8.1

Find if a File Contains a string

$ grep [options] search_string path/to/file

What if you want to find if a specific library (for example opengl) is part of the requirements.txt file from the last example? Again, you can open the file, and search for the library name inside the file. If found, then the library is required and not otherwise. But that would take more time than just grepping for the lib name.

[andy@kubuntu:~/dev/demo/plotly.py/doc]$ grep opengl requirements.txt
[andy@kubuntu:~/dev/demo/plotly.py/doc]$
[andy@kubuntu:~/dev/demo/plotly.py/doc]$ grep wget requirements.txt
wget

grep did not find opengl inside requirements.txt, but it did find wget, so opengl is not in the requirement list.

Grep Options Explained

-i ignores case sensitivity

[andy@kubuntu:~/dev/demo/plotly.py]$ ifconfig | grep -i ':4F:AB'
        ether 50:7b:9d:30:4f:ab  txqueuelen 1000  (Ethernet)

-w searches for the full word

[andy@kubuntu:~/dev/demo/plotly.py]$ grep -w connect /var/log/syslog
Aug 17 10:23:35 kubuntu redshift[945]: Could not connect to wayland display, exiting.
Aug 17 10:23:35 kubuntu redshift[977]: Could not connect to wayland display, exiting.

As an exercise to the reader, try running grep connect /var/log/syslog (i.e. without the -w option). Are you surprised to see connection, connected, disconnection, and disconnected in the results?

-B [N] and -A [N] display N lines before and after the text

andy@kubuntu:~/dev/demo/plotly.py]$ ifconfig | grep -i -B2 ':4F:AB'
enp0s25: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether 50:7b:9d:30:4f:ab  txqueuelen 1000  (Ethernet)
[andy@kubuntu:~/dev/demo/plotly.py]$ ifconfig | grep -i -A2 ':4F:AB'
        ether 50:7b:9d:30:4f:ab  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0

-r will do a recursive search within sub directories -c will count the number of matches -n will show the line number of the matches and many more other useful options which you can read all about in grep’s manual page

Regular Expressions (RegEx)

The examples above introduce grep utility for absolute beginners, but they do no justice in showcasing grep’s power. Grep’s power comes from combining grep with regular expressions. Regular expressions are a compact way of describing complex patterns in text. You can use regex to search for all kinds of expressions, such as finding phone numbers, email addresses, lines that start or end with a specific word or letter or even a pattern. Regex are very powerful. Let’s go over a couple of basic examples of grep + regex Supposed you have a file called space_oddity.txt that contains the lyrics to the famous song Space Oddity, you can search for all lines starting with the word “Can”:

[andy@kubuntu:~/dev/demo]$ grep "^Ground" space_oddity.txt
[andy@kubuntu:~/dev/demo]$ grep -n "^Can" space_oddity.txt
26:Can you hear me, Major Tom?
27:Can you hear me, Major Tom?
28:Can you hear me, Major Tom?
29:Can you "Here am I floating 'round my tin can

Search for all lines ending with “Tom”

[andy@kubuntu:~/dev/demo]$ grep "Tom$" space_oddity.txt
Ground Control to Major Tom
Ground Control to Major Tom
This is Ground Control to Major Tom
Ground Control to Major Tom

Search for all lines ending with “Tom” + zero or more characters. The . modifier means any character and the {0,1} means at least zero and at most one occurrence.

[andy@kubuntu:~/dev/demo]$ egrep "Tom.{0,1}$" space_oddity.txt 
Ground Control to Major Tom
Ground Control to Major Tom
This is Ground Control to Major Tom
Ground Control to Major Tom
Can you hear me, Major Tom?
Can you hear me, Major Tom?
Can you hear me, Major Tom?

Grep for an IPv4 Address

[andy@kubuntu:~/dev/demo]$ ifconfig | grep  -E "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}"
        inet 127.0.0.1  netmask 255.0.0.0
        inet 192.168.1.21  netmask 255.255.255.0  broadcast 192.168.1.255

For a detailed explanation of grep and regex, checkout GNU grep documentation

Grep Summary

Option examples

Option Example Operation
-i grep -i ':4F:AB' net_interfaces.txt Ignores case sensitivity
-w grep -w "connect" /var/log/syslog Search for the full word
-A grep -A 3 'Exception' error.log Display 3 lines of context after matching string
-B grep -B 4 'Exception' error.log Display 4 lines of context before matching string
-C grep -C 5 'Exception' error.log Display 5 lines around matching string
-r grep -r 'quickref.me' /var/log/nginx/ Recursive search within subdirs
-v grep -v 'warning' /var/log/syslog Returns all non-matching lines
-e grep -e '^Can' space_oddity.txt Use regex (lines starting with 'Can')
-E grep -E 'ja(s|cks)on' filename Extended regex (lines containing jason or jackson)
-c grep -c 'error' /var/log/syslog Count the number of matches
-l grep -l 'reboot' /var/log/* Print the name of the file(s) of matches
-o grep -o search_string filename Only show the matching part of the string
-n grep -n "start" demo.txt Show the line numbers of the matches

Grep Regular Expressions

^ Beginning of line.
$ End of line.
^$ Empty line.
\< Start of word.
\> End of word.
. Any character.
? Optional and can only occur once.
* Optional and can occur more than once.
+ Required and can occur more than once.
{n} Previous item appears exactly n times.
{n,} Previous item appears n times or more.
{,m} Previous item appears n times maximum.
{n,m} Previous item appears between n and m times.
[:alpha:] Any lower and upper case letter.
[:digit:] Any number.
[:alnum:] Any lower and upper case letter or digit.
[:space:] Any whites­pace.
[A-Z­a-z] Any lower and upper case letter.
[0-9] Any number.
[0-9­A-Z­a-z] Any lower and upper case letter or digit.

Leave a comment