Effective Code Browsing - Part 1
Overview
Contrary to what my friends and family think I do as a software engineer—which is that I type code all day in front of a computer—I actually spend a lot of time browsing code bases, reading code, debugging issues, and designing systems. When triaging a bug, I usually browse the surrounding code and read and understand how the system is supposed to behave under good conditions (i.e. normal operation without a bug), and then try to figure out what is going wrong, or what is causing the bug. For browsing
the code, there are no better tools than grep
& git-grep
, cscope
(for C/C++ projects) and opengrok
. These tools are literally life saving, and one grep
could save hours of manual searching. Let me explain and show the usefulness of these tools.
Grep
Grep
is one of the most powerful open source tools out there. It is installed by default by almost every installation of Linux, BSD, and Unix, regardless of the distribution. Simply stated, grep
is a command-line utility that finds plain-text in a file or output efficiently and quickly. It searches the provided input files (if any) or standard input, for lines containing a match to a given pattern. The pattern is a regular expression (regex).
Examples
Search Standard Output
$ grep [options] search_string
Suppose you want to search for all instances of the word “License” in your working directory:
# Output truncated for sake of brevity
[andy@kubuntu:~/dev/demo/plotly.py]$ grep -R "License"
packages/python/chart-studio/LICENSE.txt:The MIT License (MIT)
packages/python/plotly/plotly/package_data/plotly.min.js:* Licensed under the MIT license
packages/python/plotly/plotly/package_data/plotly.min.js: * Licensed under the MIT license
packages/python/plotly/plotly/package_data/plotly.min.js: * Licensed under the MIT License.
packages/python/plotly/versioneer.py:* License: Public Domain
packages/python/plotly/versioneer.py:## License
packages/python/plotly/README.md: <td>License</td>
README.md: <td>License</td>
README.md:## Copyright and Licenses
LICENSE.txt:The MIT License (MIT)
Search String in a File
$ grep [options] search_string path/to/file
Suppose you have a requirements.txt file containing all the required packages for a library, and you would like to know the required version for pandas
library. You can open the file and search for pandas
and see the required version and then close the file, but that will take too long, and life is too short, so why not just grep
for pandas
like the example below?
[andy@kubuntu:~/dev/demo/plotly.py/doc]$ grep pandas requirements.txt
pandas==1.0.3
geopandas==0.8.1
Find if a File Contains a string
$ grep [options] search_string path/to/file
What if you want to find if a specific library (for example opengl
) is part of the requirements.txt file from the last example? Again, you can open the file, and search for the library name inside the file. If found, then the library is required and not otherwise. But that would take more time than just grepping
for the lib name.
[andy@kubuntu:~/dev/demo/plotly.py/doc]$ grep opengl requirements.txt
[andy@kubuntu:~/dev/demo/plotly.py/doc]$
[andy@kubuntu:~/dev/demo/plotly.py/doc]$ grep wget requirements.txt
wget
grep
did not find opengl
inside requirements.txt, but it did find wget
, so opengl
is not in the requirement list.
Grep Options Explained
-i
ignores case sensitivity
[andy@kubuntu:~/dev/demo/plotly.py]$ ifconfig | grep -i ':4F:AB'
ether 50:7b:9d:30:4f:ab txqueuelen 1000 (Ethernet)
-w
searches for the full word
[andy@kubuntu:~/dev/demo/plotly.py]$ grep -w connect /var/log/syslog
Aug 17 10:23:35 kubuntu redshift[945]: Could not connect to wayland display, exiting.
Aug 17 10:23:35 kubuntu redshift[977]: Could not connect to wayland display, exiting.
As an exercise to the reader, try running grep connect /var/log/syslog
(i.e. without the -w
option). Are you surprised to see connection, connected, disconnection, and disconnected in the results?
-B [N]
and -A [N]
display N lines before and after the text
andy@kubuntu:~/dev/demo/plotly.py]$ ifconfig | grep -i -B2 ':4F:AB'
enp0s25: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
ether 50:7b:9d:30:4f:ab txqueuelen 1000 (Ethernet)
[andy@kubuntu:~/dev/demo/plotly.py]$ ifconfig | grep -i -A2 ':4F:AB'
ether 50:7b:9d:30:4f:ab txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
-r
will do a recursive search within sub directories
-c
will count the number of matches
-n
will show the line number of the matches
and many more other useful options which you can read all about in grep’s manual page
Regular Expressions (RegEx)
The examples above introduce grep
utility for absolute beginners, but they do no justice in showcasing grep
’s power. Grep’s power comes from combining grep
with regular expressions.
Regular expressions are a compact way of describing complex patterns in text. You can use regex to search for all kinds of expressions, such as finding phone numbers, email addresses, lines that start or end with a specific word or letter or even a pattern. Regex are very powerful. Let’s go over a couple of basic examples of grep
+ regex
Supposed you have a file called space_oddity.txt
that contains the lyrics to the famous song Space Oddity, you can search for all lines starting with the word “Can”:
[andy@kubuntu:~/dev/demo]$ grep "^Ground" space_oddity.txt
[andy@kubuntu:~/dev/demo]$ grep -n "^Can" space_oddity.txt
26:Can you hear me, Major Tom?
27:Can you hear me, Major Tom?
28:Can you hear me, Major Tom?
29:Can you "Here am I floating 'round my tin can
Search for all lines ending with “Tom”
[andy@kubuntu:~/dev/demo]$ grep "Tom$" space_oddity.txt
Ground Control to Major Tom
Ground Control to Major Tom
This is Ground Control to Major Tom
Ground Control to Major Tom
Search for all lines ending with “Tom” + zero or more characters. The .
modifier means any character and the {0,1}
means at least zero and at most one occurrence.
[andy@kubuntu:~/dev/demo]$ egrep "Tom.{0,1}$" space_oddity.txt
Ground Control to Major Tom
Ground Control to Major Tom
This is Ground Control to Major Tom
Ground Control to Major Tom
Can you hear me, Major Tom?
Can you hear me, Major Tom?
Can you hear me, Major Tom?
Grep for an IPv4 Address
[andy@kubuntu:~/dev/demo]$ ifconfig | grep -E "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}"
inet 127.0.0.1 netmask 255.0.0.0
inet 192.168.1.21 netmask 255.255.255.0 broadcast 192.168.1.255
For a detailed explanation of grep and regex, checkout GNU grep documentation
Grep Summary
Option examples
Option | Example | Operation |
---|---|---|
-i |
grep -i ':4F:AB' net_interfaces.txt | Ignores case sensitivity |
-w |
grep -w "connect" /var/log/syslog | Search for the full word |
-A |
grep -A 3 'Exception' error.log | Display 3 lines of context after matching string |
-B |
grep -B 4 'Exception' error.log | Display 4 lines of context before matching string |
-C |
grep -C 5 'Exception' error.log | Display 5 lines around matching string |
-r |
grep -r 'quickref.me' /var/log/nginx/ | Recursive search within subdirs |
-v |
grep -v 'warning' /var/log/syslog | Returns all non-matching lines |
-e |
grep -e '^Can' space_oddity.txt | Use regex (lines starting with 'Can') |
-E |
grep -E 'ja(s|cks)on' filename | Extended regex (lines containing jason or jackson) |
-c |
grep -c 'error' /var/log/syslog | Count the number of matches |
-l |
grep -l 'reboot' /var/log/* | Print the name of the file(s) of matches |
-o |
grep -o search_string filename | Only show the matching part of the string |
-n |
grep -n "start" demo.txt | Show the line numbers of the matches |
Grep Regular Expressions
^ |
Beginning of line. |
$ |
End of line. |
^$ |
Empty line. |
\< |
Start of word. |
\> |
End of word. |
. | Any character. |
? |
Optional and can only occur once. |
* |
Optional and can occur more than once. |
+ |
Required and can occur more than once. |
{n} |
Previous item appears exactly n times. |
{n,} |
Previous item appears n times or more. |
{,m} |
Previous item appears n times maximum. |
{n,m} |
Previous item appears between n and m times. |
[:alpha:] |
Any lower and upper case letter. |
[:digit:] |
Any number. |
[:alnum:] |
Any lower and upper case letter or digit. |
[:space:] |
Any whitespace. |
[A-Za-z] |
Any lower and upper case letter. |
[0-9] |
Any number. |
[0-9A-Za-z] |
Any lower and upper case letter or digit. |
Leave a comment