Extracting numbers from a text string with grep

The grep command-line utility found on Unix, Linux, and OS X systems can be used to extract strings from files or other data input to the command.

On an OS X system, if I want to determine the version of an application, I can look for that information in the version.plist file for the application. For Microsoft Excel for Mac 2011, I can find that file in /Applications/Microsoft Office 2011/Microsoft Excel.app/Contents.

$ cat "/Applications/Microsoft Office 2011/Microsoft Excel.app/Contents/version.plist"
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
	<key>BuildVersion</key>
	<string>0</string>
	<key>CFBundleShortVersionString</key>
	<string>14.6.0</string>
	<key>CFBundleVersion</key>
	<string>14.6.0</string>
	<key>SourceVersion</key>
	<string>151221</string>
</dict>
</plist>

If I look for the line that occurs immediately after one with CFBundleVersion, I will see the version number for Excel there. The grep command supports "before" and "after" options, which are specified with -B num or -A num where num is a number specifying how many lines of context grep should print before the line on which it finds a match for the specified pattern for which it is told to search or after the line on which it finds a match. You can also use --before-context num or --after-context num. So, I can find the line that occurs immediately after CFBundleVersion with the grep command below:

$ cat "/Applications/Microsoft Office 2011/Microsoft Excel.app/Contents/version.plist" | grep -A 1 CFBundleVersion
	<key>CFBundleVersion</key>
	<string>14.6.0</string>

But I just want the version number; to get just the number, I can pipe the output of the above grep command into another grep command where the -o or --only-matching option can be used to display only the matching part of the lines. If I'm only looking for the version number, I know it will be a sequence of digits separated by periods. For the pattern I want grep to use for its search, if I only want it to match a specific set of characters, I can enclose the characters I'm interested in within brackets, i.e. "[" and "]". I can specify the digits 0 through 9 with [0-9]. Since I also want to include periods, I will use [0-9.]. But I also need to specify a quantifer to tell grep how many instances of those characters I'm interested in seeing. Some quanitifers grep recognizes are the question mark, "?", which indicates zero or one occurrences of the preceding element, the asterisk, "*", which indicates zero or more occurrences of the preceding element, and the plus sign, "+", which indicates one or more occurrences of the preceding element. In this case, I want to use the plus sign to indicate to grep that I want to find all of the digits and periods in the line, so I would use grep -o '[0-9.]\+. Why is there a backslash, "\", before the plus sign? Because the plus sign has a special meaning to the Bash shell where I enter the command, so I need to "escape" its meaning to the shell so the shell won't interpret it by using an escape character, which is a backslash character.

$ cat "/Applications/Microsoft Office 2011/Microsoft Excel.app/Contents/version.plist" | grep -A 1 CFBundleVersion | grep -o '[0-9.]\+'
14.6.0

If you don't include the "\+" at the end of the pattern, grep will output each of the numbers and the periods on a separate line as shown below.

$ cat "/Applications/Microsoft Office 2011/Microsoft Excel.app/Contents/version.plist" | grep -A 1 CFBundleVersion | grep -o '[0-9.]'
1
4
.
6
.
0

The above grep command will work on OS X and Linux systems.

 

TechRabbit ad 300x250 newegg.com

Justdeals Daily Electronics Deals1x1 px