E.g., I have a comma-separated values (CSV) file that contains information on pending work requests. Each line in the file contains a "Project" field. I want to count the number of pending requests per project. I can easily do so with a Bash script using utilities commonly found on Linux, Unix, and Apple's OS X systems.
# If no arguments appear on the command line, display usage information if [ $# -eq 0 ] then echo "Usage: ./count_requests filename.csv" exit else fn=$1 fi grep -v "ProjectName" $fn |cut -d"," -f8 | cut -d'"' -f 2 | sort | uniq -c | sort -n # Provide a total number for all requests in the pending removal state echo "-----" grep -v "ProjectName" $fn | wc -l | cut -c 5-8
I can run the script, count_requests
from a Bash
Bash shell prompt specifying the name of the CSV file containing the
data with ./count_requests filename
where filename
is the location and name of the CSV file. I check to make sure that filename
argument was included on the command line and print a usage message if
the file name argument was omitted. Otherwise the
variable "fn" is set to be the first argument on the
command line.
Since the first line of the CSV file is a header with one of the
fields in the header being "ProjectName", I filter out that line
with the grep command
specifying with the -v
option that I want to ignore
any line containing "ProjectName". Since it is a CSV file with
fields separated by commas, I can
pipe the file's contents, excluding the header line, into the
cut utility with the comma character set as
the delimiter with -d","
. Since the project name field is the
8th comma-separated field on a line, I can extract just that field
with -f8
. I can then remove the double quotes that are
around the project name in the CSV file by piping the results into another
cut command that specifies the delmiter as the comma character and outputs
the 2nd field on the line which is just the project name without the surrounding
double quotes. I can then use the sort
utility to sort all of the lines alphabetically then pipe its output into the
uniq command which, with the -c
option will eliminate all the duplicates of a project name, but give me
the count of the number of occurrences of each unique project name. That
will give me output like the following:
4 Wind 5 IPAM 5 SDO 8 MMOC 15 MMS
Finally, I can sort the above output from uniq numerically with the sort
command by using sort -n
, so that 15 will be listed after
any occurence of a line that has 2 for the count. After the count for
each project is output, I count the total number of requests by
piping the output of the grep command that excludes the header line into
the wc command that counts the number of lines.
So, I'll then have output similar to the following, which shows there
are 330 pending work requests:
1 CD Manager 1 DSN 1 EO1 1 Enterprise Services <text snipped> 5 SD1 8 MMOC 9 MAVE 9 SLAC 11 IO PM 12 DISCOVER 15 MMS 18 LADE 21 TRMM 26 MMOC 54 GLASS 86 IPnoc ----- 330
Most of those pending requests are for internal projects rather than for external projects, however, and I'd like to know how many of the 330 requests are for internal projects. I can determine that number using Bash's "array" and "for loop" capabilities by adding the following code to the bottom of the above script:
# Internal requests intrequests=( "CD Manager" "Enterprise Services" "IO PM" "IPnoc" "SECTION1" ) echo "" echo "Internal Requests" echo "" for i in "${intrequests[@]}" do : grep "$i" $fn | cut -d"," -f8 | cut -d'"' -f 2 | sort | uniq -c done
The array is created by setting a variable, intrequests
equal to
the contents of the values between the parentheses. I can then loop through
that array with for i in "${intrequests[@]}"
putting the
steps that I want performed within the loop between do :
and done
. The result will be additional lines of output where
the counts are displayed just for the internal projects. I.e., I will
see something like the following displayed:
Internal Requests 1 CD Manager 1 Enterprise Services 5 IPAM 86 IPnoc 8 Section1
But I'd also like the script to add the number of internal requests for
me, though in this example I can add them fairly quickly in my head. But
to have the script perform the calculation, I'll read the file again and
loop through the array again, but this time add the count for each internal
project to a variable, total
. Since the file will be at most
a few hundred lines, I'm not concerned about optimizing the performance,
so I'm reading the file a few times, but that happens so quickly that I'm
not concerned about reading it more than once. Since I know the number on
each line will be in columns 1 to 4, I have a cut -c 1-4
as the last operation to extract just the count for each project without
including the project name on the line that determines the count for each
element in the array intrequests.
# Display total count for internal requests echo "-----" total=0 for i in "${intrequests[@]}" do : count=`grep "$i" $fn | cut -d"," -f8 | cut -d'"' -f 2 | sort | uniq -c | cut -c 1-4` let "total = $total + count" done echo $total
The output for the internal requests will then look like the following:
Internal Requests 1 CD Manager 1 Enterprise Services 5 IPAM 86 IPnoc 8 Section1 ----- 101
References: