I needed to search a text file for a block of text starting with a specified word or phrase and ending with a specified word or phrase, so I created the Python script below:
#!/usr/bin/python # Name: findTextBlock.py # Version: 0.2 # Created: 2018-02-03 # Last modified: 2018-02-03 # Usage: findTextBlock.py start_text end_text filename # Description: The script will display all of the text found in filename # from start_text to end_text, including start_text and end_text. If less # than 3 arguments are provided on the command line, the script will prompt # for the values for start_text, end_text, and filename. A text search string # that includes a space or spaces should be enclosed in double quotes if # provided on the command line. If a double quote is part of the search text, # precede it with the backslash escape character on the command line. import os.path, sys def parse_file(infile,start_text,end_text): with open(infile) as f: alltxt = f.read() found_position = alltxt.find(start_text) if found_position == -1: print "Starting text", start_text, "not found in", infile exit() # Need to eliminate all text before the starting point; otherwise, # if the end text occurs somewhere before the star text, nothing will # be printed even though there may be another occurrence of end text # later in the file. alltxt = alltxt[alltxt.find(start_text):] found_position = alltxt.find(end_text) if found_position == -1: print "Ending text", end_text, "not found after", start_text, "in", infile exit() return alltxt[alltxt.find(start_text):alltxt.find(end_text)+len(end_text)] if len(sys.argv) == 4: start_text = sys.argv[1] end_text = sys.argv[2] path_to_file = sys.argv[3] else: start_text = raw_input("Enter starting text: ") end_text = raw_input("Enter ending text: ") path_to_file = raw_input("Enter file name: ") if not os.path.isfile(path_to_file): exit("Input file not accessible") print parse_file(path_to_file,start_text,end_text)
The script expects three parameters: the starting text on which to search, the ending text, and the path and file name to be searched. If all three arguments are not provided on the command line, the script will prompt for them. If the file provided as the last parameter can't be found or accessed, an error message, "Input file not accessible", will be displayed. If a space or spaces is present in either the starting text or ending text, then the phrase that includes the space(s) should be enclosed in double quotes when entered on the command line. They don't need to be enclosed in spaces if entered when prompted by the script to enter them. If a phrase on which to be searched includes double quotes, precede the double quote with a backslash, which is an escape character. E.g., if the text file contains the text below:
Its text differs, however, from the written versions prepared by Lincoln before and after his speech. It is the only version to which Lincoln affixed his signature, and the last he is known to have written. Lincoln wrote "Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal..." In Lincoln at Gettysburg, Garry Wills notes the parallels between Lincoln's speech and Pericles's Funeral Oration during the Peloponnesian War as described by Thucydides.
If I didn't want to include the surrounding quotes from Lincoln's speech, I could use the following command:
$ ./findTextBlock.py "Four score" "equal..." Lincoln.txt Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal... $
If I wanted the double quotes surrounding the text block displayed, I could use the following command:
$ ./findTextBlock.py "\"Four score" "equal...\"" Lincoln.txt "Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal..." $
If the search parameters are entered when prompted for them, an escape character isn't needed to have double quotes included in the search terms.
$ ./findTextBlock.py Enter starting text: "Four score Enter ending text: equal..." Enter file name: Lincoln.txt "Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal..." $
Script download: findTextBlock.py