For a weekly status report, I need to determine the number of work requests that are approved and awaiting implementation. The list of requests in that state is contained in a webpage that contains other information, including requests that are in various other states, such as those awaiting approval. I normally download the webpage containing the information to run scripts against it to extract other information from the page, so I decided to create a PHP script that would display just the list of requests awaiting implementation and produce a count of those requests in that state. On the source webpage the line on the page that marks the start of the section of the page containing the requests that are approved and awaiting implementation contains the text "Requests Waiting Implementation". The HTML code on the page that marks the end of that section contains and ending div tag. So I created the two PHP variables below to hold the two strings I need to search for within the file.
$startString = "Requests Waiting Implementation"; $endString = "</div>";
Since I want to process the HTML file I've downloaded to obtain the data, I need to open it and read it line by line. To do so, I can use the following PHP code:
$requestsFile = "data/Requests.html"; $file_handle = fopen($requestsFile, "r"); while (!feof($file_handle)) { $line = fgets($file_handle); <other stuff to do> } fclose($file_handle);
The variable $requestsFile
stores the location of the
file I want to process. The next line stores a reference to that file in
the variable $file_handle
. The
fopen function
creates the connection to the file. The function can take two parameters:
the file name and the type of access desired. Because I just want to read
the contents of the file and not write to it, I can use "r"
, since
it is a text
file. If it was a binary file, I would use "rb"
.
I can then read the file line by lined until the
end-of-file
(EOF) is encountered with a
while
loop checking for the end of the file with feof($file_handle)
.
The feof function
returns TRUE
if the file pointer is at the end of the file,
or an error occurs; otherwise it returns FALSE
.
Since I want to continue while not at the end of the file, I can use
!feof($file_handle)
as the condition for the while loop.
The fgets function
reads a line from a file that is specified
as an argument to the function. In this case, I assign the
string returned by the function to the variable $line
. After
all lines in the file are processed, i.e., the EOF is reached for the HTML
file containing the input to be processed by the PHP script, I can close
the file with the
fclose function.
To determine if a string, e.g., a word or phrase, specifically in this
case "Requests Waiting Implementation" is present in a line, I can use
the
strpos function, e.g., strpos($line, $startString)
. The
function will return the numeric position of the first occurrence of the
string for which I want it to search, i.e., $startString
, in
the variable $line
. If the substring on which I'm searching
doesn't occur in the string contained in the variable $line
, the
function will return the
Boolean value FALSE
. So in this case, I
want to set the variable $found_startLine
to True
if the function doesn't return the value False
as shown below.
I don't check for the value true
, because if the search string
is prsent a numeric value is returned.
if (strpos($line, $startString) !== false) { $found_startLine = True; }
Once I've found the starting text I'm looking for in the file, I'll set
that variable to True
. If I've found the line and I've reached
the line in the file containing the ending string, I'll break out of the
loop using the
break statement. After the script has found the line with the starting
text, but before it has found the line with the ending text, I want it to
echo, i.e., display, all other lines it has found to its output, in this case
to a webpage it is generating. Each line with a request number on it in
the input file has a link, i.e., a
URL in the line
while no other lines do, so I want to count every instance of a
line between the starting and ending lines in the file that contains
<a href=>
which will give me a count of the number
of requests. So the entire PHP code is as follows:
<?php $requestsFile = "data/Requests.html"; $startString = "Requests Waiting Implementation"; $endString = "</div>"; $found_startLine = false; $count = 0; $file_handle = fopen($requestsFile, "r"); while (!feof($file_handle)) { $line = fgets($file_handle); if (!$found_startLine) { if (strpos($line, $startString) !== false) { $found_startLine = True; } } else { if (strpos($line, $endString) !== false) { break; } else { echo $line; if (strpos($line, "<a href=") !==false) { $count = $count + 1; } } } } fclose($file_handle); echo "<p>"; echo "Number of requests awaiting implementation: ", $count; echo "</p>\n"; ?>
The code for the .php file that produces the page, which I run on my MacBook Pro laptop, is in the file weekly_status.php.
Related articles:
References: