Excluding certain directories when using the find command

If I want to find all files with an HTML extension beneath the current directory and its subdirectories, but skip one directory, on a Linux system, I can perform a recursive search using the following command, which will exclude the contents of the directory named "private" which is directly below the current directory. The results will be placed in a file named htmlfiles.txt.

$ find . -path ./private -prune -o -name '*.html' -print > htmlfiles.txt

The period immediately after the find, i.e., find . tells find to start its search from the current directory from which the command is being executed; I could use something like find /somedir to start the search in a different directory.

The -path ./private -prune -o tells find that for the directory path that is ./private - the dot (.) represents the current directory, so the path is the private directory below the current directory - don't include it in the search, i.e., "prune" that directory from the search path. Including -prune indicates, if the file is a directory, do not descend into it. But it is the "dash o", i.e., -o which ensures that nothing is printed from within that directory.

The o is an operator - see Combining Primaries With Operators which means "or" and you can use -or as a substitute for it:

expr1 -o expr2
expr1 -or expr2
Or; expr2 is not evaluated if expr1 is true.

So, in this case, by putting it after the -path ./private -prune, it means if the expression to the left of the -o is true, i.e, the path matches ./private, then don't even evaluate the expression to the right of the -o. The expression to the right of the -o, i.e., -name '*.html' -print indicates if the name matches *.html, i.e., anything followed by .html, then print it. So, if the path includes private, the -print is never acted upon and that directory isn't included.

The other parameters above have the following meaning:

-nameFind those files that match the specified file names, in this case any file that has a .html extension on the filename
-printPrint, i.e., display the results found

But what if I want to exclude multiple directories from the results. Then I can use a find statement in the form below:

find . \( -path dir1 -o -path dir2 -o -path dir3 \) -prune -o print.

E.g., I could use a find statement like the one below:

$ find . \( -path ./private -o -path ./photos -o -path ./ellen/restricted -o -path ./ellen/keepout \) -prune -o -name '*.html' -print > htmlfiles.txt

In the above example, I don't want to include any files in the private and photos directory immediately below the directory from which the find command is run. But I don't want to include the restricted and keepout directories, which are a couple of subdirectories below the current one, either.

You need to put an escape character, i.e., a backslash, "\", before the opening and closing parenthesis characters. You also need to include a space after the opening parenthesis character, "(", and before the closing parenthesis character, ")", otherwise, without the spaces, I would get the following error message:

$ find . \(-path ./private -o -path ./photos -o -path ./ellen/restricted -o -path ./ellen/keepout\) -prune -o -name '*.html' -print > htmlfiles.txt
find: invalid expression; you have used a binary operator '-o' with nothing before it.

If you omit just the first space after the \(, you would get the error message above. But if you omit just the space before the \), you would receive the error message "find: invalid expression; I was expecting to find a ')' somewhere but did not see one." So be sure to include both spaces.

References:

  1. Recursively locating HTML files
    Date: March 17, 2014
    MoonPoint Support
  2. Finding Files
    The GNU Operating System and the Free Software Movement
  3. Exclude directory from find . command
    Asked: November 17, 2010
    Stack Overflow

 

TechRabbit ad 300x250 newegg.com

Justdeals Daily Electronics Deals1x1 px