Introduction
Tree surgeon is a tool for filtering file trees. A "filter expression" is used to filter a file tree:
endsWith ".cpp" (basename file)
Will filter out all files (on all levels of the directory tree) except those ending in ".cpp". Filtering expressions are defined via a mini-language.
-
Show a
tree
-style diff to illustrate what happens when applying a filter to a directory:$ tree-surgeon tree-diff \ -f 'endsWith ".c" (basename file)' \ -s ./my-project
-
Remove all
*.tmp
files in sub-directories named.cache
from any level of a directory, by piping tree-surgeon’s output to bash:$ tree-surgeon to-bash \ -f 'endsWith ".tmp" (basename file) & elem ".cache" (parents file)' \ -s ./my-project | xargs rm
-
Find and replace the string "Apple" with "Orange" in all files with "fruit" in the name, or in the "./fruit" directory:
$ tree-surgeon to-bash \ -f 'occursIn "fruit" (basename file) | parents file == [ "fruit" ]' \ -s ./my-project | xargs sed -i 's/Apple/Orange/g'
-
Count the number of lines of code in all files in your project that are not either Haskell or Rust:
$ tree-surgeon to-bash \ -f 'let fName = basename file in !(endsWith ".hs" fName | endsWith ".rs" fName)' \ -s ./my-project | xargs wc -l
Filter expressions
Tree surgeon uses a mini-language to define its filters, that can then be applied to directory trees. A filter expression can be thought of as a function that takes a single argument - the file that is being examined for inclusion/exclusion - and returns a boolean value, where True
indicates that the file should be included.
Note that the term "file" is used in the unix sense here, and may be a directory or a regular file. You can distinguish between the two in filter expressions by the isFile
or isDir
functions.
. └─── └─── myMain.rs └─── └─── └─── tests.rs├─── │ └─── docs.md ├─── │
How does filtering work?
-
A file or directory that has been included by the filter expression will always be included in the resulting tree, from which it flows that:
-
A directory will be included in the resulting tree if it contains any files that have been included, even if it was not directly included by the filter expression.
An example of a very simple filter expression is:
basename file == "includeMe.txt"
Which tells tree surgeon to include any file named includeMe.txt
. Tree surgeon will traverse the whole of the directory tree you specify, and for all files, test whether they match this expression. So in this case, any file named includeMe.txt
, at any depth of the directory tree, will be included; and so will any directories that (directly or indirectly) contain these files.
Language tour
Expressions are like functions
Tree surgeon uses a mini-language to define its filters. A filter expression can be thought of as a function that takes a single argument - the file that is being examined for inclusion/exclusion - and returns a boolean value, where True indicates that the file should be included. The file that is being examined for inclusion/exclusion is a special pre-defined variable in the expression, called file
.
The filter language is very similar to functional languages such as Haskell and ML, with just a few alterations for convenience.
For example:
basename file == "myMain.rs"
Tells tree surgeon to include any file named myMain.rs
. Tree surgeon will traverse the whole of the directory tree you specify, and for each file, assess whether this expression evaluates to True
. So in this case, any file named myMain.rs
, at any depth of the directory tree, will be included:
. └─── docs │ └─── docs.md ├─── │ └─── myMain.rs └─── tests └─── integration_tests └─── tests.rs├───
(note that files in grey were part of the original tree, but have been excluded)
Sub-expressions can be linked
This expression can be composed of sub-expressions, linked via &
(and) and |
(or) operators. For example:
basename file == "docs.md" | basename file == "tests.rs"
. └─── docs.md ├─── mySrc │ └─── myMain.rs └─── └─── └─── tests.rs├─── │ └───
Note that the filter expression is evaluated for every file in the tree, and as long as the expression resolves to True
, then a file will be included. The special file
variable need not actually be in the expression (though, it almost always should be). So this expression will include all files:
1 == 1
. └─── docs.md ├─── │ └─── myMain.rs └─── └─── └─── tests.rs├─── │ └───
whereas this expression will include no files:
False
Expressions can be defined, negated
Expressions can be parenthesized, compared, negated (via !
), or defined (via let .. in …
). Here we want to include all files except those which have "tests" in the name:
!(occursIn "tests" (basename file))
. └─── docs.md ├─── │ └─── myMain.rs └─── tests └─── integration_tests └─── tests.rs├─── │ └───
Multiple definitions can be put within the same let … in
statement, and separated by colons:
let isDocs = basename file == "docs.md"; isTests = basename file == "tests.rs" in isDocs | isTests
. └─── docs.md ├─── mySrc │ └─── myMain.rs └─── └─── └─── tests.rs├─── │ └───
Functions on the special file
value
As a reminder, the file
variable means "file" in the unix sense: it includes both directories and regular files.
The file
variable can be interacted with via a few functions:
-
basename
returns the name of the file as a string. This is the name of the file only, not its directory structure; so formyDir/mySrc/myMain.rs
,basename file
returnsmyMain.rs
. -
parents
returns the parent directories of the file as a list of strings. So for the filemyDir/mySrc/myMain.rs
,parents file
will return[ "myDir", "mySrc" ]
. -
isDir
andisFile
returnTrue
if thefile
object is a directory or a regular file, respectively.
String-related functions
startsWith
, occursIn
and endsWith
do what you’d imagine them to do:
endsWith ".rs" (basename file)
. └─── myDir ├─── docs │ └─── docs.md ├─── │ └─── myMain.rs └─── └─── └─── tests.rs
startsWith "docs" (basename file) | occursIn "Main" (basename file)
. └─── docs.md ├─── │ └─── myMain.rs └─── tests └─── integration_tests └─── tests.rs├─── │ └───
Functions can be partially applied
Functions can be partially applied. Here the endsWith
function normally takes two arguments, but we have passed it only one, creating a new isRust
function, that expects a single additional argument:
let isRust = endsWith ".rs" in isRust (basename file)
. └─── myDir ├─── docs │ └─── docs.md ├─── │ └─── myMain.rs └─── └─── └─── tests.rs
There are lists, no tuples
Lists are defined in the usual way, with square brackets:
parents file == [ "myDir", "docs" ]
. └─── docs.md ├─── mySrc │ └─── myMain.rs └─── tests └─── integration_tests └─── tests.rs├─── │ └───
all
, any
, and map
operate on lists
all
takes a function of type a → True
applies it to all members of a list of type [a]
, and checks if all elements of the resulting list are True
:
all (startsWith "my") (parents file)
. └─── myDir ├─── docs │ └─── docs.md ├─── │ └─── myMain.rs └─── tests └─── integration_tests └─── tests.rs
any
is similar, but checks if any element of the resulting list is True
:
any ((==) "tests") (parents file)
Note that putting parentheses around an infix function like ==
or +
allows it to be used prefix. Here ((==) "tests")
resolves to a (partially-applied function) that takes a single argument - a string - and returns True
if it equals "tests"
, otherwise False
.
. └─── myDir ├─── docs │ └─── docs.md ├─── mySrc │ └─── myMain.rs └─── └─── └─── tests.rs
map
is used to convert elements of a list from one type to another:
any ((==) 4) (map length (parents file))
. └─── docs.md ├─── mySrc │ └─── myMain.rs └─── tests └─── integration_tests └─── tests.rs├─── │ └───
any ((==) "tests") (parents file)
Note that putting parentheses around an infix function like ==
or +
allows it to be used prefix. Here ((==) "tests")
resolves to a (partially-applied function) that takes a single argument - a string - and returns True
if it equals "tests"
, otherwise False
.
. └─── docs │ └─── docs.md ├─── mySrc │ └─── myMain.rs └─── └─── └─── tests.rs├───
Occasionally Asked Questions
Why filter both files and directories?
Similar utilities to tree-surgeon will filter on files only, and will filter directories by automatically eliminating any empty directories from the resulting tree. Tree surgeon does not use this system. Directories are included/excluded by the filter expression in exactly the same way as files, with one additional rule: if a directory has been excluded, but it contains files that have been included, it will be included. The fact that directories are assessed by the filter expression in the same way as files means that empty directories can be included. Although in most use cases it is rare to want to include an empty directory, it is not out the question and so this situation is provided for. If you do not want empty directories to be included, appending & !(isDir file)
to an expression will cause empty directories to be excluded; for example:
elem "mySrc" (parents file) & !(isDir file)
Will include any file that is in a directory named mySrc
(directly or indirectly), but exclude any empty directories.
Why include, not exclude?
Applying the filter basename file == "test-file"
will include only files named test-file. This might be unintuitive to those used to filters excluding, such as .gitignore
. Versus excluding, including is a much more robust way to capture what you want from a directory tree, but tends to be more verbose, as there’s usually less stuff that needs excluding, than including. Tree surgeon aims to have the robustness of including files, but without this verbosity. Note that any filter can be inverted by using the !
operator, eg !(basename file == "test-file")
.
Also, when using the to-bash
command, you can use the -e
/--excluded
to invert the filter; this will show all files that do not meet the filter; so:
$ tree-surgeon to-bash \
-f 'endsWith ".cpp" (basename file) & elem "src" (parents file)' \
-s ./my-project -e | xargs rm
Will remove all files that are not *.cpp
files in the src
directory.