RegFind

Recently, I needed a list of the names of all structures defined in a VB.NET project. Immediately I searched my hard disk for grep, the blindingly fast command-line utility to search for strings in files, that - once upon a time - came with Borland C++. Using grep, I could have typed:

grep "(Public|Private) Structure (?<name>\w+)" *.vb

and all public and private structure declarations in my VB.NET source files would have been displayed.

Of course, my PC has been recycled several times since those days, so I was out of luck. I looked around on the web for a 'grep for Windows', and indeed there were several. That's not very surprising, since grep has been around as part of Unix for ages - in fact, the Borland-version was a ported version of it for DOS. I tried a few, but I soon discovered that none of them used the .NET framework's regular expression engine. That's a shame, since a) I'm starting to get to know that one quite intimately and b) it's powerful, versatile and fast.

RegFind

Having .NET regular expressions available means that it shouldn't be too hard to write a grep-like tool that would do exactly what I needed, so I started off. There were a few snags, of course, but here's the result: RegFind, a fast and flexible command-line utility to search files and directories for regular expressions - and more.

When you type

RegFind /?

the following help screen is displayed:

The RegFind help screen

Basically, RegFind expects a regular expression search pattern as its first argument, followed by one or more 'file masks'. A file mask is a file name with or without wildcard, and with or without a directory part. When no file masks are specified, the default is '*', which is to say: all files. In that respect, RegFind behaves a lot like the DIR command, including the ability to add a /s switch to search subdirectories:

RegFind "Structure" *.vb

searches for the text "Structure" in all files with a '.vb' extension in the current directory;

RegFind "Structure" *.vb /S

searches in all files with a '.vb' extension in the current directory and its subdirectories; and

RegFind "Structure" *.vb c:\sources\*.txt /S

searches in all files with a '.vb' or a '.txt' extension in the current directory and its subdirectories, and also in the directory 'c:\sources' and its subdirectories.

You get the idea. [Snag #1: writing the intuitive, seemingly simple DIR-like command handling turned out to be much more complex than I thought - almost more lines of code than the actual searching logic! Oh well.]

Searching for regular expressions

Of course, searching for a simple string is not what this was all about. RegFind displays its true power when searching for regular expressions, as in:

RegFind "(Public|Private) Structure (?<name>\w+)" *.vb

This will display a list of all lines in the files with a '.vb' extension in the current directory, like so:

file1.vb (10): Private Structure SomeStruct
file1.vbt (25): Public Struct SomeStruct2
file2.vb (125): ...

The numbers in parentheses are the line numbers.

RegFind options

You can suppress the file names by using the /n (for 'nofiles') switch. And if you add the /b (for 'bare') switch, all output is printed on a single line. If you want to know how many matches were found, add the /v (for 'verbose') switch. To search case sensitive, use the /c (for 'case') switch - by default, searches a case insensitive. No surprises there.

To leave out lines like

' This is not a private structure test2! 

you could demand that the match is first on the line, and only allowing for whitespace at the start of the line:

RegFind "^\s*(Public|Private) Structure (?<name>\w+)" *.vb

The special character ^ matches the start of the line.

File mode

Sometimes, searching on a line-by-line basis is not enough. Suppose you wanted to find all Dim statements inside a structure. You'd search for a regular expression like:

(?<=(Public|Private) Structure (\w+)).*Dim \w+ As \w+.*(?=End Structure)

This searches for 'Dim ... As ...', but demanding that there's 'Public/Private Structure ...' somewhere before it and 'End Structure' somewhere after it. (This is called lookbehind and lookahead and it's one of the areas where .NET's Regex engine excels!)

This obviously can't be accomplished by attempting to match the pattern on each line of the file, so we need to acitvate file mode with the /f (for 'file') switch. This causes the regular expression to become both MultiLine and SingleLine, i.e. the dot character now matches across lines and ^ and $ stand for 'start/end of each line' rather than 'start/end of file'. In effect, this makes the regular expression match on the whole file, rather than the individual lines.

When using file mode, the output of RegFind is a little different: the file names are displayed on a line of their own, and line numbers are omitted. Of course, the /b(are) and /n(ofiles) switches still operate as before.

Replacement

The last RegFind option is a complex, but extremely powerful one, that makes full use of the regular expression replacement capabilities of the .NET framework. Basically, it performs a replacement on the match before it is displayed. Suppose you wanted a list of only the names of the structures found, enclosed in parentheses. If your regular expression is

(Public|Private) Structure (?<name>\w+)

you would specify the /r switch (for 'replace) as follows:

/r:(${name})

The colon after the 'r' is mandatory, by the way.

The regular expression pattern 'captures' the word 'Private' or 'Public' as match number 1, and the pattern '\w+' (meaning 'one or more letters or digits) as match number 2, also known as the match with the name 'name'. So in the replacement pattern, you could use '$1', '$2', and ${name}. In the above example. the replacement pattern '(${name})' causes the name of the structure to be displayed, surrounded by parentheses.

Piping input to RegFind

In order to be able to pipe the output from a command to RegFind, you can use the /i (for 'interactive') switch. This instructs RegFind to read from standard input instead of files or directories. When you use /i, you can't specify any file masks on the rest of the command line.

For example, you could use RegFind to search the output of the DIR command for directory names only:

DIR | RegFind /I "^(.*)<DIR>\s*(\S*)$" /r:$2

Confused? Don't be! The pattern matches the start of every line (because of the ^ at the start of the search pattern), then some characters (match #1), followed by '<DIR>' (literally), then some spaces, then a list of non-whitespace characters (\S) as match #2. This 'catches' the characters at the start of the line up to the word <DIR> in $1, and the word at the end of the line in $2. By using the replacement '/r:$2' we display the latter only, which is, of course, the name of the directory.

Neat, huh?

Summary

Option Meaning
 /v

Verbose. Display number of matches and number of matched files at end of output

 /s Subdirs. Search in files specified, and in all subdirectories below them.
 /c Case sensitive.
 /n Nofiles. Do not display file names or line numbers.
 /b Bare. Display all output on a single line. Implies /n
 /f File mode. Search in whole file, not in each line individually.
 /r:pattern Replace. Use replacement pattern on match.
 /i Interactive. Read from standard input instead of file masks.

Snag #2: Double quotes in command line arguments

I've commented before on the apparently special handling of backslashes preceding quotes in command line arguments. When writing RegFind, I came upon a symptom that may explain the bug: the combination \" in a command line argument escapes the double quote and makes it part of the command line argument. That is very necessary sometimes, in this case when a double quote is part of the regular expression search (or replacement) pattern. To avoid having to escape double quotes (and to avoid the command line handling bug I found earlier) I decided to write custom argument handling code, that allows for either a single or a double quote as delimiters, or sqare brackets. Using the new command line handling code, the following are all valid command arguments for RegFind:

RegFind [Dim \w+ As String = "] "c:\src\*.vb" 'd:\data\*.txt' [/r:$1 = ""]

The source code of RegFind is now hosted on CodePlex.

Download RegFind

Download RegFind Download RegFind 1.0.2 (April 25, 2011) from CodePlex or browse source code

Both 32-bit and 64-bit installers available.

Comments? Bugs? Suggestions? Feature requests? Let us know! Alternatively, you can create a work item on Codeplex.