Searching Through Files






Searching Through Files

You can search for particular words or phrases in text files by loading the file into less or vi (see Figure). The maneuverability offered by both programs lets you leap from point to point in the text, and their use is generally user-friendly.

However, using vi or less can take precious seconds. There's a quicker command-line option that will search through a file in double-quick speed: grep.

Using grep to Find Text

grep stands for Global Regular Expression Print. grep is an extremely powerful tool that can use pattern-based searching techniques to find text in files. Pattern-based searching means that grep offers various options to loosen the search so that more results are returned.

The simplest way of using grep is to specify some brief text, followed by the name of the file you want to search. Here's an example:

grep 'helloworld' myfile

This will search for the phrase helloworld within myfile. If it's found, the entire line that helloworld is on will be displayed on screen.

If you specify the * wildcard instead of a filename, grep will search every file in the directory for the text. Adding the -r command option will cause grep to search all the files, and also search through any directories that are present:

grep –r 'helloworld' *

Another handy command option is -i, which tells grep to ignore uppercase and lowercase letters when it's searching. Figure shows an example of using grep.

Image from book
Figure. grep is a powerful tool that can search for text within files.
Tip 

You might never choose to use grep for searching for text within files, but it can prove very handy when used to search through the output of other commands. This is done by "piping" the output from one command to another, as explained in Chapter 18.

Using Regular Expressions

The true power of grep is achieved by the use of search patterns known as regular expressions, or regexes for short. Put simply, regexes allow you to be vague rather than specific when searching, meaning that grep (and many similar tools that use the system of regexes, such as the find tool discussed in Chapter 15) will return more results.

For example, you can specify a selection or series of characters (called a string in regex terminology) that might appear in a word or phrase you're searching for. This can be useful if you're looking for a word that might be spelled differently from how you anticipate, for example.

The most basic form of regex is the bracket expansion. This is where additional search terms are enclosed in square brackets within a search string. For example, suppose you want to find a file that refers to several drafts of a document you've been working on. The files are called myfile_1draft.doc, myfile_2draft.doc, and so on. To find any document that mentions these files, you could type:

grep 'myfile_[1-9]draft\.doc' *

The use of square brackets tells grep to fill in details within the search string based on what's inside the square brackets. In this case, 1-9 means that all the numbers from one to nine should be applied to the search string. It's as if you've told grep to search for myfile_1draft.doc, and then told it to search for myfile_2draft.doc, and so on. Notice that the example has a backslash before the period separating the file extension from the filename. This indicates to grep that it should interpret the period as an element of the string to be searched for, rather than as a wildcard character, which is how grep usually interprets periods.

You don't need to specify a range of characters in this way. You can simply enter whatever selection of characters you want to substitute into the search string. Here's an example:

grep 'myfile[12345]\.doc' *

This will attempt to find any mention of myfile1.doc, myfile2.doc, myfile3.doc, and so on, in any file within the directory.

Here's another example:

grep '[KCkc]onqueror' *

This will let you search for the word Konqueror within files but takes into account any possible misspelling of the word with a C, and any use of uppercase or lowercase.

This is only scratching the surface of what regexes can do. For example, many regexes can be combined together into one long search string, which can provide astonishing accuracy when searching. Figure contains some simple examples that should give you an idea of the power and flexibility of regexes.

Figure Some Examples of Regular Expressions

Search String

Description

'document[a-z]'

Returns any lines containing the string "document" followed by any single letter from the range a through z.

'document[A-Za-z]'

Returns any lines containing the string "document" followed by the letters A through Z or a through z. Note that no comma or other character is needed to separate possibilities within square brackets.

'document.'

Returns any lines containing the string "document" followed by any other character. The period is used as a wildcard signifying any single character.

'document[[:digit:]]'

Returns any lines containing the string "document" followed by any number.

'document[[:alpha:]]'

Returns any lines containing the string "document" followed by any character.

'^document'

Returns any lines that have the string "document" at the beginning. The caret symbol (^) tells grep to look only at the beginning of each line.

'document$'

Returns any line that has the string "document" at the end of the line. The dollar sign symbol ($) tells grep to look for the string only at the end of lines.

'document[^1-6]'

Returns lines that have the string "document" in them but not if it's followed by the numbers 1 through 6. When used in square brackets, the caret character (^) produces a nonmatching list—a list of results that don't contain the string.

grep is very powerful. It can be complicated to master, but it offers a lot of scope for performing extremely precise searches that ensure you find only what you're looking for. It's well worth reading through its man pages. You can also refer to books on the subject, of which there are many. A good example is Regular Expression Recipes: A Problem-Solution Approach, by Nathan A. Good (Apress, 2004; ISBN: 1-59059-441-X).



 Python   SQL   Java   php   Perl 
 game development   web development   internet   *nix   graphics   hardware 
 telecommunications   C++ 
 Flash   Active Directory   Windows