[Chapter 17] 17.5 Variable-Length ( Text) Databases

17.5 Variable-Length ( Text) Databases

Some system databases (and quite a few user-created databases) are a series of human-readable text lines, with one record per line. For example, the TCP/IP hosts file contains one line per hostname.

Most often, these databases are updated with simple text editors. Updating such a database consists of reading it all into a temporary area (either memory or another disk file), making the necessary changes, and then either writing the result back to the original file or creating a new file with the same name (after deleting or renaming the old version). You can think of this process as a copy pass: the data is copied from the original database to a new version of the database, and changes are made during the copy.

Perl supports a copy-pass-style edit on line-oriented databases using inplace editing. Inplace editing is a modification of the way the diamond operator (<>) reads data from the list of files specified on the command line. Most often, this editing mode is accessed by setting the -i command-line argument, but we can also trigger inplace editing mode from within a program, as shown in the examples that follow.

To trigger the inplace editing mode, set a value into the $^I scalar variable. The value of this variable is important and will be discussed in a moment.

When the <> construct is used and $^I has a value other than undef, the steps marked ##INPLACE## in the following code are added to the list of implicit actions the diamond operator takes:

$ARGV = shift @ARGV;
open(ARGV,"<$ARGV");
rename($ARGV,"$ARGV$^I"); ## INPLACE ##
unlink($ARGV);            ## INPLACE ##
open(ARGVOUT,">$ARGV");   ## INPLACE ##
select(ARGVOUT);          ## INPLACE ##

The effect is that reads from the diamond operator come from the old file, and writes to the default filehandle go to a new copy of the file. The old file remains in a backup file, which is the filename with a suffix equal to the value of the $^I variable. (A bit of magic is also used to copy the attributes from the old file to the new file.) These steps are repeated each time a new file is taken from the @ARGV array.

Typical values for $^I are things like .bak or ~, to create backup files much like the editor creates. A strange and useful value for $^I is the empty string, "", which causes the old file to be neatly eliminated after the edit is complete. Unfortunately, if the system or program crashes during the execution of your program, you lose all of your old data, so this method is recommended only for brave, foolish, or trusting souls.

Here's a way to change everyone's login name to lowercase in some file that contains a list of user logins, one per line:

@ARGV = ("userlist.txt"); # prime the diamond operator
$^I = ".bak";             # write userlist.bak for safety
while (<>) {              # main loop, once for each line
         tr/A-Z/a-z/;     # change everything to lower case
         print;           # send output to ARGVOUT: the new userlist.txt
}

As you can see, this program is pretty simple. In fact, the same program can be generated entirely with a few command-line arguments:

perl -p -i.bak -e 'tr/A-Z/a-z/' userlist.txt

The -p switch brackets your program with a while loop that includes a print statement. The -i switch sets a value into the $^I variable. The -e switch defines the following argument as a piece of Perl code for the loop body. The final argument gives an initial value to @ARGV.

Command-line arguments are discussed in greater detail in Programming Perl or the perlrun documentation.


17.4 Fixed-Length Random-Access Databases		17.6 Win32 Database Interfaces