Perl Cookbook

Perl CookbookSearch this book
Previous: 20.12. Parsing a Web Server Log FileChapter 20
Web Automation
Next: 20.14. Program: htmlsub
 

20.13. Processing Server Logs

Problem

You need to summarize your server logs, but you don't have a customizable program to do it.

Solution

Parse the error log yourself with regular expressions, or use the Logfile modules from CPAN.

Discussion

Example 20.9 is a sample report generator for an Apache weblog.

Example 20.9: sumwww

#!/usr/bin/perl -w
# sumwww - summarize web server log activity

$lastdate = "";
daily_logs();
summary();
exit;

# read CLF files and tally hits from the host and to the URL
sub daily_logs {
    while (<>) {
        ($type, $what) = /"(GET|POST)\s+(\S+?) \S+"/ or next;
        ($host, undef, undef, $datetime) = split;
        ($bytes) = /\s(\d+)\s*$/ or next;
        ($date)  = ($datetime =~ /\[([^:]*)/);
        $posts  += ($type eq POST);
        $home++ if m, / ,;
        if ($date ne $lastdate) {
            if ($lastdate) { write_report()     }
            else           { $lastdate = $date  }
        }
        $count++;
        $hosts{$host}++;
        $what{$what}++;
        $bytesum += $bytes;
    }
    write_report() if $count;
}

# use *typeglob aliasing of global variables for cheap copy
sub summary  {
    $lastdate = "Grand Total";
    *count   = *sumcount;
    *bytesum = *bytesumsum;
    *hosts   = *allhosts;
    *posts   = *allposts;
    *what    = *allwhat;
    *home    = *allhome;
    write;
}

# display the tallies of hosts and URLs, using formats
sub write_report {
    write;

    # add to summary data
    $lastdate    = $date;
    $sumcount   += $count;
    $bytesumsum += $bytesum;
    $allposts   += $posts;
    $allhome    += $home;

    # reset daily data
    $posts = $count = $bytesum = $home = 0;
    @allwhat{keys %what}   = keys %what;
    @allhosts{keys %hosts} = keys %hosts;
    %hosts = %what = ();
}

format STDOUT_TOP =
@|||||||||| @|||||| @||||||| @||||||| @|||||| @|||||| @|||||||||||||
"Date",     "Hosts", "Accesses", "Unidocs", "POST", "Home", "Bytes"
----------- ------- -------- -------- ------- ------- --------------
.

format STDOUT =
@>>>>>>>>>> @>>>>>> @>>>>>>> @>>>>>>> @>>>>>> @>>>>>> @>>>>>>>>>>>>>
$lastdate,  scalar(keys %hosts),
            $count, scalar(keys %what),
                             $posts,  $home,  $bytesum  
.

Here's sample output from that program:

    Date      Hosts  Accesses Unidocs   POST    Home       Bytes
----------- ------- -------- -------- ------- ------- --------------
19/May/1998     353     6447     3074     352      51       16058246
20/May/1998    1938    23868     4288     972     350       61879643
21/May/1998    1775    27872     6596    1064     376       64613798
22/May/1998    1680    21402     4467     735     285       52437374
23/May/1998    1128    21260     4944     592     186       55623059
Grand Total    6050   100849    10090    3715    1248      250612120

Use the Logfile::Apache module from CPAN, shown in Example 20.10, to write a similar, but less specific, program. This module is distributed with other Logfile modules in a single Logfile distribution (Logfile-0.115.tar.gz at the time of writing).

Example 20.10: aprept

#!/usr/bin/perl -w
# aprept - report on Apache logs

use Logfile::Apache;

$l = Logfile::Apache->new(
    File  => "-",                   # STDIN
    Group => [ Domain, File ]);

$l->report(Group => Domain, Sort => Records);
$l->report(Group => File,   List => [Bytes,Records]);

The new constructor reads a log file and builds indices internally. Supply a filename with the parameter named File and the fields to index in the Group parameter. The possible fields are Date (date request), Hour (time of day the request was received), File (file requested), User (username parsed from request), Host (hostname requesting the document), and Domain (Host translated into "France", "Germany", etc.).

To produce a report on STDOUT, call the report method. Give it the index to use with the Group parameter, and optionally say how to sort (Records is by number of hits, Bytes is by number of bytes transferred) or how to further break it down (by number of bytes or number of records).

Here's some sample output:

Domain                  Records 
===============================
US Commercial        222 38.47% 
US Educational       115 19.93% 
Network               93 16.12% 
Unresolved            54  9.36% 
Australia             48  8.32% 
Canada                20  3.47% 
Mexico                 8  1.39% 
United Kingdom         6  1.04% 

File                               Bytes          Records 
=========================================================
/                           13008  0.89%         6  1.04% 
/cgi-bin/MxScreen           11870  0.81%         2  0.35% 
/cgi-bin/pickcards          39431  2.70%        48  8.32% 
/deckmaster                143793  9.83%        21  3.64% 
/deckmaster/admin           54447  3.72%         3  0.52% 

See Also

The documentation for the CPAN module Logfile::Apache; perlform (1) and the section on "Formats" in Chapter 2 of Programming Perl


Previous: 20.12. Parsing a Web Server Log FilePerl CookbookNext: 20.14. Program: htmlsub
20.12. Parsing a Web Server Log FileBook Index20.14. Program: htmlsub