Perl Cookbook

Perl CookbookSearch this book
Previous: 6.20. Matching AbbreviationsChapter 6
Pattern Matching
Next: 6.22. Program: tcgrep
 

6.21. Program: urlify

This program puts HTML links around URLs in files. It doesn't work on all possible URLs, but does hit the most common ones. It tries hard to avoid including end-of-sentence punctuation in the marked-up URL.

It is a typical Perl filter, so it can be used by feeding it input:

% gunzip -c ~/mail/archive.gz | urlify > archive.urlified

or by supplying files on the command line:

% urlify ~/mail/*.inbox > ~/allmail.urlified

The program is shown in Example 6.13.

Example 6.13: urlify

#!/usr/bin/perl
# urlify - wrap HTML links around URL-like constructs

$urls = '(http|telnet|gopher|file|wais|ftp)';
$ltrs = '\w';
$gunk = '/#~:.?+=&%@!\-';
$punc = '.:?\-';
$any  = "${ltrs}${gunk}${punc}";

while (<>) {
    s{
      \b                    # start at word boundary
      (                     # begin $1  {
       $urls     :          # need resource and a colon
       [$any] +?            # followed by on or more
                            #  of any valid character, but
                            #  be conservative and take only
                            #  what you need to....
      )                     # end   $1  }
      (?=                   # look-ahead non-consumptive assertion
       [$punc]*             # either 0 or more punctuation
       [^$any]              #   followed by a non-url char
       |                    # or else
       $                    #   then end of the string
      )
     }{<A HREF="$1">$1</A>}igox;
    print;
}


Previous: 6.20. Matching AbbreviationsPerl CookbookNext: 6.22. Program: tcgrep
6.20. Matching AbbreviationsBook Index6.22. Program: tcgrep