Perl in a Nutshell

Perl in a NutshellSearch this book
Previous: 4.1 Program StructureChapter 4
The Perl Language
Next: 4.3 Statements
 

4.2 Data Types and Variables

Perl has three basic data types: scalars, arrays, and hashes.

Scalars are essentially simple variables. They are preceded by a dollar sign ($). A scalar is either a number, a string, or a reference. (A reference is a scalar that points to another piece of data. References are discussed later in this chapter.) If you provide a string where a number is expected or vice versa, Perl automatically converts the operand using fairly intuitive rules.

Arrays are ordered lists of scalars that you access with a numeric subscript (subscripts start at 0). They are preceded by an "at" sign (@).

Hashes are unordered sets of key/value pairs that you access using the keys as subscripts. They are preceded by a percent sign (%).

4.2.1 Numbers

Perl stores numbers internally as either signed integers or double-precision floating-point values. Numeric literals are specified in any of the following floating-point or integer formats:

12345               # integer
-54321              # negative integer
12345.67            # floating point
6.02E23             # scientific notation
0xffff              # hexadecimal
0377                # octal
4_294_967_296       # underline for legibility
Since Perl uses the comma as a list separator, you cannot use a comma for improving legibility of a large number. To improve legibility, Perl allows you to use an underscore character instead. The underscore only works within literal numbers specified in your program, not in strings functioning as numbers or in data read from somewhere else. Similarly, the leading 0x for hex and 0 for octal work only for literals. The automatic conversion of a string to a number does not recognize these prefixes - you must do an explicit conversion.

4.2.2 String Interpolation

Strings are sequences of characters. String literals are usually delimited by either single (') or double quotes ("). Double-quoted string literals are subject to backslash and variable interpolation, and single-quoted strings are not (except for \' and \\, used to put single quotes and backslashes into single-quoted strings). You can embed newlines directly in your strings.

Table 4-1 lists all the backslashed or escape characters that can be used in double-quoted strings.


Table 4.1: Double-Quoted String Representations
CodeMeaning
\nNewline
\rCarriage return
\tHorizontal tab
\fForm feed
\bBackspace
\aAlert (bell)
\eESC character
\033ESC in octal
\x7fDEL in hexadecimal
\cCCTRL-C
\\Backslash
\"Double quote
\uForce next character to uppercase
\lForce next character to lowercase
\UForce all following characters to uppercase
\LForce all following characters to lowercase
\QBackslash all following non-alphanumeric characters
\E

End \U, \L, or \Q

Table 4-2 lists alternative quoting schemes that can be used in Perl. They are useful in diminishing the number of commas and quotes you may have to type, and also allow you to not worry about escaping characters such as backslashes when there are many instances in your data. The generic forms allow you to use any non-alphanumeric, non-whitespace characters as delimiters in place of the slash (/). If the delimiters are single quotes, no variable interpolation is done on the pattern. Parentheses, brackets, braces, and angle brackets can be used as delimiters in their standard opening and closing pairs.


Table 4.2: Quoting Syntax in Perl
CustomaryGenericMeaningInterpolation
''q//LiteralNo
""qq//LiteralYes
``qx//CommandYes
()qw//Word listNo
//m//Pattern matchYes
s///s///SubstitutionYes
y///tr///TranslationNo

4.2.3 Lists

A list is an ordered group of scalar values. A literal list can be composed as a comma-separated list of values contained in parentheses, for example:

(1,2,3)                  # array of three values 1, 2, and 3
("one","two","three")    # array of three values "one", "two", and "three"
The generic form of list creation uses the quoting operator qw// to contain a list of values separated by white space:
qw/snap crackle pop/

4.2.4 Variables

A variable always begins with the character that identifies its type: $, @, or %. Most of the variable names you create can begin with a letter or underscore, followed by any combination of letters, digits, or underscores, up to 255 characters in length. Upper- and lowercase letters are distinct. Variable names that begin with a digit can only contain digits, and variable names that begin with a character other than an alphanumeric or underscore can contain only that character. The latter forms are usually predefined variables in Perl, so it is best to name your variables beginning with a letter or underscore.

Variables have the undef value before they are first assigned or when they become "empty." For scalar variables, undef evaluates to zero when used as a number, and a zero-length, empty string ("") when used as a string.

Simple variable assignment uses the assignment operator (=) with the appropriate data. For example:

$age = 26;		# assigns 26 to $age
@date = (8, 24, 70);	# assigns the three-element list to @date
%fruit = ('apples', 3, 'oranges', 6); 
 # assigns the list elements to %fruit in key/value pairs
Scalar variables are always named with an initial $, even when referring to a scalar value that is part of an array or hash.

Every variable type has its own namespace. You can, without fear of conflict, use the same name for a scalar variable, an array, or a hash (or, for that matter, a filehandle, a subroutine name, or a label). This means that $foo and @foo are two different variables. It also means that $foo[1] is an element of @foo, not a part of $foo.

4.2.4.1 Arrays

An array is a variable that stores an ordered list of scalar values. Arrays are preceded by an "at" (@) sign.

@numbers = (1,2,3);	# Set the array @numbers to (1,2,3)
To refer to a single element of an array, use the dollar sign ($) with the variable name (it's a scalar), followed by the index of the element in square brackets (the subscript operator). Array elements are numbered starting at 0. Negative indexes count backwards from the last element in the list (i.e., -1 refers to the last element of the list). For example, in this list:
@date = (8, 24, 70);
$date[2] is the value of the third element, 70.

4.2.4.2 Hashes

A hash is a set of key/value pairs. Hashes are preceded by a percent (%) sign. To refer to a single element of a hash, you use the hash variable name followed by the "key" associated with the value in curly brackets. For example, the hash:

%fruit = ('apples', 3, 'oranges', 6);
has two values (in key/value pairs). If you want to get the value associated with the key apples, you use $fruit{'apples'}.

It is often more readable to use the => operator in defining key/value pairs. The => operator is similar to a comma, but it's more visually distinctive, and it also quotes any bare identifiers to the left of it:

%fruit = (
    apples  => 3,
    oranges => 6
);

4.2.5 Scalar and List Contexts

Every operation that you invoke in a Perl script is evaluated in a specific context, and how that operation behaves may depend on which context it is being called in. There are two major contexts: scalar and list. All operators know which context they are in, and some return lists in contexts wanting a list, and scalars in contexts wanting a scalar. For example, the localtime function returns a nine-element list in list context:

($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime();
But in a scalar context, localtime returns the number of seconds since January 1, 1970:
$now = localtime();
Statements that look confusing are easy to evaluate by identifying the proper context. For example, assigning what is commonly a list literal to a scalar variable:
$a = (2, 4, 6, 8);
gives $a the value 8. The context forces the right side to evaluate to a scalar, and the action of the comma operator in the expression (in the scalar context) returns the value farthest to the right.

Another type of statement that might be confusing is the evaluation of an array or hash variable as a scalar, for example:

$b = @c;
When an array variable is evaluated as a scalar, the number of elements in the array is returned. This type of evaluation is useful for finding the number of elements in an array. The special $#array form of an array value returns the index of the last member of the list (one less than the number of elements).

If necessary, you can force a scalar context in the middle of a list by using the scalar function.

4.2.6 Declarations and Scope

In Perl, only subroutines and formats require explicit declaration. Variables (and similar constructs) are automatically created when they are first assigned.

Variable declaration comes into play when you need to limit the scope of a variable's use. You can do this in two ways:

Therefore, we can say that a local variable is dynamically scoped, whereas a my variable is lexically scoped. Dynamically scoped variables are visible to functions called from within the block in which they are declared. Lexically scoped variables, on the other hand, are totally hidden from the outside world, including any called subroutines unless they are declared within the same scope.

See Section 4.7, "Subroutines" later in this chapter for further discussion.


Previous: 4.1 Program StructurePerl in a NutshellNext: 4.3 Statements
4.1 Program StructureBook Index4.3 Statements