Perl Basics¶
Table of Contents¶
- Getting Help
- Running Perl
- Variables
- Scalar Context vs List Context with Arrays
- Perl File Structure
- Pragmas
- Subroutines
- Using Arrays in Perl
- Using
Data::Dumper
to Print Data - Accessing Command Line Arguments
- Command Line Options
- BEGIN and END Blocks
- File Operations
- Resources
Getting Help¶
man perl
is available, but it is not as robust as some man pages.
The man
page recomands perldoc
. This is a separate package that will need to be
installed.
For runtime options/flags:
To get help with a perl function, use perldoc -f func
:
For perl variables, use perldoc -v var
For perl modules, use perldoc -m module
Importing Modules¶
To import a module, use the use
keyword.
So, then we could use the Dumper
subroutine from this module by simply calling
it by name (since it's in our namespace):
The fully scoped path to this subroutine would normally be
Data::Dumper::Dumper()
. But since the use
populates our namespace, we don't
have to fully scope in.
If we want to load the module but not pollute the namespace with all the
module's symbols, we can add an empty import list.
This will require you to call the module with the fully qualified (scoped)
name.
This method keeps things cleaner.
Importing as an Alias¶
You can alias your imports. You can also alias specific subroutines that are
available from a module.
*dumper = \&Data::Dumper; # Alias a module
*dump = \&Data::Dumper::dump; # Alias a specific subroutine
-
*dumper = ...
: Assigns a globref (glob reference) to the typeglob*dumper
.- This creates a new symbol in the current package.
- A globref is
- A typeglob is a special kind of variable that can hold multiple values of multiple types (incl. scalars, arrays, hashes). Allows you to access all the variables associated with a particular name in a single reference.
-
\&Data::Dumper;
: A reference to the moduleData::Dumper
.
Running Perl¶
perldoc perlrun
Running Perl Scripts¶
From the command line, you can run a perl script like any other language.
Type perl
then the name of the perl script.
#!/usr/bin/local/perl
or
#!/usr/bin/env perl
) then you can just execute the script directly.Running Perl One-Liners¶
To run perl commands from the command line, use the -e
flag.
-e
expression, the shell expands variables.Single quotes are preferred for that reason.
Use the -E
flag to run the commands to enable some of the pro core features (i.e., the use strict
pragma.)
-E
option behaves just like -e
but enables all optional features.
say
automatically adds a newline at the end of a string.
Does not work with -e
, because features are not enabled with -e
.
while(<>)
reads from STDIN- This could also be done with the
-n
option.
- This could also be done with the
say uc $_
prints (say
) the uppercase (uc
) version of the current line ($_
) This will wait for user input and print it back in uppercase.
Piping to Perl¶
If you're piping input to Perl, use either -p
or -n
to loop over the input.
-p
options is usually what you want for basic substutitions.It wraps the input in a printing loop.
This means that it will print each line as it is being processed.
Basically, it is equivalent to this perl program:
Whatever code you have in -e
will also be inside this while loop.
Note: The
=~
operator serves as both a comparison operator and an assignment operator.
If you use -n
(e.g., perl -ne
), the input will be wrapped in a non-printing
loop.
So doing this:
Will not print the lines by default, unless explicitly printing the $_
variable.
This is what that command is doing:
Setting the IRS from the CLI¶
The IRS (Input Record Separator) is a special variable ($/
) which determines how
Perl reads in lines. By default, $/
is set to newline (\n
), which means it reads
input line-by-line.
Use the -0
flag to set the input record separator when running Perl.
-0
option accepts any octal or hexadecimal value to use as the IRS.
If you're specifying a hexadecimal value, add an x
:
Using -0
without any arguments will set the IRS ($/
) to NUL
.
NUL
(\0
), which is good for working with tools
that output NUL
-delimited text. For example, using find -print0
or xargs -0
.
Paragraph Mode¶
The -00
option is special, it causes Perl to "slurp" files in paragraph mode,
which sets the IRS to an empty string, and forces Perl to read in paragraphs
separated by one or more blank lines, e.g., two consecutive newline characters (\n\n
).
One newline to end the paragraph, and another to represent a blank line.
Variables¶
See variables for more indepth explanations.
Types of variables in perl are:
- Scalar: A single values.
- Array: A single-vector list of values.
- Hash: An array of key/value pairs.
Every variable type has its own namespace, along with some non-variable identifiers.
Basically meaning, you can use the same name for a scalar variable and an array
variable and they won't conflict.
my @var = (1, 2, 3);
my $var = "Hello, world.";
print "@var\n";
# 1 2 3
print "$var\n"
# Hello, world.
Not "technically" variables, but the same rule applies to these:
- Handles
- File handles
- Directory handles
- Subroutine names
- Format names
- Labels
Just because you can doesn't mean you should.
As a best practice, you should always give your variables and functions unique names
for clarity.
Scalar Variables¶
In perl, anything that is a single unit of data is a scalar
value.
That unit of data could be a string, number, or reference.
Scalars are represented with the dollar $
sign prefix.
A scalar always holds one value at a time.
There's really no need for different types since perl is dynamically typed.
So, scalar is kind of its own data type in perl. There are still numbers and
strings, but they're dynamically cast into the correct type based on the context
they're used in.
Examples of Scalars in Perl¶
Dynamically typed scalar:
Strings:
References:
Scalar Operations¶
-
Scalars can holds numbers and perform mathmatical operations.
-
Scalars can also holds strings, and you can perform concatenation.
Use
.
to concatenate strings in perl. -
Scalars can be evaluated to
true
orfalse
for boolean operations.- Non-zero numbers and non-empty strings are
true
. - Zero
0
and empty strings""
arefalse
.
- Non-zero numbers and non-empty strings are
-
Scalar context is the way to get the length of an array.
Accessing Variables¶
From perldoc -m data
:
$days # the simple scalar value "days"
$days[28] # the 29th element of array @days
$days{'Feb'} # the 'Feb' value from hash %days
$#days # the last index of array @days
@days[$#days] # the last element of array @days
@days # ($days[0], $days[1],... $days[n])
@days[3,4,5] # same as ($days[3],$days[4],$days[5])
@days{'a','c'} # same as ($days{'a'},$days{'c'})
%days # (key1, val1, key2, val2 ...)
Scalar Context vs List Context with Arrays¶
In perl, context determines how expressions are evaluated.
Scalars are always evaluated in scalar context, but arrays/hashes are a little
different.
You set the context by using the prefix $
or @
for scalar and list respectively.
If we wanted to force array context for a scalar variable:
Scalar Context¶
In scalar context, if an operation or function is expected to return a singel value,
it operates in a scalar context.
An example of this:
$count
as the variable, with $
, sets the context as scalar.
List Context¶
If an operation or function is expected to return a list of values, it operates in list context.
Example:
my @arr = (1, 2, 3);
my @copy = @arr; # in list context, @arr returns all its elements
print @copy; # outputs: 123 (flattened)
@copy
as the variable, with @
to specify an array, it sets the context as list.
Lowercase Input¶
This turns all input to lowercase.This doesn't actually use any regular expressions, it utilizes the "default" variable (holds the current line) and the
lc
(lowercase) perl function.
Perl File Structure¶
Each line in a Perl script should be ended with a semicolon ;
.
A perl file will start with a shebang line (#!/usr/bin/perl
or #!/usr/bin/env perl
).
The lines starting with use
are imports. Any modules imported with all
lowercase names are called pragmas (in Perl itself). Any imports with
uppercase letters are regular modules.
There are a lot of pragmas that can be used to enable or disable certain features.
The return code can be stated at the bottom.
Pragmas¶
Pragmas (pragmatic modules) in perl change the way the code behaves. These are written in all lowercase, because all lowercase module names are reserved for Perl itself.
They're compiler directives. Instructions that modify the behavior of Perl during compilation.
They're not functions or modules, but rather flags that control the compilation and
execution of the script.
They're included using the use
keyword.
Some common pragmas:
-
strict
: Enforces stricter programming rules, like declaring variables before using them.- Helps catch typos and errors early.
- Helps catch typos and errors early.
-
warnings
: Outputs warnings for potentially problematic code- For example, if you're using an uninitialized variable.
- For example, if you're using an uninitialized variable.
-
utf8
: Enables utf-8 encoding for the script's source code.
-
autodie
: Makes file operations (e.g.,open
) throw exceptions on failure.
You can specify what to load from a pragma by giving it an argument.
":full"
: This is a pragma argument, or "tag."- It's not normal Perl syntax, it's now certain pragmas (
open
,charnames
,strict
,warnings
) allow you to configure what they load or activate.
- It's not normal Perl syntax, it's now certain pragmas (
View the perldoc
page for the pragma to see what tags you can specify and what they do.
Subroutines¶
Functions in perl are called subroutines.
Subroutines are reusable blocks of code that perform a specific task.
Use the sub
keyword to define a subroutine.
use warnings;
use strict;
sub say_hello {
print "Hello, world.\n"
}
say_hello(); # Call the subroutine.
Passing arguments to subroutines¶
You can access any arguments passed to a subroutine using the @_
array.
use warnings;
use strict;
sub say_hello {
my ($name) = @_;
print "Hello, $name.\n"
}
say_hello("Kolkhis"); # Call the subroutine.
# Outputs: Hello, Kolkhis.
($name)
says you want to assign the first value from the list
@_
to the variable.
Without the parentheses, perl would not treat the right-hand side as a list.
It would assign $name
to the number of elements in the @_
list, since the right
hand side is being evaluated in scalar context due to the left hand side being a
scalar assignment.
So, using parentheses around the scalar variable assignment allows the RHS to be
evaluated in array context (sort of).
It's like a tuple assignment in Python. It will take the first argument from @_
,
and assign that to the scalar variable $name
.
Returning values from subroutines¶
Subroutines return values using the return
keyword.
Or, they implicitly return the last evaluated expression.
Example subroutine: Check if a file exists¶
Passing @_
to Subroutines¶
There are two ways to call a subroutine.
-
The first one is the most obvious way:
-
Then, we can use
&
to also call the subroutine, but this can have side effects. It passes the current@_
(default array), unless prentheses are used. It also bypasses prototypes.
- Subroutine prototypes define the expected number and types of arguments for a subroutine.
Syntax | Action |
---|---|
foo() | Calls sub foo |
&foo | Calls sub foo with current @_ |
\&foo | Reference to subfoo(for callbacks) |
Subroutine Prototypes¶
Subroutine prototypes define the expected number and types of arguments for a subroutine.
In the subroutine definition, you specify one sigil per argument that is
expected within the parentheses after the sub name.
For example, one $
per scalar, one @
per array, etc.
- This subroutine expects two scalar arguments.
Using Arrays in Perl¶
Also see arrays.md.
Arrays are generally accessed by using @
(the whole array, or "array context") or
with $
("scalar context") to get a single value.
Perl can have arrays that are simply references to an array, and not actually arrays
themselves, using the syntax \@array_name
. This creates a reference to the array
@array_name
.
Using Data::Dumper
to Print Data¶
Normal print
statements will flatten any sort of data structures.
Arrays, dictionaries/hashes, and nested combinations won't be seen correctly with print
.
The Data::Dumper
sub from the Data
module is used to format these data structures
into human-readable strings.
use Data::Dumper;
my $input = $ARGV[0];
print "First argument: $input\n";
print "Remaining arguments: ", Dumper(\@ARGV);
my $input = shift;
print "First argument: $input\n";
print "Remaining arguments: ", Dumper(\@ARGV);
-
Data::Dumper
: A perl module that converts complex data structures (arrays, dictionaries/hashes, etc) into a human-readable string.- Regular
print
statements won't give you the output, it will flatten this data. \@ARGV
: The\
is used to pass a reference to the array@ARGV
.- This creates a reference so that
Dumper
knows you're passing a whole array, not the contents of the array.
- Regular
-
The difference between
$ARGV[n]
and@ARGV
comes from how variables are accessed in Perl:@ARGV
: Refers to the entire array. I.e., all the command-line arguments.$ARGV[0]
: Accesses a single element (scalar) from the array@ARGV
.-
$ARGV
(without[]
, scalar context): Holds file name passed in via command line arguments or stdin when used in scalar context.- This will hold the filename that is currently being processed if there are multiple files.
Accessing elements in arrays:
$
= Single value (scalar).@
= Full array.
To output an array:
If we pass an array to Data::Dumper
without a reference (\@
), then the output will look different:
If we use print
to output a list, it will flatten all the elements into one string.
Accessing Command Line Arguments¶
You can access CLI arguments from a script in a couple different ways.
@ARGV
: An array that holds all the CLI arguments.- Stands for "Argument Vector."
- Using
$ARGV[0]
will not modify the@ARGV
array.
shift
: Command that removes and returns the first element from@ARGV
.- If called inside a subroutine (function), it pulls from the default array
@_
. - Just like
shift
in bash.
- If called inside a subroutine (function), it pulls from the default array
Command Line Options¶
Some CLI arguments for perl:
-
-p
: Places a printing loop around your command so that it acts on each line of standard input.- Use to loop over the contents of a file line by line and output every line after being processed.
- This is similar to what
sed
andawk
do by default.
-
-n
: Places a non-printing loop around your command.- Use to loop over the contents of a file line by line and NOT output anything other than what you specify.
-
-e
: Allows you to provide the perl script as an argument rather than in a file.- Identical to
-c
in Python or Bash.
- Identical to
-
-E
: Same as-E
but also enables all optional features.- Identical to
-c
in Python or Bash.
- Identical to
-
-i
: Edit the file in place, making a backup of the original.- Allows you to modify files without
{copy, delete-original, rename}
.
- Allows you to modify files without
-
-w
: Activates some warnings.- Someone said "Any good Perl coder will use this."
-
-d
: Run the command under the Perl debugger. -
-t
: Taint mode. Treats certain operations as "tainted" code.- It treats any external input (i.e., CLI args) as tainted until it's sanitized.
- Use to beef up Perl security, e.g., when running as setuid scripts.
-
-T
: Taint mode, for a whole script.- Doesn't just use taint mode for certain operations, it treats all external data as taineted until sanitized.
-
This is used to prevent bad actors for performing destructive operations.
-
Use in scripts to check input.
BEGIN and END Blocks¶
Like awk
, perl
has a BEGIN
and END
block.
Anything inside the BEGIN
block will run once, before the main code block starts execution.
Likewise, anything in the END
block runs once, after the main code block finishes execution.
This is really only useful when doing one-liners from the command line.
Example: print the total word count of a file in the END
block
-n
: Loop over the file, line by line.- Same as
while(<>)
- Same as
-e
: Allows execution of the code provided directly as a string.- Similar to
-c
in other tools.
- Similar to
END { print $t }
- The
END
block is executed once after all lines of the file have been processed. - It prints the value of
$t
, which is the total word count.
- The
@w = /(\w+)/g
:@w
is an array./(\w+)/g
: Regex that matches everyword
in the current line.\w+
: Matches one or more word characters (letters, digits, or underscores).g
: Global modifier. Ensures all matches in the line are captured.
- For each line,
@w
contains all words found in that line.
$t += @w
:$t
: Scalar variable, initialized to0
by default.@w
: In scalar context, gives the number of elements in the array (words).$t
holds the total number of words across all lines.
file.txt
: The input file.
Using Subshells in Perl¶
Subshells are a thing in perl. You can capture the output of shell commands.
In order to achieve the same result as $(...)
(bash) in perl, you can do one of two
things:
- Wrap a shell command in backticks(
`cmd`
) - Use the
qx
operator (qx/cmd/
orqx(cmd)
)
Then save the output to a variable.
This is idiotmatic, core Perl and doesn't rely on external packages.
Example:
This will save the literal output of the commandhostname
.Since the output is literal, it will contain the newline at the end.
To get rid of the trailing newline, use
chomp()
:
This will modify it in-place to get rid of the trailing newline.
Error Handling¶
In perl, we can use the or
operator along with the die
function to handle errors.
- This attempts to open the file
file.txt
in readonly mode. - If it fails, it will trigger the
or
(since the exit code of theopen
will be non-zero). die
will exit with an error message.$!
holds the last error message that the program encountered.