Chemistry::Tutorial - PerlMol Quick Tutorial


The modules in the PerlMol toolkit are designed to simplify the handling of molecules from Perl programs in a general and extensible way. These modules are object-oriented; however, this tries to assume little or no knowledge of object-oriented programming in Perl. For a general introduction about how to use object-oriented modules, see HTML::Tree::AboutObjects.

This document shows some of the more common methods included in the PerlMol toolkit, in a reasonable order for a quick introduction. For more details see the perldoc pages for each module.

How to read a molecule from a file

The following code will read a PDB file:

    use Chemistry::Mol;
    use Chemistry::File::PDB;
    my $mol = Chemistry::Mol->read("test.pdb");

The first two lines (which only need to be used once in a given program) tell Perl that you want to use the specified modules The third line reads the file and returns a molecule object.

To read other formats such as MDL molfiles, you need to use the corresponding module, such as Chemistry::File::MDLMol. Readers for several formats are under development.

The molecule object

Chemistry::Mol->read returns a Chemistry::Mol object. An object is a data structure of a given class that has methods (i.e. subroutines) associated with it. To access or modify an object's properties, you call the methods on the object through "arrow syntax":

    my $name = $mol->name; # return the name of the molecule
    $mol->name("water");   # set the name of the molecule to "water"

Note that these so-called accessor methods return the molecule object when they are used to set a property. A consequence of that if you want, you can "chain" several methods to set several options in one line:


A Chemistry::Mol object contains essentially a list of atoms, a list of bonds, and a few generic properties such as name, type, and id. The atoms and bonds themselves are also objects.

Writing a molecule file

To write a molecule to a file, just use the write method:


Make sure you used the right file I/O module. If you want to load all the available file I/O modules, you can do it with

    use Chemistry::File ':auto';

Selecting atoms in a molecule

You can get an array of all the atoms by calling the atoms method without parameters, or a specific atom by giving its index:

    @all_atoms = $mol->atoms;
    $atom3 = $mol->atoms(3);

Note: Atom and bond indices are counted from 1, not from 0. This deviation from common Perl usage was made to be consistent with the way atoms are numbered in most common file formats.

You can select atoms that match an arbitrary expression by using Perl's built-in grep function:

    # get all oxygen atoms within 3.0 Angstroms of atom 37
    @close_oxygens = grep {
        $_->symbol eq 'O' 
        and $_->distance($mol->atoms(37)) < 3.0 
    } $mol->atoms;

The grep function loops through all the atoms returned by $mol->atoms, aliasing each to $_ at each iteration, and returns only those for which the expression in braces is true.

Using grep is a general way of finding atoms; however, since finding atoms by name is common, a convenience method is available for that purpose.

    $HB1     = $mol->atoms_by_name('HB1');
    @H_atoms = $mol->atoms_by_name('H.*'); # name treated as a regex

Since the atom name is not generally unique, even the first example above might match more than one atom. In that case, only the first one found is returned. In the second case, since you are assigning to an array, all matching atoms are returned.

The atom object

Atoms are usually the most interesting objects in a molecule. Some of their main properties are Z, symbol, and coords.

    $atom->Z(8); # set atomic number to 8
    $symbol = $atom->symbol;
    $coords = $atom->coords;

Atom coordinates

The coordinates returned by $atom->coords are a Math::VectorReal object. You can print these objects and use them to do vector algebra:

    $c1            = $atom1->coords;
    $c2            = $atom2->coords;
    $dot_product   = $c1 . $c2;       # returns a scalar
    $cross_product = $c1 x $c2;       # returns a vector
    $delta         = $c2 - $c1;       # returns a vector
    $distance      = $delta->length;  # returns a scalar
    ($x, $y, $z)   = $c1->array;      # get the components of $c1
    print $c1;     # prints something like "[ 1.0E0  2.0E0  3.0E0 ]"

Since one is very often interested in calculating the distance between atoms, Atom objects provide a distance method to save some typing:

    $d  = $atom1->distance($atom2);
    $d2 = $atom1->distance($molecule2);

In the second case, the value obtained is the minimum distance between the atom and the molecule. This can be useful for things such as finding the water molecules closest to a given atom.

Atoms may also have internal coordinates, which define the position of an atom relative to the positions of other atoms by means of a distance, an angle, and a dihedral angle. Those coordinates can be accessed through the $atom->internal_coords method, which uses Chemistry::InternalCoords objects.

The Bond object

A Chemistry::Bond object is a list of atoms with an associated bond order. In most cases, a bond has exactly two atoms, but we don't want to exclude possibilities such as three-center bonds. You can get the list of atoms in a bond by using the atoms method; the bond order is accessed trough the order method;

    @atoms_in_bond = $bond->atoms;
    $bond_order    = $bond->order;

The other interesting method for Bond objects is length, which returns the distance between the two atoms in a bond (this method requires that the bond have two atoms).

    my $bondlength = $bond->length;

In addition to these properties, Bond objects have the generic properties described below. The most important of these, as far as bonds are concerned, is type.

Generic properties

There are three generic properties that all PerlMol objects have:


Each object must have a unique ID. In most cases you don't have to worry about it, because it is assigned automatically unless you specify it. You can use the by_id method to select an object contained in a molecule:

    $atom = $mol->by_id("a42");

In general, ids are preferable to indices because they don't change if you delete or move atoms or other objects.


The name of the object does not have any meaning from the point of view of the core modules, but most file types have the concept of molecule name, and some (such as PDB) have the concept of atom names.


Again, the meaning of type is not universally defined, but it would likely be used to specify atom types and bond orders.

Besides these, the user can specify arbitrary attributes, as discussed in the next section.

User-specified attributes

The core PerlMol classes define very few, very generic properties for atoms and molecules. This was chosen as a "minimum common denominator" because every file format and program has different ideas about the names, values and meaning of these properties. For example, some programs only allow bond orders of 1, 2, and 3; some also have "aromatic" bonds; some use calculated non-integer bond orders. PerlMol tries not to commit to any particular convention, but it allows you to specify whatever attributes you want for any object (be it a molecule, an atom, or a bond). This is done through the attr method.

    $mol->attr("melting point", "273.15"); # set m.p.
    $color = $atom->attr("color"); # get atom color

The core modules store these values but they don't know what they mean and they don't care about them. Attributes can have whatever name you want, and they can be of any type. However, by convention, non-core modules that need additional attributes should prefix their name with a namespace, followed by a slash. (This is done to avoid modules fighting over the same attribute name.) For example, atoms created by the PDB reader module (Chemistry::File::PDB) have the "pdb/residue" attribute.

    $mol  = Chemistry::Mol->read("test.pdb");
    $atom = $mol->atoms(1234);
    print $atom->attr("pdb/residue_name"); # prints "ALA123"

Molecule subclasses

You can do lots of interesting thing with plain molecules. However, for some applications you may want to extend the features of the main Chemistry::Mol class. There are several subclasses of Chemistry::Mol available already:


Used for macromolecules.


Used for substructure matching.


Used for representing rings (cycles) in molecules.


Used for representing and applying chemical transformations.

As an example we'll discuss macromolecules. Future versions of this tutorial may also include a discussion about patterns and rings.


So far we have assumed that we are dealing with molecules of the Chemistry::Mol class. However, one of the interesting things about object-oriented programming is that classes can be extended. For dealing with macromolecules, we have the MacroMol class, which extends the Chemistry::Mol class. This means that in practice you can use a Chemistry::MacroMol object exactly as you would use a Chemistry::Mol object, but with some added functionality. In fact, the PDB reader can return Chemistry::MacroMol instead of Chemistry::Mol objects just by changing the first example like this:

    use Chemistry::MacroMol;
    use Chemistry::File::PDB;
    my $macromol = Chemistry::MacroMol->read("test.pdb");

Now the question is, what is the "added functionality" that MacroMol objects have on top of the original Chemistry::Mol object?

The MacroMol object

For the purposes of this module, a macromolecule is considered to be a big molecule where atoms are divided in Domains. A domain is just a subset of the atoms in the molecule; in a protein, a domain would be just a residue.

You can select domains in a molecule in a way similar to that used for atoms and bonds, in this case through the domains method:

    my @all_domains = $macromol->domains;
    my $domain      = $macromol->domains(57);

The Domain object

A domain is a substructure of a larger molecule. Other than having a parent molecule, a domain is just like a molecule. In other words, the Domain class extends the Chemistry::Mol class; it is basically a collection of atoms and bonds.

    my @atoms_in_domain = $domain->atoms;
    my $atom5_in_domain = $domain->atoms(5);

If you want to get at a given atom in a given domain in a macromolecule, you can "chain" the method calls without having to save the Domain object in a temporary variable:

    my $domain57_atom5 = $macromol->domains(57)->atoms(5);
    my $res233_HA = $macromol->domains(233)->atoms_by_name('HA');

The second example is a good way of selecting an atom from a PDB file when you know the residue number and atom name.





Chemistry::Mol, Chemistry::Atom, Chemistry::Bond, Chemistry::File, Chemistry::MacroMol, Chemistry::Domain.


Ivan Tubert-Brohman <>


Copyright (c) 2005 Ivan Tubert-Brohman. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.