How to Align Structures in PyMOL: align, super, and cealign Commands
Aligning protein structures in PyMOL lets you superpose two conformations, compare homologous proteins, or track how a ligand moves across MD simulation frames. PyMOL provides three commands for this — align, super, and cealign — each suited to a different scenario. This article covers the align command in detail, with practical examples for proteins, chains, backbones, and ligands, and ends with a guide on when to use each of the three commands.
Example Aligning structures can be useful when you want to compare two ligands after a docking experiment
The PyMOL align command
The way this command work is quite intuitive.
Let’s say that you have two structures (object_1, object_2) and you want to align them. You only need to type the following in the PyMOL console.
|
|
Note that the first one will align with the second one. Therefore the first selected object will be the “mobile” one while the second will be the reference.
|
|
PyMOL uses a two-step approach for aligning structures: first, it performs a sequence alignment, and then it minimizes the Root Mean Square Deviation (RMSD) between the aligned residues. As a result, you will get the structures aligned in the display and something like this will be printed to the console.
|
|
Practical Alignment Examples
By utilizing PyMOL’s alignment capabilities, you can perform various alignments to gain insights into the structural characteristics of biomolecules and their interactions. Now let’s look at some use cases you may encounter during your research. Most of them require that you are more or less familiar with the Pymol selection tool so I suggest you read this article if you want to get an idea of how it works.
How to Align Two Proteins in PyMOL
Let’s say you have two protein structures in the PDB format, protein_A.pdb and protein_B.pdb, and you want to align them to analyze their structural similarities and differences. You can follow these two steps:
- Load the protein structures into PyMOL: You can use the
loadcommand in PyMOL to load the protein structures into separate objects. For example:
|
|
This will load the protein structures from the PDB files protein_A.pdb and protein_B.pdb into two separate objects named mobile and reference in the PyMOL GUI.
|
|
- Proceed to align them as previously shown using the
aligncommand:
|
|
You can also use the fetch command to directly retrieve the pdb file from the Protein Data Bank.
|
|
How to Align Specific Chains or Residue Ranges
Sometimes you may notice that the standard alignment is not great. If that is the case, you can improve by selecting a subset of residues and atoms.
Let’s take as an example a situation where you have two proteins and you want to align one protein to a specific chain or certain residues of the other protein. You can use PyMOL’s align command along with the sele command to specify the chains or residues of interest. Here’s an example:
- Load the two proteins
|
|
- Select the chain or residues of interest: You can use the
selecommand to specify the chain or residues of interest that you want to align.
|
|
This will select chain A from object1 as the chain of interest, or residues 100 to 150 from object2, respectively.
- Align the selected chain or residues: You can use the
aligncommand to align the selected chain or residues to the other protein structure.
|
|
How to Align by Backbone (Alpha Carbons Only)
Similarly, we can align two proteins based on their backbones by first extracting the alpha carbons of the protein with the sele command.
|
|
Align the two objects containing the alpha carbons:
|
|
How to Align Two Ligands for Docking Comparison
Let’s say that you run a MD simulation using GROMACS or any other software and you want to observe how the position of the ligand changes along the course of the simulation.
Something you could do is to load both the initial and final frames (you can get them via gmx trjconv) of the simulation and align two ligands based on their names and residue codes (resn) using the sele command in PyMOL.
|
|
Align the two objects containing the ligands:
|
|
This will align the two ligands based on their residue name “LIG”, minimizing the RMSD between the ligand atoms. After aligning the ligands, you can visually compare the superimposed structures in PyMOL to analyze whether something changed.
How to Align Multiple PDB Files to a Reference
PyMOL also allows you to align multiple structures. To do that, you have to load all the structures and then perform consecutive alignments. A little cumbersome but it gets the job done.
|
|
In this example, we have loaded three PDB files (protein1.pdb, protein2.pdb, and protein3.pdb) into three separate objects (reference, object2, and object3).
We then used the align command to align object2 and object3 to the same reference.
align vs super vs cealign: Which Command to Use
PyMOL has three alignment commands and the right choice depends on how similar your structures are.
align performs a sequence alignment first, then minimizes RMSD. It works well when your proteins share reasonable sequence similarity (roughly >30%). This is the command covered in this article and the one you will reach for most often.
super skips the sequence alignment step and works purely on structural similarity. Use it when your proteins are distantly related or when align produces a poor result because the sequences are too divergent.
|
|
cealign uses the Combinatorial Extension (CE) algorithm. It is the most robust option for structural homologs with very low sequence identity, where both align and super may struggle. It is slower, but it handles difficult cases reliably.
|
|
Note that cealign takes arguments in the opposite order to align and super — the reference comes first.