PyMOL selection tool

How to select atoms and residues in PyMOL

 

PyMOL is a cross-platform molecular graphics system that is designed to provide an interactive, visual environment for exploring and analyzing 3D structures of proteins, nucleic acids, and small molecules.

It is great to observe and analyze the overall structures of biomolecules but of course, that is not sufficient as most of the time one would also like to operate certain actions only on a specific subset of atoms (e.g., you may want to show a certain residue or the binding site of a protein in a different representation or color).

To this aim, we can exploit the selection tool that allows you to interact with specific parts of the molecule by selecting them, and then use the selections to perform actions on that precise group of atoms.

This is one of the most powerful tools in PyMOL as it allows you to create different instances containing different parts of your system that you can proceed to customize as you wish. It allows users to focus on specific parts of a molecule and perform various actions on those parts, which can be useful for tasks such as analyzing protein-ligand interactions, identifying binding sites, or visualizing structural changes

The selection tool can be used to create visual representations of selected atoms or residues, which can help communicate results and insights to show in your publications/presentations.

As is often the case in PyMOL, we have two possible approaches to selecting regions of your system.

  1. The first one relies on the graphical user interface (GUI) where you can select by clicking on the regions of interest.

  2. The second one relies on the select command that can be prompted at the command line.

Each one of them has their strength and weaknesses and can be useful in different situations so I decided to discuss both of them in this article.

 

 

The first way to select a specific atom or group of atoms is by physically clicking on them in the viewer window. Once you do that, you will notice that:

  • The selected residue will be highlighted with pink boxes
  • A new object (sele) will appear in the object menu panel.
  • In the command line panel you will be prompted with some information such as the chain and residue number of the selected residue.

 

 The interface of the molecular visualization software PyMOL when you initially load the program.

 

From here, you can grow your selection by clicking on additional residues while you can unselect by clicking on it a second time. To clear all selections, click on an area of the viewer window that does not have any atoms.

To avoid confusion and remember which atoms a selection contains, you can rename it by clicking on A $\rightarrow$ rename selection $\rightarrow$ Type the name and press enter

Then you can use the five pop-up menus available for the selection to modify the properties of the atoms within the group.

Example

Let’s say that I wanted to display the residue number 83 selected in the previous image to red colored sticks.

  1. First of all, change the name of the representation to resid_83: A $\rightarrow$ rename selection $\rightarrow$ resid_83
  2. To change the representation for the selection to sticks: S $\rightarrow$ as $\rightarrow$ sticks
  3. Finally, change the color of the selected residues to red: C $\rightarrow$ reds $\rightarrow$ red

 

The interface of the molecular visualization software PyMOL when you initially load the program.
Tip

By default, PyMOL will select residues. Note that you can modify the selection mode by left-clicking on the Selecting mode in the bottom right panel. This will allow you to switch to chain or atom selection among others.

You can also change the selection mode by dragging your cursor up to Mouse $\rightarrow$ Selection Mode $\rightarrow$ choose the selection mode

 

 

The previous approach is extremely convenient when you want to quickly select a residue during a visual inspection but it also comes with some problems. Each time you want to select a residue you first need to find it in the structure.

As you may imagine, it can be quite challenging to interpret complicated structures and find the residue you need. So what if you want to select a specific residue number and you don’t know how to locate it in the protein?

A more practical alternative in such cases is to directly select the residues you need by using the sequence display feature in the GUI window.

This will show the sequence of residues in the protein starting at the N-terminus and ending at the C-terminus. You can then use the scroll bar and click on the residues to select them by number, even if you are not sure of their location in the structure.

To turn on the sequence viewer in PyMOL, you can either click the “S” button below the mouse mode table or navigate to the upper control window and click on Display $\rightarrow$ Sequence.

 

 

Selecting regions of your protein by clicking on residues or sequence display is an approach that can be useful when you want to quickly select atoms of interest. However, sometimes you may want to achieve more control over the selection of specific atoms by filtering them according to a series of criteria.

If that is the case, you can use the built-in selection command implemented in PyMOL.

The select command is a powerful and flexible feature of PyMOL that allows users to define complex selections based on a wide range of criteria, including atomic properties, spatial relationships, and chemical properties.

Selection criteria can be refined or broadened by using a selection algebra that combines specific keywords with logical operators (“and”, “or”, not).

 

The general syntax for a selection in PyMOL is quite intuitive. You only need to call the select command followed by the rules specifying the selection criteria (selection_rule).

1
select selection_rule 

 

For instance, to select the whole system you can just prompt the select command followed by the specific keyword needed to select everything (all).

1
select all

This will create a new selection containing the overall system and having the default name ((sele)) that you can use to perform your analysis.

 

If you want to create a selection with a custom name you can simply call the command followed by the name you want, and then input the selection rules after the comma:

1
select my_selection, all

In this way, you will generate a selection named my_selection containing all the atoms.

 

 

The selection algebra in PyMOL is very powerful and allows you to get very creative with your selection operations.

It would be impossible to go through each one of them so here I will give you a few examples of the most useful ones. More specifically, I will focus on three categories of selections:

  1. Selection by name
  2. Selection by proximity
  3. Selection by properties

 

I will give some examples for each one of them and finally, I will also explain to you how you can combine them using logical operators to create more complex selections.

You can find more info about the selection algebra in the PyMOL wiki.

 

 

The first few examples we are going to see are selections by name. PyMOL gives you different options to select according to different identifiers (e.g., atom name and number, residue name and number, chain, …) as specified in the PDB file or any other file format you are using.

 

If you want to select by atom name you can do that by calling the select command followed by the name keyword and the atom name (<atom_name>).

1
select , name <atom_name>
Example

Let’s say you want to group all the $C_{\alpha}$ in your protein in a selection named CA :

1
select CA, name CA

You can also create a selection with different atom names such as $C_{\alpha}$, and $C_{\beta}$:

1
select CA_CB, name CA+CB

 

To select by atom number we need to use the id identifier.

1
select , id <atom_number>
Example

You can select a certain atom number (e.g., 10) as well as a range of atoms (1-100):

1
2
select atom, id 10
select atoms, id 1-100

 

To select by residue name use the resn keyword followed by the name of the residue you want to select (<residue_name>).

1
select , resn <residue_name>
Example

You can select all the Alanines in your protein in a selection named ALA :

1
select ALA, resn ALA

And add Glycine to the selection using the + sign:

1
select ALA_GLY, resn ALA+GLY

 

To select by residue number we need to use the resi identifier.

1
select , resi <residue_number>
Example

You can select a certain residue number (e.g., 10) or a range of residues (1-100):

1
2
select residue, resi 10
select residues, resi 1-100

 

Some proteins may be arranged in different chains. If that is the case, PyMOL also gives you the possibility to select by chain using the chain keyword followed by the chain identifier (<chain_id>).

1
select , chain <chain_id>
Example

To select chain A of your protein:

1
select , chain A

 

 

Another possible option is to select by proximity, namely depending on the surrounding environment of a given atom. Here we will consider two use cases.

 

1. The first one will be the within keyword that allows you to select atoms in a selection (<sele_1>) that are within a certain distance (<distance> in Å) of another atom or group of atoms (<sele_2>):

1
select , <sele_1> within <distance> of <sele_2>
Example

This statement allows you to select all the $C_{\alpha}$ within 10 Å of a residue range (1-10) and includes them in a selection name CAs.

1
select CAs, name CA within 10 of resi 1-10

 

2. The second example will discuss the bound_to keyword that lets you select all the atoms bounded to a certain selection (<my_selection>).

1
select , bound_to <my_selection>
Example

The following command allows you to select all the atoms bounded to residue number 70 and includes them in a selection name bounded_70.

1
select bounded_70, bound_to resi 70

 

 

PyMOL also gives you the possibility to select atoms depending on a different set of properties of atoms. Here we will briefly consider some of them including secondary structure, b-factor, and chemical class.

 

To select by secondary structure you can use the ss keyword and then specify the secondary structure you want to select (<ss>).

1
select secondary_structure, ss <ss>
Example

The values of <ss> can be either h = helix or s = sheet. You can also select both of them by adding the + sign.

1
2
3
select helix, ss h
select sheet, ss s
select all, ss h+s

 

The b-factor is an important parameter indicating the most mobile regions of a protein. If you want to select atoms based on their b-factor, you can use the b keyword followed by the "<", “>”, “=” operators, and a cutoff value (<cutoff>):

1
select b_factor, b < <cutoff> 
Example

To include all the atoms having b-factor less than 10:

1
select low_bfactor, b < 10

 

PyMOL also gives you a series of specific keywords that you can use to group atoms based on the chemical class they belong to. Some of them are reported in the table.

Keyword Class Command
organic organic compounds (e.g., ligands) select, organic
solvent water molecules select, solvent
hydrogens hydrogen molecules select, hydrogens
backbone backbone atoms select, backbone
sidechain sidechain atoms select, sidechain
metals metal atoms select, metals

 

 

The main advantage of the select command is that you can create complex selection expressions by combining all of the rules that I previously showed you using logical operators. The three logical operators available in PyMOL are and, or, and not.

Here are some examples of how you can use these operators to improve the selection in PyMOL:

 

1. The and operator is used to specify that the selection should include only atoms that meet both of the specified criteria.

Example

Let’s say that you want to group all the $C_{\alpha}$ in the residue range 50-60 protein in a selection named CA_50_60. You can use the and operator to combine the keyword selecting the residue range (resi 50-60) and the one selecting the $C_{\alpha}$ (name CA).

1
select CA, resi 50-60 and name CA

 

2. Use the or to specify that at least one of multiple selection criteria must be met.

Example

Select every residue in the range of 10-20 or 30-40 in your protein. You can use the or operator to associate the keywords to select both of the residue ranges (resi 10-20, resi 30-40)

1
select , resi 10-20 or resi 30-40

 

3. The not logical operator reverses the expression that immediately follows.

Example

For instance, if you want to select everything except for $C_{\alpha}$ you can use the keyword needed to select the alpha carbons (name CA) preceded by the not operator.

1
select not_CA, not name CA

 

You can also create a selection with multiple logical operators at the same time.

To select every $C_{\alpha}$ in chain A or every $C_{\beta}$ in chain B. Just make sure to use parenthesis in the appropriate way to avoid confusion and that the operations are performed in the right order.

1
select CA_A_or_CB_B, chain A and name CA or chain B and name CB

 

 

Once you have selected the atoms of interest you can use them to perform operations, such as displaying, hiding, changing their representation, or coloring them. Alternatively, certain commands allow you to directly perform a selection operation after the comma.

1. Visualize the selection: you can use the show, and hide commands to display or hide the selected atoms in the PyMOL viewer (my_sele) using your favorite representation. For example:

1
2
3
show spheres, my_sele
hide spheres, my_sele
show spheres, name CA #directly insert selection after the command

 

2. Change the color of the selection: You can use the color command to change the color of the selection.

1
2
color red, my_sele
color green, name CA

 

Find more info on the coloring and customization process in PyMOL here