How to manage a trajectory file in GROMACS

The gmx trjconv command

 

In the intricate world of GROMACS simulations, the proper management of trajectory files is particularly important.

These files serve as the backbone of molecular dynamics investigations, containing the dynamic behavior of atoms and molecules over time.

In this article, I will start by providing a brief overview of what is a trajectory file and which ones are available in GROMACS.

Then, I will focus on two fundamental commands to handle them:

  1. The gmx trjconv command
  2. The gmx trjcat command

 

 

Let’s start by explaining what are trajectory files and which formats you will encounter in GROMACS.

By now, you should know that molecular simulations are obtained by solving the equations of motion for your system.

At each time step, the equations are solved and you will receive new coordinates, velocities, and forces for each atom. This information is conveniently stored in trajectory files.

Therefore, a trajectory file is just a collection of snapshots capturing the coordinates of atoms at each time step of your simulation.

When it comes to GROMACS simulations, trajectories are stored in two different formats, each serving specific purposes. Understanding these formats is crucial for efficient data management and subsequent analysis.

The two file formats I am referring to are:

  1. The xtc file: this is probably the most commonly used one. Here the trajectories are stored using a reduced precision algorithm, making the files much less memory-consuming and therefore more portable.

  2. The trr file: Here the files include coordinates, velocities, forces, and energies. This makes them larger in size due to their higher level of detail, and are only required for some very specific analysis.

 

You will receive these files as outputs from your simulations (namely from the gmx mdrun command), and you can customize them as you wish by setting up the correct parameters in the mdp file.

Furthermore, If you want to visualize them, you should read this article where I explained how to do that using VMD.

Now let’s see how you can play with these files.

 

Description
This command is used to work with trajectory files.

 

The first command we are going to see is gmx trjconv, which will most likely be one of your best friends during your GROMACS journey.

There are countless different ways in which you can use this command. Here I just provide you with a list of the ones I generally use more often.

 

 

The first application we are going to see for this command is the reduction of the dimensions of a xtc trajectory file.

Sometimes you may generate a trajectory file that is too heavy (in the order of hundreds of GB).

You can imagine that handling these types of files is not really practical. From time to time you may want to transfer the trajectories from one workstation to another, or to your local laptop.

In such cases, it is useful to “lighten” the trajectory and make it less memory-consuming to facilitate data transfer.

What we can do is create an additional xtc file with a lower number of frames.

 

So we can use the command:

1
gmx trjconv -f system.xtc -s system.tpr -dt 10 -o system_reduced.xtc (-n ../index.ndx)
Syntax
  1. Call the program with gmx
  2. Select the trjconv command
  3. Select the -f flag and provide the starting trajectory (system.xtc)
  4. Choose the -s flag and enter the .tpr file
  5. The -dt flag allows us to reduce the number of frames in the output. In our case, we will write one frame every 10ps in our output file.
  6. Call the -o flag and decide how you want to name the output file (system_reduced.xtc)
Optional
  1. You may also want to include an index file you previously created (index.ndx) via the -n flag. In this way, you can cut the trajectory and, at the same time, select only a specific part of the system you are interested in. GROMACS will still provide you with a default index file.

 

 

Trajectory files in GROMACS comes in two different formats:

  • A “lighter” format named xtc that stores the trajectory with the coordinates of our system in low precision
  • A more memory consuming format named trr with the higher precision trajectory of positions, velocities, and forces during the simulation

You can specify the format you prefer through the mdp file of your simulation.

However, you can also use the gmx trjconv module to convert a trajectory file from one format to another.

 

For instance, if we are interested in converting a trr file into a xtc we can simply use this command:

1
gmx trjconv -f system.trr -s system.tpr -o system.xtc (-dt 100 -n index.ndx)
Syntax
  1. Call the program with gmx
  2. Select the trjconv command
  3. Select the -f flag and provide the starting trajectory in the trr format (system.trr)
  4. Choose the -s flag and enter the .tpr file
  5. Call the -o flag and decide how you want to name the resulting trajectory file in the xtc format (system.xtc)
Optional
  1. You can also use the -dt flag to reduce the number of frames in the output as already explained.
  2. Also in this case, you may be interested in including an index file you previously created (index.ndx) via the -n flag.

 

 

We can use the gmx trjconv command to cut the trajectory and obtain a new one in between two selected frames.

 

Here is the command:

1
gmx trjconv -f system.xtc -s system.tpr -b 0 -e 100 -o system_cut.xtc (-n ../index.ndx)
Syntax
  1. Call the program with gmx
  2. Select the trjconv command
  3. Select the -f flag and provide the starting trajectory in the preferred format (system.xtc)
  4. Choose the -s flag and enter the .tpr file (system.tpr)
  5. The -b flag signals the starting frame for the new trajectory (0 ps)
  6. The -e flag tells the program the final frame (100 ps)
  7. Call the -o flag and decide the name of the output file in the gro format (system.gro)

 

 

You can also use the gmx trjconv command to extract a frame from your trajectory and create the corresponding gro file with the structure of the system in that specific timestep.

The main idea is the one we just saw in the previous example. We just need to add the -dump flag and specify a time value. By doing so, we will have the single frame that is the closest to the time we selected, and we can save as a gro file.

 

Here is an example of how to extract the closest frame to time $t=10ps$.

1
gmx trjconv -f system.xtc -s system.tpr -dump 10 -o system.gro (-n ../index.ndx)
Syntax
  1. Call the program with gmx
  2. Select the trjconv command
  3. Select the -f flag and provide the starting trajectory in the preferred format (system.xtc)
  4. Choose the -s flag and enter the .tpr file
  5. The -dump flag is followed by the time value (10 ps)
  6. Call the -o flag and decide the name of the output file in the gro format (system.gro)
Optional
  1. Also in this case, you may be interested in including an index file you previously created (index.ndx) via the -n flag. In this way, you can extract the structure of just a specific part of the system.

 

This approach might be a little slow when you want to extract a frame from a long trajectory, as GROMACS will need to scan through all the frames before getting to the one you need. Here I give you a little bonus trick.

If you want to extract the frame corresponding to the 100th ns you can play with the -b and -e flags that I showed you in the previous example:

1
gmx trjconv -f system.xtc -s system.tpr -b 100000 -e 100000 -o system.gro (-n ../index.ndx)

 

GROMACS will start scanning the trajectory from the time you specified with the -b flag and will immediately stop since the time is the same as the one specified in the -e option. This will give you the frame you need in a matter of seconds.

 

 

During an MD simulation you can encounter one of the following “problems” when you load the resulting system in your favourite molecular visualization software (e.g., PyMOL):

  • The molecule/protein is not centered in the simulation box
  • The molecule/protein you are simulating is broken into different pieces
  • You see strangely elongated bonds when visualizing the structure of your resulting simulation

You shouldn’t worry about this. It is completely normal and is just a result of the implementation of Periodic Boundary Conditions (PBC). Reading the article where we discussed PBC should clarify the situation.

If you want to switch everything back to normal to have a proper visualization of your system you can use the gmx trjconv command.

Through this module, you can center a specific part of your systems, such as a molecule or a protein, in your simulation box.

 

The command is as follows:

1
gmx trjconv -f system.gro -s system.tpr -pbc mol -center -ur compact -o centered.gro (-n index.ndx)
Syntax
  1. Call the program with gmx
  2. Select the trjconv command
  3. Select the -f flag and provide the starting trajectory in the gro format (system.gro)
  4. Choose the -s flag and enter the .tpr file
  5. The -pbc mol -center centers the molecule and puts back all the atoms in your system
  6. Call the -ur compact to put all atoms at the closest distance from the center of the box.
  7. As always, the -o flag is used to name the output file (centered.gro)
Optional
  1. You will need a special index file if you want to center a “non-standard” part of your system. You can specify it via the -n flag.

Finally, you will be asked to:

1. Select the group that you want to center in the box.

2. Select the group that you want in the output file

Example

To center a protein in the simulation box, and output a gro file containing the entire system you have to select:

  1. “Protein” as group 1
  2. “System” as group 2

 

 

Sometimes you may want to convert a gro file into a pdb file. GROMACS allows you to do that using the gmx trjconv module in this way:

1
gmx trjconv -s system.tpr -f system.gro -o system.pdb -pbc whole -conect (-n index.ndx)
Syntax
  1. Call the program with gmx
  2. Select the trjconv command
  3. Choose the -s flag and enter the tpr file
  4. Select the -f flag and provide the gro file you want to convert (system.gro)
  5. Call the -o flag and decide the name of the output file in the pdb format (system.pdb)
  6. The -pbc flag specifies the Periodic Boundary Conditions (PBC) treatment. Through the whole option we make all broken molecules whole.
  7. The -conect flag is needed to write the CONECT records in the output pbd file
Optional
  1. Also in this case, you may be interested in including an index file you previously created (index.ndx) via the -n flag. In this way, you can extract the structure of just a specific part of the system.

 

GROMACS will ask you to provide the group you want in your pdb file. You can select the overall system or any part of the system you desire. For instance, if you simulated a protein in water but you only want the pdb file of the protein without the solvent you can select Group 1 "Protein".

 

Description
This command is used to concatenate different trajectory files.

 

The second command we are going to see is gmx trjcat. Through this command, we can join two or more different trajectory files.

This module has far less applications than the previous one but it may still be useful in a few cases.

The command is this one:

1
gmx trjcat -f traj_1.xtc traj_2.xtc -o final_traj.xtc (-settime -cat)
Syntax
  1. Call the program with gmx
  2. Select the trjcat command
  3. Select the -f flag and provide two trajectories in the preferred format (traj_1.xtc, traj_2.xtc).
  4. Call the -o flag and decide the name of the output file in the gro format (final_traj.xtc)
Optional
  1. The -settime option allows you to interactively select the starting time for all the trajectories you want to concatenate.
  2. By default, GROMACS will overwrite frames having the same timestamps. If you want to retain all of them you can use the -cat option.

 

If you have multiple xtc files in your directory you can simply use *xtc instead of explicitly pass all the trajectories. GROMACS will automatically order all the files depending on the time value and then proceed to join them.