Please consult the forum for additional answers or to ask a new question.You need to obtain a license to receive a username and password that will let you download PyRosetta.
To run PyRosetta, your computer needs to run either Mac OS X (v10.5), Windows, or Linux and must have Python v2.5 already installed. Please see the installation page for more information.
PyRosetta now supports 64-bit Linux and Mac OS X platforms along with 32-bit Linux and Windows.
Once you have obtained a license and downloaded (and unzipped) PyRosetta, the easiest way to get started is by going through the PyRosetta step-by-step tutorial. For a more interactive exposure to PyRosetta, please consult the sample scripts.
We have recently published an Applications Note in the journal Bioinformatics on PyRosetta (advanced access available):
PyRosetta is a Python-script based front-end to the Rosetta molecular modeling suite. Rosetta, which is a collaborative project between more than 15 labs world-wide, requires users to have substantial experience in C++ and Rosetta software development to write custom algorithms. Through the use of Python-bindings to Rosetta C++ source code, PyRosetta allows the end-user to have access to the same Rosetta functions available to Rosetta developers, through an easy-to-use Python script based interface. Robetta is a server available online for non-commercial use of Rosetta applications. Rosetta algorithms cited in literature are the same (or similar) to the code implemented by Robetta and PyRosetta.
The PyRosetta Toolkit GUI-addon to PyRosetta is written in Python using the Tkinter GUI package, which is included in the standard python distribution. The program is a set of modules which rely on importing PyRosetta for interactive protein modelling, design, and analysis. It can be extended and customized like many of the scripts included in PyRosetta. PyMOL is a protein visualization package, independent of PyRosetta. The Toolkit GUI has functions to send information (structures, energies, hbonds, etc.) to opened PyMOL window through the PyMOL-PyRosetta server link.
Yes, PyRosetta algorithms, like Rosetta, are easily scaled up to a large number of parallel processes. The
If you have questions about how to install or use PyRosetta please use RosettaCommons forums. For other issues please contact support.
Generally, No! Rosetta is a suite of algorithms and protocols but the underlying algorithm for many protocols is called Markov Chain Monte Carlo (MCMC) and is stochastic (random). Abstractly, MCMC algorithms allow users to withdraw samples from a distribution without explicit knowledge of the entire distribution structure (though some knowledge about constraints on the distribution is required). Rosetta was constructed to perform structural changes and scoring efficiently to make individual trajectories computationally inexpensive. Thus, an individual trajectory, such as a single ab initio job, is not expected to yield a very good prediction. With enough trials (adequate sampling), one or more trials will yield very good estimates (low scoring) of the unknown structure.
As a stochastic algorithm, the real output (with adequate sampling) is a distribution of structures. However, the search space for nearly all protein conformation questions is too large to extract statistical characteristics easily. Many statistical techniques can be used to analyze the set of structures (decoys) output by a Rosetta protocol but none are universally applicable. Simplistically this means that each new trajectory has the chance of producing a low scoring structure (most likely native-like) and thus, more trajectories yield a greater chance of producing a realistic structure.
Usually 800 or more trajectories is enough to produce useful results although it depends on the application. When developing new algorithms, construct a benchmarking set of structure to compare with. As is typical with Bioinformatics software, some parameters of any algorithm with be tuned based on a trial set. This does not mean individual trials in PyRosetta are useless. At very least, the output are indicative of the sampling performed by the algorithm.
Several tools in PyRosetta are deterministic (such as minimization). When testing out new protocols or Movers, try and learn if the application has a lot of variability or performs small random changes.
Help is accessed in PyRosetta with several similar looking commands:
These help messages are generated from comments inside the Rosetta source code. Some effort has been made to standardize and improve these help messages, however only a small portion of the essential objects and methods have in-depth help messages. The test output is a summary of an object's purpose or a method's function. iPython also supports tab-completion which is extremely useful for exploring object methods. Help can be accessed from the object name itself or from instances of the object (as in the example above).
PyRosetta contains nearly all of Rosetta. Various restrictions prevent some templated methods from becoming part of PyRosetta, but in almost every case, there is a workaround. Many Rosetta object may appear not to work, however they almost always work properly though they may have nuances to their usage or make inherent assumptions which the user does not want. Unfortunately, we cannot easily provide a list of methods that are working or not working. Fortunately, it is easy to test object functionality in PyRosetta. If the object requires a lot of setup (depends of other data structures), I recommend writing a short script to test out the object. Many errors in PyRosetta cause segmentation faults which end the IPython (or Python) interpreter session and hamper object testing. (This typically results from poor error-checking in the C++ code, so please feel free to submit a bug report when this happens.) From experience, typing "from rosetta import *" gets tedious fast, and you must know what an object or method does before using it.
The tutorials and sample scripts demonstrate what commonly used objects and methods work in PyRosetta and how to use them. For more information, consult the documentation.
Please consult the documentation.
The easiest way to find what is hidden in PyRosetta is to use tab-completion with the
Within Rosetta, several simple objects are used for basic data structures. If these are seen within PyRosetta help, they can be replaced by their appropriate Python data type.
Within Rosetta, Vector objects are used for various list structures. The common Vector objects are found in various locations. Please consult the question below for more information.
Within Rosetta has its roots in FORTRAN so counting is "1-indexed" (the first element is numbered 1). Python on the other hand is "0-indexed" (the first element is numbered 0). The documentation discusses this in a little more depth.
Yes! Very well! One advantage of having Rosetta accessible in Python is the ease of using other Bioinformatics software. Biological software is usually made public as a commandline executable or even within Python. Python serves as a wonderful language to "glue" other programs and processes together. Combining PyRosetta with other tools can greatly enhance analysis and only requires one (okay two, you should be able to get around in bash) language. Some Python programs or libraries frequently used with PyRosetta are:
matplotlib (and pylab)
Yes, and there are a lot of ways to do this. Many biological tools are separate programs or executables and Python is a perfect place to stitch together programs that share Python or require system calls from the commandline. Biopython even has a set of Python objects for handling these system calls. The most basic method uses the
Example system call:
The Pose object represents a single molecule within PyRosetta. These molecular objects can be created (see the questions below) or constructed from a PDB file. PyRosetta is intended to work with proteins but can successfully load other compounds (with additional work). Rosetta has numerous naming preferences but is capable of loading many (if not all) protein structures. The simplest method for creating a Pose object is to load it from a PDB file using the method
The Pose is a complex data structure with various objects. An abstract summary of the Pose structure is provided in the documentation. A more detailed and accurate description of the ose data structure is found in the paper A. Leaver-Fay et al., "ROSETTA3: An object-oriented software suite for the simulation and design of macromolecules," Methods in Enzymology 487, 548-574 (2011).
Missing atoms and other little errors are gracefully handled by PyRosetta. However, small clashes or other discrepancies can cause problems with some Rosetta protocols. Methods for converting a raw crystal structure into a Rosetta-solution-state structure are varied. For more information, please consul the documentation and the Rosetta user guide.
Rosetta is currently very verbose. This is handy when debugging problems but can be jarring at first die to the shear volume of seemingly useless output. When loading a pose from a PDB file, output will indicate what atoms are missing (and thus idealized) and notify you of other problems. Generally, you can ignore this output even if it says there is a problem. If the PDB file is successfully read, you may want to check its sequence etc. or view it using the PyMOL_Mover but the protein has most likely been loaded without any trouble.
Yes. The Pose object has methods for deleting and appending (inserting) residues. In addition, there are more functions and classes in the
The information, however, is not lost. It is simply taged by Rosetta as obsolete. There are two ways to approach this if your original PDB information is needed. For deletions, simply use the function:
If using this method to produce a pose with non-protein residues (molecules), you can access these ResidueTypes using a single letter with its other identifier code in brackets. The single letter code and other identifiers can be found in the .params files within PyRosetta
Different ResidueTypeSets than fullatom( "
There are several ways to change a protein at the sequence level using PyRosetta. You can (a) setup a PackerTask to redesign the protein with the desired sequence changes, (b) use the method
a) Please consult the sample script packer_task.py for syntax on setting up a PackerTask to perform design manually or using a resfile. Since this option uses a Mover and allows multiple changes, it is the most efficient method of changing a protein's sequence.
b) PyRosetta 1.0 and 2.0 have an exposed method
from toolbox import mutate_residue
This method is easy to use and best suited for investigations using the interpreter or sequence changes performed outside of a protocol. Additionally, a packing shell can be created around the residue to mutate.
c) The Mover MutateResidue performs the same change as the old mutate_residue method, and thus does not allow direct repacking of the sidechains near the mutant. Since this Mover's target residue and mutant identity must be set each time, it is the least efficient option.
PyRosetta knows the typical deoxynucleic acids as the ResidueTypes
Remember, for docking applications the downstream partner (later chain or chains) is docked to the upstream partner (first chain or chains) which is in a fixed position. Thus, for any DNA-protein docking application, it is most intuitive to dock the DNA to the protein requiring the DNA chains to be after the protein chains in the PDB file.
The sample script dna_interface.py also outlines and explains this process.
Typically, you simply want to remove the water since PyRosetta does not use these for any application and they can cause problems. If you want to load water molecules into PyRosetta, you must activate them and edit the PDB file. PyRosetta knows water as the ResidueTypes TP3 and TP5 however these are "turned off" by default. To edit PyRosetta so that it will always know water HETATM lines, edit the file
This file is the master list of fullatom ResidueTypes. As you probably guessed, the water .params files can be found in
To properly load the PDB file, you must also edit its water HETATM lines to have TP3 (or TP5) in the PDB resName column (characters 18-20). Usually a PDB file will have "
10. How do I load in a PDB file containing other molecules?
The Pose object has a
Similarly for setting a pose based on a similar list structure:
There are many ways of extracting coordinate information. Please consult Workshop #2, sample script ala_scan.py, or tool script extract_coords_pose.py for more information.
The method varies depending on the accuracy you desire. Rosetta is equipped with an
For more accurate inquiries, Rosetta is an ideal tools for extracting atomic coordinates (in the example below, a Python list is produces from the
13. How do I align two Pose objects?
14. How do I calculate the RMSD between two Pose objects?
15. How do I extract secondary structure information?
16. How do I write PDB files?
17. Can I send structure directly to PyMOL without writing to files?
18. What is a ResidueTypeSet?
19. What is a FoldTree (or for that matter an Edge or a Jump object) ?
20. What is an "AtomID" object?
1. How do I create a ScoreFunction?
2. What are the scoring units?
3. How do I extract individual residue scores?
4. How do I extract individual atom scores?
5. What are the different ScoreFunctions?
6. How do I investigate ONLY the score between two residues?
7. How do I extract hydrogen bond information?
8. How do I know what a score term calculates?
9. How do I find new score terms?
10. Can I see scores in PyMOL?
1. What is a Mover?
2. What Mover classes are available in PyRosetta?
3. What is the best way to create a sequence of Movers?
4. How do I know what structural changes are actually performed by a Mover?
5. Minimization increased the score (or moved docking partners war apart) ?
6. Packing doesn't always yield the same score (or rotamers) ?
1. Which of Rosetta protocols for modeling membrane proteins available in PyRosetta?
- RosettaMP is available in PyRosetta
- the ddG application is also accessible from PyRosetta, and there is a sample Python script that you can adapt.
- RosettaMPDock is now only available in the Rosetta C++
2. Is the Rosetta ab initio protocol available in PyRosetta?
3. Is Rosetta fragment selection/generation available in PyRosetta?
4. What is docking and what protocols best match my problem?
5. What is wrong with my docking?
6. How many trajectories should I run?
The Rosetta commandline options are accessed during initialization of PyRosetta(
To set PyRosetta initialization options:
1. import rosetta (
2. While calling rosetta.init(), pass all of the options as a string to the init() function.
If you want to get or set global Rosetta options within PyRosetta after initialization, you can either call init again or use getter and setter functions for setting options.
Generally, the getter methods syntax is:
The full list of getters (in newer versions of PyRosetta) are:
Generally, the setter methods syntax is:
The full list of setters (in newer versions of PyRosetta) are:
If these methods do not work, you may need to import them (sorry, the code has changed versions and this is not the place to explain these decisions). Try:
This may cause errors if you have altered your environment path variables. For more information, see
Use the "
For example, to create a "ALA" residue: