Please consult the forum for additional answers or to ask a new question. 1. How do I obtain and install PyRosetta?You need to obtain a license to receive a username and password that will let you download PyRosetta.To run PyRosetta, your computer needs to run either Mac OS X (v10.5), Windows, or Linux and must have Python v2.5 already installed. Please see the installation page for more information. 2. What if I have questions?If you cannot find the answers to your questions here, please check the RosettaCommons forums for solutions or post your own query. For more information on Rosetta's architecture, data structures, and custom file syntax, consult the user guide. 3. Is PyRosetta available for 64 bit Linux or Windows platforms?PyRosetta v1.1 now supports 64-bit Linux platforms along with 32-bit Linux and Mac OS X. A beta version of Pyrosetta v1.1 is now available for Windows. 4. How do I get started?Once you have obtained a license and downloaded (and unzipped) PyRosetta, the easiest way to get started is by going through the PyRosetta step-by-step tutorial and the PyRosetta manual. For a more interactive exposure to PyRosetta, please consult the sample scripts. 5. How do I cite PyRosetta?We have recently published an Applications Note in the journal Bioinformatics on PyRosetta (advanced access available): 6. What is the relationship between PyRosetta, Rosetta, and Robetta?PyRosetta is a Python-script based front-end to the Rosetta molecular modeling suite. Rosetta, which is a collaborative project between more than 15 labs world-wide, requires users to have substantial experience in C++ and Rosetta software development to write custom algorithms. Through the use of Python-bindings to Rosetta C++ source code, PyRosetta allows the end-user to have access to the same Rosetta functions available to Rosetta developers, through an easy-to-use Python script based interface. Robetta is a server available online for non-commercial use of Rosetta applications. Rosetta algorithms cited in literature are the same (or similar) to the code implemented by Robetta and PyRosetta. 7. Does PyRosetta allow for parallel processing and high performance computing?Yes, PyRosetta algorithms, like Rosetta, are easily scaled up to a large number of parallel processes. The JobDistributor object in PyRosetta that makes it simple to scale up simulations for multiple processes. Please see Section 4.4 of the PyRosetta Manual for more information.If you have questions about how to install or use PyRosetta please use RosettaCommons forums. For other issues please contact support.Basic Applications Questions1. Are PyRosetta algorithms deterministic?Generally, No! Rosetta is a suite of algorithms and protocols but the underlying algorithm for many protocols is called Markov Chain Monte Carlo (MCMC) and is stochastic (random). Abstractly, MCMC algorithms allow users to withdraw samples from a distribution without explicit knowledge of the entire distribution structure (though some knowledge about constraints on the distribution is required). Rosetta was constructed to perform structural changes and scoring efficiently to make individual trajectories computationally inexpensive. Thus, an individual trajectory, such as a single ab initio job, is not expected to yield a very good prediction. With enough trials (adequate sampling), one or more trials will yield very good estimates (low scoring) of the unknown structure. As a stochastic algorithm, the real output (with adequate sampling) is a distribution of structures. However, the search space for nearly all protein conformation questions is too large to extract statistical characteristics easily. Many statistical techniques can be used to analyze the set of structures (decoys) output by a Rosetta protocol but none are universally applicable. Simplistically this means that each new trajectory has the chance of producing a low scoring structure (most likely native-like) and thus, more trajectories yield a greater chance of producing a realistic structure. Usually 800 or more trajectories is enough to produce useful results although it depends on the application. When developing new algorithms, construct a benchmarking set of structure to compare with. As is typical with Bioinformatics software, some parameters of any algorithm with be tuned based on a trial set. This does not mean individual trials in PyRosetta are useless. At very least, the output are indicative of the sampling performed by the algorithm. Several tools in PyRosetta are deterministic (such as minimization). When testing out new protocols or Movers, try and learn if the application has a lot of variability or performs small random changes. 2. How do I access the help system?Help is accessed in PyRosetta with several similar looking commands: help <object> help( <object> ) <object>?For example: help Pose help( MonteCarlo ) Residue? p = Pose() p?These help messages are generated from comments inside the Rosetta source code. Some effort has been made to standardize and improve these help messages, however only a small portion of the essential objects and methods have in-depth help messages. The test output is a summary of an object's purpose or a method's function. iPython also supports tab-completion which is extremely useful for exploring object methods. Help can be accessed from the object name itself or from instances of the object (as in the example above). 3. What objects work? PyRosetta does not contain all of Rosetta. Various restrictions prevent some objects and protocols from (currently) becoming part of PyRosetta. Many Rosetta object may appear not to work, however they almost always work properly though they may have nuances to their usage or make inherent assumptions which the user does not want. Unfortunately, we cannot easily provide a list of methods that are working or not working. Fortunately, it is easy to test object functionality in PyRosetta. If the object requires a lot of setup (depends of other data structures), I recommend writing a short script to test out the object. Many errors in PyRosetta cause segmentation faults which end the iPython (or Python) interpreter session and hamper object testing. From experience, typing "from rosetta import *" gets tedious fast and you must know what an object or method does before using it. The tutorials and sample scripts demonstrate what commonly used objects and methods work in PyRosetta and how to use them. For more information, consult the documentation. 4. How do I know if a method or object constructor is overloaded? Please consult the documentation. 5. How do I search for new objects or methods? The easiest way to find what is hidden in PyRosetta is to use tab-completion with the rosetta architecture. Reading are searching PyRosetta is the main topic of the documentation.6. What is a "Size" or a "Real"? Within Rosetta, several simple objects are used for basic data structures. If these are seen within PyRosetta help, they can be replaced by their appropriate Python data type. Size in an intReal is a double or float (use float in Python)Vector or Vector1 often serves the purpose of a Python list7. Where are Vector objects? Within Rosetta, Vector objects are used for various list structures. The common Vector objects are found in various locations. Please consult the question below for more information.
8. Why are Rosetta objects 1-indexed? Within Rosetta has its roots in FORTRAN so counting is "1-indexed" (the first element is numbered 1). Python on the other hand is "0-indexed" (the first element is numbered 0). The documentation discusses this in a little more depth. 9. Do other biological tools interact nicely with PyRosetta? Yes! Very well! One advantage of having Rosetta accessible in Python is the ease of using other Bioinformatics software. Biological software is usually made public as a commandline executable or even within Python. Python serves as a wonderful language to "glue" other programs and processes together. Combining PyRosetta with other tools can greatly enhance analysis and only requires one (okay two, you should be able to get around in bash) language. Some Python programs or libraries frequently used with PyRosetta are: PyMOL Biopython openbabel numeric matplotlib (and pylab) 9. Can I make a system call within the Python interpreter? Yes, and there are a lot of ways to do this. Many biological tools are separate programs or executables and Python is a perfect place to stitch together programs that share Python or require system calls from the commandline. Biopython even has a set of Python objects for handling these system calls. The most basic method uses the os module and simply executes an input string from a subprocess it creates. The Python module subprocess has better tools for managing system calls. If you are interested in combining system calls with PyRosetta methods, I suggest becoming intimately familiar with the os module.Example system call: import os os.system( 'mkdir pdb' )Structure, PDB file, and Pose Questions1. How do I create a molecule (Pose)? The Pose object represents a single molecule within PyRosetta. These molecular objects can be created (see the questions below) or constructed from a PDB file. PyRosetta is intended to work with proteins but can successfully load other compounds (with additional work). Rosetta has numerous naming preferences but is capable of loading many (if not all) protein structures. The simplest method for creating a Pose object is to load it from a PDB file using the method pose_from_pdb. pose = Pose() pose_from_pdb( pose , 'your_favorite_protein.pdb' ) pose2 = pose_from_pdb( 'my_favorite_protein.pdb' ) # this method is overloaded in newer versions of Rosetta and returns a Pose2. What is a Pose and why does it own so many other objects? The Pose is a complex data structure with various objects. An abstract summary of the Pose structure is provided in the documentation. A more detailed and accurate description of the ose data structure is found in the paper A. Leaver-Fay et al., "ROSETTA3: An object-oriented software suite for the simulation and design of macromolecules," Methods in Enzymology 487, 548-574 (2011). 3. How do I make a crystal structure suitable for PyRosetta? Missing atoms and other little errors are gracefully handled by PyRosetta. However, small clashes or other discrepancies can cause problems with some Rosetta protocols. Methods for converting a raw crystal structure into a Rosetta-solution-state structure are varied. For more information, please consul the documentation and the Rosetta user guide. 4. When I create a Pose, what does all the output mean? Rosetta is currently very verbose. This is handy when debugging problems but can be jarring at first die to the shear volume of seemingly useless output. When loading a pose from a PDB file, output will indicate what atoms are missing (and thus idealized) and notify you of other problems. Generally, you can ignore this output even if it says there is a problem. If the PDB file is successfully read, you may want to check its sequence etc. or view it using the PyMOL_Mover but the protein has most likely been loaded without any trouble. 5. Can I split a Pose, delete residues, or insert residues? Not easily. The Pose object has methods for deleting and appending (inserting) residues...but they currently don't appear to work. There may be other tricks around this issue, particularly using the Pose object's Conformation object. For now, it is easiest to manually manipulate a PDB file and load this into PyRosetta. Simple text editing can even be performed from within Python. PyMOL and Biopython have numerous tools for editing PDB files that are very useful for aiding PyRosetta. 6. How do I create a Pose from a novel sequence? The method make_pose_from_sequence can fill Pose objects with a single protein chain constructed from an input sequence of single letter amino acids. pose = Pose() make_pose_from_sequence( pose , 'THANKSEVAN' , 'fa_standard' )Unfortunately, this method defaults to 0 degrees for torsion angles (not 180 degrees, make sure you fix this for many applications) and does no construct its own PDBInfo object. The tool script mutants.py provides a method pose_from_sequence which handles these problems.If using these methods to produce a pose with non-protein residues (molecules), you can access these ResidueTypes using a single letter with its other identifier code in brackets. The single letter code and other identifiers can be found in the .params files within PyRosetta minirosetta_database/chemical/residue_type_sets/. DNA residues are named this way (adenine is "A[ADE]", guanine is "G[GUA]", cytosine is "C[CYT]", thymine is "T[THY]"). Metals and other compounds exposed by default. pose = Pose() make_pose_from_sequence( pose , 'A[ADE]G[GUA]C[CYT]T[THY]Z[MN]' , 'fa_standard' ) # Z[MN] is ManganeseDifferent ResidueTypeSets than fullatom( " fa_standard") are available as long as the sequence is appropriate.7. How do I change amino acids in a Pose? There are several ways to change a protein at the sequence level using PyRosetta. You can (a) setup a PackerTask to redesign the protein with the desired sequence changes, (b) use the method mutate_residue, or (c) the MutateResidue Mover class.a) Please consult the sample script packer_task.py for syntax on setting up a PackerTask to perform design manually or using a resfile. Since this option uses a Mover and allows multiple changes, it is the most efficient method of changing a protein's sequence. b) PyRosetta 1.0 and 2.0 have an exposed method mutate_residue which accepts a pose, a residue number, and a single letter representation of the mutant amino acid. Please consult the sample script ala_scan.py or the tool script mutants.py for an updated version of the mutate_residue method. In future releases, this method will be exposed through the tool script mutants.py. This method is easy to use and best suited for investigations using the interpreter or sequence changes performed outside of a protocol.c) The Mover MutateResidue performs the same change as the old mutate_residue method, and thus does not allow direct repacking of the sidechains near the mutant. Since this Mover's target residue and mutant identity must be set each time, it is the least efficient option. 8. How do I load in a PDB file containing DNA? PyRosetta knows the typical deoxynucleic acids as the ResidueTypes ADE, GUA, CYT, and THY and can infer this from the single letters A, G, C, and T respectively if they are in the PDB resName column (characters 18-20). Please edit the PDB file to ensure that the nucleic acids are not represented with DA, DG, DC, or DT in the PDB resName column. Use PyMOL, grep, awk, Python, Biopython, or whatever technique you prefer. Soon we will provide a tool script for performing this edit.Remember, for docking applications the downstream partner (later chain or chains) is docked to the upstream partner (first chain or chains) which is in a fixed position. Thus, for any DNA-protein docking application, it is most intuitive to dock the DNA to the protein requiring the DNA chains to be after the protein chains in the PDB file. The sample script dna_interface.py also outlines and explains this process. 9. How do I load in a PDB file containing water? Typically, you simply want to remove the water since PyRosetta does not use these for any application and they can cause problems. If you want to load water molecules into PyRosetta, you must activate them and edit the PDB file. PyRosetta knows water as the ResidueTypes TP3 and TP5 however these are "turned off" by default. To edit PyRosetta so that it will always know water HETATM lines, edit the file /minirosetta_database/chemical/residue_types/fa_standard/residue_type_sets.txt in the main PyRosetta directory by uncommenting (removing the "#" character) near line 75 which should look something like:## Water Typesresidue_types/water/TP3.paramsresidue_types/water/TP5.paramsThis file is the master list of fullatom ResidueTypes. As you probably guessed, the water .params files can be found in /minirosetta_database/chemical/residue_type_sets/fa_standard/residue_types/water.To properly load the PDB file, you must also edit its water HETATM lines to have TP3 (or TP5) in the PDB resName column (characters 18-20). Usually a PDB file will have " HOH", "WAT", or something else crazy here.10. How do I load in a PDB file containing other molecules? 11. How do I change Pose coordinates? The Pose object has a .xyz method for extracting coordinates as xyzVector objects given the residue number and the atom number. The Residue objects also contain a .xyz method for extracting coordinates as xyzVector objects given an atom number. Residue objects also support a .set_xyz method for setting coordinates of an input atom number to an input xyzVector. This example extracts all of a pose's coordinates as a per-residue list of lists of atom xyzVector objects:coords = [ [ pose.residue( r.xyz( a ) for a in range( 1 , pose.residue( r ).natoms() + 1 ) ] for r in range( 1 , pose.total_residue() + 1 ]Similarly for setting a pose based on a similar list structure: for r in range( len( coords ) ): for a in range( len( coords[r] ) ): pose.residue( r ).set_xyz( a , coords[r][a] )There are many ways of extracting coordinate information. Please consult Workshop #2, sample script ala_scan.py, or tool script extract_coords_pose.py for more information. 12. Can I calculate a protein's Radius of Gyration from a Pose? The method varies depending on the accuracy you desire. Rosetta is equipped with an rg ScoreType which rapidly approximates the Radius of Gyration using only the neighbor atoms (one representative atom per residue).scorefxn = ScoreFunction()scorefxn.set_weight( rg , 1 )rad_g = scorefxn( pose )For more accurate inquiries, Rosetta is an ideal tools for extracting atomic coordinates (in the example below, a Python list is produces from the Pose xyzVector objects). Remember that Pose objects contain hydrogen by default and scanning over the Pose will extract these coordinates. Full code for the calculation is not shown.coordinates = []for r in range( 1 , pose.total_residue() + 1 ): r = pose.residue( r ) for a in range( 1 , r.natoms() ): xyz = r.xyz( a ) coordinates.append( [ xyz[0] , xyz[1] , xyz[2] ] )13. How do I align two Pose objects? 14. How do I calculate the RMSD between two Pose objects? 15. How do I extract secondary structure information? 16. How do I write PDB files? 17. Can I send structure directly to PyMOL without writing to files? 18. What is a ResidueTypeSet? 19. What is a FoldTree (or for that matter an Edge or a Jump object) ? 20. What is an "AtomID" object? Scoring Questions1. How do I create a ScoreFunction? 2. What are the scoring units? 3. How do I extract individual residue scores? 4. How do I extract individual atom scores? 5. What are the different ScoreFunctions? 6. How do I investigate ONLY the score between two residues? 7. How do I extract hydrogen bond information? 8. How do I know what a score term calculates? 9. How do I find new score terms? 10. Can I see scores in PyMOL? Mover Questions1. What is a Mover? 2. What Mover classes are available in PyRosetta? 3. What is the best way to create a sequence of Movers? 4. How do I know what structural changes are actually performed by a Mover? 5. Minimization increased the score (or moved docking partners war apart) ? 6. Packing doesn't always yield the same score (or rotamers) ? Protocol Questions1. Is the Rosetta ab initio protocol available in PyRosetta? 2. Is Rosetta fragment selection/generation available in PyRosetta? 3. What is docking and what protocols best match my problem? 4. What is wrong with my docking? 5. How many trajectories should I run? Rosetta to PyRosetta Transition Questions1. How do I interact with the Rosetta Options system using PyRosetta? PyRosetta uses it's own getter and setter functions for setting options within PyRosetta. Generally, the getter methods syntax is: void rosetta.core.get_(data type)_option( <option name> )The full list of getters (in newer versions of PyRosetta) are: get_boolean_option get_boolean_vector_option get_file_option get_file_option_option get_integer_option get_integer_vector_option get_real_option get_real_vector_option get_string_option get_string_vector_optionGenerally, the setter methods syntax is: void rosetta.core.set_(data type)_option( <option name> , <value> )The full list of setters (in newer versions of PyRosetta) are: set_boolean_option set_boolean_vector_option set_file_option set_file_option_option set_integer_option set_integer_vector_option set_real_option set_real_vector_option set_string_option set_string_vector_optionIf these methods do not work, you may need to import them (sorry, the code has changed versions and this is not the place to explain these decisions). Try: rosetta.basic.options.get_(data type)_option( <option name> ) rosetta.basic.options.set_(data type)_option( <option name> , <value> )or rosetta.core.options.get_(data type)_option( <option name> ) rosetta.core.options.set_(data type)_option( <option name> , <value> )or import rosetta.basic.optionsFor example: from rosetta import * init() rosetta.core.set_string_option( 'in:file:frag3' , <3-mer fragment filename> ) print rosetta.core.get_string_option( 'in:file:frag3' ) rosetta.sore.set_string_option( 'in:file:s' rosetta.Vector1( [ 'a' , 'b' ] ) ) print rosetta.core.get_string_option( 'in:file:s' )The Rosetta commandline options are accessed during initialization of PyRosetta( init()). PyRosetta has default settings, some necessary and others recommended.Examples of setting optionsTo set PyRosetta initialization options: 1. import rosetta ( from rosetta import *)2. create a Python list of string arguments, make sure the first three are: a. "app" b. "-database" c. the path to your PyRosetta database d. any additional options you desire, by default PyRosetta loads with: -ex1 (extra chi 1 rotamers) -ex2aro (extra chi 2 rotamers for aromatic amino acids) 3. create a rosetta.utility.vector1_string()4. add the Python list to the vector1_string (using the .extend method)5. call rosetta.core.init with the vector1_string as an inputFor example: opts = [ 'app' , '-database' , os.path.abspath( os.environ['PYROSETTA_DATABASE'] ) , '-ex1' , '-ex2aro' , '-constant_seed' ] args = rosetta.utility.vector1_string() args.extend( opts )
rosetta.core.init( args )This may cause errors if you have altered your environment path variables. For more information, see rosetta/__init__.py in the main PyRosetta directory.2. How do I construct Vector1 objects? Vector1 is exposed in newer versions of PyRosetta and lives in rosetta.Vector1. Specific Vector1 objects live in rosetta.utility.vector1_(data type).For example: print rosetta.Vector1( [ 1 , 2 , 3 ] ) print rosetta.Vector1( [ 1.0 , 2.0 , 3.0 ] ) print rosetta.Vector1( [ True , False , True ] ) print rosetta.Vector1( [ 'a' , 'b' , 'c' ] ) v = rosetta.utility.vector1_SSize() v.append( 1 ) print v2. How do I convert "AP" or "CAP" objects to regular class objects? Use the " get" function (rosetta.utility.utility___getCAP in older releases). This is an involved issue which does not come up in common usage of PyRosetta.For example, to create a "ALA" residue: (new way) chm = rosetta.core.chemical.ChemicalManager.get_instance() rts = chm.residue_type_set( 'fa_standard' ).get()
ala = rosetta.core.conformation.ResidueFactory.create_residue( rts.name_map( 'ALA' ) ) print ala(old way) chm = rosetta.core.chemical.ChemicalManager.get_instance()
rts_AP = chm.residue_type_set( 'fa_standard' ) rts = rosetta.utility.utility___getCAP( rts_AP ) # converts a CAP object to a ResidueTypeSet object
ala = rosetta.core.conformation.ResidueFactory.create_residue( rts.name_map( 'ALA' ) ) print ala |