Please consult the forum for additional answers or to ask a new question.
|
Vector1 |
rosetta.Vector1 |
xyzVector |
rosetta.numeric.xyzVector |
vector1_ (data type) |
rosetta.utility.vector1_ (data type) |
8. Why are Rosetta objects 1-indexed?
Within Rosetta has its roots in FORTRAN so counting is "1-indexed" (the first element is numbered 1). Python on the other hand is "0-indexed" (the first element is numbered 0). The documentation discusses this in a little more depth.
9. Do other biological tools interact nicely with PyRosetta?
Yes! Very well! One advantage of having Rosetta accessible in Python is the ease of using other Bioinformatics software. Biological software is usually made public as a commandline executable or even within Python. Python serves as a wonderful language to "glue" other programs and processes together. Combining PyRosetta with other tools can greatly enhance analysis and only requires one (okay two, you should be able to get around in bash) language. Some Python programs or libraries frequently used with PyRosetta are:
PyMOL
Biopython
openbabel
numeric
matplotlib (and pylab)
9. Can I make a system call within the Python interpreter?
Yes, and there are a lot of ways to do this. Many biological tools are separate programs or executables and Python is a perfect place to stitch together programs that share Python or require system calls from the commandline. Biopython even has a set of Python objects for handling these system calls. The most basic method uses the
os
module and simply executes an input string from a subprocess it creates. The Python module subprocess
has better tools for managing system calls. If you are interested in combining system calls with PyRosetta methods, I suggest becoming intimately familiar with the os
module.Example system call:
import os
os.system( 'mkdir pdb' )
Structure, PDB file, and Pose Questions
1. How do I create a molecule (Pose)?
The Pose object represents a single molecule within PyRosetta. These molecular objects can be created (see the questions below) or constructed from a PDB file. PyRosetta is intended to work with proteins but can successfully load other compounds (with additional work). Rosetta has numerous naming preferences but is capable of loading many (if not all) protein structures. The simplest method for creating a Pose object is to load it from a PDB file using the method
pose_from_pdb
. pose = Pose()
pose_from_pdb( pose , 'your_favorite_protein.pdb' )
pose2 = pose_from_pdb( 'my_favorite_protein.pdb' ) # this method is overloaded in newer versions of Rosetta and returns a Pose
2. What is a Pose and why does it own so many other objects?
The Pose is a complex data structure with various objects. An abstract summary of the Pose structure is provided in the documentation. A more detailed and accurate description of the ose data structure is found in the paper A. Leaver-Fay et al., "ROSETTA3: An object-oriented software suite for the simulation and design of macromolecules," Methods in Enzymology 487, 548-574 (2011).
3. How do I make a crystal structure suitable for PyRosetta?
Missing atoms and other little errors are gracefully handled by PyRosetta. However, small clashes or other discrepancies can cause problems with some Rosetta protocols. Methods for converting a raw crystal structure into a Rosetta-solution-state structure are varied. For more information, please consul the documentation and the Rosetta user guide.
4. When I create a Pose, what does all the output mean?
Rosetta is currently very verbose. This is handy when debugging problems but can be jarring at first die to the shear volume of seemingly useless output. When loading a pose from a PDB file, output will indicate what atoms are missing (and thus idealized) and notify you of other problems. Generally, you can ignore this output even if it says there is a problem. If the PDB file is successfully read, you may want to check its sequence etc. or view it using the PyMOL_Mover but the protein has most likely been loaded without any trouble.
5. Can I split a Pose, delete residues, or insert residues?
Yes. The Pose object has methods for deleting and appending (inserting) residues. In addition, there are more functions and classes in the
The information, however, is not lost. It is simply taged by Rosetta as obsolete. There are two ways to approach this if your original PDB information is needed. For deletions, simply use the function:
The method
If using this method to produce a pose with non-protein residues (molecules), you can access these ResidueTypes using a single letter with its other identifier code in brackets. The single letter code and other identifiers can be found in the .params files within PyRosetta
Different ResidueTypeSets than fullatom( "
There are several ways to change a protein at the sequence level using PyRosetta. You can (a) setup a PackerTask to redesign the protein with the desired sequence changes, (b) use the method
a) Please consult the sample script packer_task.py for syntax on setting up a PackerTask to perform design manually or using a resfile. Since this option uses a Mover and allows multiple changes, it is the most efficient method of changing a protein's sequence.
b) PyRosetta 1.0 and 2.0 have an exposed method
grafting
namespace, that are not currently exposed. To use these, take a look at rosetta.protocols.grafting
. These will need to be imported. Functions include: delete_region, replace_region, return_region, insert_pose_into_pose
, and the AnchoredGraftMover
class. All have descriptions that can be accessed through ipython. Note that, for many of these functions and the functions in the pose object, the Pose' PDBInfo
object will go out of date once insertions and deletions are made. This means that you will loose your access to stored pdb information ( pdbinfo().pdb2pose, etc
), and when you dump the PDB it will have numbering starting from 1. The information, however, is not lost. It is simply taged by Rosetta as obsolete. There are two ways to approach this if your original PDB information is needed. For deletions, simply use the function:
pose.pdb_info().obsolete(false)
. For insertions, you will want to fix the PDBInfo
object directly. There are a few ways to do this, and you can use iPython tab completion and documentation to get some ideas. The function pose.pdb_info().copy
will work if your insertion has the numbering you need. If not, you can use the function pose
.
pdb_info().set_resi
nfo
to manually set a particular residue's PDB information. Before accessing the PDBInfo object or dumping a pose, you will still want to un-obsolete the PDBInfo: pose.pdb_info().obsolete(
false)
. In addition to PyRosetta you can perform many of these tasks through simple text editing software, or many molecular visualization tools. PyMOL and Biopython have numerous tools for editing PDB files that are very useful for aiding PyRosetta.6. How do I create a Pose from a novel sequence?
The method
pose_from_sequence
can fill Pose objects with a single protein chain constructed from an input sequence of single letter amino acids. pose =
pose_from_sequence( pose , 'THANKSEVAN' , 'fa_standard' )
If using this method to produce a pose with non-protein residues (molecules), you can access these ResidueTypes using a single letter with its other identifier code in brackets. The single letter code and other identifiers can be found in the .params files within PyRosetta
rosetta_database/chemical/residue_type_sets/
. DNA residues are named this way (adenine is "A[ADE]
", guanine is "G[GUA]
", cytosine is "C[CYT]
", thymine is "T[THY]
"). Metals and other compounds are exposed by default. pose =
pose_from_sequence( pose , 'A[ADE]G[GUA]C[CYT]T[THY]Z[MN]' , 'fa_standard' ) # Z[MN] is Manganese
Different ResidueTypeSets than fullatom( "
fa_standard
") are available as long as the sequence is appropriate.7. How do I change amino acids in a Pose?
There are several ways to change a protein at the sequence level using PyRosetta. You can (a) setup a PackerTask to redesign the protein with the desired sequence changes, (b) use the method
mutate_residue
, or (c) the MutateResidue Mover class.a) Please consult the sample script packer_task.py for syntax on setting up a PackerTask to perform design manually or using a resfile. Since this option uses a Mover and allows multiple changes, it is the most efficient method of changing a protein's sequence.
b) PyRosetta 1.0 and 2.0 have an exposed method
mutate_residue
which accepts a pose, a residue number, and a single letter representation of the mutant amino acid. Please consult the sample script ala_scan.py or the tool script mutants.py for an updated version of the mutate_residue
method. In current releases, this method is exposed by importing from the toolbox: from toolbox import mutate_residue
This method is easy to use and best suited for investigations using the interpreter or sequence changes performed outside of a protocol. Additionally, a packing shell can be created around the residue to mutate.
c) The Mover MutateResidue performs the same change as the old mutate_residue method, and thus does not allow direct repacking of the sidechains near the mutant. Since this Mover's target residue and mutant identity must be set each time, it is the least efficient option.
PyRosetta knows the typical deoxynucleic acids as the ResidueTypes
Remember, for docking applications the downstream partner (later chain or chains) is docked to the upstream partner (first chain or chains) which is in a fixed position. Thus, for any DNA-protein docking application, it is most intuitive to dock the DNA to the protein requiring the DNA chains to be after the protein chains in the PDB file.
The sample script dna_interface.py also outlines and explains this process.
Typically, you simply want to remove the water since PyRosetta does not use these for any application and they can cause problems. If you want to load water molecules into PyRosetta, you must activate them and edit the PDB file. PyRosetta knows water as the ResidueTypes TP3 and TP5 however these are "turned off" by default. To edit PyRosetta so that it will always know water HETATM lines, edit the file
This file is the master list of fullatom ResidueTypes. As you probably guessed, the water .params files can be found in
To properly load the PDB file, you must also edit its water HETATM lines to have TP3 (or TP5) in the PDB resName column (characters 18-20). Usually a PDB file will have "
10. How do I load in a PDB file containing other molecules?
The Pose object has a
Similarly for setting a pose based on a similar list structure:
There are many ways of extracting coordinate information. Please consult Workshop #2, sample script ala_scan.py, or tool script extract_coords_pose.py for more information.
The method varies depending on the accuracy you desire. Rosetta is equipped with an
For more accurate inquiries, Rosetta is an ideal tools for extracting atomic coordinates (in the example below, a Python list is produces from the
13. How do I align two Pose objects?
14. How do I calculate the RMSD between two Pose objects?
15. How do I extract secondary structure information?
16. How do I write PDB files?
17. Can I send structure directly to PyMOL without writing to files?
18. What is a ResidueTypeSet?
19. What is a FoldTree (or for that matter an Edge or a Jump object) ?
20. What is an "AtomID" object?
1. How do I create a ScoreFunction?
2. What are the scoring units?
3. How do I extract individual residue scores?
4. How do I extract individual atom scores?
5. What are the different ScoreFunctions?
6. How do I investigate ONLY the score between two residues?
7. How do I extract hydrogen bond information?
8. How do I know what a score term calculates?
9. How do I find new score terms?
10. Can I see scores in PyMOL?
1. What is a Mover?
2. What Mover classes are available in PyRosetta?
3. What is the best way to create a sequence of Movers?
4. How do I know what structural changes are actually performed by a Mover?
5. Minimization increased the score (or moved docking partners war apart) ?
6. Packing doesn't always yield the same score (or rotamers) ?
1. Which of Rosetta protocols for modeling membrane proteins available in PyRosetta?
c) The Mover MutateResidue performs the same change as the old mutate_residue method, and thus does not allow direct repacking of the sidechains near the mutant. Since this Mover's target residue and mutant identity must be set each time, it is the least efficient option.
8. How do I load in a PDB file containing DNA?
PyRosetta knows the typical deoxynucleic acids as the ResidueTypes
ADE
, GUA
, CYT
, and THY
and can infer this from the single letters A
, G
, C
, and T
respectively if they are in the PDB resName column (characters 18-20). Please edit the PDB file to ensure that the nucleic acids are not represented with DA
, DG
, DC
, or DT
in the PDB resName column. Use PyMOL, grep, awk, Python, Biopython, or whatever technique you prefer. Soon we will provide a tool script for performing this edit.Remember, for docking applications the downstream partner (later chain or chains) is docked to the upstream partner (first chain or chains) which is in a fixed position. Thus, for any DNA-protein docking application, it is most intuitive to dock the DNA to the protein requiring the DNA chains to be after the protein chains in the PDB file.
The sample script dna_interface.py also outlines and explains this process.
9. How do I load in a PDB file containing water?
Typically, you simply want to remove the water since PyRosetta does not use these for any application and they can cause problems. If you want to load water molecules into PyRosetta, you must activate them and edit the PDB file. PyRosetta knows water as the ResidueTypes TP3 and TP5 however these are "turned off" by default. To edit PyRosetta so that it will always know water HETATM lines, edit the file
/minirosetta_database/chemical/residue_types/fa_standard/residue_type_sets.txt
in the main PyRosetta directory by uncommenting (removing the "#
" character) near line 75 which should look something like:## Water Types
residue_types/water/TP3.params
residue_types/water/TP5.params
This file is the master list of fullatom ResidueTypes. As you probably guessed, the water .params files can be found in
/minirosetta_database/chemical/residue_type_sets/fa_standard/residue_types/water
.To properly load the PDB file, you must also edit its water HETATM lines to have TP3 (or TP5) in the PDB resName column (characters 18-20). Usually a PDB file will have "
HOH
", "WAT
", or something else crazy here.10. How do I load in a PDB file containing other molecules?
11. How do I change Pose coordinates?
The Pose object has a
.xyz
method for extracting coordinates as xyzVector objects given the residue number and the atom number. The Residue objects also contain a .xyz
method for extracting coordinates as xyzVector objects given an atom number. Residue objects also support a .set_xyz
method for setting coordinates of an input atom number to an input xyzVector. This example extracts all of a pose's coordinates as a per-residue list of lists of atom xyzVector objects:coords = [ [ pose.residue( r.xyz( a ) for a in range( 1 , pose.residue( r ).natoms() + 1 ) ] for r in range( 1 , pose.total_residue() + 1 ]
Similarly for setting a pose based on a similar list structure:
for r in range( len( coords ) ):
for a in range( len( coords[r] ) ):
pose.residue( r ).set_xyz( a , coords[r][a] )
There are many ways of extracting coordinate information. Please consult Workshop #2, sample script ala_scan.py, or tool script extract_coords_pose.py for more information.
12. Can I calculate a protein's Radius of Gyration from a Pose?
The method varies depending on the accuracy you desire. Rosetta is equipped with an
rg ScoreType
which rapidly approximates the Radius of Gyration using only the neighbor atoms (one representative atom per residue).scorefxn = ScoreFunction()
scorefxn.set_weight( rg , 1 )
rad_g = scorefxn( pose )
For more accurate inquiries, Rosetta is an ideal tools for extracting atomic coordinates (in the example below, a Python list is produces from the
Pose xyzVector
objects). Remember that Pose objects contain hydrogen by default and scanning over the Pose will extract these coordinates. Full code for the calculation is not shown.coordinates = []
for r in range( 1 , pose.total_residue() + 1 ):
r = pose.residue( r )
for a in range( 1 , r.natoms() ):
xyz = r.xyz( a )
coordinates.append( [ xyz[0] , xyz[1] , xyz[2] ] )
13. How do I align two Pose objects?
14. How do I calculate the RMSD between two Pose objects?
15. How do I extract secondary structure information?
16. How do I write PDB files?
17. Can I send structure directly to PyMOL without writing to files?
18. What is a ResidueTypeSet?
19. What is a FoldTree (or for that matter an Edge or a Jump object) ?
20. What is an "AtomID" object?
Scoring Questions
1. How do I create a ScoreFunction?
2. What are the scoring units?
3. How do I extract individual residue scores?
4. How do I extract individual atom scores?
5. What are the different ScoreFunctions?
6. How do I investigate ONLY the score between two residues?
7. How do I extract hydrogen bond information?
8. How do I know what a score term calculates?
9. How do I find new score terms?
10. Can I see scores in PyMOL?
Mover Questions
1. What is a Mover?
2. What Mover classes are available in PyRosetta?
3. What is the best way to create a sequence of Movers?
4. How do I know what structural changes are actually performed by a Mover?
5. Minimization increased the score (or moved docking partners war apart) ?
6. Packing doesn't always yield the same score (or rotamers) ?
Protocol Questions
1. Which of Rosetta protocols for modeling membrane proteins available in PyRosetta?
- RosettaMP is available in PyRosetta
- the ddG application is also accessible from PyRosetta, and there is a sample Python script that you can adapt.
- RosettaMPDock is now only available in the Rosetta C++
2. Is the Rosetta ab initio protocol available in PyRosetta?
3. Is Rosetta fragment selection/generation available in PyRosetta?
4. What is docking and what protocols best match my problem?
5. What is wrong with my docking?
6. How many trajectories should I run?
The Rosetta commandline options are accessed during initialization of PyRosetta(
To set PyRosetta initialization options:
1. import rosetta (
2. While calling rosetta.init(), pass all of the options as a string to the init() function.
For example:
3. Is Rosetta fragment selection/generation available in PyRosetta?
4. What is docking and what protocols best match my problem?
5. What is wrong with my docking?
6. How many trajectories should I run?
Rosetta to PyRosetta Transition Questions
1. How do I interact with the Rosetta Options system using PyRosetta?
init()
). PyRosetta has default settings, some necessary and others recommended.Examples of setting optionsTo set PyRosetta initialization options:
1. import rosetta (
from rosetta import *
)2. While calling rosetta.init(), pass all of the options as a string to the init() function.
For example:
rosetta.init( "-ex1 -ex2 -include_sugars -write_pdb_link_records" )
If you want to get or set global Rosetta options within PyRosetta after initialization, you can either call init again or use getter and setter functions for setting options.
Generally, the getter methods syntax is:
void rosetta.core.get_
(data type)_option( <option name> )
The full list of getters (in newer versions of PyRosetta) are:
get_boolean_option
get_boolean_vector_option
get_file_option
get_file_option_option
get_integer_option
get_integer_vector_option
get_real_option
get_real_vector_option
get_string_option
get_string_vector_option
Generally, the setter methods syntax is:
void rosetta.core.set_
(data type)_option( <option name> , <value> )
The full list of setters (in newer versions of PyRosetta) are:
set_boolean_option
set_boolean_vector_option
set_file_option
set_file_option_option
set_integer_option
set_integer_vector_option
set_real_option
set_real_vector_option
set_string_option
set_string_vector_option
If these methods do not work, you may need to import them (sorry, the code has changed versions and this is not the place to explain these decisions). Try:
rosetta.basic.options.get_
(data type)_option( <option name> )
rosetta.basic.options.set_
(data type)_option( <option name> , <value> )
or
rosetta.core.options.get_
(data type)_option( <option name> )
rosetta.core.options.set_
(data type)_option( <option name> , <value> )
or
import rosetta.basic.options
For example:
from rosetta import *
init()
rosetta.core.set_string_option( 'in:file:frag3' , <3-mer fragment filename> )
print rosetta.core.get_string_option( 'in:file:frag3' )
rosetta.sore.set_string_option( 'in:file:s' rosetta.Vector1( [ 'a' , 'b' ] ) )
print rosetta.core.get_string_option( 'in:file:s' )
This may cause errors if you have altered your environment path variables. For more information, see
rosetta/__init__.py
in the main PyRosetta directory.2. How do I construct Rosetta Vector0/Vector1 objects?
Vector0/
1
is exposed in newer versions of PyRosetta and lives in pyrosetta.rosetta.utility.vector{0/1}_*
. Specific Vector1
objects live in rosetta.utility.vector1_type
.There is also pyrosetta.Vector1 helper function that will do construction of most common types using Python list as input. For or example:
print rosetta.Vector1( [ 1 , 2 , 3 ] )
print rosetta.Vector1( [ 1.0 , 2.0 , 3.0 ] )
print rosetta.Vector1( [ True , False , True ] )
print rosetta.Vector1( [ 'a' , 'b' , 'c' ] )
v = rosetta.utility.vector1_SSize()
v.append( 1 )
print v
3. How do I construct various C++ std objects, like std::map?
C++ sts:: types is exposed in PyRosetta in pyrosetta.rosetta.std module. For example all map types could be accessed as: pyrosetta.rosetta.std.
map_type1_type2
.For example:
m
=
pyrosetta.rosetta.std.map_string_Real() m['aaa'] = 1.0; m['bb']= 3.0
print m
4. How do I construct std::set objects?
std::set
templates is exposed in PyRosetta and lives in pyrosetta.rosetta.std_*
. There is also helper function Set
that will convert Python list/set object into PyRosetta, it could be found in pyrosetta namespace:
print pyrosetta.Set( [ 1 , 2 , 3 ] )
print pyrosetta.Set( [ 1.0 , 2.0 , 3.0 ] )
print pyrosetta.Set( [ 'a' , 'b' , 'c' ] )
s
= pyrosetta.utility.Set_SSize()
s.add(1); s
.add(2); s.add(1); s.erase(2) print s
5. How do I convert "AP" or "CAP" objects to regular class objects?
Use the "
get
" function (rosetta.utility.utility___getCAP
in older releases). This is an involved issue which does not come up in common usage of PyRosetta.For example, to create a "ALA" residue:
(new way)
chm = rosetta.core.chemical.ChemicalManager.get_instance()
rts = chm.residue_type_set( 'fa_standard' ).get()
ala = rosetta.core.conformation.ResidueFactory.create_residue( rts.name_map( 'ALA' ) )
print ala
(old way)
chm = rosetta.core.chemical.ChemicalManager.get_instance()
rts_AP = chm.residue_type_set( 'fa_standard' )
rts = rosetta.utility.utility___getCAP( rts_AP ) # converts a CAP object to a ResidueTypeSet object
ala = rosetta.core.conformation.ResidueFactory.create_residue( rts.name_map( 'ALA' ) )
print ala
6. How do I use std::ostream or std::istream for methods that require it?
std::o
stream
and std::
istream
are bound in PyRosetta as pyrosetta.rosetta.std.ostream and pyrosetta.rosetta.std.istream
, respectively. Generally, objects that require these objects will also accept classes that are pyrosetta.rosetta.std.istringstream
and pyrosetta.rosetta.std.ostringstream objects. Use these types of objects instead.