Copyright © 2004,2005
Copyright (c) 2004 - Steven Ness - Leiden University Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".
2004
Table of Contents
Table of Contents
Crank is a program to automate protein crystal structure determination, when starting from good quality reflection data. It indeed does not do the heavy lifting itself, but instead uses many high quality crystallographic programs, giving them properly formatted input files, and interpreting their output. It is designed to be a transparent box, allowing automation of the structure determination process in easier cases, and allow the user to run and cross compare results obtained from different techniques in difficult cases.
Crank currently has interfaces for CRUNCH2, BP3, the DREAR suite, SHELX(C/D/E), DM and SOLOMON. In addition, Crank has internal interfaces to many of the CCP4 suite of programs, including the CCP4i user interface, SFTOOLS, TRUNCATE, WILSON, MTZ2VARIOUS, CAD, SFTOOLS, MTZMADMOD, F2MTZ and others. Support for many other programs is currently being added to Crank.
If you are a developer and would like Crank to interface to your program, please contact us. It's a simple and quick process, especially with the help of the developer.
Currently, Crank has support for SAD, SIR and SIRAS experiments. Support for MAD and MIR(AS) is being added.
Crank takes merged intensity data from a CCP4 MTZ file. It is important for the direct methods programs used to have intensity data, as a small, but sometimes crucial bit of information is lost in the conversion from Intensities (I's) to Amplitudes (F's). However, soon, Crank will allow for import of both Amplitude and/or Intensity data.
From this data, Crank can first determine either the anomalously diffracting or heavy atom substructure, depending on the experiment type. These tasks can be performed by either CRUNCH2 or SHELXC/D. It can also take user entered coordinates, if these are previously known. It then can refine these coordinates with BP3 or MLPHARE, and produce phase estimates. These phase estimates can then be density modified by either SOLOMON, DM or SHELXE.
Table of Contents
1) Download the crank-XXXX-0.8.tgz file for your required platform(s) from: http://www.bfsc.leidenuniv.nl/software/crank
2) Untar the crank tar file
tar xvzf crank-XXXXX-0.8.0.tgz
3) Add the crank bin directory to your path
csh or tcsh:
setenv PATH $PATH\:/directory/to/crank/bin
bash:
export PATH="$PATH:/directory/to/crank/bin"
4) Set the environment variable CRANK to the main crank directory
csh or tcsh:
setenv CRANK "/directory/to/crank"
bash:
export CRANK="/directory/to/crank"
5) Install the CCP4i interface from the file "crank.tar.gz" in crank/ccp4i directory with the following steps. (a) Start ccp4i (as the CCP4 system administrator) (b) From the "System Administration" menu, choose the "Install Tasks" option. (c) From the "Task archive" panel, select the "crank.tar.gz" located in the crank/ccp4i directory. (d) Click on "Apply" and restart ccp4i - the crank task should appear at the end in the "Program List" and "Experimental Phasing" module!
6) Install the BP3 CCP4i interface. Complete the same steps as listed in #5, replacing the .tar.gz with "bp3.tar.gz"
7) Test Crank In order to fully test your Crank engine installation, you can type the following commands. The Crank job that runs will take from 15 min. to over an hour to complete, depending on your processor.
cd $CRANK/test/dna360
crank dna360.input.xml > & log
tail -f log
If the job completes without error, your Crank engine install is complete.
8) Run Crank engine If you are already running the CCP4i interface, stop all running instances, and type the command:
ccp4i
Then, using the main CCP4i menu on the far left hand side of the interface, select "Program List", then scroll down to "Crank" and click it.
9) Tutorials We have prepared a SAD tutorial for users. All input files have been uploaded to the Crank website, please visit it for the latest versions at: http://www.bfsc.leidenuniv.nl/software/crank
Table of Contents
Crank is most easily run through its CCP4i interface. To start the Crank CCP4i interface type the command:
ccp4i
Then, using the main CCP4i menu on the far left hand side of the interface, select "Program List", then scroll down to "Crank" and click it.
The main fields in the Crank interface are described below. Optional fields in subtasks are not yet included.
Table of Contents
In the Crank CCP4i interface, you specify the Crank output MTZ file, this file is created in the directory you specify. If you have troubles finding it, look in your CCP4 project directories.
This .mtz file contains both all the input columns that Crank knows about, copied from your input .mtz file. In addition, it contains all the final phase estimates, figures of merit and Hendrickson-Lattman parameters, as estimated by the various programs in the crystallographic pipeline.
The Crank log file contains output not only from Crank, but from all the programs in the pipeline, therefore, this file can grow very large. Although we apologize for the size, we think that it is important for the user to read and understand the information in the logfile.
To this end, we are currently developing tools in the Crank suite to parse the logfile information from all the various crystallographic tools and to convert it into a format amenable both for users to examine and for programs to use to make decisions on the direction of the pipeline, depending on program data.
Until then, some hints about what to look for in the logfile:
The DREAR suite outputs many useful logfiles as it runs. These files end with the suffix .lp and are in the crank/1-crunch2/mtz2drear directory. For more information, please visit the DREAR website. http://www.hwi.buffalo.edu/SnB/DREARhelp/DREAR.htm
For output, CRUNCH2 produces an atomic coordinates file for each trial specified by the user. These files have a simple four-column format, with the first column specifying the relative occupancy of the site, and the next three columns specifying the X, Y and Z coordinates of this atom. This format is shown below:
15.0 0.3040 0.0730 0.3246 13.8 0.6066 0.0658 0.5039 11.6 0.7966 0.0535 0.2389 10.8 0.6472 0.0615 0.5083 0.1621
In addition, CRUNCH2 has three main output files, "short.out", which is included in the Crank output file, "output", which contains much more detailed output information from the CRUNCH2 run, and "hits", which exhaustively details the progress of each trial run. It is often of use to examine the "output" file in cases of problems.
For all information about SHELXC/D/E output, please visit the SHELX website at:
http://shelx.uni-ac.gwdg.de/SHELX/
BP3 produces a single unified log file which is included in the Crank output. First, BP3 prints it's header, followed by it's interpretation of the contents of the command file. In case of problems, it is advised to make sure the command file and the commands interpreted by BP3 are as similar as possible. In default mode, BP3 then proceeds with two refinement cycles, one just refining occupancies and one refining all parameters.
After refinement of the parameters, BP3 then prints out the coordinates, occupancies and B-factors of the input atoms and outputs its best phase estimates and phase probability statistics to the output MTZ file.
When running with SAD or SIR(AS) experiements, we do not have knowledge of which enantiomorph is the correct one, so we run both, one as "hand1" and the enantiomorph as "hand2". The DM output from these two hands is appended to the Crank logfile. Important pieces of information are if in the sections displayed through the macromolecule are contiguous.
For more information, please consult the DM manual directly.
For all information about SHELXC/D/E output, please visit the SHELX website at:
Table of Contents
Crank uses XML for communication both inside and outside the suite. Inside the suite, Crank uses XML to construct command files for the various programs in Crank. It also uses XML to transmit coordinate data, program output information, and timing results.
XML (eXtensible Markup Language) is very similar to the HTML language (Hypertext Markup Language) used to construct web pages. Like HTML it uses tags to annotate, or mark-up text files, adding semantic knowledge to the flat text file. An example of XML is shown below:
<dataset_info> <num_atoms>3</num_atoms> </dataset_info>
In the XML shown above, you can see some essential features of XML. First, XML uses tags to mark up the document. These tags are enclosed in angle braces "<" and ">". Second, you can see that there is always a start tag (beginning with "<" and a matching end tag, staring with "</".
Crank uses this structured markup language to facilitate the passing of information from one crystallographic program to another.
Crank uses a hierarcherically structured directory system for its run directory, in order to maintain order with all the different runs of the various crystallographic programs. There are two types of directories under the Crank directory hierarchy, directories where crystallographic programs are run, and directories where information is collected.
The directories where programs are run in always start with a number and a dash as in "1-crunch2" or "2-bp3". The first number signifies the order in which this program was run in the pipeline and the second identifier shows which program was run.
Inside these directories are where the various crystallographic programs are run. To simplify a complex situation, inside each of these directories are all the files produced by a run of the given crystallographic program. These directories are constructed by Crank, which first builds a shell script to run the program. This shell script is designed to be as close as possible to the example scripts generated by the program author. Then, Crank copies all the requisite data files for the program into the run directory.
Then, Crank simply executes the shell script that it has built, timing the run and collecting results. The results are then converted into XML format and the reflection file data is converted into MTZ format.
Table of Contents
More information about Crank can be found both on the Crank website and in the Crank distribution. In addition, a paper discussing Crank and it's results has been published.
The Crank website can be found at http://www.bfsc.leidenuniv.nl/software/crank. On it, you can find lots of information about Crank, from how to run Crank to how to interpret results. There is also a SAD Tutorial on the Crank website that has detailed steps of how to perform a SAD experiment with Crank.
In the Crank distribution, there are documents, test directories and PDF versions of some of the papers published about Crank and some of the programs that comprise Crank. All documents are found in the "doc" directory of the Crank distribution, and include README's and .pdf versions of papers. The following papers have been included
wd5000r.pdf Navraj S. Pannu, Airlie J. McCoy and Randy J. Read Application of the complex multivariate normal distribution to crystallographic methods with insights into multiple isomorphous replacement phasing Acta Cryst. (2003). D59, 1801-1808
wd5005r.pdf Navraj S. Pannu and Randy J. Read The application of multivariate statistical techniques improves single-wavelength anomalous diffraction phasing. Acta Cryst. (2004). D60, 22-27
gc0004.pdf J.L van der Plas, R.A.G. de Graaff, and H. Schenk On the Use of Eigenvalues and Eigenvectors in the Phase Problem Acta Cryst. (1998). A54, 262-266
gc0005.pdf J.L. van der Plas, R.A.G. de Graaff, and H. Schenk Karle-Hauptman Matrices and Eigenvalues: a Practical Approach Acta Cryst. (1998). A54, 267-272
jn0094.pdf Rudolf A.G. de Graaff, Mark Hilge, Jaco L. van der Plas and Jan Pieter Abrahams Matrix methods for solving protein substructure of chlorine and sulfure from anomalous data Acta Cryst. (2001). D57, 1857-1862
There is more information about CRUNCH2 at the CRUNCH2 website:
http://www.bfsc.leidenuniv.nl/software/crunch2
In addition, there are three papers included in the "doc" directory in the Crank distribution that relate to Crunch2
gc0004.pdf J.L van der Plas, R.A.G. de Graaff, and H. Schenk On the Use of Eigenvalues and Eigenvectors in the Phase Problem Acta Cryst. (1998). A54, 262-266
gc0005.pdf J.L. van der Plas, R.A.G. de Graaff, and H. Schenk Karle-Hauptman Matrices and Eigenvalues: a Practical Approach Acta Cryst. (1998). A54, 267-272
jn0094.pdf Rudolf A.G. de Graaff, Mark Hilge, Jaco L. van der Plas and Jan Pieter Abrahams Matrix methods for solving protein substructure of chlorine and sulfure from anomalous data Acta Cryst. (2001). D57, 1857-1862
There is more information about BP3 at the BP3 website:
http://www.bfsc.leidenuniv.nl/software/bp3
In addition, there are two papers included in the Crank distribution that refer to the algorithms used in BP3.
wd5000r.pdf Navraj S. Pannu, Airlie J. McCoy and Randy J. Read Application of the complex multivariate normal distribution to crystallographic methods with insights into multiple isomorphous replacement phasing Acta Cryst. (2003). D59, 1801-1808
wd5005r.pdf Navraj S. Pannu and Randy J. Read The application of multivariate statistical techniques improves single-wavelength anomalous diffraction phasing. Acta Cryst. (2004). D60, 22-27
Although Crank has an interface to SHELX, SHELX is not part of the Crank suite. To obtain the SHELX suite, and for more information, please go to the SHELX website:
Crank uses libraries from the CCP4 project extensively. In fact, for it's interface, Crank uses the CCP4i interface and libraries. In addition, Crank uses many programs from the CCP4 suite including SFTOOLS, TRUNCATE, WILSON, MTZ2VARIOUS, CAD, SFTOOLS, MTZMADMOD, F2MTZ and others.
In addition, for density modification, Crank uses the DM program which is included in the CCP4 distribution. For more information on DM, please consult the CCP4 program documentation or the following paper:
Cowtan, K. (1994). Joint CCP4 and ESF-EACBM Newsletter on Protein Crystallography, 31, p34-38.
For more information on the CCP4 please visit the CCP4 website:
Table of Contents
“Crank: New methods for automated macromolecular crystal structure solution”. Structure. 1753-1761. October 2004. .
1996. “Methods used in the structure determination of bovine mitochondrial F1 ATPase.”. Acta Cryst.. 30-42. . .
1999. “Difference Structure Factor Normalization for Determining Heavy-Atom or Anomalous Scattering Substructures.”. J. Appl. Cryst.. 664-670. .
2003. “Automated crystallographic system for high-throughput protein structure determination.”. Acta Cryst.. 1138-1144. .
2003. “SAD or MAD phasing: location of the anomalous scatterers.”. Acta Cryst.. 662-669. .
1994. “Density Modification”. Joint CCP4 and ESF-EACBM Newsletter on Protein Crystallography. p34-38. .
1999. .“Can anomalous signal of sulfur become a tool for solving protein crystal structures? ”. J. Mol. Biol.. , 83-92. . .
2001. “Anomalous signal of phosphorus used for phasing DNA oligomer: importance of data redundancy.”. Acta Cryst.. , 990-995. . .
2002. “Jolly SAD.”. Acta. Cryst.. 494-50. .
2003. “Is it jolly SAD? ”. Acta. Cryst.. , 1958-1965. .
1999. “Datasets to Maps: A Wrapper for SHELXS and the CCP4 Program Suite”. CCP4 Newsletter on Protein Crystallography. .
2001. “Matrix methods for solving protein substructures of chlorine and sulfur from anomalous data.”. Acta Cryst.. 1857-1862. .
2003. “Substructure search procedures for macromolecular structures.”. Acta. Cryst.. 1966-1973. .
2001. “An Open Source Multi-purpose Programming Environment for Macromolecular Crystallography.”. CCP4 Newsletter on Protein Crystallography. .
2002. “Development of PDBj-ML.”. Acta Cryst. C367. .
1950. “The phases and magnitudes of the structure factors.”. Acta Cryst.. 181-187. .
1997. Methods Enzymol.. 472-494. . .
2002. “Automation of the collection and processing of X-ray diffraction data - a generic approach.”. Acta Cryst.. 1924-1928. .
2002. “ARP/wARP's model-building algorithms. I. The main chain.”. Acta Cryst.. 968-975. .
1997. “Refinement of Macromolecular Structures by the Maximum-Likelihood Method.”. Acta Cryst.. 240-255. .
2003. “Application of the complex multivariate normal distribution to crystallographic methods with insights into multiple isomorphous replacement phasing.”. Acta Cryst.. 1801-1808. .
2004. “The application of multivariate statistical techniquest improves single-wavelength anomalous diffraction phasing.”. Acta Cryst.. 22-27. .
1999. “Automated protein model building combined with iterative structure refinement.”. Nature Struct. Biol. . 458-463. .
1998. “On the Use of Eigenvalues and Eigenvectors in the Phase Problem.”. Acta Cryst.. 262-266. .
1998. “Karle-Hauptman Matrices and Eigenvalues: a Practical Approach.”. Acta Cryst.. 267-272. .
2003. “A graphical user interface to the CCP4 program suite.”. Acta Cryst.. 1131-1137. .
2002. “Substructure solution with SHELXD.”. Acta Cryst.. 1772-1779. .
2002. “Matching selenium-atom peak positions with a different hand or origin.”. J. Appl. Cryst.. 368-370. .
1999. “Automated MAD and MIR structure solution.”. Acta Cryst.. 849-861. .
1999. “The design and implementation of SnB v2.0”. J. Appl. Cryst. 120-124. .
2002. “Ongoing developments in CCP4 for high-throughput structure determination.”. Acta Cryst.. 1929-1936.. .