Sieve - a tool for 3D protein structure description, comparison
and classification
Description: Sieve reads
through a directory and calculates average occurrences of patterns of
one, two and three crossings of the carbon alpha path of all chains of
protein pdb-files it finds.
Download: The Sieve software may be downloaded on these terms.
Compile: Sieve is compiled entering
>cc Sieve.c -lm -O3 -o Sieve
Run: To run GI enter >Sieve 20 100 25 /path/to/pdb_file_directory/ output.file
0.000000001
The program takes five or six arguments.
The first two arguments are limits for the length of proteins it should
process. For example, 20 100
would indicate that the program should only treat proteins for which the
number of carbon alphas is between 20 and 100.
The third argument, 25
, is
a triangulation parameter.
The fourth argument is a directory name (which must end with a " /" )
in which the program will look for protein data files.
The fifth argument is the name of a file into which output data is to
be written. Note, this file is only appended.
The optional sixth argument, 0.000000001
,
is the amplitude of random noise to be added to atomic coordinates. If
omitted, this amplitude is set to zero.
The program: Sieve searches through the given
directory for files ending in `` .pdb''. For each such file, it reads through
its output file (which is not overwritten, but only appended to)
to see if there already is an entry for that protein. If so, it passes over
to the next one. If not, it computes the measures for this new protein
if it can, and appends a line to the output file if it could.
The output file is only opened for reading and writing, but not during
any computation. Once a line is appended to the output file, the output
stream is flushed (any buffered but unwritten data is written). This means
that the program can be aborted and restarted without losing more than the
computation in progress (i.e. one single protein). It also means that one
can first set the program to treat a set of proteins without any perturbation
of atomic coordinates (i.e. no sixth argument). It will compute the measures
of those it can, but not produce an output line for those which caused numerical
problems. One can then start again with a small perturbation to treat the
remainder.
Output: The columns of output.file
are
pdb.file chainID #C-alphas_missing
#C-alphas and then 29 structural measures, ordered as in Table
3 in our paper below, for example
1cd1C2.pdb C 0 95 -2.2006067934 23.21.....
Note: We have not
considered backbones if more than 3 C-alpha atoms are missing. This
is because, Sieve connects the carbon alpha atoms it finds and big gaps
in the backbone thus may give a "backbone" that is very different from
what the true backbone was supposed to be. To compute the number,
#C-alphas_missing, Sieve just counts the number of carbon alpha atoms and
compare this with the starting and ending residue number. In the case of
pdb-files with non consecutive numbering, this may give strange results.