ProteinCut

by Nat Laughlin on 2011-10-19

ProteinCut is a utility for visualizing how proteolytic enzymes cut an amino acid sequence according to certain well-defined rules. This was a final project for a biochemsitry class in 2005.

Enzymes have many functions, including synthesis and deconstruction. Proteolytic enzymes are the latter sort- that is, they break certain bonds between amino acids, breaking the protein into fragments.

Consider the following reactions, where A is arginine, K is lysine, and P is proline. This is describing trypsin acting on a 7-amino acid chain, part of some protein.

AAAKAAA + trypsin → AAAK + AAA (1)

AAAKPAA + trypsin → AAAKPAA (2)

Trypsin, a common digestive enzyme made in your pancreas, will cleave the carboxyl side of lysine or arginine unless it is followed by proline. Other enzymes have similar rules, which are fairly simple. The next table summarizes what sequences different enzymes cut under which conditions.

Enzyme Cleaves (C-terminus) Rule
Cyanogen bromide Methionine Always
Trypsin

Lysine
Arginine

Unless Proline follows
Chymotrypsin

Phenylalanine
Tryptophan
Tyrosine

Unless Proline follows
Phosphatase Aspartic acid
Glutamic acid
Always

By breaking long polypeptide chains into smaller ones, the original amino acid sequence can then be deduced. This technique is called peptide mass fingerprinting (PMF).

Design

ProteinCut does not simulate the actual dynamics of the cleavage process; it simply parses sequences.

I decided to use the PDB (protein data bank) format because it seems to be the standard.

PDB 2.3 Documentation

The program, written in C++, reads the SEQRES and ATOM entries in a PDB file and determines the abbreviated polypeptide sequence. Next, it cuts this sequence, according to the enzyme method, into subchains and assigning new chain names to the SEQRES entries. Finally, it outputs a modified PDB file.

ProteinCut currently supports the following enzymes:

Enzyme Method
Cyanogen bromide cnbr
Trypsin trypsin
chymotrypsin chymotrypsin
Phosphatase aphos

Usage

To experiment with the code, you can visit the RCSB Protein Data Bank and search for interesting molecules, like 'green fluorescent jellyfish'. This will return a list of candidate structures, which can be downloaded as a PDB file. In the source package, I include several test PDB files.

First, you need to compile the source code. Under Linux, the command is

$ g++ proteincut.cpp -o proteincut

To apply trypsin to the protein 1FK5 (a phospholipid transfer protein in maize):

$ proteincut 1FK5.pdb 1FK5_trypsin.pdb trypsin
Residues: 93
Chains:1
Disulfide bonds:4

Sequence:
AISCGQVASAIAPCISYARGQGSGPSAGCCSGVRSLNNAARTTADRRAACNCL
KNAAAGVSGLNAGNAASIPSKCGVSIPYTISTSTDCSRVN

Cut Sequence:
AISCGQVASAIAPCISYAR
GQGSGPSAGCCSGVR
SLNNAAR
TTADR
R
AACNCLK
NAAAGVSGLNAGNAASIPSK
CGVSIPYTISTSTDCSR
VN

A file has been generated with the results, called 1FK5_trypsin.pdb. You can see the changes in Protein Explorer or Swiss PDB viewer, which shows the protein now has several extra amino acid chains where the enzyme cut. In the image above you can see the results- point the cursor at the image to see the modified sequence of this example protein.

Files

This package includes C++ source code and 3 example proteins, including the one modified in the example. 101M is sperm whale myoglobin and 1APH is cow insulin.


Download
Compressed CPP & PDB
Source code and examples
87,952 bytes
MD5: FF5310C2E31C0CAB17B15BD67A81401E