ProteinCut
by Nat Laughlin on 2011-10-19
ProteinCut is a utility for visualizing how proteolytic enzymes cut an amino acid sequence according to certain well-defined rules. This was a final project for a biochemsitry class in 2005.
Enzymes have many functions, including synthesis and deconstruction. Proteolytic enzymes are the latter sort- that is, they break certain bonds between amino acids, breaking the protein into fragments.
Consider the following reactions, where A is arginine, K is lysine, and P is proline. This is describing trypsin acting on a 7-amino acid chain, part of some protein.
AAAKAAA + trypsin → AAAK + AAA (1)
AAAKPAA + trypsin → AAAKPAA (2)
Trypsin, a common digestive enzyme made in your pancreas, will cleave the carboxyl side of lysine or arginine unless it is followed by proline. Other enzymes have similar rules, which are fairly simple. The next table summarizes what sequences different enzymes cut under which conditions.
| Enzyme | Cleaves (C-terminus) | Rule |
| Cyanogen bromide | Methionine | Always |
| Trypsin | Lysine |
Unless Proline follows |
| Chymotrypsin | Phenylalanine |
Unless Proline follows |
| Phosphatase | Aspartic acid Glutamic acid |
Always |
By breaking long polypeptide chains into smaller ones, the original amino acid sequence can then be deduced. This technique is called peptide mass fingerprinting (PMF).
Design
ProteinCut does not simulate the actual dynamics of the cleavage process; it simply parses sequences.
I decided to use the PDB (protein data bank) format because it seems to be the standard.
The program, written in C++, reads the SEQRES and ATOM entries in a PDB file and determines the abbreviated polypeptide sequence. Next, it cuts this sequence, according to the enzyme method, into subchains and assigning new chain names to the SEQRES entries. Finally, it outputs a modified PDB file.
ProteinCut currently supports the following enzymes:
| Enzyme | Method |
| Cyanogen bromide | cnbr |
| Trypsin | trypsin |
| chymotrypsin | chymotrypsin |
| Phosphatase | aphos |
Usage
To experiment with the code, you can visit the RCSB Protein Data Bank and search for interesting molecules, like 'green fluorescent jellyfish'. This will return a list of candidate structures, which can be downloaded as a PDB file. In the source package, I include several test PDB files.
First, you need to compile the source code. Under Linux, the command is
$ g++ proteincut.cpp -o proteincut
To apply trypsin to the protein 1FK5 (a phospholipid transfer protein in maize):
$ proteincut 1FK5.pdb 1FK5_trypsin.pdb trypsin Residues: 93 Chains:1 Disulfide bonds:4 Sequence: AISCGQVASAIAPCISYARGQGSGPSAGCCSGVRSLNNAARTTADRRAACNCL KNAAAGVSGLNAGNAASIPSKCGVSIPYTISTSTDCSRVN Cut Sequence: AISCGQVASAIAPCISYAR GQGSGPSAGCCSGVR SLNNAAR TTADR R AACNCLK NAAAGVSGLNAGNAASIPSK CGVSIPYTISTSTDCSR VN

A file has been generated with the results, called 1FK5_trypsin.pdb. You can see the changes in Protein Explorer or Swiss PDB viewer, which shows the protein now has several extra amino acid chains where the enzyme cut. In the image above you can see the results- point the cursor at the image to see the modified sequence of this example protein.
Files
This package includes C++ source code and 3 example proteins, including the one modified in the example. 101M is sperm whale myoglobin and 1APH is cow insulin.
Download
Compressed CPP & PDB
Source code and examples
87,952 bytes
MD5: FF5310C2E31C0CAB17B15BD67A81401E