A worldwide e-Infrastructure for NMR and structural biology

FCC Clustering

Structure prediction methods generate a large number of models of which only a fraction matches the biologically relevant structure. To identify this (near-)native model, we often employ clustering algorithms, based on the assumption that, in the energy landscape of every biomolecule, its native state lies in a wide basin neighboring other structurally similar states. We developed a novel clustering strategy that is based on a very efficient similarity measure - the fraction of common contacts.

You can read more and download the necessary scripts to perform FCC clustering here.

Advantages of FCC clustering vs. RMSD-based clustering:

  • 100-times faster on average.
  • Handles symmetry by consider complexes as entities instead of collections of chains.
  • Does not require atom equivalence (clusters mutants, missing loops, etc).
  • Handles any molecule type (protein, DNA, RNA, carbohydrates, lipids, ligands, etc).
  • Allows multiple levels of "resolution": chain-chain contacts, residue-residue contacts, residue-atom contacts, etc.

Requirements & Usage

  • Python2.6 or greater
  • C/C++ compiler (for the contact scripts)
  1. Create a contact list for each protein using make_contacts.py

    ./make_contacts.py a.pdb b.pdb ...

    You can also provide a text file containing one structure per line:

    ./make_contacts.py -f pdb_list.txt

  2. Generate the similarity matrix using calc_fcc_matrix.py. For symmetrical complexes, use the -i option.

    ./calc_fcc_matrix.py [-i] a.contacts b.contacts -o fcc_matrix.out

    You can also provide a text file containing one contact file per line:

    ./calc_fcc_matrix.py -f contact_list.txt -o fcc_matrix.out

  3. Calculate the clusters with cluster_fcc.py.

    ./cluster_fcc.py fcc_matrix.out 0.75 -o clusters.txt

Help on the usage and options of any of the above-mentioned scripts can be obtained by running them with the -h option.

You can email suggestions to Alexandre M.J.J. Bonvin.

Reference

Rodrigues JPGLM et al. (2012) Clustering biomolecular complexes by residue contacts similarity. Proteins 80:1810–1817

0
Your rating: None

Cite WeNMR/WestLife

 
Usage of the WeNMR/WestLife portals should be acknowledged in any publication:
 
"The FP7 WeNMR (project# 261572) and H2020 West-Life (project# 675858) European e-Infrastructure projects are acknowledged for the use of their web portals, which make use of the EGI infrastructure and DIRAC4EGI service with the dedicated support of CESNET-MetaCloud, INFN-PADOVA, NCG-INGRID-PT, RAL-LCG2, TW-NCHC, SURFsara and NIKHEF, and the additional support of the national GRID Initiatives of Belgium, France, Italy, Germany, the Netherlands, Poland, Portugal, Spain, UK, South Africa, Malaysia, Taiwan and the US Open Science Grid."
 
And the following article describing the WeNMR portals should be cited:
Wassenaar et al. (2012). WeNMR: Structural Biology on the Grid.J. Grid. Comp., 10:743-767.

EGI-approved

The WeNMR Virtual Research Community has been the first to be officially recognized by the EGI.

European Union

WeNMR is an e-Infrastructure project funded under the 7th framework of the EU. Contract no. 261572

WestLife, the follow up project of WeNMR is a Virtual Research Environment e-Infrastructure project funded under Horizon 2020. Contract no. 675858

West-Life