A worldwide e-Infrastructure for NMR and structural biology

Introduction to Biomolecular NMR spectroscopy and WeNMR

NMR Spectroscopy is one of two techniques that allow determining three dimensional (3D) structures of biomacromolecules, such as proteins, RNA, DNA, and their complexes, at atomic resolution. Knowledge of their 3D structures is vital for understanding functions and mechanisms of action of macromolecules, and for rationalizing the effect of mutations. 3D structures are also important as guides for the design of new experimental studies and as starting point for rational drug design. An advantage of NMR over X-ray crystallography is that it also allows investigation of time-dependent chemical and conformational phenomena, including reaction and folding kinetics and intramolecular dynamics. For these reasons, NMR plays an important role within the life sciences.

The principles underlying NMR are modulation of the natural magnetic moment of atomic nuclei, and measurements of how the system relaxes back to the initial state (Bloch, 1946; Purcell, et al., 1946). The signal thus obtained is a fading wave consisting of many individual frequency contributions: the Free Induction Decay, FID. Typically, up to 27000 different frequencies can be resolved at the highest magnetic fields that are nowadays available. To investigate the frequency contributions and their decays, such measurements have to be repeated many times, due to the low signal-to-noise ratio. To obtain structural information from NMR data, many more, but also more complex measurements have to be run, yielding substantial amounts of data that need processing.

Processing data from NMR to obtain a 3D structure typically involves the following steps, summarized graphically in Figure 1. First the raw data have to be processed, more specifically Fourier-transformed, to obtain spectra revealing the different frequency contributions and their relations. These frequencies are the resonances of the atoms measured, but to infer structural information from them, these resonances subsequently have to be assigned to individual contributors (atoms/residues). If the assignment is sufficiently complete, structural restraints can be determined from the spectra, including inter-atomic distance restraints, dihedral angle restraints, and orientation restraints. These structural restraints are then used to calculate a number of structures using a variety of molecular modeling approaches, after which structure validation checks are performed to assert the quality of the results.

Fig. 1 NMR data processing from signal to 3D structure After acquisition of the primary NMR data, these are Fourier transformed to obtain spectra in which the individual frequency contributions or resonances of spin systems, and their relations, are revealed. The resonances subsequently have to be assigned to individual atoms. If sufficient resonances have been assigned, restraints can be inferred from the data, pertaining to distances between atoms, dihedral angles, domain orientations, etc. When an adequate number of restraints is available, these can be used to calculate a set of three-dimensional structures optimally satisfying these restraints. The resulting structures represent the structure of the protein in solution, which is validated against the available experimental data. Although the process is here depicted linearly, intermediate stages may involve iterative cycles of refinement.

For each of the steps involved, specialized computer programs are available, each with its own characteristics and often with its own data format. Processing of NMR data has thus become a task for specialists, who can understand the data and their formats, as well as the programs, with installation requirements and usage details. Furthermore, NMR data processing requires considerable data storage and computational resources. These factors together currently represent a barrier for groups in life sciences to employ the full power of NMR. Against this background, the eNMR project was ran as a European initiative funded under the Framework 7 e-Infrastructure programme to considerably facilitate this process. It is now carried on by the WeNMR project since November 2010. It aims at allowing groups lacking the resources to add NMR to their toolbox, as well as to allow dedicated NMR groups to improve their standard from basic practice towards cutting-edge research. 

The main objectives of the WeNMR project are:

  • to provide integrated protocols for NMR data processing
  • to provide access to end users through user-friendly web interfaces
  • to exploit Grid technology for computationally demanding tasks in structural biology
  • to lower the barriers for access to Grid resources in life sciences, notably in structural biology
  • to build a virtual research community around a web portal
  • to initiate SAXS (Small-angle X-ray scattering) integration into the WeNMR project

Considering the background sketched, these objectives set the challenges to be met within the project. The first of these has been the implementation of a new NMR Grid infrastructure. Historically, due to the requirements for processing of large amounts of data, NMR spectroscopy has always been intimately linked with high performance computing. Therefore, sites with high-end facilities for performing NMR measurements commonly also have considerable computational resources. For the WeNMR partners it thus came as a natural first step to integrate the existing resources into a Grid, offering a single standard for deployment and use of applications across the contributing sites, as well as a natural mechanism to share resources. Currently, the WeNMR project involves an operational Grid, running gLite 3.1 and 3.2 middleware, and the individual sites are being part of the EGI provided by National Grid Initiatives (NGIs) and their infrastructures from Europe and elsewhere.

Having an operational Grid, the programs involved in the different steps, which often require direct user interaction, have to be interfaced in such a way that they can be run automatically. Focus has been initially placed on the CPU intensive programs, which have to be operated remotely as Grid enabled applications. This has to be done in such a way that they can be combined in automated workflows for protocolized processing of data, raising the issue of interoperability. In addition, web interfaces should be set up to be easy to use, yet sufficiently flexible for expert users. At the same time a mechanism is required to handle job traffic to and from the Grid.

 
0
Your rating: None

Cite WeNMR/WestLife

 
Usage of the WeNMR/WestLife portals should be acknowledged in any publication:
 
"The FP7 WeNMR (project# 261572) and H2020 West-Life (project# 675858) European e-Infrastructure projects are acknowledged for the use of their web portals, which make use of the EGI infrastructure and DIRAC4EGI service with the dedicated support of CESNET-MetaCloud, INFN-PADOVA, NCG-INGRID-PT, RAL-LCG2, TW-NCHC, SURFsara and NIKHEF, and the additional support of the national GRID Initiatives of Belgium, France, Italy, Germany, the Netherlands, Poland, Portugal, Spain, UK, South Africa, Malaysia, Taiwan and the US Open Science Grid."
 
And the following article describing the WeNMR portals should be cited:
Wassenaar et al. (2012). WeNMR: Structural Biology on the Grid.J. Grid. Comp., 10:743-767.

EGI-approved

The WeNMR Virtual Research Community has been the first to be officially recognized by the EGI.

European Union

WeNMR is an e-Infrastructure project funded under the 7th framework of the EU. Contract no. 261572

WestLife, the follow up project of WeNMR is a Virtual Research Environment e-Infrastructure project funded under Horizon 2020. Contract no. 675858

West-Life