**2. NMR spectroscopy**

NMR is a technique that detects electrical currents induced by precessing nuclear magnetic moments within a uniform static magnetic field.[11] Nuclei with non-zero spin moments are MR active and in principle are detectable. Each individual type of spin-active nucleus has a unique precessional frequency dependent upon the strength of the static magnetic field, the magnetic properties of the isotope and the local electronic environment of the nucleus. The general precessional frequency is dependent upon the type of nucleus and thus NMR can readily distinguish among for example 13C, 1 H, 2 H or 3 H. The applications and significance of NMR has exploded because the exact precessional frequency (*i.e*. the chemical shift) with‐ in a group of the same nuclei is influenced by the local electronic environment of the nuclei and thus, NMR can readily distinguish (for example) a 1 H nuclei that is chemically bound to different nuclei (*e.g*. carbon vs. nitrogen, *etc*.), is chemically bound to different oxidation states of the same nuclei (*e.g*. methyl vs. methylene carbon), and/or are in identical bonding environments (*e.g*. methyl 1 H nuclei) but in different electronic environments induced by surrounding functional groups (*e.g*. aromatic vs. carbonyl groups). In addition, if a nucleus is influenced by another spin active nucleus either through a bond connection or in spatial proximity, a correlation exists that may be NMR detectable. In this way atomistic properties (such as the 3D spatial arrangement of nuclei and dynamics) can be determined.[12]

In addition to the distinct nuclear chemical shift, data from MR can be further separated based upon relaxation and/or diffusion properties of a nucleus or molecule.[12,13] Thus, MR technologies can discriminate among large molecules like peptides, proteins and macromo‐ lecular assemblies, and small molecules like metabolites or synthetic organic molecules. The relaxation time (influenced by the rotational correlation time and molecular fluctuations) of a molecule plays an important role in distinguishing among small drug molecules and large proteins, or between a single lipid molecule that behaves as a small molecule and an assem‐ bly of lipid molecules that, as a collective, behave as large molecules.

One drawback is that under typical conditions MR techniques are sample intensive requir‐ ing µM to mM concentrations translating to µg to g quantities of material. For natural prod‐ uct discovery, the sample intensive requirement can be an issue as extracts may only contain nano to micro gram quantities of material.[14] A number of methods have been proposed to overcome the mass demand with the most significant for general applications being the in‐ vention of cryogenically helium cooled detection systems that substantially reduce thermal noise ultimately improving the signal-to-noise ratio by up to 10 fold and reducing data ac‐ quisition times by up to 100 times.[15,16] A next generation improvement is cryogenically cooled probes that require smaller sample volumes. Currently the combination of the 700 MHz NMR spectrometer and new detection technologies requiring only 35 µl of sample af‐ fords the Biomolecular Magnetic Resonance Facility at NRC-Halifax one of the world's most sensitive instruments for mass-limited samples reducing the typical quantities by up to 50 times.[17] The limits of detection for this instrument can be as low as 10 nano-grams for small molecules (IWB, NM, TK and RTS unpublished).

Although to some extent the sample intensive nature of NMR can be addressed, a second drawback for larger proteins and macromolecular assemblies (>40 kDa) is the loss of peak resolution due to spectral overlap, broad line-widths, reduced signal-to-noise ratios and in‐ creased spectral complexity[12]. There have been efforts to address this issue however, these efforts are limited in scope and application.[18-20]

#### **2.1. Structure elucidation**

solids and mixtures obtaining comprehensive information of the chemical and physical properties. In addition to the typical static structural information, one can also detail dy‐ namic processes. NMR measurements provide information about dynamic processes with rates in the range from 10-2 to 10-10 sec-1. Furthermore, many nuclei possess magnetic mo‐ ments, and with the availability of more sensitive spectrometers, chemists are beginning to take greater advantage of the technique for structure/bonding information for organometal‐

An important application, although commonly overlooked, is the accurate quantitative in‐ formation that can be obtained without the need for laborious calibrations. Under quantita‐ tive conditions and for all practical purposes with semi-solid or solution state samples, NMR spectroscopy has the unique distinction of having a uniform molar response for all

calibrated (internal or more significantly external) standard can be used for accurate quanti‐ tation.[10] For the aforementioned reasons NMR is a valuable tool for providing atomistic structural, dynamic and quantitative information on natural products such as small mole‐ cules, metabolites, peptides, proteins, complex mixtures, and molecular assemblies such as

Nuclear MRI, on-the-other-hand, can provide 3D images of macroscopic matter, and moni‐ tor the bio-accumulation and bio-distribution of MRI tagged natural products *in vivo*. Ulti‐ mately MR technologies can be used at almost every stage along the natural product

MR technologies encompass a range of techniques including electron or nuclear MR spectro‐ scopy, MR time domain, and nuclear or electron MRI. Herein, this chapter focuses on the nuclear MR technologies of spectroscopy and imaging for solution and semi-solid states. We provide a general overview of techniques and methodologies applicable throughout the de‐ velopment pipeline for natural products, as well as some potential impacts the information has for product development. It is well beyond the scope of a chapter (or in fact an entire book) to be a comprehensive description of all applicable MR methodologies. Thus within

NMR is a technique that detects electrical currents induced by precessing nuclear magnetic moments within a uniform static magnetic field.[11] Nuclei with non-zero spin moments are MR active and in principle are detectable. Each individual type of spin-active nucleus has a unique precessional frequency dependent upon the strength of the static magnetic field, the magnetic properties of the isotope and the local electronic environment of the nucleus. The general precessional frequency is dependent upon the type of nucleus and thus NMR can

H, 2

H or 3

H. The applications and significance

discovery pipeline – from discovery to implementation, from molecules to medicine.

each section, the reader is directed to review articles, books, *etc*.

H nuclei have the same integrated intensity and thus, a single

lic compounds (for example see Ref. 9).

64 Using Old Solutions to New Problems - Natural Drug Discovery in the 21st Century

nuclei of the same type *i.e*. all 1

lipid bilayers or tissues.

**1.2. Scope and limitations**

**2. NMR spectroscopy**

readily distinguish among for example 13C, 1

After a natural product or extract has been verified to be biologically active, an essential component within the discovery pipeline is to identify compound(s) and determine struc‐ ture(s). Structural elucidation is essential if chemical modifications are to be made, if the product is for human consumption and/or if a patent application is to be filed as it will dis‐ tinguish the uniqueness of the compound as well as help identify relationships with pre-ex‐ isting compounds. Structural characterization is somewhat different between small and large molecules; the distinction between the two regimes is defined by the Nuclear Over‐ hauser Effect (nOe) cross-relaxation rate which is positive or negative depending upon the spectrometer frequency and the overall molecular tumbling time.[12] Generally, "small mol‐ ecules" are regarded as molecules that do not aggregate and have a molecular mass of <1 000 atomic mass units.

#### *2.1.1. Small molecules*

The advent of nD experiments propelled NMR to be a leading tool for natural product char‐ acterization. Previously natural products were degraded into fragments, chemically derivat‐ ized and/or completely synthesised to confirm the structure. It is still valuable for structure elucidation using NMR to obtain information from the aforementioned techniques as well as other techniques such as mass spectrometry (MS; for exact mass, functional groups and con‐ nectivities), infra-red spectroscopy (for functional groups), and separation techniques (for classes of compound *e.g.* phenolic, steroid, protein, *etc*.).

The initial NMR spectroscopic assessment as outlined in Scheme 1, typically begins with 1 dimensional (1D) 1 H spectra to determine purity, confirm the compound class, and examine the general appearance of the peaks. A spectrum with sharp well resolved peaks and the an‐ ticipated ratio of integrated intensities is indicative of a pure sample dissolved in an appro‐ priate solvent. Broad peaks or peaks that are of fractional ratio could indicate an impure sample, however, they could also be an indication of chemical exchange or limited solubili‐ ty. The preliminary information gained from other techniques is important when ascertain‐ ing if the spectral appearance is appropriate. From 1D data the splitting patterns from J-(*i.e.* scalar)-couplings provide information on the pattern of covalent bonding as well as the tor‐ sional angle distributions between spin active nuclei 3-bonds apart.[21]

1 H detected 2-dimensional (2D) experiments in which magnetic coherence is propagated through J-couplings or magnetization is transferred through dipole-dipole cross-relaxation interactions, reduce the overlap complexity of 1D spectra and provide correlations to other 1 H nuclei or heteronuclei most commonly 13C or 15N. Common homonuclear 1 H-1 H 2D ex‐ periments based on J-couplings are TOtal Correlation SpectroscopY (TOCSY)[22], and COr‐ relation SpectroscopY (COSY)[23]. Both of these experiments provide information on individual spin systems and chemical bonding. COSY experiments are used to connect 1 H nuclei that are within 3-bonds of each other whereas TOCSY experiments can connect all spins belonging to a J-coupled network *e.g*. the entire spin system connected through 3-bond correlations. Analysis of COSY data can provide J-coupling constants which can be related *via* the Karplus curve[24] to torsional angle restraints.[21]

Homonuclear 2D 1 H-1 H nOe SpectroscopY (NOESY)[25] and Rotating frame Overhauser Ef‐ fect SpectroscopY (ROESY)[26] experiments are based on dipolar cross-relaxation interac‐ tions providing distance information between nuclei that are physically close (up to ≈5 Å) in space. It is noteworthy to mention that for NOESY and ROESY spectra to have correlations, nuclei do not have to be on the same molecule. This aspect of the nOe provides the basis for determining ligand/receptor interaction characteristics (see Section 2.2). The sign and inten‐ sity of NOESY cross-peaks are dependent upon the main static magnetic field (ω0 = B0) and the rotational correlation time of the molecule (τc); for small molecules (<1 kDa) the nOe cross-relaxation rate is positive whereas for larger molecules (>2 kDa) the nOe is negative. ROESY are best suited for medium size molecules of ~1 kDa where for NOESY the nOe be‐ comes zero (*i.e*. ω0τ<sup>c</sup> ≈ 1.12)[26]. Analysis of the NOESY and/or ROESY data is important for determining the configuration/conformation of the compound and for connecting individual spin systems determined from the TOCSY and/or COSY data. Another aspect of nD NMR techniques is the addition of 13C editing to the spectra. These heteronuclear experiments are 1 H detected increasing the sensitivity and indirectly providing 13C shifts especially impor‐ tant for mass limited samples. Standard heteronuclear experiments are 1 H-13C-HSQC[27-29], 1 H-13C-HMBC[30], and 1 H-13C-H2BC[31]. Strategies for selecting the proper pulse sequences, acquisition and processing parameters for natural product elucidation has been previously reviewed.[32] Implementing higher-dimensional experiments, for example, HSQC-TOCSY, HSQC-NOESY, provides valuable information for complex natural products on the through bond or space connections by exploiting the heteronuclei chemical shift for further separa‐ tion.

product is for human consumption and/or if a patent application is to be filed as it will dis‐ tinguish the uniqueness of the compound as well as help identify relationships with pre-ex‐ isting compounds. Structural characterization is somewhat different between small and large molecules; the distinction between the two regimes is defined by the Nuclear Over‐ hauser Effect (nOe) cross-relaxation rate which is positive or negative depending upon the spectrometer frequency and the overall molecular tumbling time.[12] Generally, "small mol‐ ecules" are regarded as molecules that do not aggregate and have a molecular mass of <1

The advent of nD experiments propelled NMR to be a leading tool for natural product char‐ acterization. Previously natural products were degraded into fragments, chemically derivat‐ ized and/or completely synthesised to confirm the structure. It is still valuable for structure elucidation using NMR to obtain information from the aforementioned techniques as well as other techniques such as mass spectrometry (MS; for exact mass, functional groups and con‐ nectivities), infra-red spectroscopy (for functional groups), and separation techniques (for

The initial NMR spectroscopic assessment as outlined in Scheme 1, typically begins with 1-

the general appearance of the peaks. A spectrum with sharp well resolved peaks and the an‐ ticipated ratio of integrated intensities is indicative of a pure sample dissolved in an appro‐ priate solvent. Broad peaks or peaks that are of fractional ratio could indicate an impure sample, however, they could also be an indication of chemical exchange or limited solubili‐ ty. The preliminary information gained from other techniques is important when ascertain‐ ing if the spectral appearance is appropriate. From 1D data the splitting patterns from J-(*i.e.* scalar)-couplings provide information on the pattern of covalent bonding as well as the tor‐

H detected 2-dimensional (2D) experiments in which magnetic coherence is propagated through J-couplings or magnetization is transferred through dipole-dipole cross-relaxation interactions, reduce the overlap complexity of 1D spectra and provide correlations to other

periments based on J-couplings are TOtal Correlation SpectroscopY (TOCSY)[22], and COr‐ relation SpectroscopY (COSY)[23]. Both of these experiments provide information on individual spin systems and chemical bonding. COSY experiments are used to connect 1

nuclei that are within 3-bonds of each other whereas TOCSY experiments can connect all spins belonging to a J-coupled network *e.g*. the entire spin system connected through 3-bond correlations. Analysis of COSY data can provide J-coupling constants which can be related

fect SpectroscopY (ROESY)[26] experiments are based on dipolar cross-relaxation interac‐ tions providing distance information between nuclei that are physically close (up to ≈5 Å) in space. It is noteworthy to mention that for NOESY and ROESY spectra to have correlations,

H nOe SpectroscopY (NOESY)[25] and Rotating frame Overhauser Ef‐

H spectra to determine purity, confirm the compound class, and examine

H-1

H 2D ex‐

H

classes of compound *e.g.* phenolic, steroid, protein, *etc*.).

66 Using Old Solutions to New Problems - Natural Drug Discovery in the 21st Century

sional angle distributions between spin active nuclei 3-bonds apart.[21]

*via* the Karplus curve[24] to torsional angle restraints.[21]

H-1

H nuclei or heteronuclei most commonly 13C or 15N. Common homonuclear 1

000 atomic mass units.

*2.1.1. Small molecules*

dimensional (1D) 1

Homonuclear 2D 1

1

1

When assessing the structure of a chemically modified molecule or a molecule for which mi‐ nor structural changes are suspected, acquiring a complete structural suite of experiments may not be necessary. In such circumstances, a series of edited 1D and 2D experiments have been developed that can isolate the chemical modification of interest and express only corre‐ lations to the modification. Isolating a particular peak of interest reduces the time required for data acquisition, simplifies analysis and can help to quickly confirm modifications; val‐ uable tools for isolating information from complex molecules are reviewed in Ref. 33.

A standard approach for small molecule structure elucidation involves identification of the individual fragments or spin systems followed by their assembly.[34] This approach out‐ lined in Scheme 1 uses 1D data to assess purity, classify the compound type and compare with NMR chemical shift databases. Analysis of the 2D homonuclear data (COSY and TOC‐ SY) identifies the individual short spin systems in the compound. Heteronuclear HSQC data provides 13C chemical shift information and direct H-C links. HMBC data links distant H-C spin systems that help link molecular fragments. Data from the NOESY and ROESY spectra also aid in linking spins systems, and determining relative configuration and conformation, for example, relative stereochemistry, ring junctions, and double bond regiochemistry. The final step is confirming that proposed shift assignments and structural characteristics agree with coupling constants and splitting patterns among spectra, along with other data collect‐ ed. There are numerous books detailing the specifics for analyzing NMR data of small mole‐ cules, see for example Refs. 35-38.

**Scheme 1.** The flowchart can be utilized as a general scheme for small natural product structural elucidation. Typical‐ ly, a series of 1H and 13C NMR experiments are required in order to fully confirm the structure.

#### *2.1.2. Proteins & peptides*

**Scheme 1.** The flowchart can be utilized as a general scheme for small natural product structural elucidation. Typical‐

ly, a series of 1H and 13C NMR experiments are required in order to fully confirm the structure.

68 Using Old Solutions to New Problems - Natural Drug Discovery in the 21st Century

MR allows for structural characterization of moderately sized proteins or peptides.[12,39] Since the line width of the NMR signal depends upon the rotational correlation time τc, the resultant signal intensity reduction and decreased spectral resolution typically precludes de‐ tailed analysis of large proteins. To date the largest protein to be structurally characterized by NMR is the 82 kDa or 723 amino acid malate synthase G.[40] In contrast, X-ray techni‐ ques in principle have no size limit however, not all proteins are amenable to crystallization esp. membrane-associated proteins and crystallization can alter the protein structure[41] making NMR attractive for elucidating structures in "native" solution environments. "Nonnative" conditions can also be applicable for protein folding/unfolding or temperature sta‐ bility. Initial investigation into biomolecular structure elucidation typically requires information from other techniques such as MS, circular dichroism, micro-array for initial de‐ termination of the amino acid sequence. A general scheme for the elucidation of a protein 3D structure (Scheme 2) involves initial production of the protein either by synthesis and re‐ folding, or by recombinant expression, and followed by acquisition and data analysis. From the analysis through bond backbone and side-chain connections, through space nOe connec‐ tions and additional constraints are utilized as restraints within simulated annealing proto‐ cols that calculate superimposable ensembles of lowest-energy structures.[42]

NMR is a sample intensive technique requiring milligrams of the biomolecule. For small peptides, synthesis (mg quantities) is typical and offers the possibility of selective isotopic labelling with the preferred spin active nuclei 13C (over the 98% natural abundant spin-inac‐ tive 12C) and/or 15N (over the 99% natural abundance quadrupole and difficult to observe <sup>14</sup>N). However for large proteins, synthesis is too onerous and costly. Therefore develop‐ ment of a recombinant expression protocol capable of producing bacterial, mammalian or other proteins with the correct folding and linkages (in the case of lipoproteins or polysac‐ charide proteins) is required. In addition, expression systems allow for point mutations. For proteins >50 amino acids it is advantageous to label the protein with the isotopes 13C and 15N. Alternatively, structure elucidation of small monomeric peptides of <50 amino acids does not necessarily require labelling and the experiments and strategies outlined in Section 2.1.1 can be utilized. Isotopically labelled proteins can be achieved by growing *E. coli* with the particular expression gene on minimal media supplemented with 13C6-glucose and/or 15N ammonium chloride. Although *E. coli* is widely used for its low cost, the high productiv‐ ity yields depend upon the plasmid, host and tags.[43] In addition with prokaryotic systems, protein re-folding can be a potential bottle neck along with failure to express toxic proteins, degradation and the absence of post-modification. Other non-*E. coli* prokaryotic and eukary‐ otic cell-lines that have been used for labelling, such as insect cells[44], are reviewed in Ref. 45. Eukaroytic *Pichia pastoris* has been widely and successfully used for labelling using mini‐ mal media,[46] and cell-free expression allows for high yields with relatively small reaction volumes[47]. With cell-free, any combination of labelled and unlabelled amino acids can be incorporated into the protein without isotopic scrambling.[47] An advantage of the cell-free system is that reagents, that stabilize expression, can be added. For example, protease inhibi‐ tors, detergents or membrane mimetics for insoluble or membrane associated proteins.[48] However, each amino acid is added to the medium which can become costly for uniform 15N and/or 13C labelling. Overall, the particular expression system used depends upon such fac‐ tors as post-translational modifications, the labelling scheme, membrane association, total cost and expression efficiency.

Over the past decade specialized isotope labelling strategies have been developed for bio‐ molecular NMR, including full or partial deuteration, specific amino acid labelling and re‐ giospecific labelling.[49] Labelling proteins with deuterium is practical for simplifying complex NMR spectra or for studying protein-substrate complexes. Furthermore, as the mo‐ lecular weight of the biomolecule increases the spin-spin relaxation time (T2) decreases con‐ siderably, inhibiting coherence transfer along amino acid side-chains. Substitution of 2 H for 1 H nuclei increases T2 enhancing the coherence transfer. Perdeuteration of the backbone pre‐ vents connection to the side-chain 1 H nuclei, whereas a random, uniform deuteration level between 50% and 90% is most desirable. [50,51] Nevertheless, perdeuteration is beneficial for studying protein-substrate or protein-protein complexes, in which one portion of the complex is "invisible" reducing the overlap and spectral complexity allowing information of conformational changes to be more readily identified. In the pursuit of NMR information of larger proteins, labelling strategies have been developed for selective methyl labelling of ala‐ nine, leucine, valine and isoleucine (Hγ1) residues with perdeuteration of the backbone. This labelling scheme allowed for the analysis of relaxation dynamics of a 1 MDa protein com‐ plex.[52,53] Site specific information on protein conformational changes upon substrate binding can be obtained from selective amino acid labelling of the backbone. Within the pro‐ tocol, media is supplemented with isotopically labelled amino acids, isotopic scrambling may result in instances where the supplemented amino acids are precursors to other amino acids. Scrambling of the isotope labels can be overcome by using *E. coli* strains with lesions in their biosynthetic pathways,[54] and recently demonstrated with a prototphic strain.[55] Contrary to selective amino acid labelling it has been proposed that unlabelling specific ami‐ no acids against a uniformly 13C/15N labelled background is beneficial.[56] Selective unlabel‐ ling of the protein still allows for sequential assignment of regions of the protein. Segmental labelling of individual domains or portions of large proteins has been established in which labelled segments of the protein are ligated together.[57,58] Segmental labelling of a large protein is useful for studying domain-domain interactions[59], conformational changes and substrate binding studies.

Preparation of protein samples is straightforward for soluble monomeric proteins at concen‐ trations >1 mM. A wide range of deuterated buffers are available for controlling the pH of the sample. Deuterated or an inorganic buffer are desirable to minimize interference within 1 H spectra. The pH requires consideration as at pH > 8.0 labile amide 1 H nuclei can rapidly exchange with water becoming invisible. This can be utilized to reduce spectral complexity; however, it may also affect residues of interest. Approximately one-third of human genes code for membrane-associated proteins and to utilize NMR for studying the structure/func‐ tion relationship of these proteins requires the protein to be folded in a membrane environ‐ ment.[60] A range of membrane mimetics are available for solution and solid state NMR.[61]

However, each amino acid is added to the medium which can become costly for uniform 15N and/or 13C labelling. Overall, the particular expression system used depends upon such fac‐ tors as post-translational modifications, the labelling scheme, membrane association, total

70 Using Old Solutions to New Problems - Natural Drug Discovery in the 21st Century

Over the past decade specialized isotope labelling strategies have been developed for bio‐ molecular NMR, including full or partial deuteration, specific amino acid labelling and re‐ giospecific labelling.[49] Labelling proteins with deuterium is practical for simplifying complex NMR spectra or for studying protein-substrate complexes. Furthermore, as the mo‐ lecular weight of the biomolecule increases the spin-spin relaxation time (T2) decreases con‐ siderably, inhibiting coherence transfer along amino acid side-chains. Substitution of 2

H nuclei increases T2 enhancing the coherence transfer. Perdeuteration of the backbone pre‐

between 50% and 90% is most desirable. [50,51] Nevertheless, perdeuteration is beneficial for studying protein-substrate or protein-protein complexes, in which one portion of the complex is "invisible" reducing the overlap and spectral complexity allowing information of conformational changes to be more readily identified. In the pursuit of NMR information of larger proteins, labelling strategies have been developed for selective methyl labelling of ala‐ nine, leucine, valine and isoleucine (Hγ1) residues with perdeuteration of the backbone. This labelling scheme allowed for the analysis of relaxation dynamics of a 1 MDa protein com‐ plex.[52,53] Site specific information on protein conformational changes upon substrate binding can be obtained from selective amino acid labelling of the backbone. Within the pro‐ tocol, media is supplemented with isotopically labelled amino acids, isotopic scrambling may result in instances where the supplemented amino acids are precursors to other amino acids. Scrambling of the isotope labels can be overcome by using *E. coli* strains with lesions in their biosynthetic pathways,[54] and recently demonstrated with a prototphic strain.[55] Contrary to selective amino acid labelling it has been proposed that unlabelling specific ami‐ no acids against a uniformly 13C/15N labelled background is beneficial.[56] Selective unlabel‐ ling of the protein still allows for sequential assignment of regions of the protein. Segmental labelling of individual domains or portions of large proteins has been established in which labelled segments of the protein are ligated together.[57,58] Segmental labelling of a large protein is useful for studying domain-domain interactions[59], conformational changes and

Preparation of protein samples is straightforward for soluble monomeric proteins at concen‐ trations >1 mM. A wide range of deuterated buffers are available for controlling the pH of the sample. Deuterated or an inorganic buffer are desirable to minimize interference within

exchange with water becoming invisible. This can be utilized to reduce spectral complexity; however, it may also affect residues of interest. Approximately one-third of human genes code for membrane-associated proteins and to utilize NMR for studying the structure/func‐ tion relationship of these proteins requires the protein to be folded in a membrane environ‐ ment.[60] A range of membrane mimetics are available for solution and solid state NMR.[61]

H spectra. The pH requires consideration as at pH > 8.0 labile amide 1

H nuclei, whereas a random, uniform deuteration level

H for

H nuclei can rapidly

cost and expression efficiency.

vents connection to the side-chain 1

substrate binding studies.

1

1

**Scheme 2.** The flowchart can be utilized as a general scheme for protein structural elucidation. Successful structural elucidation for proteins relies on elaborate but well established 1H, 15N and 13C NMR experiments.

With the advent of 2D NMR experiments a strategy for the 3D structure determination of small proteins utilizing homonuclear NMR spectra was established.[39] Extension of the 2D NMR experiments to nD experiments along with isotopic labelling allowed for the 3D struc‐ ture determination of much larger proteins including membrane proteins.[12,62] 3D and 4D NMR techniques allow NMR data to be filtered by the 13C or 15N nuclei thus, reducing spec‐ tral overlap especially important for large proteins. In the past decade data on larger pro‐ teins has been facilitated with the development of Transverse Relaxation Optimized SpectroscopY (TROSY) experiments.[63,64] NMR structural and dynamic analysis of a su‐ pra-molecular systems of 1 MDa has been achieved in combination with selective methyl la‐ belling.[65,66]

Typical NMR structure determination involves manual assignment of the backbone and side-chain chemical shifts. Extensive assignment of the 1 H-1 H nOe from NOESY experiments yields distance restraints as the volumes of the NOESY peaks are proportional to the aver‐ age of 1/r6 distance between the 1 H nuclei. More recently routines for automatic resonance and nOe assignments have been developed.[67,68] However, these methods require labelled proteins and high-quality data to be effective. Regardless, protein structures are calculated using a molecular dynamics computer simulation program with predefined *a priori* bond connectivities, lengths and angles, and NMR derived restraints.

NMR restraints are most commonly from nOe experiments but may also include dihedral angles, hydrogen bonding information and residual dipolar couplings.[69] Dihedral angles can be calculated from measured/fitted[70] J-couplings or predicted from backbone chemical shifts of the 1 Hα, 13Cα, 13Cβ, 13CO and 15N resonances.[42,71] 1 H nuclei exchange rates are re‐ duced in protein domains which are structured, or can be used to identify binding domains. Residual dipolar couplings are valuable for identifying angular constraints for large do‐ mains and structural changes in substrate binding.[72,73]

An ensemble of lowest energy structures that satisfy the NMR derived restraints is calculat‐ ed. Quality of the calculated structures is based on the consistency of the experimental data compared to the inputted restraints. Agreement of the structures among each other is evalu‐ ated with the RMSD to the lowest energy structure. In addition, NMR quality assessment scores, recall, precision and F-measures (RPF scores) have been developed to directly meas‐ ure the quality of structures compared to the NOESY peak list.[74] Structural calculations are an iterative process as not all restraints will be satisfied during the first simulated an‐ nealing calculation. It is typical for misassignments to occur due to spectral overlap or poor volume calculations. In principle the number of correctly assigned and integrated restraints should out-weigh the incorrectly assigned restraints. Thus during the calculations, a set of restraints could be identified as being violated regularly. From the identification, the NMR data is re-examined and the restraints corrected. Calculations are re-run and violations checked iteratively until well defined structures with minimal restraint violations obtained.

Tertiary/quaternary structural aspects can be confirmed with NMR through diffusion ex‐ periments. NMR is one of the most accurate and precise methods for determining diffusion constants.[13,75] Diffusion constants are related to the hydrodynamic radius through the Stoke-Einstein equation and thus can be indirectly used to determine the mass of the diffus‐ ing species.[13,76] Of particular importance for proteins is determining the aggregation number.[77]

#### **2.2. Pharmacophore identification & binding characterization**

NMR techniques allow NMR data to be filtered by the 13C or 15N nuclei thus, reducing spec‐ tral overlap especially important for large proteins. In the past decade data on larger pro‐ teins has been facilitated with the development of Transverse Relaxation Optimized SpectroscopY (TROSY) experiments.[63,64] NMR structural and dynamic analysis of a su‐ pra-molecular systems of 1 MDa has been achieved in combination with selective methyl la‐

Typical NMR structure determination involves manual assignment of the backbone and

yields distance restraints as the volumes of the NOESY peaks are proportional to the aver‐

and nOe assignments have been developed.[67,68] However, these methods require labelled proteins and high-quality data to be effective. Regardless, protein structures are calculated using a molecular dynamics computer simulation program with predefined *a priori* bond

NMR restraints are most commonly from nOe experiments but may also include dihedral angles, hydrogen bonding information and residual dipolar couplings.[69] Dihedral angles can be calculated from measured/fitted[70] J-couplings or predicted from backbone chemical

duced in protein domains which are structured, or can be used to identify binding domains. Residual dipolar couplings are valuable for identifying angular constraints for large do‐

An ensemble of lowest energy structures that satisfy the NMR derived restraints is calculat‐ ed. Quality of the calculated structures is based on the consistency of the experimental data compared to the inputted restraints. Agreement of the structures among each other is evalu‐ ated with the RMSD to the lowest energy structure. In addition, NMR quality assessment scores, recall, precision and F-measures (RPF scores) have been developed to directly meas‐ ure the quality of structures compared to the NOESY peak list.[74] Structural calculations are an iterative process as not all restraints will be satisfied during the first simulated an‐ nealing calculation. It is typical for misassignments to occur due to spectral overlap or poor volume calculations. In principle the number of correctly assigned and integrated restraints should out-weigh the incorrectly assigned restraints. Thus during the calculations, a set of restraints could be identified as being violated regularly. From the identification, the NMR data is re-examined and the restraints corrected. Calculations are re-run and violations checked iteratively until well defined structures with minimal restraint violations obtained.

Tertiary/quaternary structural aspects can be confirmed with NMR through diffusion ex‐ periments. NMR is one of the most accurate and precise methods for determining diffusion constants.[13,75] Diffusion constants are related to the hydrodynamic radius through the Stoke-Einstein equation and thus can be indirectly used to determine the mass of the diffus‐ ing species.[13,76] Of particular importance for proteins is determining the aggregation

H-1

H nuclei. More recently routines for automatic resonance

H nOe from NOESY experiments

H nuclei exchange rates are re‐

side-chain chemical shifts. Extensive assignment of the 1

72 Using Old Solutions to New Problems - Natural Drug Discovery in the 21st Century

connectivities, lengths and angles, and NMR derived restraints.

mains and structural changes in substrate binding.[72,73]

Hα, 13Cα, 13Cβ, 13CO and 15N resonances.[42,71] 1

distance between the 1

belling.[65,66]

age of 1/r6

shifts of the 1

number.[77]

Natural products have a diverse range of mechanisms for eliciting a biological response. Some natural products act as free-radical scavengers never directly interacting with the or‐ ganism, whereas other compounds bind to molecular targets triggering a signaling cascade and altering the physiological state. Determining the mode of action of a small natural prod‐ uct molecule and where necessary the biological target requires extensive micro-biological investigations. NMR can play a role within these investigations in particular by identifying the pharmacophore of the natural product. The pharmacophore is the constituent of the molecule that binds to a biological receptor to modify its biological response.[78] Identifying the pharmacophore is an important aspect for drug discovery and understanding the mech‐ anism of action as it assists with "intelligent" design of drugs through modifications that change binding characteristics (*e.g*. modifying the pharmacophore region) or solubility/ permeability properties (*e.g*. modifying sites distant from the pharmacophore).[79,80]

The difference in the NMR nOe response between a small molecule rapidly tumbling in sol‐ ution and a small molecule that is bound to a slowly tumbling large protein (see Section 2.1.1) is exploited to isolate and identify the pharmacophore (see chapter 14 of Ref. 81 and Refs. 82,83). In order to clearly define the pharmacophore complete structural analysis of the molecule is required (see Section 2.1.1); in order to sequence identify the active site within the receptor, complete structural analysis of the protein (preferably including 15N and 13C chemical shifts and connectivities) is required (see Section 2.1.2). A number of these techni‐ ques require mg quantities of purified ligand, receptor or both. Purifying compounds can be a detrimental drawback especially if the receptor is a membrane bound protein that is diffi‐ cult to express and purify. Nevertheless, if the receptor is highly over-expressed within a cell (*e.g*. cancer cell that over-expresses a particular protein) the possibility exists for the ex‐ periment to be performed *in vivo*.[84] Essentially 6 fundamental methods are available for pharmacophore/binding characterization:[81,82] chemical-shift perturbations[85], saturation transfer difference (STD)[86], water-logsy (wLogsy)[87,88], transfer-NOESY (tr-NOESY)[89], selective relaxation[90] and diffusion editing[90]. Selective relaxation, diffusion editing and tr-NOESY experiments in principle can be used for nM to mM binding constants (KD) whereas chemical shift perturbations, STD and wLogsy are valuable for pM to mM KD rang‐ es with the concentration of receptor in the nM range. It is well beyond the scope of this chapter to describe these experiments in detail especially since these experiments can be combined to provide further characterization such as combining diffusion editing with STD to simultaneously determine the pharmacophore and binding constant.[91] There are nu‐ merous reviews that provided explicit details (see Refs. 82,92-94).

With these tools both the ligand and receptor can be characterized. Typically chemical shift perturbation or mapping methods helps to characterize the active site of the receptor. A ser‐ ies of 1 H-15N or 1 H-13C HSQC spectra of the labelled receptor are collected as the ligand is titrated. Changes in chemical shifts of the receptor are indicative of 1 H nuclei that are pertur‐ bed during binding, although care must be taken as to not over-interpret data as this could also be indicative of structural alterations distant from the binding site. In cases where the receptor is large (*i.e*. > 30 kDa) extensive resonance overlap may preclude unambiguous in‐ terpretation of the HSQC data. Expression techniques to isolate particular amino acids or re‐ gions of the receptor are valuable for these experiments (see Section 2.1.2). One disadvantage of this technique is the necessity of a complete resonance assignment of the target or at least the active site.

Diffusion editing is a technique that can be used to determine the KD by examining the change in diffusion properties of the ligand (typically < 2 kDa) upon titration to the receptor (typically > 100 kDa). Although limited with pharmacophore and binding pocket identifica‐ tion, it is nevertheless a valuable tool to identify binding events from a mixture of possible small ligands or to combine with other techniques.

The most often utilized techniques are the STD, wLogsy, tr-NOESY and selective relaxation. [95] These techniques are used when the target is too large for chemical shift perturbations, is not available with the desired isotopic labelling scheme or the target aggregates/precipi‐ tates at high concentrations. For these techniques, non-specific binding events in the nM-µM range may be difficult to rule out unless compared with a known binder or used with a competitive binder.[96] Regardless, these techniques are invaluable for specifically observ‐ ing resonances of a low-affinity ligands that bind to a receptor. With selective relaxation ex‐ periments, differences in relaxation properties of the ligand between a free and bound state help identify interacting 1 H nuclei that are in close proximity to the receptor. The relaxation properties of the free ligand (*i.e.* no receptor) are compared to the relaxation properties for the ligand at various receptor titers and mixing times. Changes within the relaxation values can distinguish 1 H nuclei that are in direct contact with the receptor from 1 H nuclei that show magnetization relay or 1 H nuclei distant from the receptor. For wLogsy experiments, 1 H resonances arising from 1 H nuclei in close proximity to the receptor (*i.e.* are part of the pharmacophore) are opposite sign to 1 H resonances arising from 1 H nuclei that are distant from the receptor or are on a ligand that does not bind to the receptor (Fig. 1A). The wLogsy also has the advantage of identifying 1 H nuclei that are part of salt bridges between the li‐ gand and receptor. For STD experiments, small molecules that do not bind to the receptor show a zero response (Fig. 1B) whereas the 1 H nuclei of the pharmacophore show a re‐ sponse. Because the wLogsy is a direct observation technique whereas the STD is generated from a difference of spectra, the wLogsy tends to have fewer artifacts and can be more sensi‐ tive. The tr-NOESY experiments can provide structural information about the ligand in the bound state as well as potentially information on the type of amino acids on the receptor involved in binding; typically no information regarding the sequence specificity of the ami‐ no acids is gleaned. The tr-NOESY is advantageous if the ligand changes conformation upon binding as it can be utilized to determine the bound state structure of the ligand which is a valuable asset for understanding the mode of action. The tr-NOESY is a 2D technique and as such requires mg quantities of ligand and substantially more time to acquire the data. For rapid screening of natural products the STD and wLogsy are the experiments of choice.

Beyond the individual experiments, combinations of these various experiments are possible. For example, to investigate the folding/unfolding properties of a peptide the combination of a 1 H-15N HSQC and wLogsy can be used to monitor amide 1 H nuclei exchange rates valua‐ ble for identifying H-bonding and buried residues.[77,97] This experimental combination can also be utilized with/without ligand to identify amide 1 H nuclei that are involved with ligand binding. Using the wLogsy to saturate the water signal avoids the challenges of the typical methods of monitoring exchange by addition of D2O such as protein precipitation, conformational changes induced by concentrating/diluting or complete loss of signal due to rapid deuterium exchange.[77,98]

**Figure 1.** STD and wLogsy experiments can be utilized for identifying the pharmacophore of a small natural product molecule. Panel **A** represents a wLogsy experiment (blue spectrum) of a compound that binds (positive peaks) to an enzyme and a compound that does not bind (negative peaks). The positive peaks are of 1H nuclei that are adjacent or in proximity to a salt bridge involving water. The negative peaks are from either the binding compound that are not within the binding pocket, or from the control non-binder compound. The red spectrum is the reference spectrum. Panel **B** represents the STD experiment (blue) of human serum albumin, a binder tryptophan and a non-binder su‐ crose. The positive peaks are from 1H nuclei on the tryptophan that are within the binding pocket. Peaks that are not observed compared to the reference spectrum (red) represent either the control compound, or 1H nuclei on the tryp‐ tophan that are distant to the binding pocket.

#### **2.3. Quantitative analysis & QA/QC**

terpretation of the HSQC data. Expression techniques to isolate particular amino acids or re‐ gions of the receptor are valuable for these experiments (see Section 2.1.2). One disadvantage of this technique is the necessity of a complete resonance assignment of the

Diffusion editing is a technique that can be used to determine the KD by examining the change in diffusion properties of the ligand (typically < 2 kDa) upon titration to the receptor (typically > 100 kDa). Although limited with pharmacophore and binding pocket identifica‐ tion, it is nevertheless a valuable tool to identify binding events from a mixture of possible

The most often utilized techniques are the STD, wLogsy, tr-NOESY and selective relaxation. [95] These techniques are used when the target is too large for chemical shift perturbations, is not available with the desired isotopic labelling scheme or the target aggregates/precipi‐ tates at high concentrations. For these techniques, non-specific binding events in the nM-µM range may be difficult to rule out unless compared with a known binder or used with a competitive binder.[96] Regardless, these techniques are invaluable for specifically observ‐ ing resonances of a low-affinity ligands that bind to a receptor. With selective relaxation ex‐ periments, differences in relaxation properties of the ligand between a free and bound state

properties of the free ligand (*i.e.* no receptor) are compared to the relaxation properties for the ligand at various receptor titers and mixing times. Changes within the relaxation values

from the receptor or are on a ligand that does not bind to the receptor (Fig. 1A). The wLogsy

gand and receptor. For STD experiments, small molecules that do not bind to the receptor

sponse. Because the wLogsy is a direct observation technique whereas the STD is generated from a difference of spectra, the wLogsy tends to have fewer artifacts and can be more sensi‐ tive. The tr-NOESY experiments can provide structural information about the ligand in the bound state as well as potentially information on the type of amino acids on the receptor involved in binding; typically no information regarding the sequence specificity of the ami‐ no acids is gleaned. The tr-NOESY is advantageous if the ligand changes conformation upon binding as it can be utilized to determine the bound state structure of the ligand which is a valuable asset for understanding the mode of action. The tr-NOESY is a 2D technique and as such requires mg quantities of ligand and substantially more time to acquire the data. For rapid screening of natural products the STD and wLogsy are the experiments of choice.

Beyond the individual experiments, combinations of these various experiments are possible. For example, to investigate the folding/unfolding properties of a peptide the combination of

ble for identifying H-bonding and buried residues.[77,97] This experimental combination

H-15N HSQC and wLogsy can be used to monitor amide 1

H nuclei that are in direct contact with the receptor from 1

H resonances arising from 1

H nuclei that are in close proximity to the receptor. The relaxation

H nuclei distant from the receptor. For wLogsy experiments,

H nuclei that are part of salt bridges between the li‐

H nuclei of the pharmacophore show a re‐

H nuclei in close proximity to the receptor (*i.e.* are part of the

H nuclei that

H nuclei that are distant

H nuclei exchange rates valua‐

target or at least the active site.

help identify interacting 1

show magnetization relay or 1

H resonances arising from 1

pharmacophore) are opposite sign to 1

also has the advantage of identifying 1

show a zero response (Fig. 1B) whereas the 1

can distinguish 1

1

a 1

small ligands or to combine with other techniques.

74 Using Old Solutions to New Problems - Natural Drug Discovery in the 21st Century

Nuclear MR technologies have traditionally been associated with molecular characteriza‐ tion. The quantitative nature has been acknowledged from integration of signals to distin‐ guish between for example methyl and methylene 1 H nuclei; however, it has rarely been exploited for absolute quantitative analysis (qNMR) or quality assurance/quality control (QA/QC).[10] MR techniques are capable of accurately and precisely determining the con‐ centration of molecules within a purified sample or complex mixture without the need for elaborate calibrations.[99] In addition samples can be in solution or semi-solid states.[100] Relatively simple protocols have been developed that use a single certified external standard to calibrate the instrument. From the calibrated system, other samples can be rapidly quanti‐ tated.[10,99]

MR technologies have the unique distinction of having a uniform molar response for all nu‐ clei of the same type, *i.e*., the NMR signals are proportional to the molar concentration of the nuclei allowing for a direct comparison of the concentration of all compounds within a mix‐ ture. Thus for example, for all organic molecules regardless of the concentration, the intensi‐ ty of each signal within the NMR spectrum is a direct measure of the number of 1 H nuclei that contribute to that signal. Furthermore, spectra can be recorded in such a manner as to allow for accurate comparison between different samples within different sample tubes and different solvents; the implications are far reaching for qNMR and QA/QC. For example, concentrations of natural products or impurities can be determined for samples within sealed tubes reducing the handling requirements of toxic or precious samples, or rapid crude or refined product profiling ensuring purity, integrity and consistency with applica‐ tions to fractionation or end-product QA/QC.[99] Fully automated protocols have been de‐ veloped that have been coupled with metabolomics investigations providing absolute scaling for temporal data.

#### **2.4. Metabolomics analysis**

Analysis of metabolites and metabolic flux can help ascertain the effects of a particular natu‐ ral product or extract on an organism. Metabolic analysis can be utilized while identifying biological activity *in vitro*, or during *in vivo* investigations. Perhaps one of the best practical definitions of metabolomics was offered by Oliver describing it as an approach for simulta‐ neously measuring the complete set of metabolites (low molecular weight intermediates) that are context dependent and which vary according to the physiological, developmental or pathological state of an organism.[101] From the perspective of natural products, such a def‐ inition fits in the framework of extracts from plants, fungi and secretions from microorgan‐ isms such as bacteria. When metabolomics is viewed from context dependence, only those metabolites that vary according to environmental, biochemical, and/or physiological fluctua‐ tions are important. In this regard, there are a vast number of metabolites present in botani‐ cal extracts and secretions from organisms that are useful for inducing biological responses in organisms including humans. For instance, ginseng has been argued to induce biochemi‐ cal changes in humans that lead to anti-tumor, antioxidant, anti-fatigue and anti-stress activ‐ ities while *Streptomyces coelicolor* secrete therapeutic natural products during their quiescent growth phase.[102,103] The discussion within this section follows the aforementioned defi‐ nition and is restricted to metabolomics applied to botanical natural products since it is one of the fastest growing subsections established as "plant metabolomics."[104-106] NMR is a suitable method for such analyses since it allows simultaneous detection of a diverse group of both primary metabolites (sugars, organic acids, amino acids *etc*.) and secondary ones (flavonoids, alkaloids tri-terpenes *etc*.) Nevertheless, the applications are extensible to the "animal kingdom" metabolomics since the underlying sample composition is similar and characterized by large heterogeneity exhibiting vast dynamic range in concentration.

The numerous advantages of NMR have been a major driver for developing NMR based metabolomics technologies. NMR is ideal for resolving the complexity of metabolomics sam‐ ples given that methods exist for compounds with nuclei such as 1 H, 13C, 15N and 31P to pro‐ vide spectral fingerprints with compound specificity and quantitative accuracy even within complex matrices. For example, from the un-annotated 1D 1 H NMR spectra of seaweed ex‐ tracts, visible differences in chemical composition are observed (Fig. 2). NMR signals have a uniform molar response (see Section 2.3). This property has particularly been important with propelling NMR metabolomics as a technology. NMR is a non-destructive technique ideal suited because many of these samples are difficult to obtain and may be precious. The stability of many NMR instrumentation allows for repeated measures (often years apart) of a sample with accurate reproducibly. This advantage also lends the technology to inter-labo‐ ratory studies that are important for establishing the robustness of a given measurement and technique.[52] Advances in technology now allow for high throughput analysis with automated, temperature controlled sample changers such as Bruker's SampleJet®.

ty of each signal within the NMR spectrum is a direct measure of the number of 1

76 Using Old Solutions to New Problems - Natural Drug Discovery in the 21st Century

scaling for temporal data.

**2.4. Metabolomics analysis**

that contribute to that signal. Furthermore, spectra can be recorded in such a manner as to allow for accurate comparison between different samples within different sample tubes and different solvents; the implications are far reaching for qNMR and QA/QC. For example, concentrations of natural products or impurities can be determined for samples within sealed tubes reducing the handling requirements of toxic or precious samples, or rapid crude or refined product profiling ensuring purity, integrity and consistency with applica‐ tions to fractionation or end-product QA/QC.[99] Fully automated protocols have been de‐ veloped that have been coupled with metabolomics investigations providing absolute

Analysis of metabolites and metabolic flux can help ascertain the effects of a particular natu‐ ral product or extract on an organism. Metabolic analysis can be utilized while identifying biological activity *in vitro*, or during *in vivo* investigations. Perhaps one of the best practical definitions of metabolomics was offered by Oliver describing it as an approach for simulta‐ neously measuring the complete set of metabolites (low molecular weight intermediates) that are context dependent and which vary according to the physiological, developmental or pathological state of an organism.[101] From the perspective of natural products, such a def‐ inition fits in the framework of extracts from plants, fungi and secretions from microorgan‐ isms such as bacteria. When metabolomics is viewed from context dependence, only those metabolites that vary according to environmental, biochemical, and/or physiological fluctua‐ tions are important. In this regard, there are a vast number of metabolites present in botani‐ cal extracts and secretions from organisms that are useful for inducing biological responses in organisms including humans. For instance, ginseng has been argued to induce biochemi‐ cal changes in humans that lead to anti-tumor, antioxidant, anti-fatigue and anti-stress activ‐ ities while *Streptomyces coelicolor* secrete therapeutic natural products during their quiescent growth phase.[102,103] The discussion within this section follows the aforementioned defi‐ nition and is restricted to metabolomics applied to botanical natural products since it is one of the fastest growing subsections established as "plant metabolomics."[104-106] NMR is a suitable method for such analyses since it allows simultaneous detection of a diverse group of both primary metabolites (sugars, organic acids, amino acids *etc*.) and secondary ones (flavonoids, alkaloids tri-terpenes *etc*.) Nevertheless, the applications are extensible to the "animal kingdom" metabolomics since the underlying sample composition is similar and

characterized by large heterogeneity exhibiting vast dynamic range in concentration.

ples given that methods exist for compounds with nuclei such as 1

complex matrices. For example, from the un-annotated 1D 1

The numerous advantages of NMR have been a major driver for developing NMR based metabolomics technologies. NMR is ideal for resolving the complexity of metabolomics sam‐

vide spectral fingerprints with compound specificity and quantitative accuracy even within

tracts, visible differences in chemical composition are observed (Fig. 2). NMR signals have a uniform molar response (see Section 2.3). This property has particularly been important

H nuclei

H, 13C, 15N and 31P to pro‐

H NMR spectra of seaweed ex‐

**Figure 2.** NMR spectra of aqueous extracts from various strains of brown seaweeds showing distinct differences with‐ in the spectral features.

The most commonly cited disadvantage of NMR for metabolomics is the lack of sensitivity. [106] This is a major hindrance given that in typical sample matrices, the concentration of constituent metabolites often exhibits a large dynamic range in concentration. Low abun‐ dance metabolites will invariantly be overlapped by highly abundant ones. Furthermore, in some applications such low level metabolites maybe of high value. For instance a phyto‐ chemical preparation may exhibit activity in a biological readout but such activity is in‐ duced by the low level metabolites which are undetectable *via* NMR. Approaches for simplifying sample matrices have been developed in order to separate, for instance, lipo‐ philic from hydrophilic metabolites.[107] In addition, recent advances in NMR hardware have significantly improved sensitivity, especially with the advent of cryogenic probes and microprobes.

NMR-based metabolomics applications have been in the literature for over two decades, [108,109] but the technology only realized widespread acceptance and application in the lat‐ er part of the last decade.[105,110-112] This shift is attributed to the utility of pattern recog‐ nition methods for analysis of multiple spectra, allowing the visualization of patterns corresponding to differences among samples and identification of chemical shifts responsi‐ ble for eliciting such differences.[113] Specifically, principal component analysis (PCA) has been a significant driver as it allows patterns associated with the variability in the relative concentrations of metabolites to be assessed by the human eye, often in two dimensions. PCA analysis assisted with the classification of different extracts of the sea weed *Ascophyl‐ lum nodosum* (Fig. 3).[114] In recent years, many other pattern recognition methods (classi‐ fied either as supervised or unsupervised) have been developed and applied to data. Unsupervised methods do not include *a priori* knowledge of the class memberships of a giv‐ en sample. Such methods include PCA, SIMCA, independent component analysis (ICA) and the so called machine learning methods such as neural networks and self organizing maps (SOM). On the other hand, supervised methods use information about the samples in order to build models that can later be used to predict the class to which an unknown sample be‐ longs. Such methods include partial least squares discriminant analysis (PLS-DA).

**Figure 3.** A principal component scores plot of a product characterization study showing similar samples from one manufacturer tightly clustered in the center (solid red squares); the ellipse encloses these samples plus two others that are known to incorporate that manufacturer's samples (solid blue squares and circle). Competitors' samples are dis‐ tributed throughout the plot with each symbol representing products from a single manufacturer. Unfilled symbols represent samples recorded under acidic conditions.

Ultimately, the impetus for natural product metabolomics analyses is the need for high throughput screening to determine biological activity and the requirements from the nu‐ traceutical industry to purport therapeutic health benefits from unprocessed foods. Tradi‐ tional approaches with the development of novel drugs are difficult, expensive and time consuming. It is estimated that over \$800 million and an average of 14.2 years are spent before a novel drug application is approved. Because natural products are a rich source of lead compounds in the drug discovery pipeline, the metabolomics approach provides an avenue for a systematic characterization of complex mixtures such as phytochemical extracts linking observations made via biological assays without the need for isolation. This promises to complement high throughput screening of compounds in order to shorten the drug discovery process. Indirectly, NMR has been used in metabolomics to measure the fate of consumed natural products and their effects on human physiology. One study has shown that the consumption of dark chocolate, for instance, affects ener‐ gy homeostasis in humans.[115] In another study, differences in metabolic profiles were observed in human urine following consumption of black compared to green tea specifi‐ cally increases in urinary hippuric acid and 1,3-dihydroxyphenyl-2-O-sulfate, which are end products of tea flavonoid degradation.[116] Several other studies exist in the litera‐ ture and have been reviewed for instance within Ref. 117.

Perhaps one of the biggest gaps with metabolomics developments for natural products is the lack of certified reference materials for quantification of analytes. Such reference materi‐ als would enhance product characterization and validation of biological observations, espe‐ cially with studies of bioactivity assessment. Those studies are often inconsistent due to inadequate chemical characterization of complex botanical mixtures, making comparison of results across studies difficult. Fortunately, it is possible to determine the concentration of 'active' compounds using external standards, *via* NMR, as long as both the external stand‐ ard and the compound of interest are of the same nuclei type (see Section 2.3).[10]

#### **2.5. Semi-solid state & macromolecular assemblies**

NMR-based metabolomics applications have been in the literature for over two decades, [108,109] but the technology only realized widespread acceptance and application in the lat‐ er part of the last decade.[105,110-112] This shift is attributed to the utility of pattern recog‐ nition methods for analysis of multiple spectra, allowing the visualization of patterns corresponding to differences among samples and identification of chemical shifts responsi‐ ble for eliciting such differences.[113] Specifically, principal component analysis (PCA) has been a significant driver as it allows patterns associated with the variability in the relative concentrations of metabolites to be assessed by the human eye, often in two dimensions. PCA analysis assisted with the classification of different extracts of the sea weed *Ascophyl‐ lum nodosum* (Fig. 3).[114] In recent years, many other pattern recognition methods (classi‐ fied either as supervised or unsupervised) have been developed and applied to data. Unsupervised methods do not include *a priori* knowledge of the class memberships of a giv‐ en sample. Such methods include PCA, SIMCA, independent component analysis (ICA) and the so called machine learning methods such as neural networks and self organizing maps (SOM). On the other hand, supervised methods use information about the samples in order to build models that can later be used to predict the class to which an unknown sample be‐

78 Using Old Solutions to New Problems - Natural Drug Discovery in the 21st Century

longs. Such methods include partial least squares discriminant analysis (PLS-DA).

**Figure 3.** A principal component scores plot of a product characterization study showing similar samples from one manufacturer tightly clustered in the center (solid red squares); the ellipse encloses these samples plus two others that are known to incorporate that manufacturer's samples (solid blue squares and circle). Competitors' samples are dis‐ tributed throughout the plot with each symbol representing products from a single manufacturer. Unfilled symbols

represent samples recorded under acidic conditions.

In addition to studying soluble molecules or extracts dissolved in solution state, it can be advantageous and/or necessary to study semi-solids such as intact tissues, cells, raw ma‐ terials or product formulations; metabolites or components are in their native environ‐ ment *i.e.* potentially time consuming and disrupting/degrading extractions, or chemical modifications are not required in order to obtain valuable information. Semi-solid materi‐ als require specialized NMR probes to overcome sever spectral line-broadening as result of the restricted molecular mobility. To acquire a high-resolution spectrum of a semi-sol‐ id, High Resolution Magic Angle Spinning (HR-MAS) was developed in the late 1990's as a hybrid between solid and solution state NMR.[118] Similar to solid state NMR the spinning of the sample at the "magic angle" (54.7°) to the applied magnetic field reduces line broadening effects. Spinning speeds are typically between 3 to 5 kHz and cells re‐ main intact and viable.[119] HR-MAS requires <100 uL of the semi-solid. HR-MAS does not require the high powered pulses that solid state NMR requires and many of the nD experiments that are utilized for structural biology can be applied, although stable iso‐ tope labelling is preferred for the less abundant 13C, and 15N nuclei. HR-MAS can be ap‐ plied for many natural product studies[118,120] commonly used to study metabolic changes in diseased and treated tissues,[121,122] metabolism,[123] combinatorial chemis‐ try[124] and whole cells[125]. Intensities in the HR-MAS NMR spectra are dependent on the environment of the analyte and as such molecules that are in rigid environments with completely restricted mobility are not detected. Macro molecular assemblies on-theother-hand can be examined for profiling and quantitation of algae lipid content and pol‐ ysaccharides or metabolites.[100]

#### **3. MRI**

MRI is a nuclear MR technique applicable for natural product development in particular during *in vivo* testing and diagnostic stages. MRI is capable of producing 3D images that can be used to monitor changes in brain activity in response to application of a natural product *via* fMRI techniques[126-128], indirectly monitor the effects of a natural product on tumours[129], or directly monitor the bio-distribution and bio-accumulation of a natu‐ ral product by tagging the compound with an MRI contrast agent.[130-133].

Direct monitoring of the natural product is one of the most typical methods for drug de‐ velopment. For direct monitoring however, the molecules should be tagged with a MRI contrast agent which contains a paramagnetic centre causing 1 H nuclei on water in close proximity to "relax" much faster than 1 H nuclei distant from the paramagnetic centre. [134] This rapid relaxation is exploited with MRI and is depicted as "dark spots" within the image whereas unaffected water molecules remain as bright areas. In order to "tag" a natural product with a MRI contrast agent such as super-paramagnetic iron oxide (SPIO), the complete structure and pharmacophore identification is valuable (see Section 2.2) as it allows one to pick a functional group distant from the active site that can be chemically coupled to the contrast agent. Many SPIO contrast agents are commercially available with a variety of functional groups amenable to chemical coupling under aque‐ ous conditions.[134]

Once the natural product is conjugated to the paramagnetic particle, it can be adminis‐ tered to an organism and imaged. The bio-distribution and bio-accumulation of the tag‐ ged molecule is monitored and compared against a control particle that does not contain the active natural product.[133,135-137] The difference in clearance time is considered as conformation that the molecule is associated with the tissue being examined. As an ex‐ ample the peptide SOR-C27, a 27 amino acid fragment of the paralytic natural product peptide SOR-54 from the Northern Short Tail shrew (*Blarina brevicauda*) was found to bind the calcium ion channel TRPV6 which is highly over-expressed by breast, prostate and ovarian cancers.[138,139] The SOR-C27 peptide was chemically bonded to a malei‐ mide functionalized SPIO particle through the sulfur centre of the Cys-14 residue. From MRI investigations on an ovarian cancer xenograft mouse model, the SPIO-peptide parti‐ cle persisted at the tumour site 24 hours post injection whereas the control SPIO particle rapidly cleared. The persistence of the SPIO-peptide particle at the tumour site is confor‐ mation that the peptide is associated with the tumour[139].
