**2. Small angle X-Ray scattering**

When a collimated beam (assumed as parallel waves) of X-ray photons strikes a sample, a fraction of the incident beam interacts with the electrons clouds of each molecule, and a possible process is the absorption of this photon by the atoms which excites the electrons of the atom to higher energy levels. When the excited electrons decay to ground state another X-Ray photon is reemitted as a spherical wave. In this way, this process can be viewed as the scattering of the incident photon over the electronic cloud. Depending on the energy of the incident photon several processes can happen: Rayleigh scattering, Resonance scattering, Compton effect, Thomson scattering, pair production, etc. It is beyond the scope of this chapter investigate all these possible processes. However, under certain energy limits (~7- 12KeV), this scattering is well described by the so called first Born approximation, where the

Investigating Macromolecular Complexes in Solution by Small Angle X-Ray Scattering 369

The index "1" indicates that until now this intensity is related to a single particle with fixed orientation. One usual mathematical procedure is to take the convolution integral in *r'* and

 *r* :

*r r r r dr* (4)

*r iq r dr* (5)

has several properties and asymptotic limits enabling the

<sup>1</sup> exp

retrieving of several general parameters. The interested reader is invited to read the seminal book from Guinier and Fournet (Guinier and Fournet, 1955) and the articles from Ciccariello (Ciccariello, 1985) among others. A theoretical calculation of a 2D scattering profile for a

Fig. 2. Theoretical calculation for a two dimensional scattering profile for a fixed ellipsoidal particle. The intensity is given in logarithm scale. Inserts: vertical and horizontal 1D-profiles

Equation (5) is still too general to be used in practice. In real systems, the particles investigated are not fixed in space but instead they might be randomly oriented. The averaging procedure can be made either in real space or in reciprocal space. From the mathematical point of view is easier to perform the average in reciprocal space, by an extra

> <sup>2</sup> <sup>1</sup> 1 1 *I q fq* 4

exp <sup>4</sup>

 

*I q r drd d r iq r*

4 4 2

00 0 4 4

1 exp <sup>4</sup>

00 0

2

1

*I q d*

 

 

*r dr r d iq r d*

(6)

(7)

*V*

*V I q* 

 

Now the scattering from a single fixed particle can be rewritten as,

*r*

define as the so called self-correlation function

The self-correlation function

of the intensity.

integration over the solid angle :

Substituting (5) on (6) we have,

1

fixed particle in space is shown in Fig2.

photon interacts only with one atom and the resulting scattered photon has the same energy of the incident photon (elastic scattering). This effect is mostly related the so called Thomson scattering. The complete solution of the scattered beam is a sum of a plane wave plus a spherical wave (Jackson, 1988). Since the information of the scattering process is related with the spherical wave only this part is considered to investigate the structural information of the particle (Guinier and Fournet, 1955; Glatter and Kratky, 1982; Feigin and Svergun, 1987). One possible way to understand the scattering process is to start from the concept of the scattering from a single particle, fixed in space. This is sketched on Fig.1, where an incident beam of wave vector 0 *k* strikes the particle at the points *O* and *P*, separated by the vector *<sup>r</sup>* .

Fig. 1. Representation of the scattering process for a fixed particle.

Since the scattering is assumed to be elastic, the scattered wave with wave vector *k* has the same modulus of the incident wave so the difference between the incident and the scattered beam is given by:

$$\begin{aligned} \vec{q} &= \vec{k} - \vec{k}\_{0\prime} \\ q &= 2k\sin\theta \\ k &= \frac{2\pi}{\lambda} \end{aligned} \tag{1}$$
 
$$\begin{aligned} k &= \frac{2\pi}{\lambda} \end{aligned} \right\} \Rightarrow q = \frac{4\pi}{\lambda}\sin\theta \end{aligned} \tag{1}$$

which leads to the definition of the reciprocal space momentum transfer vector *q*. The scattering amplitude *f q* is given by the Fourier transformation of the particle electron density *r* :

$$f\left(\vec{\eta}\right) = \frac{1}{4\pi} \int\_{V} \rho\left(\vec{r}\right) \exp\left(-i\vec{\eta}\cdot\vec{r}\right) d\vec{r} \tag{2}$$

The measurable quantity is the scattering intensity *I*<sup>1</sup> *q* , which is the square modulus of the scattering amplitude:

$$\begin{aligned} I\_1(\vec{q}) &= \left| f\left(\vec{q}\right) \right|^2 = f\left(\vec{q}\right) \bullet f\left(\vec{q}\right)^\* \\ I\_1(\vec{q}) &= \int\_V \int \rho\left(\vec{r}'\right) \rho\left(\vec{r}' - \vec{r}\right) \exp\left(-i\vec{q}\cdot\vec{r}\right) d\vec{r} d\vec{r}' \end{aligned} \tag{3}$$

The index "1" indicates that until now this intensity is related to a single particle with fixed orientation. One usual mathematical procedure is to take the convolution integral in *r'* and define as the so called self-correlation function *r* :

$$\mathcal{I}\left(\vec{r}\right) = \int\_{V} \rho\left(\vec{r}'\right) \rho\left(\vec{r}' - \vec{r}\right) d\vec{r}' \tag{4}$$

Now the scattering from a single fixed particle can be rewritten as,

368 Current Trends in X-Ray Crystallography

photon interacts only with one atom and the resulting scattered photon has the same energy of the incident photon (elastic scattering). This effect is mostly related the so called Thomson scattering. The complete solution of the scattered beam is a sum of a plane wave plus a spherical wave (Jackson, 1988). Since the information of the scattering process is related with the spherical wave only this part is considered to investigate the structural information of the particle (Guinier and Fournet, 1955; Glatter and Kratky, 1982; Feigin and Svergun, 1987). One possible way to understand the scattering process is to start from the concept of the scattering from a single particle, fixed in space. This is sketched on Fig.1, where an incident

Fig. 1. Representation of the scattering process for a fixed particle.

*q k*

*f q*

The measurable quantity is the scattering intensity *I*<sup>1</sup> *q*

1

Since the scattering is assumed to be elastic, the scattered wave with wave vector *k*

same modulus of the incident wave so the difference between the incident and the scattered

0 0 ,

*r iq r dr*

is given by the Fourier transformation of the particle electron

(2)

(3)

2 sin <sup>4</sup> 2 sin

which leads to the definition of the reciprocal space momentum transfer vector *q*. The

 <sup>1</sup> exp <sup>4</sup> *<sup>V</sup>*

*I q fq fq fq*

<sup>1</sup> exp

*V V*

 

*I q r r r iq r drdr*

2 \*

 

*qkk k k*

*<sup>q</sup> <sup>k</sup>* 

strikes the particle at the points *O* and *P*, separated by the vector *<sup>r</sup>*

.

(1)

, which is the square modulus of

has the

beam of wave vector 0 *k*

beam is given by:

scattering amplitude *f q*

the scattering amplitude:

 *r* :

density

$$M\_1(\vec{q}) = \int\_V \chi(\vec{r}) \exp\left(-i\vec{q}\cdot\vec{r}\right) d\vec{r} \tag{5}$$

The self-correlation function *r* has several properties and asymptotic limits enabling the retrieving of several general parameters. The interested reader is invited to read the seminal book from Guinier and Fournet (Guinier and Fournet, 1955) and the articles from Ciccariello (Ciccariello, 1985) among others. A theoretical calculation of a 2D scattering profile for a fixed particle in space is shown in Fig2.

Fig. 2. Theoretical calculation for a two dimensional scattering profile for a fixed ellipsoidal particle. The intensity is given in logarithm scale. Inserts: vertical and horizontal 1D-profiles of the intensity.

Equation (5) is still too general to be used in practice. In real systems, the particles investigated are not fixed in space but instead they might be randomly oriented. The averaging procedure can be made either in real space or in reciprocal space. From the mathematical point of view is easier to perform the average in reciprocal space, by an extra integration over the solid angle :

$$I\_1(q) = \left\langle f(q)^2 \right\rangle\_{\Omega} = \left(4\pi\right)^{-1} \int\_{\Omega} I\_1(\vec{q})d\Omega \tag{6}$$

Substituting (5) on (6) we have,

$$\begin{split} I\_{1}(q) &= \frac{1}{4\pi} \int\_{0}^{\omega\_{4}} \int\_{0}^{4\pi} r^{2} dr do \int\_{0}^{4\pi} d\Omega \gamma(\vec{r}) \exp\left(-i\vec{q}\cdot\vec{r}\right) \\ &= \int\_{0}^{\omega} r^{2} dr \int\_{0}^{4\pi} \gamma\left(\vec{r}\right) do \frac{1}{4\pi} \int\_{0}^{4\pi} \exp\left(-i\vec{q}\cdot\vec{r}\right) d\Omega \end{split} \tag{7}$$

Investigating Macromolecular Complexes in Solution by Small Angle X-Ray Scattering 371

The result given in equation (10) was derived for a single particle randomly oriented in

therefore it is necessary to extrapolate this result for a system of particles. One expression

system structure factor. For systems composed of identical particles (monodisperse sytems) the form factor is identical to the average scattering intensity of a single particle, *Pq I*<sup>1</sup> *q* . For polydisperse systems the form factor will be an average over the different sizes, electron densities and particle shapes. In this case, one usual procedure is to assume a known shape and electron density and performs the average over a distribution of sizes (Glatter and Kratky, 1982). On the other hand, the structure factor is related with particle interactions and there are several approaches to describe its behavior (Pedersen, 2002). For very diluted systems the particle interactions can be neglected and the structure factor is

indicating that, although the measured intensity is a contribution of a large number of particles, it contains the information of the scattering from a single particle randomly oriented. This shows that in a real case where the intensity *I(q)* is measured, it might be

As mentioned above, the particles can be considered to be immersed in a matrix with constant electron density. It can be shown that in this case the scattering event will only happen if there are differences between the electron density of the particles and the matrix.

approach is to consider the particle form factor to be normalized and the electronic mass is

 2 2 <sup>1</sup> *Iq N VP q* 

A schematic setup for a typical SAXS experiment is shown in figure 4. Specific technical details about geometries and configurations can be found in several sources (Lindner and Zemb, 2002) and it will not be presented in this chapter. However, some general characteristics have to be addressed. Since only a small fraction of the incident beam is scattered, the detectors should be set to detect reasonably low intensities. Therefore the incoming beam that passes without interaction with the sample has to be blocked by a beam

*<sup>0</sup>*, is the scattering contrast between the particles and the matrix, *V* is the

equal to 1. Therefore for a system of identical particles in a dilute regime we have

possible to obtain information about the single particle shape and conformation.

particle volume and *P1(q)* is the normalized form factor (*P1(q=0)=1*).

*(r) = (r)-*

**2.1 Experimental aspects and absolute calibration** 

*f q <sup>P</sup> <sup>q</sup>* is the so called particle form factor and the function *S q*( ) is the

<sup>2</sup> *I q N f q Sq* () () (11)

*Iq NI*<sup>1</sup> *q* (12)

*(r)* shown in equations 2-5 should be replaced by the

*<sup>0</sup>*. In order to make this point clearer, one usual

(13)

*<sup>0</sup>* and

Some interpretations for the *p(r)* function will be explained in the next sections.

for the intensity of a system of particles is given by

Where <sup>2</sup>

In this way the electron density

electron density contrast

explicitly shown:

 *= -*

where

space. In real systems the particles are dispersed in a matrix with electron density

Where the volume integral element for *dr* was written as <sup>2</sup> *r drd* , the spherical coordinates in real space. In the last operation the terms were rearranged. The angular integrals can be performed directly:

$$\begin{aligned} \left< \exp \left( -i\vec{q} \cdot \vec{r} \right) \right>\_{\Omega} &= \frac{\sin(qr)}{qr} \\ \left< \gamma \left( r \right) = \frac{1}{4\pi} \int\_{0}^{4\pi} \gamma \left( \vec{r} \right) d\phi = \left< \rho \left( \vec{r} \right) \* \rho \left( -\vec{r} \right) \right>\_{\phi} \end{aligned} \tag{8}$$

where the function *r* is known as average self-correlation function. With these substitutions the intensity for a single particle randomly oriented is given by

$$M\_1(q) = 4\pi \int\_0^\phi \gamma(r) \frac{\sin(qr)}{qr} r^2 dr \tag{9}$$

The theoretical intensity of an ellipsoidal particle randomly oriented in space if shown in Fig3. As can be directly seen, now the 2D spectra is angular independent and any cut starting from the center towards a radial direction will have the same profile.

Fig. 3. Theoretical calculation for a two dimensional scattering profile for an ellipsoidal particle randomly oriented. The intensity is given in logarithm scale. Insert: 1D-profiles of the intensity.

One usual procedure is to define the so called pair distance distribution function *p(r)*, <sup>2</sup> *p rrr* which is a histogram of pair distances inside of the particle, weighted by the distance length and by the product of the electron densities of the infinitesimal elements of the pair (Glatter, 2002). The *p(r)* function permits the definition of the Fourier pair:

$$\begin{aligned} I\_1(q) &= 4\pi \int\_0^\circ p(r) \frac{\sin(qr)}{qr} dr \\ p(r) &= \frac{r}{2\pi^2} \int\_0^\circ q^2 I\_1(q) \frac{\sin(qr)}{qr} dq \end{aligned} \tag{10}$$

in real space. In the last operation the terms were rearranged. The angular integrals can be

<sup>2</sup>

The theoretical intensity of an ellipsoidal particle randomly oriented in space if shown in Fig3. As can be directly seen, now the 2D spectra is angular independent and any cut

sin <sup>4</sup> *qr <sup>I</sup> <sup>q</sup> r r dr*

*qr*

0

Fig. 3. Theoretical calculation for a two dimensional scattering profile for an ellipsoidal particle randomly oriented. The intensity is given in logarithm scale. Insert: 1D-profiles of

One usual procedure is to define the so called pair distance distribution function *p(r)*,

distance length and by the product of the electron densities of the infinitesimal elements of

sin

the pair (Glatter, 2002). The *p(r)* function permits the definition of the Fourier pair:

1

*qr I q p r dr*

0

4

2

*<sup>r</sup> qr <sup>p</sup> r q I q dq*

2 2 1 0

which is a histogram of pair distances inside of the particle, weighted by the

sin

*qr*

(10)

*qr*

 

*r rd r r*

 

sin

*qr*

*qr iq r*

4

 

1 4

1

exp

0

substitutions the intensity for a single particle randomly oriented is given by

starting from the center towards a radial direction will have the same profile.

was written as <sup>2</sup> *r drd*

(9)

(8)

 

*r* is known as average self-correlation function. With these

, the spherical coordinates

Where the volume integral element for *dr*

performed directly:

where the function

the intensity.

 <sup>2</sup> *p rrr* 

Some interpretations for the *p(r)* function will be explained in the next sections. The result given in equation (10) was derived for a single particle randomly oriented in space. In real systems the particles are dispersed in a matrix with electron density *<sup>0</sup>* and therefore it is necessary to extrapolate this result for a system of particles. One expression for the intensity of a system of particles is given by

$$I(q) = N \left\langle f^2(\vec{q}) \right\rangle \left\langle S(\vec{q}) \right\rangle \tag{11}$$

Where <sup>2</sup> *f q <sup>P</sup> <sup>q</sup>* is the so called particle form factor and the function *S q*( ) is the system structure factor. For systems composed of identical particles (monodisperse sytems) the form factor is identical to the average scattering intensity of a single particle, *Pq I*<sup>1</sup> *q* . For polydisperse systems the form factor will be an average over the different sizes, electron densities and particle shapes. In this case, one usual procedure is to assume a known shape and electron density and performs the average over a distribution of sizes (Glatter and Kratky, 1982). On the other hand, the structure factor is related with particle interactions and there are several approaches to describe its behavior (Pedersen, 2002). For very diluted systems the particle interactions can be neglected and the structure factor is equal to 1. Therefore for a system of identical particles in a dilute regime we have

$$I(q) = N I\_1(q) \tag{12}$$

indicating that, although the measured intensity is a contribution of a large number of particles, it contains the information of the scattering from a single particle randomly oriented. This shows that in a real case where the intensity *I(q)* is measured, it might be possible to obtain information about the single particle shape and conformation.

As mentioned above, the particles can be considered to be immersed in a matrix with constant electron density. It can be shown that in this case the scattering event will only happen if there are differences between the electron density of the particles and the matrix. In this way the electron density *(r)* shown in equations 2-5 should be replaced by the electron density contrast *(r) = (r)-<sup>0</sup>*. In order to make this point clearer, one usual approach is to consider the particle form factor to be normalized and the electronic mass is explicitly shown:

$$I(q) = N\Delta\rho^2 V^2 P\_1(q) \tag{13}$$

where  *= -<sup>0</sup>*, is the scattering contrast between the particles and the matrix, *V* is the particle volume and *P1(q)* is the normalized form factor (*P1(q=0)=1*).

#### **2.1 Experimental aspects and absolute calibration**

A schematic setup for a typical SAXS experiment is shown in figure 4. Specific technical details about geometries and configurations can be found in several sources (Lindner and Zemb, 2002) and it will not be presented in this chapter. However, some general characteristics have to be addressed. Since only a small fraction of the incident beam is scattered, the detectors should be set to detect reasonably low intensities. Therefore the incoming beam that passes without interaction with the sample has to be blocked by a beam

Investigating Macromolecular Complexes in Solution by Small Angle X-Ray Scattering 373

molecular weight, which is a direct indication of the oligomerization state of the protein. Starting from equation (13), multiplying and dividing by the particle specific volume *v* and

<sup>2</sup> *I q c M NP*

mass (cm/g), *MW* is the molecular weight in kDa, *NA* is the Avogadro's number and *P(q)* is

2x1010cm/g (Oliveira and Pedersen, unpublished). The above equation directly shows that the molecular weight of the proteins can be directly estimated from the forward intensity

> <sup>2</sup> 0 *W A M*

*I M N c* 

In general, the precision on the molecular weight determination has an uncertainty of 10% - 20%, which enables to check the monodispersity of the sample or to indicate the oligomeric state. However, this approach is very dependent on the knowledge of the scattering contrast

From the above considerations it is possible to see that from the analysis of SAXS data it might be possible to obtain structural information about the studied system. There are several methods that can be used, depending on the knowledge about the system. Usually

might provide the particle shape, size, etc. This approach is the so called "inverse scattering problem", ie, retrieve real space information from the data in reciprocal space. The modeling is based on the comparison of a given model and experimental SAXS data. From the

*I q Iq q*

 

experimental points and *M* is the number of independent parameters used in the theoretical model. If a good fitting is achieved, the differences between the model and the experimental

to 1. Values considerably larger than 1 might indicate important discrepancies between the model and experimental data. However, it can also indicate underestimated uncertainties.

2

*2,* 2 2 /( )

() () ( ) *<sup>N</sup> i teo i*

*(qi)* an the theoretical intensity *Iteo(qi)* calculated for the same angular

2

(17)

function for the optimization procedure. Given a set of *N* experimental points *Iexp(qi)* with

the information that is desired is the scattering length density distribution

2 exp

data will have to be lower or equal to the standard deviations

normalized by *(N-M)* if *N* is reasonably larger than *M*, the <sup>2</sup>

1

*i i*

*M WA* / *q* (15)

*<sup>M</sup>* is the excess scattering length density per unit

(16)

*<sup>2</sup>* (chi-square) test is a good minimization

*<sup>R</sup> N M* , where *N* is the number of

*(qi)*. Therefore, since <sup>2</sup>

*<sup>R</sup>* for a good fit should be close

*(r)*, which

*<sup>R</sup>* is

*<sup>M</sup>* is

some simple algebraic manipulations is possible to rewrite the intensity *I(q)* as:

were *c* is the concentration in mg/mL,

and sample concentration.

**2.2 Modeling methods** 

standard deviations

values, *qi*, the

characteristics of scattering experiments the

*<sup>2</sup>* function is defined as:

A common practice is to use the reduced

*I(0)*:

the normalized form factor (*P(0)=1*). For proteins, a good approximation of

stopper, to avoid possible damaging of the detector. The size of the beam stopper depends on the equipment geometry.

Fig. 4. Schematic setup of a SAXS experiment.

In a typical experiment it is necessary to measure the intensity from the system (sample matrix+particles) and subtract the intensity from the matrix where the particles are immersed (blank). To normalize the data to absolute scale, scattering standards have to be used. In the applications described in the present chapter two procedures were applied. In one procedure a known protein is measured on the same sample conditions (buffer, temperature, etc) and the forward scattering obtained for this sample is used to normalize the other, unknown data. In other procedure water at 20ºC is used as primary standard. This is a convenient standard since the value of the scattering cross section can be calculated with very high accuracy from the fundamental macroscopic properties of water. In both cases, the data has to be normalized by the value obtained from the standard on the same experimental conditions and multiplied by the theoretical intensity value. Assuming that the sample and the blank are measured in the same cell, the treated intensity, normalized to absolute scale can be given by:

$$I\_{Treated}\left(q\right) = \left(\frac{I\_{sample}\left(q\right)}{\Phi\_s T\_s t\_s} - \frac{I\_{blank}\left(q\right)}{\Phi\_b T\_b t\_b}\right) \frac{d\Sigma\left(0\right) / d\Omega\_{sld}}{I\left(0\right)\_{std}}\tag{14}$$

Where *q* is modulus of the scattering vector, defined as *q = (4/) sin* , where *2* is the scattering angle as shown in Fig1 and Fig4 and is the wavelength of the monochromatic beam; *ITreated(q)* is the treated scattering intensities for the sample on absolute scale, i.e. the scattering cross section of the sample; *Isample(q)* is the raw data measured for the sample, *Iblank(q)* is the raw data from the matrix scattering; *I(0)std* is the value at *q = 0* of treated standard data (background subtracted and normalized by flux, transmission and acquisition time); *<sup>i</sup>* is the flux of the incident beam; *Ti* is the sample transmission and *ti* is the exposure time, where the index *i* is *s* (sample), *b* (blank); *(d/d)std* is the theoretical scattering cross section for the standard. For water at 20ºC this cross section has the value 0.01632 [cm-1]. For proteins in typical buffers without high amounts of salt, glycerol or other additives, the theoretical cross section for a system of proteins in solution with mass concentration *c* (in mg/mL) and molecular weight *MW* (in kDa) is given by *(d/d)std=6.645x10-4 c MW* [cm-1] (see equation 15 below)*.*

Having the data on absolute scale, information about its contrast, particle volume or particle concentration can be obtained, depending on the knowledge about the system. One very important parameter when studying proteins in solution is the determination of the molecular weight, which is a direct indication of the oligomerization state of the protein. Starting from equation (13), multiplying and dividing by the particle specific volume *v* and some simple algebraic manipulations is possible to rewrite the intensity *I(q)* as:

$$I(q) = c \left(\Delta \rho\_M\right)^2 \left(M\_W \;/\ N\_A\right) P(q) \tag{15}$$

were *c* is the concentration in mg/mL, *<sup>M</sup>* is the excess scattering length density per unit mass (cm/g), *MW* is the molecular weight in kDa, *NA* is the Avogadro's number and *P(q)* is the normalized form factor (*P(0)=1*). For proteins, a good approximation of *<sup>M</sup>* is 2x1010cm/g (Oliveira and Pedersen, unpublished). The above equation directly shows that the molecular weight of the proteins can be directly estimated from the forward intensity *I(0)*:

$$M\_{\rm W} = \frac{I\{0\}}{c\left(\Delta\rho\_{\rm M}\right)^{2}} N\_{A} \tag{16}$$

In general, the precision on the molecular weight determination has an uncertainty of 10% - 20%, which enables to check the monodispersity of the sample or to indicate the oligomeric state. However, this approach is very dependent on the knowledge of the scattering contrast and sample concentration.

#### **2.2 Modeling methods**

372 Current Trends in X-Ray Crystallography

stopper, to avoid possible damaging of the detector. The size of the beam stopper depends

In a typical experiment it is necessary to measure the intensity from the system (sample matrix+particles) and subtract the intensity from the matrix where the particles are immersed (blank). To normalize the data to absolute scale, scattering standards have to be used. In the applications described in the present chapter two procedures were applied. In one procedure a known protein is measured on the same sample conditions (buffer, temperature, etc) and the forward scattering obtained for this sample is used to normalize the other, unknown data. In other procedure water at 20ºC is used as primary standard. This is a convenient standard since the value of the scattering cross section can be calculated with very high accuracy from the fundamental macroscopic properties of water. In both cases, the data has to be normalized by the value obtained from the standard on the same experimental conditions and multiplied by the theoretical intensity value. Assuming that the sample and the blank are measured in the same cell, the treated intensity, normalized to

beam; *ITreated(q)* is the treated scattering intensities for the sample on absolute scale, i.e. the scattering cross section of the sample; *Isample(q)* is the raw data measured for the sample, *Iblank(q)* is the raw data from the matrix scattering; *I(0)std* is the value at *q = 0* of treated standard data (background subtracted and normalized by flux, transmission and acquisition

scattering cross section for the standard. For water at 20ºC this cross section has the value 0.01632 [cm-1]. For proteins in typical buffers without high amounts of salt, glycerol or other additives, the theoretical cross section for a system of proteins in solution with mass concentration *c* (in mg/mL) and molecular weight *MW* (in kDa) is given by

Having the data on absolute scale, information about its contrast, particle volume or particle concentration can be obtained, depending on the knowledge about the system. One very important parameter when studying proteins in solution is the determination of the

*I q Tt Tt I*

*sample blank std*

*I q I q d d*

*<sup>i</sup>* is the flux of the incident beam; *Ti* is the sample transmission and *ti* is the

*s ss b bb std*

 0 / 0

> */) sin*

> > */d*

is the wavelength of the monochromatic

(14)

*)std* is the theoretical

is the

, where *2*

on the equipment geometry.

absolute scale can be given by:

time);

*(d/d*

*Treated*

scattering angle as shown in Fig1 and Fig4 and

Where *q* is modulus of the scattering vector, defined as *q = (4*

exposure time, where the index *i* is *s* (sample), *b* (blank); *(d*

*)std=6.645x10-4 c MW* [cm-1] (see equation 15 below)*.*

Fig. 4. Schematic setup of a SAXS experiment.

From the above considerations it is possible to see that from the analysis of SAXS data it might be possible to obtain structural information about the studied system. There are several methods that can be used, depending on the knowledge about the system. Usually the information that is desired is the scattering length density distribution *(r)*, which might provide the particle shape, size, etc. This approach is the so called "inverse scattering problem", ie, retrieve real space information from the data in reciprocal space. The modeling is based on the comparison of a given model and experimental SAXS data. From the characteristics of scattering experiments the *<sup>2</sup>* (chi-square) test is a good minimization function for the optimization procedure. Given a set of *N* experimental points *Iexp(qi)* with standard deviations *(qi)* an the theoretical intensity *Iteo(qi)* calculated for the same angular values, *qi*, the *<sup>2</sup>* function is defined as:

$$\left|\boldsymbol{\chi}^{2}\right|^{2} = \sum\_{i=1}^{N} \frac{\left(I\_{\text{exp}}(q\_{i}) - I\_{\text{teo}}(q\_{i})\right)^{2}}{\sigma(q\_{i})^{2}}\tag{17}$$

A common practice is to use the reduced *2,* 2 2 /( ) *<sup>R</sup> N M* , where *N* is the number of experimental points and *M* is the number of independent parameters used in the theoretical model. If a good fitting is achieved, the differences between the model and the experimental data will have to be lower or equal to the standard deviations *(qi)*. Therefore, since <sup>2</sup> *<sup>R</sup>* is normalized by *(N-M)* if *N* is reasonably larger than *M*, the <sup>2</sup> *<sup>R</sup>* for a good fit should be close to 1. Values considerably larger than 1 might indicate important discrepancies between the model and experimental data. However, it can also indicate underestimated uncertainties.

Investigating Macromolecular Complexes in Solution by Small Angle X-Ray Scattering 375

 Solid Sphere Long Cylinder Long Prism Flat Particle Hollow Sphere Prolate ellipsoid

Fig. 5. Theoretical calculations for scattering intensities and corresponding *p(r)* functions for

Fig. 6. Theoretical calculations for scattering intensities and corresponding *p(r)* functions for bodies with simple shapes. The form factors were normalized by the forward scattering of

will have a *p(r)* function with a bell shape, with the maximum close to *(r/DMAX)/2*. Any anisotropy will move the maximum to the left, towards lower *r/DMAX* values. Elongated (prolate) particles with constant cross-section like cylinders or prisms will have *p(r)* functions with linearly descent regions. Flat (oblate) particles will have *p(r)* functions with shapes different from the two previous cases. Hollow particles will have *p(r)* functions with the maximum moved to the right, towards higher *r/DMAX* values. Dimeric particles will have *p(r)* with shoulders, as viewed in Fig6. Interestingly, the differences in the opening angle of a

 Monomer Dimer 60o Dimer 90o Dimer 120o

0.0 0.2 0.4 0.6 0.8 1.0

bodies with simple shapes. The form factors were normalized to one.

0 20 40 60 80

r [Å]

0.0 0.2 0.4 0.6 0.8 1.0

q [Å-1]

r/DMAX

p(r)

p(r)

the monomer.

Intensity

Intensity

0 10 20 30 40 50

qDMAX

On the other hand, values considerably lower than 1 can indicate overestimated uncertainty values.

#### **2.2.1 Indirect fourier transform – model independent approach**

In the theoretical description shown above, the pair distance distribution function *p(r)* was introduced as a natural step on the equation manipulation and, as indicated in equation (10), it forms a Fourier pair with the scattering intensity of a single particle *I1(q)*. Since the total intensity from a system is proportional to the scattering of a single particle (equation (12)), this procedure might be used to calculate the real space function *p(r)* from measured scattering data. This procedure has intrinsic limitations since the Fourier transformation involve integrals from 0 to infinity and the measured scattering data is only obtained for a very small region of reciprocal space. As a consequence, direct calculations of the *p(r)* function from the integral of *I(q)* are usually not successful since the truncation of the integral leads to strong oscillations of the *p(r)* function. Another method was introduced by Glatter (Glatter, 1977) and it is known as Indirect Fourier Transformation method (Program ITP and GIFT; Glatter, 1977; Bergmann et al, 2000; Fritz and Glatter, 2006). In this approach one starts from the *p(r)* function, describing it using a set of base functions (in the Glatter method, spline functions) and perform the Fourier transformations on those functions in order to have a similar set of base functions in reciprocal space. Since all operations are linear, the coefficients of the *p(r)* base functions are the same as the ones for the *I(q)* base functions and therefore by the fitting of the experimental data one can direct obtain the best set of coefficients and consequently the best *p(r)* functions. Since the interval of *I(q)* is still limited, this operation also leads to oscillating *p(r)* functions. In order to avoid this problem, Glatter introduced a damping parameter that is selected in the fitting procedure in order to provide a smooth *p(r)* function. A similar approach was used by Svergun and co-workers (Semenyuk and Svergun, 1991) in the program package GNOM. In both cases the fitting process is iterative and the user has to obtain the maximum particle size *DMAX* that gives the best fit and *p(r)* function. In an interesting development Hansen (Hansen, 2000) proposed a method where the maximum dimension is obtained using Baesyan probabilities. Recently, performing a procedure based on the Glatter method (Pedersen et al, 1994), Oliveira and Pedersen developed a procedure that enabled the calculation of the *p(r)* function from both diluted (program WIFT) and concentrated systems (program WGIFT), where structure factors are taken into account in the optimization (Oliveira et al, 2009). The calculation of the *p(r)* function for concentrated systems was also implemented by Glatter in a new implementation of his approach (Program GIFT) by optimization using simulated annealing.

A common result of all the above program packages is the pair distance distribution function *p(r)*. As mentioned above, this function is a histogram of pair distances inside of the particle, weighted by the distance length and by the product of the electron densities of the infinitesimal elements of the pair. For particles with finite size, it will exists a maximum distance from which the *p(r)* function is zero. This corresponds to the maximum size of the particle. Since the histogram is weighted by the distance length, the *p(r)* function also might starts from zero. In this way, it is easy to see that the *p(r)* should start from zero and ends at zero when reach the maximum particle size. The shape of the function will be a consequence of the particle shape and electron density distribution. A set of theoretical calculations for the *p(r)* function is shown in Fig5, Fig6 and Fig7. In Fig5 one can see that globular particles

On the other hand, values considerably lower than 1 can indicate overestimated uncertainty

In the theoretical description shown above, the pair distance distribution function *p(r)* was introduced as a natural step on the equation manipulation and, as indicated in equation (10), it forms a Fourier pair with the scattering intensity of a single particle *I1(q)*. Since the total intensity from a system is proportional to the scattering of a single particle (equation (12)), this procedure might be used to calculate the real space function *p(r)* from measured scattering data. This procedure has intrinsic limitations since the Fourier transformation involve integrals from 0 to infinity and the measured scattering data is only obtained for a very small region of reciprocal space. As a consequence, direct calculations of the *p(r)* function from the integral of *I(q)* are usually not successful since the truncation of the integral leads to strong oscillations of the *p(r)* function. Another method was introduced by Glatter (Glatter, 1977) and it is known as Indirect Fourier Transformation method (Program ITP and GIFT; Glatter, 1977; Bergmann et al, 2000; Fritz and Glatter, 2006). In this approach one starts from the *p(r)* function, describing it using a set of base functions (in the Glatter method, spline functions) and perform the Fourier transformations on those functions in order to have a similar set of base functions in reciprocal space. Since all operations are linear, the coefficients of the *p(r)* base functions are the same as the ones for the *I(q)* base functions and therefore by the fitting of the experimental data one can direct obtain the best set of coefficients and consequently the best *p(r)* functions. Since the interval of *I(q)* is still limited, this operation also leads to oscillating *p(r)* functions. In order to avoid this problem, Glatter introduced a damping parameter that is selected in the fitting procedure in order to provide a smooth *p(r)* function. A similar approach was used by Svergun and co-workers (Semenyuk and Svergun, 1991) in the program package GNOM. In both cases the fitting process is iterative and the user has to obtain the maximum particle size *DMAX* that gives the best fit and *p(r)* function. In an interesting development Hansen (Hansen, 2000) proposed a method where the maximum dimension is obtained using Baesyan probabilities. Recently, performing a procedure based on the Glatter method (Pedersen et al, 1994), Oliveira and Pedersen developed a procedure that enabled the calculation of the *p(r)* function from both diluted (program WIFT) and concentrated systems (program WGIFT), where structure factors are taken into account in the optimization (Oliveira et al, 2009). The calculation of the *p(r)* function for concentrated systems was also implemented by Glatter in a new implementation of his approach (Program GIFT) by optimization using simulated

A common result of all the above program packages is the pair distance distribution function *p(r)*. As mentioned above, this function is a histogram of pair distances inside of the particle, weighted by the distance length and by the product of the electron densities of the infinitesimal elements of the pair. For particles with finite size, it will exists a maximum distance from which the *p(r)* function is zero. This corresponds to the maximum size of the particle. Since the histogram is weighted by the distance length, the *p(r)* function also might starts from zero. In this way, it is easy to see that the *p(r)* should start from zero and ends at zero when reach the maximum particle size. The shape of the function will be a consequence of the particle shape and electron density distribution. A set of theoretical calculations for the *p(r)* function is shown in Fig5, Fig6 and Fig7. In Fig5 one can see that globular particles

**2.2.1 Indirect fourier transform – model independent approach** 

values.

annealing.

Fig. 5. Theoretical calculations for scattering intensities and corresponding *p(r)* functions for bodies with simple shapes. The form factors were normalized to one.

Fig. 6. Theoretical calculations for scattering intensities and corresponding *p(r)* functions for bodies with simple shapes. The form factors were normalized by the forward scattering of the monomer.

will have a *p(r)* function with a bell shape, with the maximum close to *(r/DMAX)/2*. Any anisotropy will move the maximum to the left, towards lower *r/DMAX* values. Elongated (prolate) particles with constant cross-section like cylinders or prisms will have *p(r)* functions with linearly descent regions. Flat (oblate) particles will have *p(r)* functions with shapes different from the two previous cases. Hollow particles will have *p(r)* functions with the maximum moved to the right, towards higher *r/DMAX* values. Dimeric particles will have *p(r)* with shoulders, as viewed in Fig6. Interestingly, the differences in the opening angle of a

Investigating Macromolecular Complexes in Solution by Small Angle X-Ray Scattering 377

several reviews (Pedersen, 1997). Also, advanced modeling approaches can be found in

The main advantage of the use of analytical or semi analytical expressions describing the form factor is that, usually, there is a low number of parameters to adjust against experimental data, permitting the determination of structural information with reasonable reliability. Also, if the model does not fit the data correctly, this directly indicates that the particle shape is different from the one that is been assumed. One example of application si presented in Fig8 where the model of an elongated cylinder was used to described the SAXS data of mature glucagon fibers. In several cases, the particle possible shape is known but the calculations of the integrals is impractical. In these cases it is possible to use the finite element method which consists of build up the particle shape using known subunits. One approach is to use spherical subunits and apply the Debye formula to calculate the intensity

<sup>2</sup>

, 1 , *<sup>N</sup>* sin *ij bead*

*i j ij*

(18)

2

This procedure enables the calculation of very complicated models. From this calculation the model parameters can be optimized against experimental data (Oliveira et all, 2009,

Fig. 8. Example of application of the use of a known form factor . Experimental data (open circles) of a mature fiber of Glucagon (Oliveira et al, 2009) and the theoretical fit (solid line), assuming a form factor of cylinders, with radius R and length L. The SAXS data was

measured at the laboratory SAXS instrument NanostarTM, from Professor Jan Skov Pedersen

In some applications, the particle shape is known but the electron density profile and overall dimensions have to be determined. Amphifilic molecules like surfactants and several types of diblock copolymers self assemble into structures that can be analyzed in this way. Several propositions for the deconvolution square root procedure can be found in the literature (Pape and Kreutz, 1978; Nagle and Wierner, 1989). An initial approach was to take the

**2.2.3 Deconvolution square root – obtaining the electron density profile** 

*f qr qr P q <sup>N</sup> qr*

1

*Model*

specialized articles in the literature (Székely et al, 2010).

(Debye, 1915; Glatter, 1980):

at University of Aarhus, Denmark.

2010).

dimeric particle are easier to detect in the *p(r)* function than in the intensity *I(q)*. Finally, particles with differences in the scattering length contrast might have *p(r)* functions with negative portions as indicated in Fig7. For a broader and deeper review on the *p(r)* interpretation the reader is invited to read several works in the literature (Glatter, 1979; Glatter and Kratky, 1982). The important point of this modeling approach is that, apart from the assumption that the system is composed of identical particles, no other hypothesis are made and the *p(r)* function provides a direct insight about the particle shape and dimensions. This approach is widely used in analysis of SAXS data because it provides a first guess about the particle shape.

Fig. 7. Theoretical calculations for scattering intensities and corresponding *p(r)* functions for a core-shell particle with different scattering length contrasts.

#### **2.2.2 Model dependent approach – assuming a known form factor**

For simple particle shapes it is possible to integrate equation (2) and obtain the amplitude form factor *f q* . Then, performing the angular integral given in equation (6) it is possible to obtain a analytical or semi-analytical expression for *I*<sup>1</sup> *q* . Some examples are shown in Table 1. A more complete list of analytical expressions for form factors can be found in


Table 1. Few examples of semi-analytical expressions of the scattering intensity calculated for particles with simple shapes.

dimeric particle are easier to detect in the *p(r)* function than in the intensity *I(q)*. Finally, particles with differences in the scattering length contrast might have *p(r)* functions with negative portions as indicated in Fig7. For a broader and deeper review on the *p(r)* interpretation the reader is invited to read several works in the literature (Glatter, 1979; Glatter and Kratky, 1982). The important point of this modeling approach is that, apart from the assumption that the system is composed of identical particles, no other hypothesis are made and the *p(r)* function provides a direct insight about the particle shape and dimensions. This approach is widely used in analysis of SAXS data because it provides a

p(r)

0.0 0.2 0.4 0.6 0.8 1.0

r/DMAX

. Then, performing the angular integral given in equation (6) it is possible

1 2

 <sup>2</sup> 11 1 22 2

*VR f qR VR f qR I q VR VR* 

() , () , () ()

 

 

2 sin sin cos / 2 sin sin cos / 2 *J qR qL I q <sup>d</sup> qR qL*

> 

   

 

*qR*

2

<0

>0

 

Fig. 7. Theoretical calculations for scattering intensities and corresponding *p(r)* functions for

For simple particle shapes it is possible to integrate equation (2) and obtain the amplitude

to obtain a analytical or semi-analytical expression for *I*<sup>1</sup> *q* . Some examples are shown in Table 1. A more complete list of analytical expressions for form factors can be found in

> 2 1 1 3 3 sin cos *qR qR qR Iq fq*

> > 1

Table 1. Few examples of semi-analytical expressions of the scattering intensity calculated

*I q f qrR d* , ( , , ) sin

1/2 2 2 *rR R* ( , , ) sin cos

 

first guess about the particle shape.

I(q)

form factor *f q*

R2

Homogeneus sphere with

Spherical shell with inner radius R1 and outer radius

Homogeneous cylinder

Ellipsoid of revolution with semi axes R, R and R

for particles with simple shapes.

0 5 10 15 20 25

Core-Shell particle

qDMAX

a core-shell particle with different scattering length contrasts.

Shape Normalized Form Factor

**2.2.2 Model dependent approach – assuming a known form factor** 

radius R

<sup>3</sup> *VR R* ( ) 4 /3 

> > 0

/2 1 1 0

with radius R and height L /2 <sup>2</sup>

1

1

several reviews (Pedersen, 1997). Also, advanced modeling approaches can be found in specialized articles in the literature (Székely et al, 2010).

The main advantage of the use of analytical or semi analytical expressions describing the form factor is that, usually, there is a low number of parameters to adjust against experimental data, permitting the determination of structural information with reasonable reliability. Also, if the model does not fit the data correctly, this directly indicates that the particle shape is different from the one that is been assumed. One example of application si presented in Fig8 where the model of an elongated cylinder was used to described the SAXS data of mature glucagon fibers. In several cases, the particle possible shape is known but the calculations of the integrals is impractical. In these cases it is possible to use the finite element method which consists of build up the particle shape using known subunits. One approach is to use spherical subunits and apply the Debye formula to calculate the intensity (Debye, 1915; Glatter, 1980):

$$P\_{M\text{odd}}\left(q\right) = \frac{f\_1\left(q, r\_{\text{bean}}\right)^2}{N^2} \sum\_{i,j=1}^N \frac{\sin\left(qr\_{ij}\right)}{qr\_{ij}}\tag{18}$$

This procedure enables the calculation of very complicated models. From this calculation the model parameters can be optimized against experimental data (Oliveira et all, 2009, 2010).

Fig. 8. Example of application of the use of a known form factor . Experimental data (open circles) of a mature fiber of Glucagon (Oliveira et al, 2009) and the theoretical fit (solid line), assuming a form factor of cylinders, with radius R and length L. The SAXS data was measured at the laboratory SAXS instrument NanostarTM, from Professor Jan Skov Pedersen at University of Aarhus, Denmark.

#### **2.2.3 Deconvolution square root – obtaining the electron density profile**

In some applications, the particle shape is known but the electron density profile and overall dimensions have to be determined. Amphifilic molecules like surfactants and several types of diblock copolymers self assemble into structures that can be analyzed in this way. Several propositions for the deconvolution square root procedure can be found in the literature (Pape and Kreutz, 1978; Nagle and Wierner, 1989). An initial approach was to take the

Investigating Macromolecular Complexes in Solution by Small Angle X-Ray Scattering 379

fitting of the experimental scattering data enabled a direct *ab initio* determination of the three-dimensional shape. Since the representation using spherical harmonics only enables the construction of smooth shapes without sharp edges or corners, this approach provides a rough representation of the particle shape. In this way it can be said that this method provides a very low resolution approximation of the scattering data and usually enables the fitting of only the initial part of the scattering intensity. Program packages that enables the fitting of experimental data are available in the literature (programs ASSA and SASHA, Kozin et al, 1997). An example of this procedure is shown in fig10 where the experimental data for lysozime in solution was adjusted using multipolar expansion by using the program SASHA. It can be directly seen that the correct anisotropy and overall shape can be obtained from this approach. In all the examples shown in figures 10-14 the measurements

were performed in the SAXS beamline of Brazilian Synchrotron Light Laboratory.

10-1

Fig. 10. *Ab initio* modeling of experimental SAXS data using multipolar expansion. Left: on red it is shown the model obtained superposed with the backbone of the protein obtained from its known crystallographic structure (pdb entry *6lyz.pdb*). Right: Fit of experimental

A further improvement on the *ab initio* procedure for modeling SAXS data was proposed initially by Chacón (program DALAI, Chacón et al, 1998), and later by Svergun (Program DAMMIN, Svergun, 1999), Doniach (program SAXS3D, Walter et al, 2000) among others. In this method the particle is build using the finite element approach, by the use of a closed packing arrangement of spherical subunits. Since the number of possible solutions is very large, Monte Carlo based optimization methods are used to obtain the set of spherical subunits that gives the best fitting of the scattering data. The program DAMMIN is widely cited in the literature and starts by creating a spherical search space with diameter equal or slightly larger than the particle maximum diameter *DMAX* (obtained from the *p(r)* curve). By the application of a simulated annealing procedure, constrained by penalty functions that ensure particle compactness and smoothness (Volkov and Svergun, 2002), a subset of the initial search space can be obtained providing a three-dimensional model that represents the particle shape. Due to the intrinsic randomness of Monte Carlo approaches, several independent runs of these model procedures will lead to different models. However, it is possible to show that all models might share similar features like overall anisotropy, size, etc. This model approach permits a better representation of the particle shape than the multipolar expansion since it does not have the above mentioned limitations for the shape description. However, since the internal structure is not represented, this method cannot describe data up to high *q* values (Volkov and Svergun, 2002). One example of this so called

100

I(q) [arb. u.]

data. Open circles – experimental data. Solid line – model fit.

101

0.0 0.1 0.2 0.3

 Experimental Model Fit

q [Å-1]

square root of the scattering intensity, which gives an absolute value for the amplitude function *f(q)*. Then, by Fourier transforming this function is possible to retrieve the electron density distribution *(r)*. This procedure has serious problems since the signals of the *f(q)* function has to be guessed an also the very short interval of data on reciprocal space precludes a trustful calculation of the inverse Fourier transformation. A more stable process was proposed by Glatter (Glatter 1981; Glatter 1984, Bergmann et al, 2000) where the deconvolution is made by the use of the *p(r)* function. Apart of the overall sign of the electron density profile (1 factor), this procedure enables a correct estimation of the electron density profile, and has been used in several applications (Rathgeber et al, 2002). One example of application of this method is presented in Fig9, where the radial electron density of SDS micelles could be obtained.

Fig. 9. Example of application of the deconvolution method. A) Experimental SDS data (open circles) and the IFT fit (solid line). B) IFT Calculated *p(r)* function (open circles) and the theoretical *p(r)* obtained from the deconvolution method (solid line). C) Restored radial electron density profile presented as step functions (dashed line) and by the use of a smooth approximation (solid line). The SAXS data was measured at the laboratory SAXS instrument NanostarTM, from Professor Jan Skov Pedersen at University of Aarhus, Denmark.

#### **2.2.4 Ab initio modeling – an overview**

The shape of the scattering function is directly related to the three-dimensional shape of the particle. However, since the particles are randomly oriented (equation 6) and there is only a limited measurable region in reciprocal space, the information content in a SAXS curve is very low (Patel and Schimidt, 1971). Nevertheless, even with these limitations important developments occurred in the last decades have proof that it is possible to obtain a 3D model from the 1D SAXS curves. Starting from the seminal work from Sturhmann in late 70's (Sturhmann, 1973; Sturhmann and Miller, 1978), Svergun and co-workers had used a set of spherical harmonics (multipolar expansion) to describe the particle electron density and by the use of a nonlinear minimization procedure it is possible to obtain the set of spherical harmonics coefficients that gives the best fit of the scattering data. Details on the calculations and the representation of the scattering intensity using spherical harmonics can be found in the original articles (Svergun and Sturhmann, 1991). The success of this method has shown that, even though it is not possible to obtain a unique solution for the particle shape, the

square root of the scattering intensity, which gives an absolute value for the amplitude function *f(q)*. Then, by Fourier transforming this function is possible to retrieve the electron

function has to be guessed an also the very short interval of data on reciprocal space precludes a trustful calculation of the inverse Fourier transformation. A more stable process was proposed by Glatter (Glatter 1981; Glatter 1984, Bergmann et al, 2000) where the deconvolution is made by the use of the *p(r)* function. Apart of the overall sign of the electron density profile (1 factor), this procedure enables a correct estimation of the electron density profile, and has been used in several applications (Rathgeber et al, 2002). One example of application of this method is presented in Fig9, where the radial electron

Fig. 9. Example of application of the deconvolution method. A) Experimental SDS data (open circles) and the IFT fit (solid line). B) IFT Calculated *p(r)* function (open circles) and the theoretical *p(r)* obtained from the deconvolution method (solid line). C) Restored radial electron density profile presented as step functions (dashed line) and by the use of a smooth approximation (solid line). The SAXS data was measured at the laboratory SAXS instrument

NanostarTM, from Professor Jan Skov Pedersen at University of Aarhus, Denmark.

The shape of the scattering function is directly related to the three-dimensional shape of the particle. However, since the particles are randomly oriented (equation 6) and there is only a limited measurable region in reciprocal space, the information content in a SAXS curve is very low (Patel and Schimidt, 1971). Nevertheless, even with these limitations important developments occurred in the last decades have proof that it is possible to obtain a 3D model from the 1D SAXS curves. Starting from the seminal work from Sturhmann in late 70's (Sturhmann, 1973; Sturhmann and Miller, 1978), Svergun and co-workers had used a set of spherical harmonics (multipolar expansion) to describe the particle electron density and by the use of a nonlinear minimization procedure it is possible to obtain the set of spherical harmonics coefficients that gives the best fit of the scattering data. Details on the calculations and the representation of the scattering intensity using spherical harmonics can be found in the original articles (Svergun and Sturhmann, 1991). The success of this method has shown that, even though it is not possible to obtain a unique solution for the particle shape, the

*(r)*. This procedure has serious problems since the signals of the *f(q)*

density distribution

density of SDS micelles could be obtained.

**2.2.4 Ab initio modeling – an overview** 

fitting of the experimental scattering data enabled a direct *ab initio* determination of the three-dimensional shape. Since the representation using spherical harmonics only enables the construction of smooth shapes without sharp edges or corners, this approach provides a rough representation of the particle shape. In this way it can be said that this method provides a very low resolution approximation of the scattering data and usually enables the fitting of only the initial part of the scattering intensity. Program packages that enables the fitting of experimental data are available in the literature (programs ASSA and SASHA, Kozin et al, 1997). An example of this procedure is shown in fig10 where the experimental data for lysozime in solution was adjusted using multipolar expansion by using the program SASHA. It can be directly seen that the correct anisotropy and overall shape can be obtained from this approach. In all the examples shown in figures 10-14 the measurements were performed in the SAXS beamline of Brazilian Synchrotron Light Laboratory.

Fig. 10. *Ab initio* modeling of experimental SAXS data using multipolar expansion. Left: on red it is shown the model obtained superposed with the backbone of the protein obtained from its known crystallographic structure (pdb entry *6lyz.pdb*). Right: Fit of experimental data. Open circles – experimental data. Solid line – model fit.

A further improvement on the *ab initio* procedure for modeling SAXS data was proposed initially by Chacón (program DALAI, Chacón et al, 1998), and later by Svergun (Program DAMMIN, Svergun, 1999), Doniach (program SAXS3D, Walter et al, 2000) among others. In this method the particle is build using the finite element approach, by the use of a closed packing arrangement of spherical subunits. Since the number of possible solutions is very large, Monte Carlo based optimization methods are used to obtain the set of spherical subunits that gives the best fitting of the scattering data. The program DAMMIN is widely cited in the literature and starts by creating a spherical search space with diameter equal or slightly larger than the particle maximum diameter *DMAX* (obtained from the *p(r)* curve). By the application of a simulated annealing procedure, constrained by penalty functions that ensure particle compactness and smoothness (Volkov and Svergun, 2002), a subset of the initial search space can be obtained providing a three-dimensional model that represents the particle shape. Due to the intrinsic randomness of Monte Carlo approaches, several independent runs of these model procedures will lead to different models. However, it is possible to show that all models might share similar features like overall anisotropy, size, etc. This model approach permits a better representation of the particle shape than the multipolar expansion since it does not have the above mentioned limitations for the shape description. However, since the internal structure is not represented, this method cannot describe data up to high *q* values (Volkov and Svergun, 2002). One example of this so called

Investigating Macromolecular Complexes in Solution by Small Angle X-Ray Scattering 381

10-4

Fig. 12. *Ab initio* modeling of experimental SAXS data using dummy chain modeling. Left: optimized backbone structure (solid spheres) superimposed by the backbone obtained from the protein known crystallographic structure (pdb entry *6lyz.pdb*). Right: Fit of experimental

overall size, shape and anisotropy can be obtained from these approaches. Another very useful application of the study of proteins in solution is the use of known atomic resolution data in connection with SAXS data. If the full atomic model for the protein is known, the comparison of the theoretical scattering intensity against experimental data provides direct information about the conformation of the protein in solution in comparison with the atomic resolution structure. A good fit indicates that the structure of the protein in solution is similar to the given by the atomic resolution model. Discrepancies in the fit indicate differences between the atomic resolution model and the protein structure in solution. A widely cited procedure that enabled a successful comparison between experimental data and atomic resolution structures was developed by Svergun and co-workers (program CRYSOL, Svergun et al, 1995) where it was demonstrated that a hydration shell around the protein with slightly higher electron density than the one from the bulk was necessary to be considered. One example of this comparison between experimental data and theoretical SAXS intensity calculated from atomic coordinates is shown in Fig13 where the crystallographic structure for the protein lysozyme was used for the comparison with

10-3

I(q) [arb. u.]

data. Open circles – experimental data. Solid line – model fit.

experimental data of lysozyme in solution.

10-2

0.0 0.2 0.4 0.6 0.8 1.0 <sup>10</sup>-5

0.0 0.2 0.4 0.6 0.8 <sup>10</sup>-5

q [Å-1]

 Experimental Intensity calculated from Crystal Structure

10-4

Fig. 13. *Ab initio* modeling of experimental SAXS data using dummy chain modeling. Left: representation of the crystallographic structure of lysozyme (pdb entry *6lyz.pdb*). Right: Fit

One of the major applicability of the use of SAXS data and the knowledge about atomic resolution models for proteins is for the cases where just part of the structure is known. In

of experimental data. Open circles – experimental data. Solid line – theoretical fit.

10-3

I(q) [arb. u.]

10-2

q [Å-1]

 Experimental Model Fit

"dummy atom modeling" is shown in Fig11 where the *ab initio* model was obtained from SAXS data of lysozyme in solution.

Fig. 11. *Ab initio* modeling of experimental SAXS data using dummy atom modeling. Left: model results. Semitransparent spheres – initial search space. Solid spheres – selected subset that gives the best fit. Blue backbone – protein backbone obtained from its known crystallographic structure (pdb entry *6lyz.pdb*). Right: Fit of experimental data. Open circles – experimental data. Solid line – model fit.

When dealing with SAXS data from proteins, one additional very useful constraint can be used for the model building. Proteins are composed of a sequence of aminoacids, which forms its backbone, known as primary structure. This primary sequence folds into specific patterns like -helices, -sheets, turns, etc, composing the secondary structure. Finally, the secondary structure folds into a specific three-dimensional arrangement, known as tertiary structure. In some cases this protein can even be part of a supramolecular complex which comprises the quaternary structure (Voet et al, 2008). Due to the intrinsic low resolution and information of a SAXS data, the information about the atomic resolution structure or secondary structure cannot be accessed but instead, the overall shape and size. However, the information of sequence continuity can be used as a constraint to enable a better modeling of proteins in solution. This procedure was implemented by Svergun, Pethoukov and co-workerks in the "dummy chain model" approach (Program GASBOR, Svergun et al, 2001). In this method a sequence of interconnected chains is used to represent the protein backbone. Each sphere corresponds to one amino acid and therefore the total number of spheres is identical to the number of protein residues. Starting from a spherical arrangement of the backbone the optimization program performs a simulated annealing optimization in which the backbone three-dimensional arrangement is changed in order to provide the best fitting of the scattering data. Similarly to the dummy atom approach, the theoretical intensity is calculated using a variation of the Debye formula (equation 18). The natural constraint imposed by the continuity of the backbone makes leads to better representations of protein structures. Also, this approach can fit experimental data up to higher *q* values than the previous ones since the internal structure of the protein is somehow represented by the backbone. One example of this so called "dummy chain modeling" is shown in Fig12 where the *ab initio* model was obtained from SAXS data of lysozyme in solution.

The previous examples showed the possibility of apply *ab initio* methods to retrieve the three-dimensional structure. Although the model results for the dummy atom and dummy chain methods are not unique due to the heuristic nature of the optimization methods, the

"dummy atom modeling" is shown in Fig11 where the *ab initio* model was obtained from

10-1

Fig. 11. *Ab initio* modeling of experimental SAXS data using dummy atom modeling. Left: model results. Semitransparent spheres – initial search space. Solid spheres – selected subset

crystallographic structure (pdb entry *6lyz.pdb*). Right: Fit of experimental data. Open circles

When dealing with SAXS data from proteins, one additional very useful constraint can be used for the model building. Proteins are composed of a sequence of aminoacids, which forms its backbone, known as primary structure. This primary sequence folds into specific patterns like -helices, -sheets, turns, etc, composing the secondary structure. Finally, the secondary structure folds into a specific three-dimensional arrangement, known as tertiary structure. In some cases this protein can even be part of a supramolecular complex which comprises the quaternary structure (Voet et al, 2008). Due to the intrinsic low resolution and information of a SAXS data, the information about the atomic resolution structure or secondary structure cannot be accessed but instead, the overall shape and size. However, the information of sequence continuity can be used as a constraint to enable a better modeling of proteins in solution. This procedure was implemented by Svergun, Pethoukov and co-workerks in the "dummy chain model" approach (Program GASBOR, Svergun et al, 2001). In this method a sequence of interconnected chains is used to represent the protein backbone. Each sphere corresponds to one amino acid and therefore the total number of spheres is identical to the number of protein residues. Starting from a spherical arrangement of the backbone the optimization program performs a simulated annealing optimization in which the backbone three-dimensional arrangement is changed in order to provide the best fitting of the scattering data. Similarly to the dummy atom approach, the theoretical intensity is calculated using a variation of the Debye formula (equation 18). The natural constraint imposed by the continuity of the backbone makes leads to better representations of protein structures. Also, this approach can fit experimental data up to higher *q* values than the previous ones since the internal structure of the protein is somehow represented by the backbone. One example of this so called "dummy chain modeling" is shown in Fig12

that gives the best fit. Blue backbone – protein backbone obtained from its known

where the *ab initio* model was obtained from SAXS data of lysozyme in solution.

The previous examples showed the possibility of apply *ab initio* methods to retrieve the three-dimensional structure. Although the model results for the dummy atom and dummy chain methods are not unique due to the heuristic nature of the optimization methods, the

100

I(q) [arb. u.]

101

0.0 0.1 0.2 0.3 0.4 0.5 10-2

q [Å-1 ]

 Experimental Model Fit

SAXS data of lysozyme in solution.

– experimental data. Solid line – model fit.

Fig. 12. *Ab initio* modeling of experimental SAXS data using dummy chain modeling. Left: optimized backbone structure (solid spheres) superimposed by the backbone obtained from the protein known crystallographic structure (pdb entry *6lyz.pdb*). Right: Fit of experimental data. Open circles – experimental data. Solid line – model fit.

overall size, shape and anisotropy can be obtained from these approaches. Another very useful application of the study of proteins in solution is the use of known atomic resolution data in connection with SAXS data. If the full atomic model for the protein is known, the comparison of the theoretical scattering intensity against experimental data provides direct information about the conformation of the protein in solution in comparison with the atomic resolution structure. A good fit indicates that the structure of the protein in solution is similar to the given by the atomic resolution model. Discrepancies in the fit indicate differences between the atomic resolution model and the protein structure in solution. A widely cited procedure that enabled a successful comparison between experimental data and atomic resolution structures was developed by Svergun and co-workers (program CRYSOL, Svergun et al, 1995) where it was demonstrated that a hydration shell around the protein with slightly higher electron density than the one from the bulk was necessary to be considered. One example of this comparison between experimental data and theoretical SAXS intensity calculated from atomic coordinates is shown in Fig13 where the crystallographic structure for the protein lysozyme was used for the comparison with experimental data of lysozyme in solution.

Fig. 13. *Ab initio* modeling of experimental SAXS data using dummy chain modeling. Left: representation of the crystallographic structure of lysozyme (pdb entry *6lyz.pdb*). Right: Fit of experimental data. Open circles – experimental data. Solid line – theoretical fit.

One of the major applicability of the use of SAXS data and the knowledge about atomic resolution models for proteins is for the cases where just part of the structure is known. In

Investigating Macromolecular Complexes in Solution by Small Angle X-Ray Scattering 383

data and other experimental data is been tried. A simultaneous modeling of SAXS and NRM data was proposed by Mareuil and co-workers in the program DADIMODO (Mareiul et al, 2007). Also, automatic tools for the use of complementarity between SAXS and NMR is been currently developed in connection with the SAXIER project (Svergun, 2007; Svergun, 2009).

10<sup>0</sup>

Fig. 15. Rigid body modeling of a hypothetical structure using calculated SAXS data. Left: hypothetical heterodimer built using two atomic resolution structures. Semitransparent spheres: original structure. Blue and green strands: optimized heterodimer. Right: Fit of the generated data. Open circles – generated SAXS data for the heterodimer. The data was created using program CRYSOL from the built model. Standard deviations were added in order to mimic experimental uncertainties. Solid line – fitting of the scattering data for the

Two applications of SAXS analysis will be presented. In the first case, an *in-situ* aggregation study of lysozyme is presented. As a second example a structural characterization of a giant protein complex is described. These two cases are good examples of the application of the

The structure of proteins is intrinsically related to its shape. The protein shape, on the other hand, is a result of the protein folding. In the native state, proteins are known to adopt hierarchical structures, which might be a result of a multistep folding process. One possible way to investigate this characteristic is to induce protein denaturation. The denaturation or unfolding can be induced by changes in temperature, pH, or even by the addition of denaturant agents like sodium dodecyl sulfate (SDS). A study of denaturation induced by

The experiments were performed at the SAXS beamline of the Brazilian Synchrotron Light Laboratory, Campinas, Brazil. The wavelength selected for the experiments was λ =1.49 Å and the distance between the sample and detector was 745 mm. The measurements were performed using a 1D Gabriel-type detector. The samples were exposed in a 1.5mm capillary tube in a thermally controlled sample holder directly connected to the evacuated beam path. These experiments were performed with lysozyme samples at 10 mg/mL and pH 7.0 in a 10mM phosphate buffer with 50mM of NaCl. Indirect Fourier transformations were performed using program package GNOM which enabled the correction of smearing

10<sup>1</sup>

I(q) [arb. u.]

optimized structure.

**3. Applications** 

heat will be presented here.

SAXS technique to investigate biological systems.

**3.1 Lysozyme denaturation and aggregation induced by heat** 

effects. *Ab initio* models were built using program DAMMIN.

10<sup>2</sup>

0.0 0.2 0.4 0.6 0.8 1.0 <sup>10</sup>-1

q [Å-1]

 Generated Data Final model

these situations the SAXS data can be used to generate (using the dummy chain approach) the missing aminoacid loops in the known structure (program BUNCH, Petoukhov and Svergun, 2005) or/and to obtain the spatial arrangement of known domains in order to form the full structure (program SASREF, Petoukhov and Svergun, 2005). Both the generation of the missing loops and the optimization of domains are performed by the use of Monte Carlo methods which, similarly to the previous cases, do not lead to a unique solution. However, even though the solution is not unique, the obtained model is a very good representation of the overall structure. Test examples are shown in Fig14 and Fig15. In Fig14 part of the lysozyme structure was clipped and as it can be seen in the curve, without the loop the atomic model cannot fit the experimental data correctly. With the addition of a dummy chain loop and its optimization it is possible to obtain a very good fit of the experimental data. The generated loop (blue loop in the model) is a reasonable approximation of the real loop superposed to it.

Fig. 14. *Ab initio* modeling of missing loop of a hypothetical structure using experimental SAXS data. Left: crystallographic structure of lysozyme (pdb entry *6lyz.pdb*) and the restored loop. Semitransparent structure – lysozyme structure with a missing part. Blue blackbone – restored loop superposed to the real, clipped loop. Right: Fit of the experimental data. Open circles – experimental SAXS data for lysozyme in solution. Dotted line – fitting of the scattering data for the structure without the loop. Solid line – fitting of the scattering data for the structure with the optimized loop.

In Fig15 a hypothetical situation of a heterodimer is shown. The optimization of the structure components does not give a perfect agreement with the initial structural but there is a remarkable similarity, indicating that SAXS data can also be applied in these cases.

The situations presented here are just a small representation of possibilities for the applications of these modeling tools. Advanced modeling examples based on these procedures can be found in several articles in the literature (Svergun, 2007). An intrinsic problem of any SAXS modeling is the ambiguity that might arises in the results. In general, it is not possible to obtain a unique solution from the modeling procedure. Therefore it is necessary to complement any scattering modeling with additional information in order to reduce the number of possible solutions. There are several ways on doing this. When available, information about binding sites or specific arrangement of domains can be used as constraints in the modeling. Results from biochemical/biophysical techniques can provide useful information about structure change or binding. For example, fluorescence spectroscopy and isothermal titration calorimetry can provide important information on binding and stoichiometry. In recent applications the simultaneous modeling of scattering data and other experimental data is been tried. A simultaneous modeling of SAXS and NRM data was proposed by Mareuil and co-workers in the program DADIMODO (Mareiul et al, 2007). Also, automatic tools for the use of complementarity between SAXS and NMR is been currently developed in connection with the SAXIER project (Svergun, 2007; Svergun, 2009).

Fig. 15. Rigid body modeling of a hypothetical structure using calculated SAXS data. Left: hypothetical heterodimer built using two atomic resolution structures. Semitransparent spheres: original structure. Blue and green strands: optimized heterodimer. Right: Fit of the generated data. Open circles – generated SAXS data for the heterodimer. The data was created using program CRYSOL from the built model. Standard deviations were added in order to mimic experimental uncertainties. Solid line – fitting of the scattering data for the optimized structure.
