5. Expectation-maximization algorithms

The theory of the expectation-maximization algorithm can be found in many textbooks, e.g., [16]. The EM algorithm works iteratively, where in each step the conditional mean of the latent variables is used to update the parameter estimate. For the cryo-EM reconstruction problem, variables N , T are taken to be the latent variables, and S is taken to be the parameter to be estimated. A prior can also be included for S, if necessary.

RELION [8, 9] uses the EM algorithm with a Gaussian prior on the amplitude of the Fourier coefficients of S. The resulting algorithm is rather complex, and instead of discussing all of the details of the algorithm, I will discuss a simplified version in Algorithm 2.

Line 4 in Algorithm 2 calculates the conditional probability of the alignment parameters ni, θi, ti given the image Ii and the current estimate of the structure S<sup>k</sup> . The L I<sup>ð</sup> <sup>i</sup>jS<sup>k</sup> , <sup>n</sup>i, <sup>θ</sup>i, tiÞterm in line 4 comes from Eq. (5). The term <sup>p</sup> <sup>n</sup>i, <sup>θ</sup>i, tijS<sup>k</sup> � � in line 4 is the prior probability of the alignment parameters given S<sup>k</sup> , and typically this can be set to a uniform probability density. The denominator on the right-hand side of the assignment in line 4 is the normalizing constant, which makes <sup>Γ</sup><sup>k</sup>þ<sup>1</sup> <sup>n</sup>i,θi,ti a probability density. Also, note that <sup>Γ</sup><sup>k</sup>þ<sup>1</sup> <sup>n</sup>i,θi,ti is a function of <sup>n</sup>i, <sup>θ</sup>i, ti and is calculated for all values of ni, θi, ti on the vertices of the spherical, angular, and translation grids.



Line 5 updates the structure. I have split the update in two lines to fit the format of this document. The update is effectively a weighted average of the backprojected images, where the weight depends on <sup>Γ</sup>kþ<sup>1</sup> <sup>n</sup>i,θi,ti , P<sup>n</sup><sup>i</sup> , P<sup>∗</sup> <sup>n</sup><sup>i</sup> and Ci. This weighting is apparent in the integration on the right-hand side of line 5. The integrals in line 5 are approximated as Riemann sums over the spherical, angular, and translation grid. The assignments in line 5 take an especially simpler form in the Fourier domain. Hence the algorithm is typically implemented in the Fourier domain. See [8, 9] for details.

6. Postprocessing, multiple structures, and symmetry

DOI: http://dx.doi.org/10.5772/intechopen.90099

A Gentle Introduction to Cryo-EM Single-Particle Reconstruction Algorithms

Best-alignment, as well as the EM algorithm, gives maximum likelihood estimates of the structure, but maximum likelihood reconstructions can be noisy. The noise is especially prevalent in high frequencies and makes it difficult to visualize the details of the structure. A number of methods are employed to "filter out" the noise. Below, I group these methods together as postprocessing methods, but keep in mind that many of them are incorporated into the reconstruction algorithm itself:

1.Filtering at the resolution: in this strategy, the spectral signal-to-noise ratio (signal-to-noise ratio in the Fourier domain) is calculated via Fourier shell coefficients (FSC), and the resolution of the structure is determined as the frequency at which the FSC falls below a threshold. The structure is then lowpass filtered at the resolution. Many of the structures reported in the literature

2.Wiener filtering: in the cryo-EM context, Wiener filtering may be viewed as a more sophisticated version of the FSC strategy. In Wiener filtering, the structure is low-pass filtered with the low-pass filter adapting to the spectral signal-to-noise ratio at each spatial frequency. This strategy is used in SPIDER,

3.Sparse representation: modern approaches to denoising involve using a sparse representation of a signal in an over-complete basis and then using a joint L1- L2L1 minimization to reconstruct the signal from noisy data. This strategy has also been applied to cryo-EM [19] with improvement in the resolution of the

There are two reasons for reconstructing multiple structures instead of a single structure from cryo-EM images. One reason is that many proteins are not rigid and exhibit several different structures, called conformations. Thus, even a chemically pure cryo-EM sample may have different structures, and reconstructing only a single structure from the sample is likely to give an "average" or even a meaningless structure. Another reason to consider multiple structures is possible problems with sample preparation. If the sample is a protein complex (e.g., several proteins held together by hydrogen bonds), then it is possible that, in a given sample, some of the complexes may have disassociated into their components. Multiple structures are used during reconstruction so that some of the reconstructed structures model the disassociated components and prevent them from corrupting the reconstruction of

Reconstructing multiple structures fits well within the EM algorithm (see [9] for

Several particles exhibit symmetry. For example, the capsid of a virus may exhibit icosahedral symmetry. If the symmetry of a particle is known before-hand, say from X-ray crystallography studies, then it is possible to incorporate the sym-

details), and is routinely used in RELION and other packages.

metry into the reconstruction algorithm.

6.1 Postprocessing

adopt this strategy.

structure.

the main structure.

6.3 Symmetry

53

6.2 Multiple structures

FREALIGN, and RELION.

Intuitively speaking, the EM algorithm can be viewed as a "smoother" version of the best-alignment algorithm. The "smoothing" corresponds to calculating the probability of matching Ii to all possible projections, rotations, and translations (line 4) and using these probabilities to reconstruct the "weighted-average" structure in line 5. This "smoothing" is in contrast to best-alignment, which only uses a single alignment (the best-alignment) for reconstruction.

As mentioned above, RELION uses the EM algorithm in the Fourier domain. RELION also estimates the noise power spectrum within the iteration and uses a weak prior on the structure (Section 6 discusses why) [8, 9].

## 5.1 Speeding up the EM algorithm

Calculating the conditional probabilities and the structure update in the EM algorithm is just as computationally expensive as the alignment step in the bestalignment algorithms. Several methods have been suggested to speed up the EM algorithm;

