6.1 Postprocessing

Line 5 updates the structure. I have split the update in two lines to fit the format

CiR<sup>θ</sup>i,tiIi=σ<sup>2</sup> dn<sup>i</sup> dθ<sup>i</sup> dti

Intuitively speaking, the EM algorithm can be viewed as a "smoother" version of

As mentioned above, RELION uses the EM algorithm in the Fourier domain. RELION also estimates the noise power spectrum within the iteration and uses a

Calculating the conditional probabilities and the structure update in the EM algorithm is just as computationally expensive as the alignment step in the bestalignment algorithms. Several methods have been suggested to speed up the EM

1.Stochastic gradient descent: the idea is to limit computational complexity by choosing a small, random subset of the terms to be summed over in line 4 and line 5 of Algorithm 2, at every iteration. Stochastic gradient descent is used in

2.Adaptive integration: the conditional probability distributions of the latent variables (line 4) often have very sharp peaks around which most of the probability mass of the distribution is concentrated. It is possible to adaptively choose an integration grid that samples the conditional probability distribution finely near the peaks and coarsely away from the peaks, thereby saving computation. This strategy was suggested in [17] and is used in RELION.

3.Adaptive basis selection: this strategy is based on the idea that the projections of a structure along nearby projection directions are very similar. Of course, this implies that the images that align to these directions are also similar. It turns out that if the structure projections and the images can be represented on a small basis (small compared to the number of projection directions and number of images), then the EM calculations can be sped up. This strategy is proposed in [18], where the bases are adaptively adjusted within the EM

, P<sup>n</sup><sup>i</sup> , P<sup>∗</sup>

<sup>n</sup><sup>i</sup> and Ci. This

of this document. The update is effectively a weighted average of the back-

Skþ<sup>1</sup>

weighting is apparent in the integration on the right-hand side of line 5. The integrals in line 5 are approximated as Riemann sums over the spherical, angular, and translation grid. The assignments in line 5 take an especially simpler form in the Fourier domain. Hence the algorithm is typically implemented in the Fourier

the best-alignment algorithm. The "smoothing" corresponds to calculating the probability of matching Ii to all possible projections, rotations, and translations (line 4) and using these probabilities to reconstruct the "weighted-average" structure in line 5. This "smoothing" is in contrast to best-alignment, which only uses a

projected images, where the weight depends on <sup>Γ</sup>kþ<sup>1</sup> <sup>n</sup>i,θi,ti

Skþ<sup>1</sup> <sup>P</sup><sup>N</sup> i¼1 Ð <sup>Γ</sup><sup>k</sup>þ<sup>1</sup> <sup>n</sup>i,θi,ti P∗ ni

<sup>i</sup> =σ<sup>2</sup> dn<sup>i</sup> dθidti � ��<sup>1</sup>

Technology, Science and Culture - A Global Vision, Volume II

single alignment (the best-alignment) for reconstruction.

weak prior on the structure (Section 6 discusses why) [8, 9].

domain. See [8, 9] for details.

Skþ<sup>1</sup> <sup>Ð</sup>

7: end while 8: return Sk 9: end procedure

<sup>Γ</sup><sup>k</sup>þ<sup>1</sup> <sup>n</sup>i,θi,ti P∗ ni C2

6: k k þ 1

5.1 Speeding up the EM algorithm

cryoSPARC [10].

framework.

52

algorithm;

Best-alignment, as well as the EM algorithm, gives maximum likelihood estimates of the structure, but maximum likelihood reconstructions can be noisy. The noise is especially prevalent in high frequencies and makes it difficult to visualize the details of the structure. A number of methods are employed to "filter out" the noise. Below, I group these methods together as postprocessing methods, but keep in mind that many of them are incorporated into the reconstruction algorithm itself:


### 6.2 Multiple structures

There are two reasons for reconstructing multiple structures instead of a single structure from cryo-EM images. One reason is that many proteins are not rigid and exhibit several different structures, called conformations. Thus, even a chemically pure cryo-EM sample may have different structures, and reconstructing only a single structure from the sample is likely to give an "average" or even a meaningless structure. Another reason to consider multiple structures is possible problems with sample preparation. If the sample is a protein complex (e.g., several proteins held together by hydrogen bonds), then it is possible that, in a given sample, some of the complexes may have disassociated into their components. Multiple structures are used during reconstruction so that some of the reconstructed structures model the disassociated components and prevent them from corrupting the reconstruction of the main structure.

Reconstructing multiple structures fits well within the EM algorithm (see [9] for details), and is routinely used in RELION and other packages.

#### 6.3 Symmetry

Several particles exhibit symmetry. For example, the capsid of a virus may exhibit icosahedral symmetry. If the symmetry of a particle is known before-hand, say from X-ray crystallography studies, then it is possible to incorporate the symmetry into the reconstruction algorithm.
