Crystallization: Its Mechanisms and Pharmaceutical Applications

*Hendrik J.R. Lemmer and Wilna Liebenberg*

#### **Abstract**

The crystallization of small-molecule drugs plays an important role in the pharmaceutical industry. Since many downstream industrial processes are heavily influenced by the crystalline properties of a drug, that is, crystal shape, size distribution, and polymorphic form, control over the crystallization process can facilitate manufacturing and testing. However, before the crystallization process can be controlled, an understanding of its underlying mechanisms is required. In this chapter, we will look at the thermodynamic driving force behind crystallization and how crystal nucleation and growth rates can be used to control the properties of the resulting crystals. Throughout the chapter, we give examples of how these control approaches can be applied in pharmaceutical research and industry to obtain crystals with desired properties. We then finish this chapter with a look at crystallization from the amorphous state, which differs from crystallization from solution and is a relevant topic in pharmaceutical sciences, since the preparation of an amorphous solid is a popular approach to enhancing the solubility of a drug.

**Keywords:** crystallization, pharmaceutical, mechanisms, models, nucleation, growth, amorphous

#### **1. Introduction**

Crystallization plays an important role in the manufacture and purification of small-molecule active pharmaceutical ingredients (APIs). It is estimated that between 70 and 80% of all small-molecule, APIs have at least one crystallization step in their manufacturing processes [1, 2]. To facilitate other downstream processes, such as filtration, drying, dissolution testing, and formulation, it is often desirable to be able to consistently produce crystals with specific properties, such as crystal size distribution, crystal shape (habit), and polymorphic form. Such control over the crystallization process requires accurate descriptions of crystal nucleation and growth kinetics, as well as solubility, breakage, and agglomeration data [3]. Crystallization control has gained even more interest since the release of the United States Food and Drug Administration's (FDA) Process Analytical Technology (PAT) framework, which aims to improve efficiency in pharmaceutical development, manufacturing, and quality assurance through innovative process development, analysis, and control [4].

Designing and implementing controlled crystallization processes can be time-consuming, and usually involve trial-and-error, model-based, or model-free (direct-design) approaches [5]. Accurate kinetic descriptions of crystal nucleation and growth form the backbone of the model-based approach which, although more complex, has the potential to alleviate the labor-intensiveness and facilitate the optimization of the traditional trial-and-error approach [6]. The models used in the model-based approach can even be used together, to more accurately describe the underlying crystallization mechanism(s). For example, in a study by Quilló and coworkers [6], three different crystal growth models were fit to experimental data, to find the most likely crystallization mechanism. They found that the underlying crystallization mechanism could best be described by an additive combination of simultaneous Birth-and-Spread (B+S) and Burton-Cabrera-Frank (BCF) models, which will be discussed later in this chapter. An example of the model-free approach is direct nucleation control (DNC), which attempts to directly control the number of nuclei present in a system. Using DNC, Abu Bakar and coworkers [5] were able to produce glycine crystals from water–ethanol mixtures that were larger than those obtained from uncontrolled crystallization.

To implement the above-mentioned approaches, the investigator needs to be able to monitor the crystallization process. The instrumental technique used will depend on what aspect of the process one needs to monitor. For the model-based approach, one would typically be interested in monitoring the supersaturation decay during crystallization. This can be done using techniques, such as attenuated total reflectance-Fourier transform infrared (ATR-FTIR) or near infrared (NIR) [7]. If the goal is to monitor the evolution of crystal size distribution over time, for example in the DNC study mentioned above, a technique like focused beam reflectance measurement (FBRM) can be used [5, 7]. In most investigations, the above-mentioned instrumental techniques are used concurrently, to gain as much information about the crystallization process as possible.

The crystallization processes and monitoring techniques mentioned above are concerned with crystallization from solution, where a reduction in temperature and/ or the addition of an antisolvent is used to create supersaturated conditions. Under supersaturated conditions, the solute molecules can come together to form small masses, or nuclei, through a process called nucleation. The addition of more solute molecules to these nuclei is called crystal growth. There is also another type of crystallization that is of particular importance in pharmaceutical sciences, and that is crystallization from the amorphous state.

In this chapter, we will discuss the theoretical background of crystal nucleation and growth, look at examples of how these theoretical models can be implemented practically and give examples of applications in pharmaceutical research. We will also look at crystallization from the amorphous state and discuss techniques that can be used to delay it.

### **2. The driving force behind crystallization**

The thermodynamic driving force behind crystal nucleation and growth rates is a chemical potential difference (Eq. (1)):

$$
\mu\_i - \mu\_i^{\rm sat} = RT \ln \left(\frac{a\_i}{a\_i^{\rm sat}}\right) \tag{1}
$$

#### *Crystallization: Its Mechanisms and Pharmaceutical Applications DOI: http://dx.doi.org/10.5772/intechopen.105056*

where *μ<sup>i</sup>* and *ai* are the chemical potential and thermodynamic activity of solute *i*, respectively, and the superscript "*sat*" represents the property at supersaturated conditions, *R* is the universal gas constant and *T* is the absolute temperature [6, 8]. Because the activity of a solute is hard to measure experimentally, it is usually substituted with the term *γixi*, where *γ<sup>i</sup>* is the activity coefficient and *xi* the mole fraction of solute *i*. Since mole fractions can be measured experimentally, Eq. (1) is often rearranged and expressed in terms of a supersaturation (*σ*) driving force [8], presented here in Eq. (2):

$$\ln\left(\sigma\right) = \ln\left(\frac{\chi\_i \mathfrak{x}\_i}{\chi\_i^{sat} \mathfrak{x}\_i^{sat}}\right) = \ln\left(\frac{a\_i}{a\_i^{sat}}\right) = \frac{\mu\_i - \mu\_i^{sat}}{RT} \tag{2}$$

The value of *xsat <sup>i</sup>* can sometimes be determined experimentally from equilibrium solubility studies. However, if the solubility extrema cannot be determined experimentally, it can be estimated from a variety of drug solubility models [9], for example, the Apelblat model [10], or the van't Hoff–Jouyban–Acree (VH–JA) model [6].

Because the activity coefficients still need to be estimated, many studies make the simplifying assumption of setting the activity coefficients in Eq. (2) equal to 1, thereby expressing the supersaturation driving force only as a mole fraction *xi=xsat i* � �, or as the concentration ratios *Ci=Csat i* � �, or ð Þ *<sup>S</sup>* � <sup>1</sup> where *<sup>S</sup>* <sup>¼</sup> *Ci=Csat i* � � [11–13]. Although these simplifications of the supersaturation driving force have been used successfully, they do not necessarily generalize to all solutes and solvents, and one might need to find a way to estimate the activity coefficient of the solute. Fortunately, there is a way to estimate the denominator *γsat <sup>i</sup> xsat i* � � in Eq. (2) in one go.

When dealing with phase equilibria, the traditional thermodynamic reference is a supercooled melt of the pure solute compound. Using this reference, the activity of a solute can be obtained from the enthalpy of fusion at the temperature of interest. In its most general form, this is done through Eq. (3) [14]:

$$\ln \left( a\_i^{\text{sat}} \right) = \ln \left( \gamma\_i^{\text{sat}} \mathbf{x}\_i^{\text{sat}} \right) = \frac{\Delta H\_f \left( T\_m \right)}{R} \left( \frac{1}{T\_m} - \frac{1}{T} \right) - \frac{1}{RT} \int\_{T\_m}^{T} \Delta C\_p dT + \frac{1}{R} \int\_{T\_m}^{T} \frac{\Delta C\_p}{T} dT \tag{3}$$

where Δ*H <sup>f</sup>*ð Þ *Tm* is the enthalpy of fusion at the melting temperature, *Tm*, *T* is the temperature of interest and Δ*Cp* is the difference in heat capacity between the supercooled melt, *Cp*ð Þ*l* and the solute's solid-state *Cp*ð Þ*s* , presented here in Eq. (4):

$$
\Delta \mathcal{C}\_p = \mathcal{C}\_p(l) - \mathcal{C}\_p(\mathfrak{s}) \tag{4}
$$

A practical problem with Eq. (3) is that Δ*Cp* has to be integrated down from the melting temperature to the temperature of interest, but far below the melting temperature the thermodynamic properties of the supercooled melt are not experimentally accessible [14]. Looking at the right-hand side of Eq. (3), Δ*Cp* is the only physical property that is difficult to obtain experimentally. This has led to several simplifying assumptions regarding Eq. (3), all concerned with how to handle Δ*Cp*.

A common assumption is to completely ignore Δ*Cp* [14], resulting in Eq. (5):

$$\ln\left(\chi\_i^{\text{sat}}\varkappa\_i^{\text{sat}}\right) = \frac{\Delta H\_f(T\_m)}{R}\left(\frac{\mathbf{1}}{T\_m} - \frac{\mathbf{1}}{T}\right) \tag{5}$$

However, when working with temperature ranges normally used in pharmaceutical processes, some approximations obtained from Eq. (5) have shown to be inaccurate [15, 16].

Another common assumption is that Δ*Cp* is constant and can be approximated by the entropy of fusion at the melting temperature Δ*S <sup>f</sup>*ð Þ *Tm* , yielding Eq. (6) [14, 17]:

$$\ln\left(\chi\_i^{sat} x\_i^{sat}\right) = \frac{\Delta H\_f(T\_m)}{RT\_m} \ln\left(\frac{T}{T\_m}\right) \tag{6}$$

If we can experimentally determine the isobaric heat capacity of the solid, *Cp*ð Þ*s* , at different temperatures below the melting temperature, as well as the isobaric heat capacity of the melt, *Cp*ð Þ*l* , the assumption can be made that Δ*Cp* is constant and equal to its value at the melting temperature, Δ*Cp*ð Þ *Tm* , as illustrated in **Figure 1** [14]. Provided that the compound in question does not decompose upon melting, this assumption gives Eq. (7):

$$\ln\left(\chi\_i^{\text{sat}}\mathcal{X}\_i^{\text{sat}}\right) = \frac{\Delta H\_f(T\_m)}{R}\left(\frac{\mathbf{1}}{T\_m} - \frac{\mathbf{1}}{T}\right) - \frac{\Delta C\_p(T\_m)}{R}\left(\ln\left(\frac{T\_m}{T}\right) - \frac{T\_m}{T} + \mathbf{1}\right) \tag{7}$$

If we have enough isobaric heat capacity data above the melting temperature, we can extrapolate down from *Cp*ð Þ*l* to the temperature of interest, see **Figure 1**, and use Eq. (3) in its general form [14]. In this case, we can use the linear dependency of Δ*Cp* on temperature to rewrite Eq. (4) as:

$$
\Delta \mathcal{C}\_p = \mathcal{C}\_p(l) - \mathcal{C}\_p(\mathfrak{s}) = q + r(T\_m - T) \tag{8}
$$

where *q* and *r* are regression parameters, obtained from extrapolating down from *Cp*ð Þ*l* to temperatures of interest, calculating the difference in heat capacity between the extrapolated data and *Cp*ð Þ*s* , and plotting these differences in heat capacity, Δ*Cp*ð Þ *T* , against ð Þ *Tm* � *T* . Notice from the above that the value of *q* corresponds to Δ*Cp*ð Þ *Tm* in Eq. (7). Once we have estimates of *q* and *r*, we can plug them into Eq. (3) and solve the integrals to give the most comprehensive estimate of *γsat <sup>i</sup> xsat i* in the form of Eq. (9):

#### **Figure 1.**

*Illustration of the experimental heat capacities (solid lines) and extrapolated heat capacities (dashed lines) used to approximate the heat capacity term in Eq. (3). Reproduced from ref. [14] with permission from Elsevier.*

*Crystallization: Its Mechanisms and Pharmaceutical Applications DOI: http://dx.doi.org/10.5772/intechopen.105056*

$$\ln\left(\gamma\_i^{\text{sat}}\mathbf{x}\_i^{\text{sat}}\right) = \frac{\Delta H\_f(T\_m)}{R} \left(\frac{\mathbf{1}}{T\_m} - \frac{\mathbf{1}}{T}\right) - \frac{q}{R} \left(\ln\left(\frac{T\_m}{T}\right) - \frac{T\_m}{T} + \mathbf{1}\right) \tag{9}$$

$$-\frac{r}{R}\left(T\_m \ln\left(\frac{T\_m}{T}\right) - \frac{T\_m^2}{2T} + \frac{T}{2}\right)$$

Once the values of *γsat <sup>i</sup> xsat i* at different temperatures have been properly estimated, the values of *γ<sup>i</sup>* can be calculated from experimental solubility data.

#### **3. Nucleation**

In the introduction, we mentioned that crystallization consists of two processes, namely nucleation and growth. Depending on the rates of these two processes, the molecules that make up the crystal may pack differently, giving rise to different crystal structures, or polymorphic forms, of the same compound.

Nucleation is the first step in the crystallization process, and it consists of two mechanisms, namely homogeneous and heterogeneous nucleation. Homogeneous nucleation is triggered by spontaneous fluctuations in the density of the liquid, while heterogeneous nucleation is triggered by contact of the liquid with a foreign solid surface, like an impurity or metastable polymorph [7]. Since most industrial crystallization processes involve heterogeneous nucleation, we will focus mainly on this mechanism, which can be expressed mathematically as Eq. (10):

$$J = N\_0 v \exp\left(\frac{-\Delta G^\* \Phi}{kT}\right) \tag{10}$$

where *J* is the number of nuclei formed per unit time per unit volume, *N*<sup>0</sup> is the number of solute molecules per unit volume, *v* is the frequency of molecular transport at the nucleus-liquid interface, Φ is the heterogeneous nucleation factor, a function of the contact angle between the nuclei and a foreign surface in the solution with values ranging from 0 to 1, and Δ*G*<sup>∗</sup> is the free energy barrier to nucleation of a sphere [7, 18], defined by:

$$
\Delta G^\* = \frac{16\pi\nu^2\gamma\_{sl}^3}{3(kT)^2(\ln\sigma)^2} \tag{11}
$$

where *υ* is the molecular volume of the solute, *γsl* is the solid–liquid interfacial tension per unit area, *k* is the Boltzmann constant and *T* and *σ* have the same meanings as before [18]. The equation for the rate of homogenous nucleation is similar to Eq. (10) but does not contain the heterogeneous nucleation factor, Φ, and has a different pre-exponential factor [18].

Looking at Eq. (10), we see some interesting mechanistic features of the nucleation step. First off, we can expect a higher number of nuclei to form from solvents in which the solute is more soluble because these will give higher values of *N*0, which is a concentration term. Concretely, since the pre-exponential term, *N*0*v* is an estimate of the probability of intermolecular collisions, and the term *v* is mainly determined by the degree of agitation which can be controlled to be constant between different experiments [18], higher solubility will lead to higher values of *N*0, which will

increase the likelihood of nuclei forming. The exponential term itself is a negative exponential, it will decay and asymptotically approach zero for large values in the exponential. In other words, we can expect fewer nuclei to form per unit volume of solute if the free energy barrier to nucleation, Δ*G*<sup>∗</sup> , is large. The free energy barrier to nucleation is itself dependent on the interfacial tension, *γsl*, expressed as:

$$\gamma\_d = 0.414kT(C\_sN\_A)^{2/3}\left(\ln\left(\frac{C\_s}{C\_{eq}}\right)\right) \tag{12}$$

where *Cs* is the ratio of the density of the solute to its molar mass, *NA* is Avogadro's number, *Ceq* is the equilibrium solubility of the solute and *k* and *T* have the same meanings as above [19]. From Eq. (12), we see that *γsl* is inversely proportional to the logarithm of the equilibrium solubility, suggesting that higher values of *Ceq* will result in lower values of *γsl*. Putting it all together, we see that under conditions of higher solubility, *Ceq*, the interfacial tension, *γsl*, will be lower, lowering the energy barrier to nucleation, Δ*G*<sup>∗</sup> , and thereby increasing the number of nuclei formed, *J*.

In the introduction, we mentioned that there are model-free approaches to controlling crystallization and that DNC is an example of such an approach. In short, the DNC approach attempts to control the number of nuclei present by adding solvent or antisolvent, increasing or decreasing the temperature, or a combination of both, to manipulate the solute's solubility. Abu Bakar and coworkers employed DNC and were able to produce progressively larger glycine crystals if they lowered the number of newly formed particles [5].

Another example of where the nucleation step of crystallization was used as an intervention point to control the crystal size distribution, is in the work of Fujiwara and coworkers [20]. Paracetamol (acetaminophen), like many other small-molecule APIs, tends to form agglomerates during crystallization, especially if the crystals are small (< 100 *μ*m). This can lead to formulation-related problems further downstream. Fujiwara and coworkers used a solubility curve and determined the metastable limit, that is, the degree of supersaturation that corresponds to

#### **Figure 2.**

*Illustration of how the degree of supersaturation (Run 2) was controlled to fall between the metastable limit (Meta) and the solubility curve (Csat) during cooling. Reproduced from ref. [20] with permission from the American Chemical Society.*

spontaneous nucleation. They found that if the degree of supersaturation in a seeded batch crystallization process can be temperature controlled during cooling to stay above saturation but below the metastable limit, as illustrated in **Figure 2**, larger paracetamol crystals with negligible nucleation and agglomeration can be obtained [20].
