**3. Computational methods for solubility prediction**

Computational methods in recent decades have become an important part of drug design and discovery. They are classified as theoretical, semi empirical and empirical equations. Most of models used in pharmaceutical sciences are semi-empirical (which is theoretical correlation of experimentally determined values) or empirical equations (which is mathematical correlation of experimentally determined values). Examples for semiempirical models are those correlations which use physicochemical parameters in their relationships. In other word, it is needed for them to be calculated based on experimental determinations at least for one time. For example in Noyes-Whitney equation, the diffusion coefficient must be determined at least for one time for a solute. So the Noyes-Whitney equation is a semi-empirical model. The quantitative structure property relationships (QSPR) and quantitative structure activity relationships (QSAR) are examples for empirical modelling. The pioneer for this type of equations in pharmaceutical sciences is Prof. Crowin H. Hansch. He has developed a QSPR model for solubility prediction of liquids, based on their partition coefficient (Hansch et al., 1968):

$$-\log S = 1.339\log P - 0.978\tag{3}$$

Experimental and Computational Methods Pertaining to Drug Solubility 199

7. group contribution method or fragment based approach (different fragments derived

8. solvatochromic parameters (Dearden, 2006; Katritzky et al., 2010; Jouyban et al., 2010b). Other descriptors have been used as well, and a number of mixtures of the mentioned parameters are used, too. In the next section, the easiest and the most accurate models for solubility prediction are discussed. Also approaches like mobile order theory and differential equations of activity coefficient for the calculation of solubility have been used

For modelling, multiple linear regression (MLR), partial least square (PLS), support vector machine (SVM), artificial neural network (ANN), random forest (RF), Monte Carlo simulation (MCS), and other methods are used. Mostly, correlation coefficients of the nonlinear methods are better than linear methods and the related errors are smaller (Dearden, 2006; Katritzky et al., 2010). This might suggest a nature of non-linear behaviour for

Available models and software to predict the aqueous solubility of drugs were reviewed in a recent work (Jouyban et al., 2008). Solubility of drugs in water could be predicted using different models presented in the literature. The general single equation of Yalkowsky is the simplest and the most common method in the pharmaceutical area. The model requires experimental melting point (*mp*) and logarithm of partition coefficient (log*P*) as input data

where *Sw* is the molar aqueous solubility of a drug at 25 ºC. If the solute has a melting point less than 25 ºC, the (mp-25) term is set to zero (Ran et al., 2001). The two parameters, log*P* and *mp* are good representatives of effects of hydrophobicity and crystal packing on the solubility of a certain solute. Jain et al. (2008) provided some theoretical background for general single equation from thermodynamic principles. The simplicity of the model is its main advantage and a possible disadvantage is the melting point as an experimental parameter which may not be available for some of the compounds in early stages of drug discovery. An attempt has been made to predict the melting points from chemical structure was not successful (Jain and Yalkowsky, 2010) and it is recommended to use experimental values of melting point in the computations using general single equation (Chu and Yalkowsky, 2009). Also drugs with high melting points which decompose before melting are not suitable to be predicted by this model. The log*P* is measured using experimental methods such as HPLC, and/or calculated by some computational methods, then applied to

The linear solvation energy relationship is another model developed by Abraham and his

 log*S ESABA <sup>W</sup>* 0.395 0.955 0.320 1.155 3.255 0.785 3.330 *B V* (5) in which E is excess molar refraction of the compound, S is dipolarity/polarizability, A and B are hydrogen bond acidity and basicity, respectively, which these later three parameter (S,

log*S m <sup>W</sup>* 0.5 0.01( 25) lo *p P*g (4)

rom structure, SMILIES/InChI codes),

solubility.

**3.1 Aqueous solubility** 

and is expressed as:

solubility prediction.

co-workers (Stovall et al., 2005a) and is presented as:

as semi-empirical methods (Dearden, 2006; Katritzky et al., 2010).

where log*P* is the logarithm of the partition coefficient between octanol and water for a specific liquid.

In another grouping, the correlation could be developed using linear modelling or nonlinear modelling. Linear modelling is the simple linear regression (or multiple linear regression) and non-linear modelling is artificial neural network, as examples. There are advantages and disadvantages for each type of modelling which is listed in the Table 2:


Table 2. Advantages and disadvantages of linear and non-linear modelling in QSPR studies

In QSPR modelling, the variables used for correlation of physicochemical properties are called descriptors. These descriptors include simple structure derived parameters (e.g. number of carbon atoms, number of single bonds), overall structural parameters (e.g. molecular weight, and molecular volume), structure residues parameters (e.g. distance between two atoms, total charge on oxygen atoms), or physicochemical properties (e.g. melting point, partition coefficient). In solubility correlation almost all kinds of descriptors have been used. Around half of the models use log*P* as one of descriptors in modelling (Dearden, 2006). The following categories of descriptors have been used in solubility correlation:


Other descriptors have been used as well, and a number of mixtures of the mentioned parameters are used, too. In the next section, the easiest and the most accurate models for solubility prediction are discussed. Also approaches like mobile order theory and differential equations of activity coefficient for the calculation of solubility have been used as semi-empirical methods (Dearden, 2006; Katritzky et al., 2010).

For modelling, multiple linear regression (MLR), partial least square (PLS), support vector machine (SVM), artificial neural network (ANN), random forest (RF), Monte Carlo simulation (MCS), and other methods are used. Mostly, correlation coefficients of the nonlinear methods are better than linear methods and the related errors are smaller (Dearden, 2006; Katritzky et al., 2010). This might suggest a nature of non-linear behaviour for solubility.

#### **3.1 Aqueous solubility**

198 Toxicity and Drug Testing

where log*P* is the logarithm of the partition coefficient between octanol and water for a

In another grouping, the correlation could be developed using linear modelling or nonlinear modelling. Linear modelling is the simple linear regression (or multiple linear regression) and non-linear modelling is artificial neural network, as examples. There are advantages and disadvantages for each type of modelling which is listed in the Table 2:

Table 2. Advantages and disadvantages of linear and non-linear modelling in QSPR studies In QSPR modelling, the variables used for correlation of physicochemical properties are called descriptors. These descriptors include simple structure derived parameters (e.g. number of carbon atoms, number of single bonds), overall structural parameters (e.g. molecular weight, and molecular volume), structure residues parameters (e.g. distance between two atoms, total charge on oxygen atoms), or physicochemical properties (e.g. melting point, partition coefficient). In solubility correlation almost all kinds of descriptors have been used. Around half of the models use log*P* as one of descriptors in modelling (Dearden, 2006). The following categories of descriptors have been used in solubility

2. structure related descriptors (such as molecular volume, solvent accessible surface area, number of rotatable/rigid bonds, number of hydrogen bond donor/acceptor atoms, …), 3. quantum chemical descriptors (such as optimized total energy, HOMO and LUMO




complex behaviour



model

low accuracy

type Advantages Disadvantages




1. PCPs (such as melting point, molecular weight, molar refraction, …),



theoretically

behaviour

cases


specific liquid.

Modelling

Linear

Non-linear

correlation:

energies, …), 4. topological parameters,

5. molecular connectivity indices, 6. electrostatic state (E-state) descriptors, Available models and software to predict the aqueous solubility of drugs were reviewed in a recent work (Jouyban et al., 2008). Solubility of drugs in water could be predicted using different models presented in the literature. The general single equation of Yalkowsky is the simplest and the most common method in the pharmaceutical area. The model requires experimental melting point (*mp*) and logarithm of partition coefficient (log*P*) as input data and is expressed as:

$$
\log S\_W = 0.5 - 0.01(mp - 25) - \log P \tag{4}
$$

where *Sw* is the molar aqueous solubility of a drug at 25 ºC. If the solute has a melting point less than 25 ºC, the (mp-25) term is set to zero (Ran et al., 2001). The two parameters, log*P* and *mp* are good representatives of effects of hydrophobicity and crystal packing on the solubility of a certain solute. Jain et al. (2008) provided some theoretical background for general single equation from thermodynamic principles. The simplicity of the model is its main advantage and a possible disadvantage is the melting point as an experimental parameter which may not be available for some of the compounds in early stages of drug discovery. An attempt has been made to predict the melting points from chemical structure was not successful (Jain and Yalkowsky, 2010) and it is recommended to use experimental values of melting point in the computations using general single equation (Chu and Yalkowsky, 2009). Also drugs with high melting points which decompose before melting are not suitable to be predicted by this model. The log*P* is measured using experimental methods such as HPLC, and/or calculated by some computational methods, then applied to solubility prediction.

The linear solvation energy relationship is another model developed by Abraham and his co-workers (Stovall et al., 2005a) and is presented as:

$$
\log S\_W = 0.395 - 0.955E + 0.320S + 1.155A + 3.255B - 0.785A \cdot B - 3.330V \tag{5}
$$

in which E is excess molar refraction of the compound, S is dipolarity/polarizability, A and B are hydrogen bond acidity and basicity, respectively, which these later three parameter (S,

Experimental and Computational Methods Pertaining to Drug Solubility 201

Diethylstilbestrol -4.57 -5.805 -4.94 -4.82 Digoxin -4.16 -3.12 -10.31 -4.94 Diltiazem -2.95 -4.39 -4.64 -4.41 Ephedrine -0.47 -0.52 -1.28 -1.65 Estradiol -4.84 -4.95 -4.53 -4.43 Famotidine -2.48 -0.09 -1.18 -2.53 Fluorouracil -1.03 -1.24 0.55 -0.38 Gemfibrozil -3.16 -4.26 -4.44 -3.54 Griseofulvin -4.61 -3.45 -3.47 -3.28 Guaifenesin -0.60 -0.36 -1.10 -1.45 Haloperidol -4.43 -4.08 -5.44 -4.22 Halothane -1.71 -1.68 -1.99 -1.56 Hydrochlorothiazide -2.63 -1.61 -1.04 -2.18 Hydroquinone -0.18 -1.66 -0.23 -1.56 Isoniazid 0.01 -0.154 0.97 -0.85 Ketoprofen -3.25 -2.73 -3.95 -3.27 Labetalol -3.45 -3.44 -4.32 -3.75 Lamotrigine -3.14 -4.05 -3.48 -4.26 Levodopa -1.72 -0.02 -0.35 -0.29 Lindane -4.60 -4.08 -4.53 -3.76 Lovastatin -6.01 -5.40 -6.42 -4.18 Manitol 0.06 1.08 0.89 -0.18 Maprotiline -4.69 -5.28 -5.91 -5.03 Meprobamate -1.82 -1.36 -1.62 -1.44 Mercaptopurine -3.09 -1.84 -0.70 -1.66 Metoclopramide -3.18 -2.99 -2.85 -3.04 Metronidazole -1.22 -0.585 -1.14 -1.09 Minoxidil -1.98 -2.97 -2.19 -2.46 Mitomycin C -2.56 -2.53 -0.12 -2.46 Mycophenolic acid -4.39 -3.30 -4.99 -3.19 Nifedipine -4.78 -2.10 -3.71 -2.42 Nitrofurantoin -3.24 -2.19 -0.98 -2.03 Nitroglycerin -2.26 -1.19 -2.22 -1.66 Omeprazole -3.62 -3.21 -3.00 -4.43 Oxytetracycline -3.09 0.07 -4.04 -3.30 p-Aminobenzoic acid -1.37 -1.93 -0.65 -1.65 Papaverine -3.87 -4.43 -4.66 -4.67 Phenobarbital -2.29 -2.39 -1.90 -2.59 Phenytoin -3.99 -4.07 -3.20 -3.35 Progesterone -4.40 -4.35 -5.64 -4.08 Propofol -3.05 -3.38 -3.82 -3.28 Propoxyphene -5.01 -4.38 -6.45 -4.14 Prostaglandin–E2 -2.47 -2.73 -5.22 -3.16

A and B) determined from solubility data of a compound in water and different organic solvents, the *A B* term is a representative of hydrogen-bond interactions between acidic and basic functional groups of the drug in its pure solid or liquid, V is one percent of the McGowan volume and simply is calculated using group contribution method (Stovall et al., 2005).

In a recent work from our group, a simple equation was proposed to predict the aqueous solubility of drugs trained by the solubility data of pharmaceuticals (220 drugs) and was validated using various validation methods (Shayanfar et al, 2010). The proposed model is:

$$
\log S\_W = -1.120E-0.599C \log P \tag{6}
$$

Both parameters (E and Clog*P* or computed log*P*) employed in equation 6 are computed using Pharma-Algorithms (Pharma Algorithms, 2008), therefore, the model is an *in silico* model and no experimental data is required in the prediction procedure. In the pharmaceutical literature, an external prediction set consisting of aqueous solubility of 21 pharmaceutical and non-pharmaceutical compounds (Ran et al., 2001) usually were used to test the prediction capability of the proposed models. This data could not well represent the aqueous solubility data of pharmaceutical compounds, and another data set has been proposed consisting of the solubility of 75 official drugs collected from the literature. A list of the proposed test set and the experimental and predicted aqueous solubilities using equations 1-3 are listed in Table 3.


A and B) determined from solubility data of a compound in water and different organic solvents, the *A B* term is a representative of hydrogen-bond interactions between acidic and basic functional groups of the drug in its pure solid or liquid, V is one percent of the McGowan volume and simply is calculated using group contribution method (Stovall et al.,

In a recent work from our group, a simple equation was proposed to predict the aqueous solubility of drugs trained by the solubility data of pharmaceuticals (220 drugs) and was validated using various validation methods (Shayanfar et al, 2010). The proposed

Both parameters (E and Clog*P* or computed log*P*) employed in equation 6 are computed using Pharma-Algorithms (Pharma Algorithms, 2008), therefore, the model is an *in silico* model and no experimental data is required in the prediction procedure. In the pharmaceutical literature, an external prediction set consisting of aqueous solubility of 21 pharmaceutical and non-pharmaceutical compounds (Ran et al., 2001) usually were used to test the prediction capability of the proposed models. This data could not well represent the aqueous solubility data of pharmaceutical compounds, and another data set has been proposed consisting of the solubility of 75 official drugs collected from the literature. A list of the proposed test set and the experimental and predicted aqueous solubilities using

Drug Experimental Equation 4 Equation 5 Equation 6 Acetaminophen -1.06 -1.18 -0.63 -1.39 Acetazolamide -2.49 -1.18 0.06 -1.43 Acyclovir -2.24 -0.37 0.91 -1.26 Allopurinol -2.26 -2.19 0.38 -1.30 Amiloride -3.36 -1.96 -0.07 -2.74 Amoxicilin -2.17 -0.11 -1.90 -2.38 Antipyrine 0.39 -0.9 -1.16 -1.91 Atenolol -1.30 -0.98 -1.85 -1.81 Atropine -2.12 -1.77 -3.77 -2.43 Azathioprine -3.21 -1.98 -1.95 -3.16 Baclofen -1.67 -0.53 -1.84 -0.76 Benzocaine -2.33 -1.98 -1.82 -2.14 Celecoxib -3.74 -3.73 -5.38 -4.55 Chloramphenicol -2.11 -1.57 -2.16 -2.55 Chlorpromazine -5.27 -4.82 -5.97 -5.72 Ciprofloxacin -3.73 -1.11 -2.78 -2.04 Colchicine -0.96 -1.76 -3.93 -3.00 Cortisone -3.00 -2.72 -4.43 -2.88 Dapsone -3.19 -2.43 -2.05 -2.94 Diazepam -3.76 -3.34 -4.58 -4.06

log*S E CP <sup>W</sup>* 1.120 0.599 log (6)

2005).

model is:

equations 1-3 are listed in Table 3.


Experimental and Computational Methods Pertaining to Drug Solubility 203

Few models were presented to calculate the solubility of drugs in organic solvents. Yalkowsky et al. (1983) calculated the mole fraction solubility of weak electrolytes and non-

Dearden and O'Sullivan (1988) proposed the following equation for calculating the molar

which was tested on the solubility of 12 pharmaceuticals and the mean percentage deviation

Sepassi and Yalkowsky (2006) proposed another version of equation 8 to compute the molar

The Abraham solvation model provides a more comprehensive solubility prediction method for organic solvents (Abraham et al., 2010). The Abraham model written in terms of

*<sup>S</sup> c eE sS a A bB vV*

where *SS* and *SW* are the solute solubility in the organic solvent and water (in mole/L), respectively. In equation 12, the coefficients c, e, s, a, b and v are the model constants (i.e. solvent's coefficients), which depend upon the solvent system under consideration. These

dilution activity coefficients and partition coefficients of various solutes against the corresponding solute parameters (Abraham and Acree, 2005). The Abraham solvent coefficients (c, e, s, a, b and v) and Abraham solute parameters (E, S, A, B and V) represent the extent of all known interactions between solute and solvents in the solution (Stovall et

Solubility of a solute in an ideal solution could be mathematically represented by van't Hoff

log *<sup>a</sup> S b*

The mean percentage value of equation 11 was 147 (± 247) % (Jouyban, 2009).

log *<sup>S</sup> W*

**3.3 Solubility at different temperatures** 

*S*

coefficients were computed by regression analysis of measured log *<sup>S</sup>*

log *X m Oct* 0.011 0.15 *p* (8)

log *X m Oct* 0.013 0.44 *p* (9)

log 0.0423 1.45 *S m Cyc p* (10)

log*S mp Oct* 0.01 25 0.5 (11)

(12)

values, infinite

*W S S* 

*<sup>T</sup>* (13)

**3.2 Solubility in organic solvents** 

electrolytes in n-octanol at 30 C as:

solubility of drugs in cyclohexane ( *SCyc* ):

was 85.1 (± 21.6) % (Jouyban, 2009).

solubility of drugs in octanol as:

solubility is:

al., 2005b).

equation:


Table 3. List of the test data set for evaluating the capability of the models for aqueous solubility prediction, the experimental (log*SW*) and predicted values by equations 4-6

The solubility value of a drug is affected by pH which is largely depends on whether the compound has acid/base ionizable functional groups. Most of the pharmaceutical compounds are weak acids or bases which could be dissociated according to the following equlibria:

$$\begin{aligned} \text{Acidic Drug}: \quad &HA + H\_2O \xleftarrow{pKa} A^- + H\_3O^+ \\\\ \text{Basic Drug}: \quad &B + H\_2O \xleftarrow{pKa} BH^+ + OH^- \end{aligned} \qquad \qquad \qquad \qquad \qquad \qquad \qquad \begin{aligned} pK\_a &= \frac{\begin{bmatrix} A^- \\ \end{bmatrix} \cdot \begin{bmatrix} H\_3O^+ \\ \end{bmatrix}}{\begin{bmatrix} B^+ \\ \end{bmatrix} \cdot \begin{bmatrix} OH^- \\ \end{bmatrix}} \begin{bmatrix} \begin{bmatrix} OH^- \\ \end{bmatrix} \end{aligned}$$

where HA and B are acidic and basic drugs, respectively, p*K*a is the acid dissociation constant, and *pK*b is basic dissociation constant. The solubility of a weak acid or base in solutions with different pH is calculated by Henderson–Hasselbalch equation:

$$\text{Acidic Drug}: \quad \log S\_T = \log S\_0 + \log \left( 10^{pH - pKa} + 1 \right) \tag{7a}$$

$$\text{Basic Drug}: \quad \log S\_T = \log S\_0 + \log \left( 10^{pKa - pH} + 1 \right) \tag{7b}$$

where *ST* and *S0* are total and intrinsic solubility, respectively. So for solubility prediction of a drug at different pH values we need to have intrinsic solubility and p*Ka* value for the drug (Sinko and Martin, 2006).

However, having a specific p*Ka* value for a compound does not mean it will have complete activity in every pH values which is the case for most of the drugs which do not have complete activity in aqueous solutions.

There are some mathematical models for calculation of the solubility and pKa of the compounds (Dearden, 2006; Jouyban, 2009; Katritzky et al., 2010). However, complete activity will be gained in two conditions: 1- infinite dilution and 2- strong acidic condition for basic compounds (or strong basic condition for acidic compounds).

#### **3.2 Solubility in organic solvents**

202 Toxicity and Drug Testing

The solubility value of a drug is affected by pH which is largely depends on whether the compound has acid/base ionizable functional groups. Most of the pharmaceutical compounds are weak acids or bases which could be dissociated according to the following

<sup>2</sup> : , *pKb*

where HA and B are acidic and basic drugs, respectively, p*K*a is the acid dissociation constant, and *pK*b is basic dissociation constant. The solubility of a weak acid or base in

: log log <sup>0</sup> log10 1 *pH pKa Acidic Drug S S <sup>T</sup>*

: log log <sup>0</sup> log10 1 *pKa pH Basic Drug S S <sup>T</sup>*

where *ST* and *S0* are total and intrinsic solubility, respectively. So for solubility prediction of a drug at different pH values we need to have intrinsic solubility and p*Ka* value for the drug

However, having a specific p*Ka* value for a compound does not mean it will have complete activity in every pH values which is the case for most of the drugs which do not have

There are some mathematical models for calculation of the solubility and pKa of the compounds (Dearden, 2006; Jouyban, 2009; Katritzky et al., 2010). However, complete activity will be gained in two conditions: 1- infinite dilution and 2- strong acidic condition

for basic compounds (or strong basic condition for acidic compounds).

2 3 : , *pKa*

*Acidic Drug HA H O A H O pK HA*

*Basic Drug B H O BH OH pK <sup>B</sup>*

solutions with different pH is calculated by Henderson–Hasselbalch equation:

 3

*BH OH*

*A HO*

*a*

*b*

(7a)

(7b)

equlibria:

(Sinko and Martin, 2006).

complete activity in aqueous solutions.

Quinine -2.82 -2.11 -4.01 -4.06 Riboflavin -3.65 -0.43 -2.77 -2.21 Salicylic acid -1.93 -2.87 -1.53 -2.24 Sertraline -4.94 -6.59 -6.17 -4.98 Sulfacetamide -1.23 -0.99 -0.83 -1.64 Terfenadine -6.69 -6.63 -9.05 -6.39 Testosterone -4.06 -4.02 -4.89 -3.66 Theophylline -1.38 -2.09 -0.18 -1.71 Thiabendazole -3.48 -4.68 -3.21 -3.94 Tolbutamide -3.46 -2.93 -3.13 -2.93 Trimethoprim -2.95 -2.22 -6.11 -2.61 Warfarin -3.89 -3.19 -7.40 -3.61 Table 3. List of the test data set for evaluating the capability of the models for aqueous solubility prediction, the experimental (log*SW*) and predicted values by equations 4-6

Few models were presented to calculate the solubility of drugs in organic solvents. Yalkowsky et al. (1983) calculated the mole fraction solubility of weak electrolytes and nonelectrolytes in n-octanol at 30 C as:

$$
\log X\_{\text{Oct}} = -0.011mp + 0.15 \tag{8}
$$

$$
\log X\_{\text{Oct}} = -0.013mp + 0.44 \tag{9}
$$

Dearden and O'Sullivan (1988) proposed the following equation for calculating the molar solubility of drugs in cyclohexane ( *SCyc* ):

$$\log S\_{\text{Cyc}} = -0.0423mp + 1.45\tag{10}$$

which was tested on the solubility of 12 pharmaceuticals and the mean percentage deviation was 85.1 (± 21.6) % (Jouyban, 2009).

Sepassi and Yalkowsky (2006) proposed another version of equation 8 to compute the molar solubility of drugs in octanol as:

$$0.\log S\_{\text{Oct}} = -0.01(mp - 25) + 0.5\tag{11}$$

The mean percentage value of equation 11 was 147 (± 247) % (Jouyban, 2009).

The Abraham solvation model provides a more comprehensive solubility prediction method for organic solvents (Abraham et al., 2010). The Abraham model written in terms of solubility is:

$$\log\left(\frac{S\_S}{S\_W}\right) = c + e \cdot E + s \cdot S + a \cdot A + b \cdot B + v \cdot V \tag{12}$$

where *SS* and *SW* are the solute solubility in the organic solvent and water (in mole/L), respectively. In equation 12, the coefficients c, e, s, a, b and v are the model constants (i.e. solvent's coefficients), which depend upon the solvent system under consideration. These

coefficients were computed by regression analysis of measured log *<sup>S</sup> W S S* values, infinite

dilution activity coefficients and partition coefficients of various solutes against the corresponding solute parameters (Abraham and Acree, 2005). The Abraham solvent coefficients (c, e, s, a, b and v) and Abraham solute parameters (E, S, A, B and V) represent the extent of all known interactions between solute and solvents in the solution (Stovall et al., 2005b).

#### **3.3 Solubility at different temperatures**

Solubility of a solute in an ideal solution could be mathematically represented by van't Hoff equation:

$$
\log S = \frac{a}{T} + b
\tag{13}
$$

Experimental and Computational Methods Pertaining to Drug Solubility 205

where *Sm* is the solubility of the solute in the mixed solvent system, *S*2 denote the aqueous

to 1 2 (log( / )) *S S* in which *S*1 is the solubility in the neat cosolvent (Yalkowsky and Roseman, 1981). The general form of the log-linear model for multi-component solvent

<sup>2</sup> log log *S S m i <sup>i</sup>*

*<sup>i</sup>* and *<sup>i</sup> f* are the solubilization power and the fractions of cosolvent i *(*Li, 2001*).* 

partition coefficient ( log *P* ) which is a key relationship and could improve the prediction

where *M* and *N* are the cosolvent constants and are not dependent on the solute's nature. The numerical values of *M* and *N* were reported for most of the common cosolvents earlier (Li and Yalkowsky, 1998) and listed in Table 4. This version of the log-linear model could be considered as a predictive model and provided the simplest solubility estimation method and requires the aqueous solubility of the drug and its experimental/calculated log*P* value as input data. The log-linear model was developed to predict the solubility of drugs at room temperature (22 – 27 C) however the solubility at other temperatures are also required in

> Solvent system M N Acetone - water 1.14 -0.10 Acetonitrile – water 1.16 -0.49 Butylamine – water 0.64 1.86 Dimethylacetamide – water 0.96 0.75 Dimethylformamide – water 0.83 0.92 Dimethylsulphoxide – water 0.79 0.95 Dioxane – water 1.08 0.40 Ethanol – water 0.93 0.40 Ethylene glycol – water 0.68 0.37 Glycerol – water 0.35 0.26 Methanol – water 0.89 0.36 Polyethylene glycol 400 – water 0.74 1.26 1-Propanol – water 1.09 0.01 2-Propanol – water 1.11 -0.50 Propylene glycol - water 0.77 0.58

is the solubilization power of the cosolvent and theoretically is equal

*f* (17)

and logarithm of drug's

*M* log *P N* (18)

solubility of drug,

where

systems could be written as:

the pharmaceutical industry.

Valvani *et al.* (1981) reported a linear relationship between

capability of the log-linear model. The relationship was expressed as:

Table 4. Updated Table from (Li and Yalkowsky, 1998; Millard et al., 2002.)

in binary solvents at a fixed temperature and expressed as:

The Jouyban-Acree model was adopted from the combined nearly ideal binary solvent/Redlich-Kister equation proposed by Prof. Acree (1992) which was derived from a thermodynamic mixing model that includes contributions from both two-body and threebody interactions (Hwang et al., 1991). The model was presented for solubility calculations

where a is the slope of the linear plot of ln*<sup>S</sup>* against <sup>1</sup> *T* and b is the intercept. The a term is

equal to 2.303 *H f R* and b is equal to 2.303 *f m H RT* for ideal solutions in which *R* is the molar gas

constant and *Tm* is the melting point expressed as K. Equation 13 provides good relationship in the narrow range of temperature. For ideal solutions, the enthalpy of mixing is zero, therefore the enthalpy of solution ( *Hs* ) is equal to the enthalpy of fusion ( *H <sup>f</sup>* ). The *Hs*

is always endothermic for ideal solutions, and the solute solubility will be increased by increasing the temperature. The pattern is different for gases, liquids and solids as shown in Figure 4 where the solubility of gases decreases with increased temperature. The Hildebrand equation is an alternative model and expressed as:

$$\log S = a \ln T + b \tag{14}$$

in which a and b are the adjustable parameters. Equations 13 and 14 fail to represent the solubility-temperature relationship of most of pharmaceutical compounds in water and other pharmaceutically interested solvents especially at a wide temperature range. There are some physico-chemical reasons for this deviation from linear relationships, e.g. formation of polymorphs or solvate forms of the drug, which was discussed in details by Grant et al. (1984). To represent such data, a combined version of the van't Hoff and Hildebrand equations could be used. The equation is:

$$\log S = \frac{a}{T} + b \ln T + c \tag{15}$$

in which a, b and c are the adjustable parameters calculated by a least square analysis (Grant et al., 1984).

Fig. 4. The van't Hoff plot for gases, liquids and solids

#### **3.4 Solubility in mixed solvents**

The log-linear model of Yalkowsky is the simplest and famous model to calculate the solubility of pharmaceuticals in mixed solvent systems and is expressed by:

$$\log S\_m = \log S\_2 + \sigma \cdot f\_1 \tag{16}$$

constant and *Tm* is the melting point expressed as K. Equation 13 provides good relationship in the narrow range of temperature. For ideal solutions, the enthalpy of mixing is zero, therefore the enthalpy of solution ( *Hs* ) is equal to the enthalpy of fusion ( *H <sup>f</sup>* ). The *Hs* is always endothermic for ideal solutions, and the solute solubility will be increased by increasing the temperature. The pattern is different for gases, liquids and solids as shown in Figure 4 where the solubility of gases decreases with increased temperature. The

in which a and b are the adjustable parameters. Equations 13 and 14 fail to represent the solubility-temperature relationship of most of pharmaceutical compounds in water and other pharmaceutically interested solvents especially at a wide temperature range. There are some physico-chemical reasons for this deviation from linear relationships, e.g. formation of polymorphs or solvate forms of the drug, which was discussed in details by Grant et al. (1984). To represent such data, a combined version of the van't Hoff and Hildebrand

log ln *<sup>a</sup> S bTc*

in which a, b and c are the adjustable parameters calculated by a least square analysis (Grant

The log-linear model of Yalkowsky is the simplest and famous model to calculate the

2 1 log log *S Sf <sup>m</sup>*

solubility of pharmaceuticals in mixed solvent systems and is expressed by:

*f m*

*H RT*

*T* and b is the intercept. The a term is

for ideal solutions in which *R* is the molar gas

log ln *SaTb* (14)

*<sup>T</sup>* (15)

(16)

where a is the slope of the linear plot of ln*<sup>S</sup>* against <sup>1</sup>

and b is equal to 2.303

Hildebrand equation is an alternative model and expressed as:

equations could be used. The equation is:

Fig. 4. The van't Hoff plot for gases, liquids and solids

**3.4 Solubility in mixed solvents** 

equal to 2.303

et al., 1984).

*H f R*

where *Sm* is the solubility of the solute in the mixed solvent system, *S*2 denote the aqueous solubility of drug, is the solubilization power of the cosolvent and theoretically is equal to 1 2 (log( / )) *S S* in which *S*1 is the solubility in the neat cosolvent (Yalkowsky and Roseman, 1981). The general form of the log-linear model for multi-component solvent systems could be written as:

$$\log S\_m = \log S\_2 + \sum \sigma\_i f\_i \tag{17}$$

where *<sup>i</sup>* and *<sup>i</sup> f* are the solubilization power and the fractions of cosolvent i *(*Li, 2001*).*  Valvani *et al.* (1981) reported a linear relationship between and logarithm of drug's partition coefficient ( log *P* ) which is a key relationship and could improve the prediction capability of the log-linear model. The relationship was expressed as:

$$
\sigma = M \cdot \log P + N \tag{18}
$$

where *M* and *N* are the cosolvent constants and are not dependent on the solute's nature. The numerical values of *M* and *N* were reported for most of the common cosolvents earlier (Li and Yalkowsky, 1998) and listed in Table 4. This version of the log-linear model could be considered as a predictive model and provided the simplest solubility estimation method and requires the aqueous solubility of the drug and its experimental/calculated log*P* value as input data. The log-linear model was developed to predict the solubility of drugs at room temperature (22 – 27 C) however the solubility at other temperatures are also required in the pharmaceutical industry.


Table 4. Updated Table from (Li and Yalkowsky, 1998; Millard et al., 2002.)

The Jouyban-Acree model was adopted from the combined nearly ideal binary solvent/Redlich-Kister equation proposed by Prof. Acree (1992) which was derived from a thermodynamic mixing model that includes contributions from both two-body and threebody interactions (Hwang et al., 1991). The model was presented for solubility calculations in binary solvents at a fixed temperature and expressed as:

Experimental and Computational Methods Pertaining to Drug Solubility 207

equations are obtained:

12 1 2

*T*

12 1 2

*T*

respectively (Jouyban et al., 2009).

the solute parameters as:

12 1 2

*T*

and

model.

12 1 2

*T*

, 1 1, 2 2,

*S fS fS m T <sup>T</sup> <sup>T</sup>*

1 2

*T*

log log log

2

, 1 1, 2 2,

*S fS fS m T <sup>T</sup> <sup>T</sup>*

1 2

*T*

log log log

2

*ff f f E S*

interactions. To cover this point, the deviated solubilities from the trained versions of the Jouyban-Acree model were correlated using available solubility data sets in ethanol – water and dioxane – water mixtures at various temperatures and the following

558.45 358.60 22.01 352.97 130.48 297.10

*f f ES A B V*

45.67 165.77 321.55 479.48 409.51 827.63

(21)

(22)

73.41 555.48 *B V*

135.95 41.11 192.19 237.81 363.87 310.30

*ABV* 924.73 271.54

648.01 404.99 428.69 340.99 59.03 56.94

*ff f f ES A BV*

*f f E SA B V*

1102.49 667.02 2070.16 421.15

The mean percentage deviation values for ethanol and dioxane were 34 and 22 %,

v. a generalized version of the Jouyban-Acree model was proposed using its combination with the Abraham solvation parameters where the model constants of the Jouyban-Acree model were correlated with the functions of the Abraham solvent coefficients and

The mean percentage deviation of this model was 42 % for 152 data sets which was significantly less than that of the log-linear model (78 %). Figure 5 shows the relative frequency of the individual percentage deviations of the predicted solubilities using equations 23 and 16 (log-linear) in which the error distribution of equation 23 is better than that of the log-linear model. It should be noted that the Jouyban-Acree model requires two experimental data points, i.e. *S*1,*<sup>T</sup>* and *S*2,*<sup>T</sup>* , whereas the log-linear model needs just aqueous solubility of the drug as input data. The main advantage of equation 23 is that it could be used to predict the solubility in mixed solvents where the Abraham solvent parameters (i.e. c, e, s, a, b and v) are available. Table 6 listed these parameters for a number of more common solvents in the pharmaceutical industry. Unfortunately these parameters are not available for a number of more common pharmaceutical cosolvents, such as propylene glycol and polyethylene glycols, and this is a disadvantage for this

493.81 341.32 866.22 36.17 1

*ff f f E SA*

*ff f f ES ABV*

$$\log S\_m = f\_1 \log S\_1 + f\_2 \log S\_2 + f\_1 f\_2 \sum\_{i=0}^2 A\_i \left( f\_1 - f\_2 \right)^i \tag{19}$$

where *Ai* stands for the model constants. The *Ai* values are calculated by regressing log log log *Sm <sup>f</sup>*1 12 2 *<sup>S</sup> <sup>f</sup> <sup>S</sup>* against 1 2 *f f* , *ff f f* 12 1 2 and <sup>2</sup> 12 1 2 *ff f f* by a no intercept least squares analysis (Jouyban-Gharamaleki and Hanaee, 1997). The applicability of the model was extended to other physico-chemical properties in mixed solvents at various temperatures as:

$$\log S\_{m,T} = f\_1 \log S\_{1,T} + f\_2 \log S\_{2,T} + \frac{f\_1 f\_2}{T} \sum\_{i=0}^{2} f\_i \left( f\_1 - f\_2 \right)^i \tag{20}$$

where *Sm T*, , *S*1,*<sup>T</sup>* and *S*2,*<sup>T</sup>* are the solubility in solvent mixture, mono-solvents 1 and 2 at temperature *T* (K) and *<sup>i</sup> J* is the model constants. The main limitations of the Jouyban-Acree model for predicting drug solubilities in solvent mixtures are: a) it requires two data points of solubilities in mono-solvent systems, and b) numerical values of the model constants. To overcome the first limitation, the solubility prediction methods in mono-solvent system should be improved. To address the second limitation, the following solutions were examined during last couple of years:



Table 5. The constants of the Jouyban-Acree model for a number of solvent systems, data taken from (Jouyban and Acree, 2007; Jouyban, 2008)

iv. in the trained versions of the Jouyban-Acree model, we assumed the extent of the solute-solvent interactions are the same, however, it is not the case since various solutes possess different functional groups leading to various extent of the solute-solvent interactions. To cover this point, the deviated solubilities from the trained versions of the Jouyban-Acree model were correlated using available solubility data sets in ethanol – water and dioxane – water mixtures at various temperatures and the following equations are obtained:

$$\begin{aligned} \log S\_{m,T} &= f\_1 \log S\_{1,T} + f\_2 \log S\_{2,T} \\ &+ \left(\frac{f\_1 f\_2}{T}\right) \{558.45 + 358.60E + 22.01S - 352.97A + 130.48B - 297.10V\} \\ &+ \left(\frac{f\_1 f\_2 \left(f\_1 - f\_2\right)}{T}\right) \{45.67 - 165.77E - 321.55S + 479.48A - 409.51B + 827.63V\} \\ &+ \left(\frac{f\_1 f\_2 \left(f\_1 - f\_2\right)^2}{T}\right) \{-493.81 - 341.32E + 866.22S - 36.17A + 173.41B - 555.48V\} \end{aligned} \tag{21}$$

and

206 Toxicity and Drug Testing

log log log *<sup>i</sup> m i*

*S f S f S ff A f f*

where *Ai* stands for the model constants. The *Ai* values are calculated by regressing

intercept least squares analysis (Jouyban-Gharamaleki and Hanaee, 1997). The applicability of the model was extended to other physico-chemical properties in mixed solvents at

, 1 1, 2 2, 1 2

log log log *<sup>i</sup> m T T T i*

*f f S f S f S Jf f <sup>T</sup>*

where *Sm T*, , *S*1,*<sup>T</sup>* and *S*2,*<sup>T</sup>* are the solubility in solvent mixture, mono-solvents 1 and 2 at temperature *T* (K) and *<sup>i</sup> J* is the model constants. The main limitations of the Jouyban-Acree model for predicting drug solubilities in solvent mixtures are: a) it requires two data points of solubilities in mono-solvent systems, and b) numerical values of the model constants. To overcome the first limitation, the solubility prediction methods in mono-solvent system should be improved. To address the second limitation, the following solutions were

i. the *<sup>i</sup> J* terms are obtained using solubility of structurally related drugs in a given mixed solvent system, and then predict the un-measured solubility of the related drugs where the expected mean percentage deviation was ~ 17 % (Jouyban-Gharamaleki et al., 1998). ii. the model constants could be calculated using a minimum number of experimental data points, i.e. three data points, and then predict the solubilities at the rest of solvent compositions where the expected prediction mean percentage deviation was < 15 %

iii. the trained versions of the Jouyban-Acree models could be employed for solubility prediction of drugs in the aqueous mixtures of a number of organic solvents were reported. Using this version of the model, only the solubility data in mono-solvents are required. Table 5 listed the numerical values of the Jouyban-Acree model constants for

Solvent system J0 J1 J2 Prediction % error

Table 5. The constants of the Jouyban-Acree model for a number of solvent systems, data

iv. in the trained versions of the Jouyban-Acree model, we assumed the extent of the solute-solvent interactions are the same, however, it is not the case since various solutes possess different functional groups leading to various extent of the solute-solvent

Dioxane - water 958.44 509.45 867.44 27 Ethanol – water 724.21 485.17 194.41 48 Polyethylene glycol 400 – water 394.82 -355.28 388.89 40 Propylene glycol - water 37.03 319.49 - 24 Ethanol – ethyl acetate 382.987 125.663 214.579 13

log log log *Sm <sup>f</sup>*1 12 2 *<sup>S</sup> <sup>f</sup> <sup>S</sup>* against 1 2 *f f* , *ff f f* 12 1 2 and <sup>2</sup>

various temperatures as:

examined during last couple of years:

(Jouyban-Gharamaleki et al., 2001).

taken from (Jouyban and Acree, 2007; Jouyban, 2008)

the 5 cosolvents studied.

1 1 2 2 12 1 2

12 1 2 *ff f f* by a no

(19)

2 1 2

0

(20)

*i*

2

0

*i*

$$\begin{aligned} \log S\_{m,T} &= f\_1 \log S\_{1,T} + f\_2 \log S\_{2,T} \\ &+ \left(\frac{f\_1 f\_2}{T}\right) \{648.01 - 404.99E + 428.69 + S340.99A - 59.03B - 56.94V\} \\ &+ \left(\frac{f\_1 f\_2 (f\_1 - f\_2)}{T}\right) \{-135.95 - 41.11E - 192.19S + 237.81A + 363.87B + 310.30V\} \\ &+ \left(\frac{f\_1 f\_2 \left(f\_1 - f\_2\right)^2}{T}\right) \{-1102.49 - 667.02E + 2070.16S + 421.15A - 924.73B - 271.54V\} \end{aligned} \tag{22}$$

The mean percentage deviation values for ethanol and dioxane were 34 and 22 %, respectively (Jouyban et al., 2009).

v. a generalized version of the Jouyban-Acree model was proposed using its combination with the Abraham solvation parameters where the model constants of the Jouyban-Acree model were correlated with the functions of the Abraham solvent coefficients and the solute parameters as:

The mean percentage deviation of this model was 42 % for 152 data sets which was significantly less than that of the log-linear model (78 %). Figure 5 shows the relative frequency of the individual percentage deviations of the predicted solubilities using equations 23 and 16 (log-linear) in which the error distribution of equation 23 is better than that of the log-linear model. It should be noted that the Jouyban-Acree model requires two experimental data points, i.e. *S*1,*<sup>T</sup>* and *S*2,*<sup>T</sup>* , whereas the log-linear model needs just aqueous solubility of the drug as input data. The main advantage of equation 23 is that it could be used to predict the solubility in mixed solvents where the Abraham solvent parameters (i.e. c, e, s, a, b and v) are available. Table 6 listed these parameters for a number of more common solvents in the pharmaceutical industry. Unfortunately these parameters are not available for a number of more common pharmaceutical cosolvents, such as propylene glycol and polyethylene glycols, and this is a disadvantage for this model.

Experimental and Computational Methods Pertaining to Drug Solubility 209

Solvent c e s a b v

Acetone 0.335 0.349 -0.231 -0.411 -4.793 3.963

Acetonitrile 0.413 0.077 0.326 -1.566 -4.391 3.364

Dimethyl formamide -0.438 -0.099 0.670 0.878 -4.970 4.552

Dioxane 0.098 0.350 -0.083 -0.556 -4.826 4.172

Ethanol 0.208 0.409 -0.959 0.186 -3.645 3.928

Ethylene glycol 0.243 0.695 -0.670 0.726 -2.399 2.670

Methanol 0.329 0.299 -0.671 0.080 -3.389 3.512

2-Propanol 0.063 0.320 -1.024 0.445 -3.824 4.067

Water -0.994 0.577 2.549 3.813 4.841 -0.869

Stovall et al., 2005a; 2005b)

al., 2005):

and

**3.5 Solubility in the presence of surfactants** 

surfactant (Rangel-Yagui et al., 2005):

Table 6. The Abraham solvent parameters of a number of common solvents (data taken from

Equation 24 is one of the equations used for the solubility calculation in presence of

 tan *T W Surfac t*

(24)

(25)

*S S C cmc*

where *χ* is the ratio of the concentration of the drug in micelles to the concentration of the micellar surfactant molecules, *ST* is the total drug solubility in the solution, *SW* is the aqueous solubility of the drug, *CSurfactant* is the molar concentration of the surfactant in the solution, and *cmc* is the critical micelle concentration. Another equation is (Rangel-Yagui et

> *T W W*

*S S <sup>K</sup> S*

However, these equations require at least two other experimental data as input for total

Abraham et al. (1995) have proposed two models for prediction of *K* for different solutes in

log 1.201 0.542 0.400 0.133 1.580 2.793 0.9849 , 132 , standard deviation 0.171 *K ES ABV <sup>x</sup>*

(26)

where *K* is the micelle-water partition coefficient of the drug.

solubility prediction of the drug in micellar solutions.

the presence of sodium dodecylsulfate (SDS) as:

*R N*

 , 1 1, 2 2, 2 22 1 2 1 2 1 2 1 2 22 2 1 2 1 2 1 2 2 1 2 12 1 2 log log log 1639.07 561.01 1344.81 18.22 3.65 0.86 4.40 1054.03 1043.54 *S fS fS m T <sup>T</sup> <sup>T</sup> c c Ee e Ss s f f T Aa a Bb b V v v c c ff f f T* 2 2 1 2 1 2 22 2 1 2 1 2 1 2 2 22 <sup>2</sup> 1 2 1 2 1 2 12 1 2 2 1 2 1 359.47 1.20 30.26 2.66 0.16 2895.07 1913.07 901.29 10.87 24.62 9.79 *Ee e Ss s Aa a Bb b V v v c c Ee e Ss s ff f f T Aa a Bb* 2 2 <sup>2</sup> 1 2 *b Vv v* 24.38 (23)

In addition to the above discussed models to predict the solubility of drugs in solvent mixtures, there are some models derived from molecular thermodynamic approaches. These models require relatively complex computations and did not attract more attention in the pharmaceutical area. These models provide comparable prediction accuracies with the above discussed models. As an example, the prediction error of a method based on statistical mechanical fluctuation solution theory varied 0.3-58 % (Ellegaard et al., 2010) whereas the corresponding value for the common models in the pharmaceutical area varied between 8 to 19 % (Jouyban-Gharamaleki et al., 1999).

Fig. 5. The relative frequencies of the predicted solubilities in binary solvent mixtures using Jouyban-Acree and log-linear models


Table 6. The Abraham solvent parameters of a number of common solvents (data taken from Stovall et al., 2005a; 2005b)

#### **3.5 Solubility in the presence of surfactants**

Equation 24 is one of the equations used for the solubility calculation in presence of surfactant (Rangel-Yagui et al., 2005):

$$\mathcal{X} = \frac{\left(\mathcal{S}\_T - \mathcal{S}\_W\right)}{\left(\mathcal{C}\_{Surfac\,\text{tan}} - cm\mathcal{C}\right)}\tag{24}$$

where *χ* is the ratio of the concentration of the drug in micelles to the concentration of the micellar surfactant molecules, *ST* is the total drug solubility in the solution, *SW* is the aqueous solubility of the drug, *CSurfactant* is the molar concentration of the surfactant in the solution, and *cmc* is the critical micelle concentration. Another equation is (Rangel-Yagui et al., 2005):

$$K = \frac{S\_T - S\_W}{S\_W} \tag{25}$$

where *K* is the micelle-water partition coefficient of the drug.

However, these equations require at least two other experimental data as input for total solubility prediction of the drug in micellar solutions.

Abraham et al. (1995) have proposed two models for prediction of *K* for different solutes in the presence of sodium dodecylsulfate (SDS) as:

$$\begin{aligned} \log K\_x &= 1.201 + 0.542E - 0.400S - 0.133A - 1.580B + 2.793V \\ R &= 0.9849 \quad , \quad N = 132 \quad , \quad \text{standard deviation} = 0.171 \end{aligned} \tag{26}$$

and

208 Toxicity and Drug Testing

1054.03 1043.54

24.62 9.79

*T Aa a Bb*

2 1 2 1

2 2

1 2 12 1 2

*c c ff f f*

between 8 to 19 % (Jouyban-Gharamaleki et al., 1999).

log-linear

Jouyban-Acree

, 1 1, 2 2,

*S fS fS m T <sup>T</sup> <sup>T</sup>*

1 2

log log log

*T*

0

Jouyban-Acree and log-linear models

10

20

30

40

**Relative frequency**

50

60

70

80

*Aa a Bb b V v v*

2

2895.07 1913.07 901.29 10.87

2 22 <sup>2</sup> 1 2 1 2 1 2 12 1 2

In addition to the above discussed models to predict the solubility of drugs in solvent mixtures, there are some models derived from molecular thermodynamic approaches. These models require relatively complex computations and did not attract more attention in the pharmaceutical area. These models provide comparable prediction accuracies with the above discussed models. As an example, the prediction error of a method based on statistical mechanical fluctuation solution theory varied 0.3-58 % (Ellegaard et al., 2010) whereas the corresponding value for the common models in the pharmaceutical area varied

> **Individual percentage deviation**

Fig. 5. The relative frequencies of the predicted solubilities in binary solvent mixtures using

*c c Ee e Ss s ff f f*

3.65 0.86 4.40

30.26 2.66 0.16

*T Aa a Bb b V v v*

1639.07 561.01 1344.81 18.22

*c c Ee e Ss s f f*

22 2 1 2 1 2 1 2

22 2 1 2 1 2 1 2

359.47 1.20

<sup>2</sup> 1 2 *b Vv v* 24.38

2 22 1 2 1 2 1 2

*Ee e Ss s*

2 2 1 2 1 2

(23)

Experimental and Computational Methods Pertaining to Drug Solubility 211

intrinsic solubility, *Slope* is the slope of the first part of solubility curve versus complexing agent concentrations, and *CHost* is the concentration of the complexing agent (Sinko and

Fig. 6. Possible different solubility behaviours in the presence of complexing agent.

equation for aromatics and terpenes with hydroxypropyl-β-cyclodextrin:

type solubility curves between drugs and α/β/γ-cyclodextrines as following:

*R N*

2

*R N*

Again, like the pH and surfactant effects, one must have intrinsic solubility and *Slope* (or *K1:1*) for solubility prediction in presence of complexing agents. However some QSPR models have been developed for prediction of *Slope* (or *K1:1*). But most of them only considered the effect of complexing agent on the solubility enhancement (i.e. *Slope*). Demian (2000) has proposed equation 32 for the correlation of the *Slope* of the above mentioned

> 2.86 0.11 0.34 log 0.788 , 19 , standard error 0.336

where *SterimolL* is a steric parameter which is calculated by ChemOffice software (Demian, 2000). Choi et al. (2006) have developed a QSPR model for the correlation of the *Slope* for AL

0.913 , 63 , standard error 0.028

where *Eh-g* is the interaction energy between host and guest, *Enp\_h-g* is the difference between nonpolar components of free energy of solvation of the host–guest complex and those of individual host and guest molecules, *Enp\_g-g* is the difference between nonpolar components of free energy of solvation of the guest–guest dimer and those of individual guest molecule (Choi et al., 2006). These energy values are calculated after a Monte Carlo docking

*Slope E E h g np h g Enp g g*

(32)

\_ \_

(33)

0.012 0.102 0.328 0.305

*Slope SterimolL P*

*Total S* is the total solubility amount in the presence of a complexing agent, *S0* is the

where *Complex*

Martin, 2006; Brewster and Loftsson, 2007).

$$\begin{aligned} \log K\_x &= 1.129 + 0.504 \log P + 1.216V\\ R &= 0.9755 \quad , \quad N = 132 \quad , \quad \text{standard deviation} = 0.215 \end{aligned} \tag{27}$$

where *Kx* is the definition of *K* of equation 25 in mole fraction unit (Abraham et al., 1995). Ghasemi and coworkers have developed a MLR model for micellar solubility prediction in the presence of SDS for a diverse set of compounds:

$$\begin{aligned} \log K\_S &= -0.638 + 0.001E\_b + 0.384MR - 0.112LILMO + 0.570C \log P - 0.001Re \, pE\\ R^2 &= 0.9679 \quad , \quad N = 62 \quad , \quad RMSE = 0.124 \end{aligned} \tag{28}$$

where *KS* is the micellar solubility, *Eb* is bending energy, *MR* is molar refractivity, LUMO is the lowest unoccupied molecular orbital, *C*log*P* is logarithm of calculated partition coefficient and *RepE* is the repulsion energy (Ghasemi et al., 2008). In other work, they have proposed a QSPR model for micellar solubility prediction for a diverse set of compounds in presence of cetyltrimethylammonium bromide (CTAB) as:

$$\begin{aligned} \log K\_S &= -1.1522 + 0.0070 MPa + 0.8089 \log P - 0.1262 DPL\\ R^2 &= 0.9624 \quad , \quad N = 40 \quad , \quad RMSE = 0.169 \end{aligned} \tag{29}$$

where *MP* is melting point of the solute, and *DPLL* is the dipole length of the solute (Ghasemi et al., 2009).

However, as mentioned above, at least intrinsic solubility is required for total solubility prediction in the presence of a surfactant and they cannot be used as *ab initio* QSPR models for solubility prediction.

#### **3.6 Solubility in the presence of complexing agents**

In most of the cases, by adding complexing agents (e.g. cyclodextrins) to the solution, the solubility of a specific ligand (i.e. drug) is enhanced. But this enhancement could have different types as illustrated in Figure 6.

As has been seen, different kinds of drugs show different behaviours. But except for one condition, in the smaller amounts of complexing agent, the solubility changes are the same for other types. This common part of the curves is considered as a straight line with a slope of:

$$\begin{aligned} Slope &= \frac{K\_{1:1} S\_0}{1 + K\_{1:1} S\_0} \\ K\_{1:1} &= \frac{\left[Host.Ligmand\right]}{\left[Host\right] \cdot \left[Ligmand\right]} \end{aligned} \tag{30}$$

where *K1:1* is the complex formation coefficient, [*Host.Ligand*] is the concentration of the formed complex between drug and complexing agent, [*Host*] is the concentration of the complexing agent, and [*Ligand*] is the concentration of the drug (Sinko and Martin, 2006; Brewster and Loftsson, 2007). To correlate solubility value in presence of a complexing agent in this part of the solubility curve, one can use the following equation:

$$\mathbf{S}\_{Total}^{\text{Coupleex}} = \mathbf{S}\_0 + \text{Slope} \cdot \mathbf{C}\_{\text{Host}} \tag{31}$$

0.9755 , 132 , standard deviation 0.215

where *Kx* is the definition of *K* of equation 25 in mole fraction unit (Abraham et al., 1995). Ghasemi and coworkers have developed a MLR model for micellar solubility prediction in

log 0.638 0.001 0.384 0.112 0.570 log 0.001Re

log 1.1522 0.0070 0.8089log 0.1262 0.9624 , 40 , 0.169

*R NRMSEP*

*K M <sup>S</sup> P PDPLL*

where *MP* is melting point of the solute, and *DPLL* is the dipole length of the solute

However, as mentioned above, at least intrinsic solubility is required for total solubility prediction in the presence of a surfactant and they cannot be used as *ab initio* QSPR models

In most of the cases, by adding complexing agents (e.g. cyclodextrins) to the solution, the solubility of a specific ligand (i.e. drug) is enhanced. But this enhancement could have

As has been seen, different kinds of drugs show different behaviours. But except for one condition, in the smaller amounts of complexing agent, the solubility changes are the same for other types. This common part of the curves is considered as a straight line with a slope

> 

(30)

*Complex S S Slope C Total Host* (31)

*Host Ligand*

.

1:1

in this part of the solubility curve, one can use the following equation:

1:1 0 1:1 0

1

*Host Ligand <sup>K</sup>*

where *K1:1* is the complex formation coefficient, [*Host.Ligand*] is the concentration of the formed complex between drug and complexing agent, [*Host*] is the concentration of the complexing agent, and [*Ligand*] is the concentration of the drug (Sinko and Martin, 2006; Brewster and Loftsson, 2007). To correlate solubility value in presence of a complexing agent

0

*K S Slope K S*

*K EM S b R LUMO C P pE*

where *KS* is the micellar solubility, *Eb* is bending energy, *MR* is molar refractivity, LUMO is the lowest unoccupied molecular orbital, *C*log*P* is logarithm of calculated partition coefficient and *RepE* is the repulsion energy (Ghasemi et al., 2008). In other work, they have proposed a QSPR model for micellar solubility prediction for a diverse set of compounds in

(28)

(27)

(29)

log 1.129 0.504log 1.216

*R N*

the presence of SDS for a diverse set of compounds:

*R N RMSEP*

2

different types as illustrated in Figure 6.

0.9679 , 62 , 0.124

presence of cetyltrimethylammonium bromide (CTAB) as:

**3.6 Solubility in the presence of complexing agents** 

2

(Ghasemi et al., 2009).

for solubility prediction.

of:

*K P <sup>x</sup> V*

where *Complex Total S* is the total solubility amount in the presence of a complexing agent, *S0* is the intrinsic solubility, *Slope* is the slope of the first part of solubility curve versus complexing agent concentrations, and *CHost* is the concentration of the complexing agent (Sinko and Martin, 2006; Brewster and Loftsson, 2007).

Fig. 6. Possible different solubility behaviours in the presence of complexing agent.

Again, like the pH and surfactant effects, one must have intrinsic solubility and *Slope* (or *K1:1*) for solubility prediction in presence of complexing agents. However some QSPR models have been developed for prediction of *Slope* (or *K1:1*). But most of them only considered the effect of complexing agent on the solubility enhancement (i.e. *Slope*). Demian (2000) has proposed equation 32 for the correlation of the *Slope* of the above mentioned equation for aromatics and terpenes with hydroxypropyl-β-cyclodextrin:

$$\begin{aligned} Slope &= 2.86 - 0.11 \times Sternimal - 0.34 \times \log P \\ R &= 0.788 \quad , \quad N = 19 \quad , \quad standard \, error = 0.336 \end{aligned} \tag{32}$$

where *SterimolL* is a steric parameter which is calculated by ChemOffice software (Demian, 2000). Choi et al. (2006) have developed a QSPR model for the correlation of the *Slope* for AL type solubility curves between drugs and α/β/γ-cyclodextrines as following:

$$\begin{aligned} \text{Slope} &= -0.012E\_{h-\text{g}} + 0.102E\_{\text{np\\_}-h-\text{g}} + 0.328E\_{\text{np\\_g\\_g}-\text{g}} + 0.305\\ R^2 &= 0.913 \quad , \quad N = 63 \quad , \quad \text{standard error} = 0.028 \end{aligned} \tag{33}$$

where *Eh-g* is the interaction energy between host and guest, *Enp\_h-g* is the difference between nonpolar components of free energy of solvation of the host–guest complex and those of individual host and guest molecules, *Enp\_g-g* is the difference between nonpolar components of free energy of solvation of the guest–guest dimer and those of individual guest molecule (Choi et al., 2006). These energy values are calculated after a Monte Carlo docking

Experimental and Computational Methods Pertaining to Drug Solubility 213

methodology for modelling (ADMET Predictor™). This package also can predict possibility of supersaturation in water. It calculates ratio of kinetic solubility versus intrinsic solubility and if the result is higher than 1.3, then the answer to possibility of supersaturation is true. It classified 95 and 23 out of 97 and 24 compounds correctly as train and test sets (ADMET

Finally, Solvomix is a recently developed free software available via Handbook of Solubility Data for Pharmaceuticals as a tool for prediction of solubility in monosolvents and mixtures of solvents. It uses GSE and Abraham models for the prediction of solubility in monosolvents and trained versions of log-linear model of Yalkowsky and Jouyban-Acree

Although preparation of a drug solution is a simple procedure, the associated problems are still a challenging subject in the pharmaceutical area. Brief review of its importance, various experimental and computational methods to determine the solubility and a number of more common methods to alter the solubility are discussed in this chapter. A comprehensive compilation of aqueous solubility data of chemical/pharmaceutical compounds is available from a reference work of Yalkowsky et al. 2010. The solubility data of pharmaceuticals in organic mono-solvents and also aqueous and non-aqueous solvent mixtures are compiled in

This work is dedicated to Professor S.A. Mahboob, Tabriz University of Medical Sciences,

Abraham, M.H.; Chadha, H.S.; Dixon, J.P.; Rafols, C. & Treiner, C. (1995). Hydrogen

Abraham, M.H. & Acree Jr., W.E. (2005). Characterisation of the Water-Isopropyl Myristate

Abraham, M.H.; Smith, R.E.; Luchtefeld, R.; Boorem, A.J.; Luo, R. & Acree Jr., W.E. (2010).

http://www.simulations-plus.com/Products.aspx?grpID=1&cID=11&pID=13 Acree Jr., W.E. (1992). Mathematical Representation of Thermodynamic Properties. Part II.

Derivation of the Combined Nearly Ideal Binary Solvent (NIBS)/Redlich-Kister Mathematical Representation from a Two-Body and Three-Body Interactional

*Journal of Pharmaceutical Sciences*, Vol. 99, pp. 1500-1515, ISSN *0378-5173*

Mixing Model. *Thermochimica Acta*, Vol. 198, pp. 71-79, ISSN *0040-6031*

Bonding. Part 40. Factors That Influence the Distribution of Solutes between Water and Sodium Dodecylsulfate Micelles. *Journal of the Chemical Society, Perkin* 

System. *International Journal of Pharmaceutics*, Vol. 294, pp. 121-128, ISSN *0378-5173*

Prediction of Solubility of Drugs and Other Compounds in Organic Solvents.

Tabriz, Iran, for his life long efforts in training pharmacy students in Tabriz.

*Transactions 2*, pp. 887-894, ISSN *0300-9580* 

ACD/Labs. Advanced Chemistry Development. Available from

ADMET Predictor™. Simulations plus inc. available from

http://207.176.233.196/products/phys\_chem\_lab/aqsol/

model for solubility prediction in mixtures of solvents (Jouyban, 2009).

Predictor™).

**4. Conclusion** 

a recent work (Jouyban, 2009).

**5. Acknowledgment** 

**6. References** 

simulation between each drug and related complexing agent. Trapani et al. (2005) have developed a QSPR model for the correlation of the ratio of the total versus intrinsic solubilities of 25 drugs in the presence of 2-hydroxypropyl-β-cyclodextrin as following:

$$\begin{aligned} \log \frac{S\_{\text{Total}}^{\text{Complex}}}{S\_0} &= 3.766 + 0.182 \text{CMR} - 0.150 \text{Clog P} - 0.00683 \text{TPSA} - 0.0844 \delta\_{\text{lat}} \\ R^2 &= 0.793 \quad , \quad N = 25 \quad , \quad Q^2 = 0.711 \end{aligned} \tag{34}$$

and

$$\begin{aligned} \log \frac{S\_{\text{Catal}}^{\text{Complex}}}{S\_0} &= 1.827 - 0.00508MW + 0.0122MV - 0.179 \text{Clog} \, P - 0.00547 \text{TPSA} \\ R^2 &= 0.763 \quad , \quad N = 25 \quad , \quad Q^2 = 0.605 \end{aligned} \tag{35}$$

where *CMR* is calculated molecular refractivity, *TPSA* is total polar surface area, *δtot* total solubility parameter, *MW* is molecular weight, and *MV* is molecular volume. Equation 34 was derived using a MLR method and equation 35 was derived using a PLS method (Trapani et al., 2005).

However, as mentioned earlier, none of these models can be applied directly for solubility prediction in the presence of complexing agents and intrinsic solubility is required for all of them.

#### **3.7 Available software**

There is almost a large number of software for solubility prediction. A thorough review of these software was provided in an article (Jouyban et al., 2008). In this chapter, more useful solubility prediction applications and those which are newly developed or related with drug design and development is discussed.

ACD/Solubility DB predicts aqueous solubility at different pH with an accuracy of average error of 0.47±0.67 (in decimal logarithm) for solubility prediction of 1125 compounds (ACD/Labs).

ACD/DMSO Solubility predicts whether a compound is soluble (a result of 1) or insoluble (a result of 0) in DMSO. Using a hybrid model of logistic regression with PLS method, its predictive model was trained with solubility related physicochemical parameters, and considering the effects of charged groups, atom chains, and ring scaffolds. It provides 30% high reliability, 70% moderate reliability and <1% low reliability in prediction, with an overall accuracy of 82% in correct prediction (Japertas et al.).

Simulations plus' ADMET predictor™, predicts aqueous solubility using 2D and 3D descriptors as input data with average error of 0.432 and 0.423 in logarithm scale for 2817 and 711 number of compounds in train and test sets, respectively (ADMET Predictor™). It can also predict the solubility in biorelevant medium of the fasted state simulated gastric fluid (FaSSGF), the fasted state simulated intestinal fluid (FaSSIF), and the fed state simulated intestinal fluid (FeSSIF). Its average errors in logarithm scale for FaSSGF are 0.510 and 0.470 for 137 and 20 compounds, respectively. Its average errors in logarithm scale for FaSSIF are 0.469 and 0.417 for 141 and 16 compounds, respectively. Its average errors in logarithm scale for FeSSIF are 0.424 and 0.409 for 136 and 21 compounds, respectively. These predictive tools are designed using 2D descriptors as inputs and ADMET Modeler's ANNE methodology for modelling (ADMET Predictor™). This package also can predict possibility of supersaturation in water. It calculates ratio of kinetic solubility versus intrinsic solubility and if the result is higher than 1.3, then the answer to possibility of supersaturation is true. It classified 95 and 23 out of 97 and 24 compounds correctly as train and test sets (ADMET Predictor™).

Finally, Solvomix is a recently developed free software available via Handbook of Solubility Data for Pharmaceuticals as a tool for prediction of solubility in monosolvents and mixtures of solvents. It uses GSE and Abraham models for the prediction of solubility in monosolvents and trained versions of log-linear model of Yalkowsky and Jouyban-Acree model for solubility prediction in mixtures of solvents (Jouyban, 2009).
