**1. Introduction**

132 Recurrent Neural Networks and Soft Computing

[14] M.I. El-Hawwary, A.L. Elshafei and H.M. Emara, Adaptive Fuzzy Control of the

[15] P.J. Gawthrop and L.P. Wang, Intermittent Predictive Control of An Inverted

[16] R.J. Wai and L.J. Chang, Stabilizing and Tracking Control of Nonlinear Dual-axis

[17] X. Xu, Y.C. Liang, H.P. Lee, W.Z. Lin, S.P. Lim and K.H. Lee, Mechanical modeling of a

[18] T. Senjyu, H. Miyazato, S. Yokoda, and K. Uezato, Speed control of ultrasonic motors using neural network, IEEE Transactions on Power Electronics, 13 (1998) 381-387. [19] X.H. Shi, Y.C. Liang, H.P. Lee, W.Z. Lin, X.Xu and S.P. Lim, Improved Elman networks

Pendulum, Control Engineering Practice, 14 (2006) 1347-1356.

14 (2006) 1135-1144.

(2004) 603-629.

Fuzzy systems, 14 (2006) 145-168.

Materials and Structures, 12 (2003) 514-523.

Inverted Pendulum Problem, IEEE Transactions on Control Systems Technology,

Inverted-pendulum System Using Fuzzy Neural Network, IEEE Transactions on

longitudinal oscillation ultrasonic motor and temperature effect analysis, Smart

and applications for controlling ultrasonic motors, Applied Artificial Intelligence 18

A protein domain is the basic unit of protein structure that can develop itself by using its own shapes and functions, and exists independently from the rest of the protein sequence. Protein domains can be seen as distinct functional or structural units of a protein. Protein domains provide one of the most valuable information for the prediction of protein structure, function, evolution, and design. Protein domain is detected from protein structure that is predicted from protein sequence of amino acid. The protein sequence may be contained of single-domain, two-domain, or multiple-domain with different or matching copies of protein domain. A protein domain comprises of protein domain boundary that relates to a part in amino acid residue where each residue in the protein chain is defined as domain position. Each shape of protein domain is a compacted and folded structure that is independently stable. It exists independently since the protein domain is a part of the protein sequence. The independent modular nature of protein domain means that it can often be found in proteins with the same domain content, but in different orders or in different proteins. The knowledge of protein domain boundaries is useful in analysing the different functions of protein sequences.

Several methods have been developed to detect the protein domain, which can be categorized as follows: (1) Methods based on similarity and used multiple sequence alignments to represent domain boundaries, e.g. KemaDom (Lusheng et al., 2006) and Biozon (Nagaranjan

BRNN-SVM: Increasing the Strength of

performance using sensitivity and specificity, and accuracy.

**3. Secondary structure prediction by BRNN** 

(Chen and Chaudhari, 2007) is as follows:

*Fi* and *Bi* , the BRNN equation is applied as follows:

**2. BRNN-SVM algorithm** 

Domain Signal to Improve Protein Domain Prediction Accuracy 135

The BRNN-SVM begins with seeking the seed protein sequences using BLAST (Altschul et al., 1997) in order to generate a dataset. The dataset is split into training and testing sets. Multiplealignment is performed using ClustalW (Larkin et al., 2007), where the alignments are represented as a protein sequence of alignment column that is associated to one position in the seed protein sequence. Bidirectional Recurrent Neural Network (BRNN) is used to generate secondary structure from alignment of protein sequence in order to highlight the signal of protein domain boundaries. The protein secondary structure is predicted into three types: alpha-helices, beta-sheet, and coil. The information of secondary structure are extracted using six measures (which are entropy, protein sequence termination, correlation, contact profile, physio-chemical properties, intron-exon information, and score of secondary structure) to increase the domain signal. This extracted information will be used for SVM input for the protein domain prediction. SVM processes the information and classify the protein domain into single-domain, two-domain, and multiple-domain. The BRNN-SVM is evaluated by comparing it with other existing methods either based on similarity and multiple sequence alignment (Biozon and KemaDOM), known protein structure (AutoSCOP and DOMpro), dimensional structure (GlobPlot, Mateo, and Dompred-DPS), comparative model (HMMPfam and HMMSMART), and sequence alone (Armadillo and SBASE). An analysis of the results has demonstrated that the BRNN-SVM shows outstanding performance on single-domain, twodomain, and multiple-domain. The steps involved in BRNN-SVM can be simplified as follows: (1) Generate training and testing sets using BLAST; (2) Perform multiple sequence alignment using ClustalW; (3) Predict secondary structure by BRNN; (4) Extract information from protein secondary structure; (5) Classify the protein domain by SVM; and (6) Evaluate the

For each protein sequence, the secondary structure information is predicted based on an ensemble of BRNNs. The input for predicting secondary structure is a single protein sequence from a multiple sequence alignment. Then, BRNN derives protein sequence information from PSI-BLAST (Altschul et al., 1997) to include homology structure that is used in the protein secondary structure information prediction. Subsequently, the protein secondary structure

The BRNN is described in Fig. 2–3. This BRNN involves a set of *i* protein sequences as input *Xi* variable, a forward *Fi* , and backward *Bi* , a chain of hidden variables, and a set of *Oi* as an output variable. The relationship between these variables is implemented using feedforward NN. Three NNs *No* , *N <sup>f</sup>* , and *Nb* are used to implement BRNN. The output *Oi*

The output *Oi* depends on input *Xi* at the position *i*, the forward *Fi* (Chen and Chaudhari, 2007) is the hidden context in the vector *<sup>n</sup> Fi* and the backward *Bi* (Chen and Chaudhari, 2007) is the hidden context in the vector *<sup>m</sup> Bi* where *m n* . To obtain the composite the

( ,,) *O N XFB i o ii i* . (1)

information is divided into three classes: alpha-helices, beta-sheets, and coils.

and Yona, 2004); (2) Methods that depend on known protein structure to identify the protein domain, e.g. AutoSCOP (Gewehr et al., 2007) and DOMpro (Cheng et al., 2006); (3) Methods that used dimensional structure to assume protein domain boundaries, e.g. GlobPlot (Linding et al., 2003), Mateo (Lexa and Valle, 2003), and Dompred-DPS (Marsden et al., 2002); (4) Methods that used comparative model such as Hidden Markov Models (HMM) to identify other member of protein domain family, e.g. HMMPfam (Bateman et al., 2004) and HMMSMART (Ponting et al., 1999); and (5) Methods that are solely based on protein sequence information, e.g. Armadillo (Dumontier et al., 2005) and SBASE (Kristian et al., 2005). However, these methods only produce good results in the case of single-domain proteins.

There is no sign to indicate when a protein domain starts and ends. Protein sequence with closely related homologues can reveal conserved regions which are functionally important (Elhefnawi et al., 2010). Nowadays, it is not only important to detect a protein domain accurately from large numbers of protein sequences with unknown structure, but it is also essential to detect protein domain boundaries of the protein sequence (Chen et al., 2010). Protein domain boundaries are important to understand and analyse the different functions of protein (Paul et al., 2008) as shown in Fig. 1. The difficulty in protein domain prediction lies in the detection of the protein domain boundaries in the protein sequences, since the protein sequences alone contain the structural information but it is only available in small portion along the protein space. The secondary structure provides the sequence information used in protein domain prediction such as the similarity of protein chain, the potential of protein domain region and boundaries. Methods that used secondary structure information in protein domain prediction, such as DOMpro and KemaDom has shown improvement in predicting the protein domain compared to other protein domain predictors.

Fig. 1. An example of constructing a new protein from different protein domain boundaries.

Previously, Neural Network (NN) is used as a classifier to detect protein domain such as in the work of Armadillo, Biozon, Dompred-DPS, and DOMpro. Of late, Support Vector Machines (SVM) is perceived as a strong contender to NN in protein domain classification. Unlike NN, SVM is much less affected by the dimension of the input space and employs structural risk minimization rather than empirical risk minimization. SBASE (Kristian et al., 2005) and KemaDom are examples that apply SVM in protein domain prediction. The results from these methods are more accurate compared to NN.
