**6. References**


dynactin p150 gene. Based on these facts and the found exon order of the genomic region,

Our algorithm provides a method to consistently predict and reconstruct tandemly arrayed gene duplicates. It has been integrated into the web interface of WebScipio allowing the search for gene duplicates of a given query protein sequence in the respective genome assemblies. WebScipio provides access to more than 2300 genome assembly files from more than 650 eukaryotes (July 2011) and is updated as soon as further genome assemblies become available whether from newer versions of already sequenced species or from newly sequenced genomes. The search results are presented in drawings coloured according to the sequence similarity of the gene duplicate to the search sequence, and in several humanreadable formats like detailed alignments of the found exons to the genomic DNA. Sequences and figures can be downloaded, as well as the complete raw data for later upload or further computational analysis. The new algorithm is based on the precondition that gene duplicates rather retain the gene structure of the original gene than the sequence. We could show that the new extension to WebScipio is able to correctly predict and reconstruct gene duplicates on both the forward and the reverse strand. Also, the new algorithm is able to correctly reconstruct complicated gene structures spread over hundreds of thousands of nucleotides like the skeletal muscle myosin heavy chain gene cluster in mammals. Gene duplications often accumulate gene function destroying mutations that lead to frame shifts and in-frame stop codons. Those potential pseudogenes are identified by WebScipio but the user has to carefully inspect the results to distinguish between sequencing errors and real pseudogenes. WebScipio cannot distinguish between gene duplicates and duplications of small genomic regions that might encode several genes. Here, WebScipio can identify and reconstruct the duplicates of one gene but does not provide any hints about other genes in the intergenic regions. *Trans*-spliced genes often contain clusters of alternative exons. Those clusters will be identified by WebScipio, but again the user needs to evaluate the results to distinguish between cases of *trans-*spliced genes, where the constitutive part is encoded by just a few exons, or real gene duplications, for which some terminal exons could not be identified because of very low sequence similarity or even assembly gaps. Altogether, WebScipio provides an easy to use way to analyse the genomic region of every gene of

we expect the gene to be *trans*-spliced (Fig. 10, bottom).

interest for the very common event of tandem gene duplication.

thank Björn Hammesfahr for fruitful discussions.

MK has been funded by grant KO 2251/6-1 of the Deutsche Forschungsgemeinschaft. We

Adams, M. D., Celniker, S. E., Holt, R. A., Evans, C. A., Gocayne, J. D., Amanatides, P. G.,

olfactory receptor clusters, *Genome Biol*, Vol.7, No.10, pp. R88

Scherer, S. E., Li, P. W., Hoskins, R. A., Galle, R. F. et al. (2000). The genome sequence of Drosophila melanogaster, *Science*, Vol.287, No.5461, pp. 2185-2195 Aloni, R., Olender, T. & Lancet, D. (2006). Ancient genomic architecture for mammalian

**5. Acknowledgments** 

**6. References** 

**4. Conclusion** 


**5** 

*USA* 

Felix Friedberg

**The LRR and TM Containing** 

*Howard University Medical School Washington, DC* 

**Multi-Domain Proteins in Arabidopsis** 

Thousands of different multi-domain proteins, each the product of one separate specific gene, which exhibit one or multiple transmembrane (TM) domains, are expressed in Arabidopsis as well as in Homo sapiens (Sallman-Almen et al.,2009). These molecules may carry one or multiple TM domains very close to the amino or carboxyl terminus or even dispersed throughout the molecule Thus one may conclude that the TM domain existed prior to the branching of plants and metazoa. In both organisms, a very small percent of the TM domain containing proteins additionally possess multiple leucine rich repeats (LRR) domains. These domains seem to transmit ligand perception. So one should presume that such domains also existed prior to the branching of these two forms of life. (A caveat, however, is in order: Many of the combinations of the various domains which one finds in plants are not present in the same combination in animals and vice versa.). As demonstrated in this paper, subsequently, during evolution, on several occasions, the genes for these multi-domain proteins duplicated and during this process they were often altered slightly to allow generation of proteins that could provide new specific functioning (i.e.they

Unlike the "adaptive immune" system which exists in animals but not in plants, an "innate immune" system is present in all multicellular organisms (animals and plants ). This latter system operates by way of receptors: the Toll-like receptors (TLRs) (first identified in Drosophila ) which bind lipopolysaccharides (endotoxins). In Drosophila, these receptors not only activate innate immunity; they also act in dorsal-ventral specifications. When one compares these molecules as they are encoded e.g. .in Arabidopsis (Dangl & Jones, 2001) with those present in Homo Sapiens, one finds that in animals – but not in plants – these receptor proteins possess a cysteine rich domain just prior to entering the membrane and also a signaling domain (which is not a protein kinase but which acts as a docking site) on the backside of the TM domain. TLRs target "pathogen associated molecular patterns"

There are, however, besides of TLRs, many other multi-domain proteins that contain both TM and LRR domains, encoded in Arabidopsis and also in Homo sapiens. Many exist in both of these two species but some of them are present only in one or in the other. Below we list more than a hundred of these various proteins (each expressed from an individual gene) present in Arabidopsis. The TM domain is an about 22 AA residue domain and the LRR is an about 20-29 residue domain (which contains about 6 Leu ). Both domains are present in proteins that

**1. Introduction** 

underwent neofunctionalization).

(PAMPs) by way of the LRRs domains.

