- Open Access
- Total Downloads : 25
- Authors : Raazia Rahim, Nijil Raj N
- Paper ID : IJERTCONV5IS16010
- Volume & Issue : NCETET – 2017 (Volume 5 – Issue 16)
- Published (First Online): 24-04-2018
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License: This work is licensed under a Creative Commons Attribution 4.0 International License
Method for Predicting Transmembrane Helices in Protein Sequences
Raazia Rahim (PG Scholar) Department of computer Science and Engineering
Younus college of Engineering and Technology Pallimukku , Kollam ,691010
Prof Nijil Raj N
Head Of The Department Department of computer Science and
Engineering
Younus college of Engineering and Technology Pallimukku , Kollam ,691010
AbstractThe increasing protein sequences from the genome project require the oretical methods to predict transmembrane helical segments (TMHs). So far, several prediction methods have been reported, but there are some deficiencies in prediction accuracy and adaptability in these methods. Here, a method based on discrete wavelet transform (DWT) has been developed to predict the number and location of TMHs in membrane proteins,80 proteins with known 3D structure from Mptopo database are chosen at random as data sets (including 325 TMHs).TMHs prediction is carried out for the membrane protein sequences and obtain satisfactory result. To verify the feasibility of this method, 80 membrane protein sequences are treated as test sets, 308 TMHs can be predicted and the prediction accuracy is 96.3%Compared with the other prediction results , the obtained results indicate that the proposed method has higher prediction accuracy.
Index TermsMembrane Protein, Transmembrane He- lices,Hydrophobicity.
-
INTRODUCTION
A transmembrane protein (TP),which is a type of integral membrane protein that spans the entirety of the biological membrane to which it is permanently attached. Many transmembrane proteins function as gateways to allow the transport of specific substances across the biological membrane. They undergo significant changes to move a substance through the membrane.
Transmembrane proteins are polytopic proteins that aggregate and precipitate in water. They require detergents or nonpolar solvents for extraction, although some of them (beta-barrels) can be also extracted using denaturing agents.
The other type of integral membrane protein is the integral monotopic protein that is also permanently attached to the cell membrane but does not pass through it.
The knowledge of the function of membrane protein itself has been expanded enormously and deeply, and the more study of it can be used as a breakthrough of studying pro- tein structure and function and the genetic information in DNA sequence. In order to explore the relationship between membrane protein structure and function, understand various
work mechanism in membrane protein life activities, bioin- formatics methods and techniques of developing the study of membrane protein are needed. In the genome data, a large portion (about 20%-30%) of proteins in a genome encodes membrane protein [1-3] , the proportion of such shows the importance of membrane protein in biology. Membrane protein, especially transmembrane protein has very important function in organism, such as photosynthesis, respiration, neural signaling, immune response, nutrient absorption and so on, and it is also the important drug target. Of the drug target known and being researched is about 70% of the membrane protein [4]. Here introduces a method for predicting
Fig. 1. Schematic representation of transmembrane
the Transmembrane Helices(TMH) in a protein sequences. The introduced method make use of hydrophobocity values of the aminoacid sequences and a wavelet transform.We are hoping better prediction accuracy for the method when compared to all other existing method
-
LITERATURE SURVEY
Many transmembrane helical segments (TMHs) predicting algorithms for membrane proteins have been proposed. In 1982 Kyte and Doolittle firstly suggested a hydrophobicity analysis method of membrane protein sequences [5]. Thereafter von Heijne putforward the well-known positive- inside rule to guide prediction in 1986 [6]. SOSUI [7], PRED-TMR [8] were based on the foregoing two methods.
In recent years, some statistical methods have been developed that like DAS [9], TMAP [10332], neural networks PHDhtm [10, 11], TMHMM [1, 12] and HMMTOP [13,14] based on
hidden Markov model, MEMSAT-SVM prediction method based on support vector machine [15, 16].
Wavelet transform was first introduced into bioinformatics research in 1996 [17] and raised extensive attention immediately [18. Lio et al [19] proposed a non-parametric method based on a wavelet data-dependent threshold technique for change-point analysis which was applied to predict TMHs in membrane proteins. Continuous wavelet transform (CWT) for predicting the number and location of helices in membrane proteins is presented by Qiu et al. Pashou et al applied a dynamic programming algorithm on wavelet-denoised hydropathy signals to determine membrane spanning segments.
Here we make full use of the hydrophobicity of amino acids and multiresolution feature of discrete wavelet transform (DWT) to decompose the amino acids of TM proteins into a series of structures in different layers, then predicting the location of TMHs according to the information of the amino acids sequence in different scales.
-
MATERIALS AND METHOD
TABLE I
MEMBRANE PROTEIN FAMILY USED FOR PREDICTION
Family Name
PDB Code
Bacteriorhodopsin
1ap9
ABC transporters
1jsq
1l7vA
1pf4
Channel proteins
1fqyA
1fx8A
1msl
1mxm
1oedA
1oedB
1oedC
1oedE
1p7b
1rc2A
1rhzA
1rhzB
Cytochrome bc1 complexes
1bgyE
1bgyJ
1bgyK
Cytochrome b6f complexes
1um3A
1um3B
1um3D
1um3F
1um3G
1um3H
Cytochrome c oxidases
1ehkA
1ehkB
1ehkC
1occA
1occB
1occC
1occD
1occG
1occI
1occJ
1occK
1occL
1occM
1qleA
1qleB
1qleC
1qleD
Glycophorin
1afoA
Light-harvesting complexes
1kzuA
1lghA
Photosynthetic reaction centers
1eysH
1eysL
1eysM
1prcH
1prcL
1prcM
2rcrL
2rcrM
Photosystems
1jboA
1jboB
1jboF
1jboI
1jboJ
1jboK
Respiratory proteins
1a91C
1fftA
1fftB
1fftC
1fumC
1kqgB
1kqgC
1lovD
1nekC
1nekD
1okcA
1q16C
1qlaC
Rhodopsins
1f88
1psB
1p8A
Translocation proteins
1pw4A
1s7b
2cpb
-
Materials
The test data set is collected from the MPtopo database, which consist of a set of membrane protein structure which can be treated as reliable samples.The test dataset consist of 80 protein sequences with known 3D structure.The data can be obtained from http://blanco.biomol.uci.edu/mptopo.[23]
-
Method
The feature of protein structure is the balance between hydrophobic and hydrophilic and the structure stability depends heavily on molecules hydrophobic effects [20-21]. The determination of hydrophobic value of amino acid is mainly calculated according to distribution coefficient in which various amino acid . So when we map the amino acid sequence of protein onto a sequence of hydrophobicity, we need to optimize a variety of different hydrophobic parameters. Here, we use KD hydrophobic parameter values.
In order to predict TMHs of membrane protein sequence, with the condition of selecting the suitable wavelet basis functions and threshold are important.The threshold here is determined by the maximum average prediction accuracy of training set. Using this threshold, we are able to predict TMHs among membrane protein sequences from test set.
Procedure is as follows:
-
According to their hydrophobic amino acid value, con- vert 80 amino acid sequence of membrane protein into a sequence of hydrophobicity value.
-
Randomly choose the training set from the 80 protein sequences and rest of them are considered as test data.
-
According to the data of training set, analyze and determine wavelet function.
-
Discrete Wavelet Transform (CWT) is used to find out the wavelet coeffient and the optimized threshold.
-
Get prediction result by predicting samples of the test set, and do statistics and analysis of the precision of prediction compared with experimental data.
This method is implementing in MATLAB for the con- vienence.
-
-
Evaluation of Result
Due to the limitations of experimental condition, we hope that the predicted TMHs are regarded as correct when over half of the predicted TMHs coincide with the observed TMHs. In statistical analysis, the average length of TMHs is 20
a.a. In this method, we decide that predicted TMHs are correct when at least 9 continuous residues are contained in the observed TMHs. The prediction accuracy of TMHs, Qp = M C 100 , where M=Ncor/Nobs (Ncor stands for the number of correctly predicted TMHs, Nobs stands for the number of observed TMHs);M can be regard as a measure index of sensitivity; C=Ncor/Nprd (Nprd stands for the total number of predicted TMHs), C is regarded as a measure index of specificity.
For sequence of KD hydrophobic parameters, db10 is used as optimal wavelet basis. At scale level j=4, data of each group
of training set obtained the corresponding optimal threshold. In the test set, we use the threshold 0.836, get maximum average prediction accuracy of the membrane protein TMHs being 95.8%, maximum average prediction accuracy of residue being 83.1%.
Fig. 3. Prediction accuracy for each group of training set and test set of KD hydrophobic parameters.
Fig. 4. ROC curve of the proposed method
-
-
CONCLUSION
Transmembrane proteins are polytopic proteins that aggregate and precipitate in water. They require detergents or nonpolar solvents for extraction, although some of them (beta-barrels) can be also extracted using denaturing agents. Here introduces a method for predicting the Transmembrane Helices(TMH) in a protein sequences. The introduced method make use of hydrophobocity values of the aminoacid sequences and a wavelet transform.We are hoping better prediction accuracy for the method when compared to all other existing method.
Although the proposed method has the characteristics of simplicity, visual process, and high accuracy, through the analysis of the predicted results of data set, it is found that compared with the actual structure of membrane protein, there still exist some differences in the position and number of the
predicted TMHs. They are as follows: too much prediction, that is the position and number of the predicted TMHs and the actual structure is not completely corresponding; less prediction, i.e. it havent predicted all the position and number of the actual structure of TMHs. This is because: (1) When using wavelet transform, we just map the amino acid sequence of membrane protein into hydrophobic value sequence.The hydrophobic effect is the most important factor to determine the stability of protein structure, it is not the only factor. Beyond the hydrophobic effect, there are hydrogen bond, ionic bond and van der Waals force and disulfide bond of peptide chain, etc; (2) The volume of protein molecules, electric charge and many kinds of factors all have the regulation effects on the protein structure and stability; (3) Based on the signal peptide hypothesis, the signal peptide can form TMHs in pro- tein synthesis, auxiliary peptide chain across the endoplasmic reticulum (ER) membrane, so in forecasting TMHs, it is very normal that signal peptide is contained. Because hydropho- bicity is the main sequence characteristic of transmembrane helices, and there are likely to be long hydrophobic sequence in the hydrophobic core of water-soluble globular protein, which also can produce false positive results.When considering many kinds of factors, and when predicting the position and number of membrane protein TMHs by mapping the amino acid sequence into hydrophobic value sequence, the deviation within the scope is .allowed. If the above many factors are considered, the prediction accuracy can be improved.
REFERENCES
-
Krogh A, Larsson B, von Heijne G, Sonnhammer EL. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol. 2001; 305: 567-580, .
-
Liu J, Rost B. Comparing function and structure between entire proteomes. Protein Sci. 2001; 10: 1970-1979.
-
Kihara D, Shimizu T, Kanehisa M. Prediction of membrane proteins based on classification of transmembrane segments. Protein Engin. 1998; 11: 961-970.
-
Yildirim MA, Goh KI, Cusick ME, Barabasi AL, Vidal M. Drug-target network. Nature Biotechnology. 2007; 25: 1119-1126.
-
Kyte J, Doolittle RF. A simple method for displaying the hydrophathic character of a protein. J Mol Biol. 1982; 157: 105-132.
-
von Heijne G. The distribution of positively charged residues in bacterial inner membrane proteins correlates with the transmembrane topology. EMBO J. 1986; 5: 3021-3027.
-
Hirokawa T, Boon-Chieng S, Mitaku S. SOSUI: classification and secondary structure prediction system for membrane proteins. Bioinformatics. 1998; 14: 378-379.
-
Pasquier C, Promponas VJ, Palaios GA, Hamodrakas JS, Hamodrakas SJ. A novel method for predicting trsnsmembrane segments in proteins based on a statistical analysis of the SwissProt database: the PRED-TMR algorithm. Protein Eng. 1999; 12: 381-385.
-
Cserz M, Wallin E, Simon I, von Heijne G, Elofsson A. Prediction of transmembrane alpha-helices in prokaryotic membrane proteins: the dense alignment surface method. Protein Eng. 1997; 10: 673-676.
-
Persson B, Argos P. Prediction of transmembrane segments in proteins utilizing multiple sequence alignments. J Mol Biol. 1994; 237: 182-192.
-
Rost B, Casadio R, Fariselli P. Topology prediction for helical transmembrane segments at 86% accuracy. Protein Sci. 1996; 5: 1704- 1718.
-
Rost B, Casadio R, Fariselli P, Sander C. Prediction of helical transmembrane proteins at 95% accuracy. Protein Sci. 1995; 4: 21-533.
-
Sonnhammer EL, von Heijne G, Krogh A. A hidden Markov model for predicting transmembrane helices in protein sequences. Proc Int Conf Intell Syst Mol Biol. 1998; 6: 175-182.
-
Tusnady GE, Simon I. Principles governing amino acid composition of integral membrane proteins: application to topology prediction. J Mol Biol. 1998; 283: 489-506.
-
Tusnady GE, Simon I. Topology of membrane proteins. J Chem Inf Comput Sci. 2001; 41: 364-368.
-
Nugent T, Jones DT. Transmembrane protein topology prediction using support vector machines. BMC Bioinformatics. 2009; 10: 159.
-
Nugent T, Jones DT. Detecting pore-lining regions in transmembrane protein sequences. BMC Bioinformatics. 2012; 13: 169.
-
Altaiski M, Mornev M, Polozov R. Wavelet analysis of DNA sequence. Genet Anal. 1996; 12: 165-168.
-
Hirakawa H, Muta S, Kuhara S. The hydrophobic cores of proteins predicted by wavelet analysis. Bioinformatics. 1999; 15: 141-148.
-
Li P, Vannucci M. Wavelet change-point prediction of transmembrane proteins. Bioinformatics. 2000; 16: 376-382.
-
Eisenberg D, Mclachlan AD. Solvation energy in protein folding and binding. Nature. 1986; 319: 199-203.
-
Huang DS, Xing-Ming Zhao XM, Huang GB,Cheung YM. Classifying protein sequences using hydropathy blocks. Pattern Recognition. 2006; 39: 2293-2300.
-
Ikeda M, Arai1 M, Okuno T, Shimizu T. TMPDB: a database of experimentally- characterized transmembrane topologies. Nucleic Acids Research. 2003; 31: 406-409.