MalSig

MalSig - Malaria Secretory Signal Predictions

During intra-erythrocytic growth of the malaria parasite, a number of parasite-encoded proteins are exported from the parasite and a sub-set is trafficked to the erythrocyte cytoplasm. The putative signal sequences of some exported parasite proteins are unusual in that they are recessed from the N-terminus, with hydrophobic cores beginning 20-80 amino acids from the N-terminus.

MalSig has been developed by Erica Logan in collaboration with Prof Leann Tilley and Dr Richard Hall. It uses a fuzzy logic system that analyses the first 100 residues in a sequence and makes predictions based on hydrophobicity to identify classical and non-cannonical recessed secretory signals of Plasmodium falciparum proteins. Enter one or more sequences in FASTA format into the text-box below. The output represents the likelihood of the protein containing a classic signal, a recessed signal and a transmembrane domain. Prediction values range from 0 to 1. A prediction of 0.5 or more is considered significant, and when this occurs it is highlighted. For more information on this algorithm please click here.

Enter sequences in FASTA format:


 
OR

Enter FASTA file name:  

Advanced Options:
For more information please see the Help section in accordion below.

    obtain PSORT prediction
    obtain SignalP prediction
    obtain PlasmoAP prediction
    use entire sequence cleave N-terminus (before hydrophobic region) for external predictions

     
Biochem
  • Links

    Useful web resources for Plasmodium falciparum

    PlasmoDB: the Plasmodium Genome Resource

    PlasmoAP: apicoplast target peptide predictions for Plasmodium falciparum

    PATS: apicoplast target peptide predictions for Plasmodium falciparum

    PlasMit: mitochondria target peptide predictions for Plasmodium falciparum

    PSORT: protein sorting signal and localisation predictions

    SignalP: signal peptide cleavage site predictions

    TMHMM: transmembrane helix predictions

  • References

    MalSig has been developed by Erica Logan, Nick Klonis, Sue Herd, Leann Tilley, Richard Hall, La Trobe University, Australia.

    A manuscript describing the development and use of this algorithm has been published:
    Logan E, Hall R, Klonis N, Herd S, and Tilley L. (2004)
    Fuzzy classification of secretory signals in proteins encoded by the Plasmodium falciparum genome.
    In: Knowledge-based Intelligent Information & Engineering Systems Proceedings (KES 2004):
    Lecture Notes in Artificial Intelligence. (Negoita, M. Howlett, R.J, Jain, L.C. eds.), Springer-Verlag, pp. 1023-1029. http://www.springerlink.com/index/3HW804QXUHDRXK36

    Any work using predictions made by this prediction server incorporating PSORT, SignalP, or PlasmoAP should cite the following references:

    PSORT
    Nakai K, and Horton P. (1999) PSORT: a program for detecting the sorting signals of proteins and predicting their subcellular localization. Trends Biochem Sci 24(1): 34-35.

    SignalP Neural Network (SignalP-NN)
    Nielsen H, Engelbrecht J, Brunak S and von Heijne G. (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Engineering 10(1): 1-6.

    SignalP Hidden Markov Model (SignalP-HMM)
    Nielsen H and Krogh A. (1998) Prediction of signal peptides and signal anchors by a hidden Markov model. In Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology (ISMB 6), AAAI Press, Menlo Park, California, 122-130.

    PlasmoAP
    Foth BJ, Ralph SA, Tonkin CJ, Struck NS, Fraunholz M, Roos DS, Cowman AF, McFadden GI. (2003) Dissecting Apicoplast Targeting in the Malaria Parasite Plasmodium falciparum. Science 299(5606): 705-708.
  • Help

    Instructions

    • Enter one or more sequences in FASTA format (this should include an identification line beginning with ">" and followed by the protein amino acid sequence). The sequences can be typed in, or a file containing FASTA sequences can be uploaded, using the "Browse..." button.   e.g.
      >myFirstProtein
      mmaarefdlnssslhhhrsdemmmaalalalaalhrdrdrdhdhdhdhd

      Sequences should begin with M. As the aim of this program is to predict classical and non-canonical secretory signals at or near the N-terminus of a protein it is important that the N-terminal region of the protein sequence is present.

      Any characters within the protein sequence that are not contained in the 20 amino acid single letter abbreviations 'acdefghiklmnpqrstvwy' will be ignored by the program (this includes numbers and spaces).
    • Select any advanced options required.
    • Click on the "Submit sequences" button.
    • A results table will be displayed showing predictions made for your proteins sequences. Note that this may take a little time if the advanced options are chosen or many sequences are entered.

    Advanced Options

    External predictions

    As another source of information about possible secretory signals, additional information can be obtained from PSORT, SignalP and PlasmoAP. The protein sequences are sent externally to these sites and cleavage sites (or in the case of PlasmoAP apicoplast target peptide predictions) are returned for each sequence and displayed in a table alongside fuzzy predictions. This increases the reliability of predictions made, but also increases the time the web page takes to display the predictions.

    N-terminal cleavage

    Recessed secretory signals are not predicted well by standard secretory signal prediction algorithms. To improve these predictions the N-terminal region before the hydrophobic region can be removed before the sequence is sent to the external prediction servers. Only use this option if you are interested the presence of a recessed secretory signals in your protein sequences.

    Explanation of Output

    Fuzzy Logic Predictions

    Four columns are present in the output table from the fuzzy logic prediction. The first three represent the likelihood of the protein containing a classic signal, recessed signal, and transmembrane domain. These range from 0 to 1. A prediction of 0.5 or more is considered significant, and when these occur they are highlighted. These three predictions are summarised textually in the column titled "Prediction".

    Low Complexity and Hydrophobicity

    The low complexity scale represents the percentage of low complexity regions, identified using SEG (Wooten and Federhen, 1996). The average hydrophobicity is calculated using the sum of the hydrophobicity values (from the Kyte and Doolittle scale) after removing membrane-spanning regions and signal sequences. These scales were used in our analysis to identify potential antigens.

    External Predictions

    Cleavage site predictions are made by PSORT and SignalP. SignalP makes two predictions, one using neural networks (NN) and one using a hidden markov model (HMM). The values presented for these cleavage sites are either amino acid positions or -/*, the latter representing problems in collecting results from the prediction server. In some cases SignalP doesn't predict a cleavage site. For the NN this results in a blank output or a -, and for the HMM a -1 is displayed.

    PlasmoAP Symbol Definition

    PlasmoAP makes apicoplast targeting predictions in terms of +,- and 0. These are defined as:
    ++   very likely apicoplast
    +     likely apicoplast
    0     unknown
    -     unlikely apicoplast

    Errors

    If there is a problem with the input, such as the sequence not beginning with M, * will be displayed in the fuzzy predictions and the error message will be in the prediction column.

    If there is a problem with predictions from an external source - or * will be displayed in the column for that prediction. For proteins where this occurs try going to the web page of the external program and running your query from there (for the address see the Links page).

    System details

    This prediction server uses a Fuzzy logic algorithm, with Mamdani-style fuzzy inference(Mamdani and Assilian, 1975). The system is composed of six membership functions, three for input and three for output, and 20 rules. It uses centroid defuzzification.

    The fuzzy logic prediction is made via a stand-alone fuzzy C-file provided by MATLAB, and Python is used to process sequences, obtain predictions, and present results.

    When a protein sequence is entered the program first generates a hydrophobicity plot and uses this to calculate values for where the hydrophobic region starts, its length, and the maximum value of hydrophobicity in the region. This uses a Kyte and Doolittle hydropathy plot with window size 15 (Kyte and Doolittle, 1982). These values are then used as inputs to generate a prediction using fuzzy logic.

    References

    • Kyte J, Doolittle RF. (1982) A Simple Method for Displaying the Hydropathic Character of a Protein. Journal of Molecular Biology 57: 105-132.
    • Mamdani EH, Assilian S. (1975) An experiment in linguistic synthesis with a fuzzy logic controller. Journal of Man-Machine Studies 7: 1-13.
    • Wooten JC, Federhen S. (1996) Analysis of compositionally biased regions in sequence databases. Methods in Enzymology 266: 554-71.