PEPPER - Help   


Description of PEPPER

PEPPER identifies three types of periodic repeats: It has no limitations on sequence size and supports batch submissions (500kb limitation on the server).
The source code is freely available for download for larger batch submissions.

Explanation of Parameters:

1. Threshold, t:

For Identifying TPRs: In the first module, while self-comparing the query sequence in a sliding window of size 10, threshold t defines the minimum number of exact matches to be considered for two windows to be similar. The default value of t is 5, which allows for 50% mismatch between windows.
For Identifying POSAAs and SAARs: Here, threshold t corresponds to (1 - percentage mismatch) allowed in the final output. The default value, t = 8, corresponds to 20% mismatch tolerated in the complete repeat region reported.

2. Tuple size, k:

For Identifying TPRs: In the second module, to find the period (i.e., length) of the repeat pattern, a sliding window of size k is used (default k = 3). For finding larger repeat patterns, the value of k may be increased.
For Identifying POSAAs and SAARs: It is an internal parameter in the identification of POSAAs & SAARs and cannot be changed by the user

3. Copy Number,n:

For Identifying TPRs: For a givenn, the program will report an output only if the pattern is found to be occurring minimum n number of times at a given locus. Default n=4 will report all tandem repeats occurring 4 or more times contiguously at a given locus.
For Identifying POSAAs and SAARs: Here,n corresponds to the minimum number of occurrences of an amino acid at a certain period to be reported in the output. Default n=5.


Sample Input Formats:

The input to the program is in the FASTA format for a single file (as shown below).

Single Sequence Input:
top
Batch Submission:
top

*For batch submissions, the size limit for uploading the sequence on the server is 500Kb. For larger sizes, the tool can be downloaded


Sample Output Files:

Output for Tandem Peptide Repeats:

The output of PEPPER displays the parameters used in the analysis and one-line description of the protein sequence used. The result section includes the Consensus pattern; Count - the copy number of the repeat pattern; Period - the length of the repeat pattern; Start/End - the repeat region boundaries; and the alignment of the consensus pattern with the repeat region. In the alignment, the first line corresponds to the repeat region, while the second line corresponds to the consensus pattern. The '*' represents the mismatches in the repeat region, while '-' corresponds insertions/deletions.

top

Output for Multi-Periodicity in Tandem Peptide Repeats:

It may happen that a repeat region may have more than one periodicity in the same region. PEPPER reports up to a maximum of three periods in a region (see below). For each period detected, the output gives the Consensus pattern; Count - the copy number of the repeat pattern; Period - the length of the repeat pattern; Start/End - the repeat region boundaries; and the alignment of the consensus pattern with the repeat region.

top

Output for Periodic Occurrences of Single Amino Acids:

In this case the output displays the parameters used for the analysis and AA - gives the repeated amino acid; Start/End - gives the repeat region boundaries; Periodicity - the period of repeat; Count - number of times the amino acid is repeated. The complete repeat region is displayed with the repeated amino acid in 'red' and mismatches shown in 'blue'

top

Output for Single Amino Acid Repeats:

In this case the output displays the parameters used for the analysis and Pattern - gives the repeated amino acid; Count - number of times the amino acid is repeated; the Start/End - the boundaries of the repeat region; followed by the complete repeat region. The '*' corresponds to the mismatches in the repeat region.

top