Table of Contents

  1. COUGER input files
  2. COUGER feature sets
  3. COUGER run
  4. COUGER output

COUGER input files

The primary input for COUGER is represented by two sets of genomic regions bound by (paralogous) TFs, each in a separate file. These files can be in one of the following formats:

The sequences in each input file have to be sorted by a measure of their significance (i.e. p-value)!

COUGER will process slightly differently each type of input file. In the case of FASTA files, COUGER will skip the first step, considering that the classes are predetermined. Thus only the nucleotide sequences will be read. For all other input formats, COUGER will start with Step 1, removing from each set the sequences that have an overlap with any sequence from the other set. Moreover, if narrowPeak files are uploaded, COUGER will focus the search for putative co-factor binding sites by trimming the peaks to +/-100 bases centered at the peak summit, which will reduce the running time and improve the results. Thus, narrowPeak is the recommended format for the input sets of sequence.

The first step in submitting a job to COUGER is to select the appropriate format for your input files:

COUGER documentation - select format

If the selected format is not FASTA, then a genome assembly is required. COUGER provides 5 different versions of the human genome, 4 versions of the mouse (Mus musculus) genome, and 3 versions of the fly (Drosophila melanogaster) genome:

COUGER documentation - select genome

This web server enforces a restriction of maximum N sequences per class, due to time and resources constraints. If the number of unique targets exceeds the threshold, then COUGER will run using only the top N sequences. For this threshold, N, a value from 300 to 1000 can be selected. We recommend small datasets for initial tests of the web server on new pairs of paralogous TFs, followed by more comprehensive tests using larger sets of sequences. We also note that for more that 1000 sequences per class the running time increases significantly, without much gain in the results.

COUGER documentation - select limit

Next step is to upload the input files and, optionally, to provide the name of the TFs:

COUGER documentation - upload files

Alternatively, it is possible to skip these first steps by selecting an existing example.

COUGER documentation - select an example

[ Back to top ]

COUGER feature sets

For classification, COUGER computes features that reflect the DNA binding specificities of putative co-factors. These features are generated from one of two types of data sets:

  • PBM data (data from protein binding microarray experiments)
  • PWM data (position weight matrices)

After providing the input files, the next step is to select the desired set of features. COUGER provides the following choices (first six for human or mouse, and last tree for fly data):

  • PBM data from UniPROBE — 429 TFs with PBM data from UniPROBE database
  • PWMs UniPROBE — PWMs derived from PBM data in UniPROBE database
  • PWMs from TRANSFAC — 1226 PWMs from TRANSFAC database
  • PWMs from HT-SELEX data — 239 PWMs derived from HT-SELEX data by Jolma et al, Cell 2013 (human and mouse)
  • PWMs from JASPAR CORE vertebrata — 205 PWMs from JASPAR database
  • PWMs from UniPROBE & HT-SELEX & JASPAR CORE vertebrata — all 876 PWMs from the three databases
  • PWMs from TRANSFAC — 1226 PWMs from TRANSFAC database
  • PWMs from JASPAR CORE insecta — 205 PWMs from JASPAR database
  • PWMs from TRANSFAC & JASPAR CORE insecta — all 1357 PWMs from the two databases
COUGER documentation - select feature set

After selecting the desired set of features, COUGER allows the removal of some of the factors from that set. The use of this option is recommended if COUGER was ran before on the data and TF1 and/or TF2 were among the factors most relevant for the classification.

COUGER documentation - remove factor(s) from feature set

The selection of TFs to be removed is performed in a separate form, where is possible also to see all the files that correspond to the selected set of features.

COUGER documentation - remove factor(s) from feature set

In order to find the TFs files that will be romoved, a short text can be entered in the "Filter" field.

COUGER documentation - remove factor(s) from feature set

Then the desired factors can be added to the exceptions list. The filtering and adding to exceptions steps can be repeated. When the list is complete, the "Save and close" button can be used to return to the previous screen. Note that COUGER does not support the removal of more than 20 files.

COUGER documentation - remove factor(s) from feature set

After that, Data Submission Form shows the list of files selected for removal.

COUGER documentation - remove factor(s) from feature set

[ Back to top ]


Finally, after providing all data required by the Data Submission Form, the COUGER job can be submitted:

COUGER documentation - submit job

Immediately after this, a new page will be loaded, and the status of the job will be displayed:

COUGER documentation - starting to run

Also, the execution of the job is displayed in the form of the messages produced by COUGER:

COUGER documentation - showing status

[ Back to top ]

COUGER output

When the job is completed, the results will be showed in the status page, and an e-mail will be send if an e-mail address was provided. The results can be downloaded as a zip file (this is recommended since they will be discarded from the web server after 48 hours).

COUGER documentation - results page

The main plot represents the median accuracies (over 5 runs) for each type of algorithm used and each set of selected features. These accuracies are presented in a heatmap-like color coded table, where green corresponds to the minimum value, and red to the maximum. Also, the result page offers an interactive mode to view three types of detailed plots:

  • Boxplots of accuracy (over 5 runs)
  • Feature heatmaps
  • Accuracies for individual runs

COUGER documentation - results option
COUGER documentation - results option
COUGER documentation - results option

[ Back to top ]