Welcome to COUGER!
CO-factors associated with Uniquely-bound GEnomic Regions
Most eukaryotic transcription factors (TFs) are members of protein families that share a common DNA binding domain and have highly similar DNA binding preferences. However, individual TF family members (i.e. paralogous TFs) often have different functions and bind to different genomic regions in vivo. A potential mechanism for achieving regulatory specificity is through interactions with proteins co-factors.
COUGER can be applied to any two sets of genomic regions bound by paralogous TFs (e.g., regions derived from ChIP-seq experiments) to identify putative co-factors that provide specificity to each TF. The framework determines the genomic targets uniquely-bound by each TF, and identifies a small set of co-factors that best explain the in vivo binding differences between the two TFs. COUGER uses state-of-the-art classification algorithms (support vector machines and random forests) with features that reflect the DNA binding specificities of putative co-factors. The features are generated either from high-throughput TF-DNA binding data (from protein binding microarray experiments), or from large collections of DNA motifs. We apply a combined feature selection procedure (random forest recursive feature elimination and mRMR - minimum Redundancy Maximum Relevance Feature Selection) to obtain a small subset of non-redundant factors that are most important for distinguishing between genomic regions bound by the considered pair of paralogous TFs.