Efforts in this area focus on the development of novel algorithms geared towards specific biomedical applications, using both knowledge-based and physics-based approaches. Research is carried out by investigators at the Columbia University Medical Center (Andrea Califano, Aris Floratos, Carol Friedman, Barry Honig, Diana Murray), the Graduate School of Arts and Sciences (Harmen Bussemaker), and the University of Chicago (Yves Lussier, Andrey Rzhetsky). Work is divided across 4 research projects:
(Lead Investigators: Barry Honig, Diana Murray)
The project develops a series of powerful computational tools for the prediction of protein structure and function. These include (a) the Pudge structure prediction server, (b) the SkyLine high throughput homology modeling pipeline, (c) the MarkUs function annotation server, and (d) the SkyBase database, containing SkyLine and MarkUs results. Pudge is an interactive protein structure prediction server, while Skyline generates models for each sequence through PSI-BLAST searches to related structures in the PDB. Since SkyLine models are constructed automatically, they may be less accurate than those from manual application of the most effective modeling tools, as contained in Pudge. Yet, the very construction of a model facilitates the detection of remote sequence relationships. SkyBase uses quantitative reliability criteria for assessing model quality (“modelability”) to decide whether a sequence relationship is likely to be meaningful. Thus, if an apparently good model can be built even if the sequence relationship is uncertain, the model is constructed and deposited in SkyBase for further analysis, for example with MarkUs. MarkUs uses geometric alignments to identify structural neighbors of a query protein, which can then be filtered based on biological criteria, such as common GO annotation, conserved residue patterns, and putative functional sites.
(Lead Investigators: Andrea Califano, Harmen Bussemaker, Andrey Rzhetsky)
The objective of the project is to develop algorithms for the inference of pairwise and multivariate regulatory interactions in cells using information theoretic and machine learning methods, and the extension of these methods to signaling networks. A number of experimentally validated tools for the dissection of regulatory networks in human cells have been implemented, including (a) ARACNe and MINDy for reverse-engineering transcriptional and post-translational networks, respectively, (b) MatrixREDUCE for estimating the free energy parameters that define the sequence specificity of transcription factors (TFs) from expression profile or ChIP-chip data, and for estimating TF protein activity from expression profiles, and (c) CSA, a novel method for the analysis of ChIP-Chip data. These methods have been used to infer genome-wide interactomes for a number of maligant phenotypes (T-cell acute lymphoblastic leukemia, glioblastoma multiforme, and breast cancer). Following successful reconstruction and experimental validation of several regulatory networks, network-based tools were developed to elucidate (a) genetic and epigenetic abnormalities in specific tumor subtypes (e.g., good vs. bad prognosis) (b) mechanism of action of chemical perturbations, and (c) master regulators of development and transformation. These efforts have led to the development of novel algorithms: IDEA (Interactome Dysregulation Enrichment Analysis) and MARINa (Master Regulator Inference algorithm).
(Lead Investigators: Yves Lussier, Carol Friedman)
Statements such as gene Y is a transcriptional target of protein X are not universally true. For instance, they may be true in yeast and drosophila but not in mammalian cells. More importantly, when cells are organized into distinct cellular phenotypes (i.e. a distinct tissue or disease state) these statements may be true or false in a phenotype dependent manner. Finally, at the molecular level, the transcriptional activation of gene Y by protein X may be contingent on protein X being activated by an acetylation or phosphorylation event. Hence, simple integration of the evidence across algorithms and databases will not be useful unless the molecular and cellular contexts are fully accounted for. These issues are being addressed using formal ontologies, across the entire continuum spectrum from the molecular, to the cellular, to the disease-related level.
(Lead Investigators: Aris Floratos, Andrea Califano)
The methods, models, and data produced in the context of all MAGNet Center's activities are made available as interoperable, grid-enabled components of a state-of-the-art bioinformatics platform, geWorkbench.