Skip Navigation [You are reading this message either because you can not see our css files, or because you do not have a standards-compliant browser.]

The central theme of the Center is the multiscale analysis of cellular networks. This theme is manifested in seven Driving Biological Projects (DBPs) that target broad areas of basic Biology research, including

(a)
tackling the issue of biomolecular interaction directly, at the structural and physiochemical level,
(b)
constructing a context-specific map of cellular interactions, and
(c)
using such a map to dissect complex diseases.

The biological questions posed by the DBPs generate the requirements that drive the biomedical computation research carried out by the Center:

COMPUTATIONAL SCIENCES

Efforts in this area provide critical expertise in the advancement of theoretical knowledge-based methods that are then applied to the solution of specific biomedical problems. Research is carried out by investigators in Columbia's School of Engineering and Applied Science (SEAS) and the Columbia University Medical Center (CUMC). Of the SEAS researchers, Christina Leslie and David Waltz are at the Center for Computational Learning Systems (CCLS); Rocco Servedio, Yechiam Yemini, Gail Kaiser and Kenneth Ross are in the Computer Science Department (CS); and Chris Wiggins is a faculty member in Applied Physics and Applied Math (APAM). Of the CUMC researchers, Andrea Califano, Carol Friedman, and Yves Lussier are faculty members in the Department of Biomedical Informatics (DBMI). Research projects follow three leading themes:

  1. Machine Learning (ML)

    : CCLS specializes in Machine Learning (ML) theory, algorithms, and applications, and includes developers of two of the most important modern large-margin ML methods, Support Vector Machines SVMs (Vapnik) and boosting (Freund). Participating investigators are broadly versed in all modern ML methods, including PAC learning, random forest learning, kernel-based methods, kNN methods, bottleneck methods, information-theoretic methods, clustering/module-discovery methods, and graphical methods. They have been innovators in the use of ranking rather than classification for more accurate prediction, learning in the absence of a "gold standard" for training, and training using weighted and uncertain evidence. Projects in ML are divided across three separate topics:
    • Protein Function, Structure, and Interactions

      : This include the development of algorithms for
      (a)
      evidence integration (Peer), design of SVM kernels (Bio-Kernels), and
      (b)
      identification and classification of pockets on proteins structures (Pockets).
    • Reverse Engineering of Gene Regulatory Networks

      : This includes an information-theoretic algorithm (ARACNE) and two Boosting algorithms (GeneClass and MEDUSA) that integrate sequence and expression data to learn regulatory interactions predictive of mRNA expression data.
    • Network-Theoretic Analyses

      : These include a graph-diffusion-based method for protein similarity analysis (RankProp), large-margin ML methods for inferring evolutionary mechanisms from biological network topologies (NetClass), and parameter-free algorithm for organizing networks into modules (InfoMod).
  2. NLP and Ontologies

    : The lead investigators (Friedman & Lussier) are innovators and leaders in the use of Natural Language Processing (NLP) for extracting biological knowledge from text databases. Their research in new NLP methods directly impacts the Reverse Engineering and the Phenotypes projects. Ontologies are also used to define complex biomedical informatics concepts and their relationships for component interoperability and interface design. The key effort here is to bridge the gap between the NLP systems and the standard phenotypes schema and ontologies specified by the biological community. The NLP projects build on the MedLEE system that processes patient reports; GENIES, that captures biomolecular interactions from the literature, and BioMedLEE, that captures genotypic-phenotypic relations associated with the underlying causes and treatments of diseases.
  3. Large Scale Systems

    : CS systems researchers (Yemini, Kaiser and Ross) are experts in interoperability, complex distributed systems, and database technologies. They are innovators in modern software engineering technologies including object-oriented languages, self-diagnosing and self-healing systems, and publish-subscribe (pub-sub) technology. Dr. Califano has been involved in a number of academic and industrial large-scale software development efforts, including the development of caWorkbench which will constitute the foundation of the MAGNet Center bioinformatics platform. He leads Columbia University's activities in caBIG, the NCI-sponsored effort to establish a grid of interoperable bioinformatics services for cancer research. The main goal of this area is to develop a formal Biomedical Informatics Structured ONtology (BISON) for the representation of bioinformatics data-structure and data-structure transformations (algorithms, applications, tools).
     
     

COMPUTATIONAL BIOLOGY AND BIOMEDICAL INFORMATICS SCIENCES

Efforts in this area target the development of novel algorithms geared towards specific biomedical applications using both knowledge-based and physics-based approaches. Research is carried out by investigators in Columbia University Medical Center (CUMC) in collaboration with researchers from Columbia's School of Engineering and Applied Science (SEAS). All investigators are affiliated with the Center for Computational Biology and Bioinformatics (C2B2). Drs. Barry Honig and Burkhard Rost are in the Department of Biochemistry and Molecular Biophysics (CUMC), Dr. Dianna Murray is at the Department of Pharmacology, Drs. Andrea Califano, Carol Friedman, Andrey Rzhetsky, Yves Lussier, and Dennis Vitkup are in the Department of Bio-Medical Informatics (CUMC), Dr. Bussemaker is in the Department of Biological Sciences (SEAS), and Dr. Chris Wiggins in the Department of Applied Physics and Applied Math (SEAS). Research projects are organized around four leading themes:

  1. Sequence and structure based annotation of protein function

    (specifically protein-protein interactions): In the context of the Northeast Structural Genomics Consortium (NESG), the Honig, Rost and Murray groups are clustering protein sequences into individual domain families, and using structural information to annotate each of these clusters in terms of biological function. They have also developed methods for a new structure prediction pipeline which generates homology models once the structure of one or more members of a sequence cluster has been determined. Building on this ongoing research, MAGNet-specific activities include the development of new sequence and structure-based approaches for functional annotation and protein-protein interaction analysis. The new algorithms make use of evidence integration methods and have been integrated into the Center's software platform (geWorkbench) providing a unified suite of programs.
  2. Cellular interaction reverse engineering algorithms

    : By leveraging the core Computational Sciences methods, we have implement a variety of tools for the inference of molecular interactions in the cell. These include protein- DNA, protein-protein, and protein-mRNA interactions as well as the interaction of small molecules with any of these macro-molecular structures. In particular, we have implemented algorithms for the reverse engineering of cellular interactions from experimental and literature data using regression, NLP, and information theory. These methods are used
    (a)
    to create a cellular network Knowledge-base,
    (b)
    to identify regulators responsible for activating and deactivating specific interactions (e.g. a kinase activating the transcriptional interaction between a transcription factor and a target gene, via phosphorylation of the TF), and
    (c)
    to identify modular control structures conserved across distinct cellular states or types.

    Significant emphasis is placed on two integration activities:
    (a)
    the output of multiple algorithms is integrated into a single knowledge-base and
    (b)
    specific algorithms (i.e. REDUCE and GeneClass) are also integrated.
  3. Using cellular and molecular phenotypes for context filtering

    : Statements such as gene Y is a transcriptional target of protein X are not universally true. For instance, they may be true in yeast and drosophila but not in mammalian cells. More importantly, when cells are organized into distinct cellular phenotypes (i.e. a distinct tissue or disease state) these statements may be true or false in a phenotype dependent manner. Finally, at the molecular level, the transcriptional activation of gene Y by protein X may be contingent on protein X being activated by an acetylation or phosphorylation event. Hence, simple integration of the evidence across algorithms and databases will not be useful unless the molecular and cellular contexts are fully accounted for. These issues are being addressed using formal ontologies, across the entire continuum spectrum from the molecular, to the cellular, to the disease-related level.
  4. Software platform (geWorkbench)

    : the methods, models, and data produced in the context of all MAGNet Center's activities are made available as interoperable, grid-enabled components of a state-of-the-art bioinformatics platform, geWorkbench. This allows them

    (a)
    to be integrated with a variety of other existing bioinformatics modules for the analysis, visualization, and management of multiple data modalities and
    (b)
    to be assembled into complex bioinformatics workflows and biomedical applications using a simple yet powerful visual front-end and a scripting language.

    We also define and use a Biomedical Informatics Structured Ontology (BISON) to create interoperable interfaces for geWorkbench components. Further, components that are data or computationally-intensive are being wrapped as grid-services.

MAGNet