Identifying Hox protein-specific DNA-binding sites and probing their shapes.
Tom Tullius, Richard Mann, Barry Honig, Harmen Bussemaker
The Hox genes encode a set of homeodomain-containing transcriptional regulators that play critical roles in the development of all metazoans. Mutations in Hox genes and in their DNA-binding cofactors underlie several human diseases and birth defects, most notably, leukemias (Argiropoulos 2007, Eklund 2007). Hox genes in humans all encode proteins with nearly indistinguishable binding specificities, despite having distinct functions in vivo, raising the question of how these factors achieve specificity (Berger 2008, Noyes 2008). A partial answer to this paradox is that Hox proteins only achieve specificity when binding cooperatively to DNA with various cofactors (Mann 2009, Moens 2006). While a number of crystal structures exist for ternary complexes of DNA and the DNA-binding homeodomains of a Hox protein and one of its cofactors, until recently they did not provide insights as to the source of specificity. Part of the problem was that the crystal structures had been determined with non-specific DNA. Our studies of a site that is specific to a single Hox protein revealed that, in addition to the major groove binding pattern common to all Hox proteins, specificity is achieved, at least in part, through the recognition of sequence-specific minor groove shape. However, we do not yet have an understanding of how other Hox family members use this readout mechanism. Acquiring such information is critical to eventually “solving” the Hox specificity problem and at the same time providing critical insights into the general problem of DNA recognition by many TFs thus providing a paradigm for how other TF families achieve specificity.
This project represents a collaborative effort of the four labs largely responsible for determining that minor groove width, and DNA shape in general, play a central role in protein-DNA-binding specificity. The Honig and Mann labs have collaborated on this problem and have produced widely recognized papers, while the Tullius lab has been a pioneer in using experimental methods to characterize DNA shape and in demonstrating that it is evolutionarily conserved. The current project now expands observations on individual Hox proteins by using high-throughput and whole-genome methods to identify sites that are Hox protein specific and then by characterizing these sites in structural terms with hydroxyl-radical cleavage measurements and computational tools. We plan to combine new predictive algorithms for DNA shape with information extracted from Selex data about the binding sites recognized by individual Hox proteins. The goal of this combined approach is to characterize the specificity determinants of the entire Hox family both in sequence and structural terms.
In a recently published study we described the SELEX-seq approach for identifying Hox-Exd binding preferences (Slattery, Riley et al. 2011). Specifically, we defined the affinities for all 12-mers over a 100-fold range for all eight Drosophila Hox proteins when they bind in combination with the dimeric cofactor Exd-HM (HM is the homeodomainless isoform of Hth). We also used the same SELEX-seq platform to define a subset of specificities for the monomeric Hox proteins, in the absence of any cofactor. From these comparisons we established the concept of ‘latent specificity’, which in this case is when the cofactor (Exd-HM) reveals a latent specificity that is built into the Hox proteins, but cannot be utilized in the absence of the cofactor. We posit that this sort of mechanism may be generally used to distinguish the specificities of members of other gene families. We also extended our previous observations that DNA shape plays an important role in Hox-Exd DNA binding preferences. A previously established Monte Carlo approach was used to show that anterior Hox family members prefer to bind DNA sequences that differ in minor groove width compared to DNAs preferred by posterior Hox family members. Moreover, a novel high-throughput method was used to characterize the DNA shapes of tens of thousands of binding sites. This more sensitive method was able to distinguish the specificities of all eight Hox-Exd complexes based only on minor groove width. A dendrogram based on these specificities resulted in the ordering of the Hox factors in a manner that is collinear with their expression domains along the anterior-posterior axis during embryogenesis, a feature shared by other aspects (such as the location of the Hox genes in the chromosome) of this transcription factor family.
In a second study we extended our ChIP-chip analysis for the Hox protein Ubx in imaginal discs (Slattery, Ma et al. 2011) by asking to what extent these in vivo binding sites can be explained by the Ubx-Exd DNA preferences defined by the SELEX-seq method. Remarkably, we found that the SELEX-seq-defined binding sites were enriched in the Ubx ChIP-chip peaks. Moreover, the preferences were selective: binding sites for other Hox-Exd complexes (e.g. Antp-Exd or Scr-Exd) were not enriched in the Ubx ChIP-chip peaks (Slattery, Riley et al. 2011). These data provide strong in vivo validation for the SELEX-seq-defined sequences.
In a third study (Bishop, Rohs et al. 2011) we published the first detailed quantitative analysis of the hydroxyl radical cleavage pattern of DNA, demonstrating that the cleavage pattern of a DNA strand is a quantitative measure of DNA backbone solvent accessibility. We then showed that combining the cleavage patterns for both DNA strands provides a new metric, ORChID2, that quantitatively relates hydroxyl radical cleavage intensity to minor groove width and electrostatic potential. This new work extends our earlier findings (Joshi, Passner et al. 2007; Rohs, West et al. 2009) that recognition of minor groove electrostatic potential is an important but previously unappreciated mechanism for specificity in protein binding to DNA. The earlier work depended on analysis of high-resolution X-ray crystal structures of DNA and DNA-protein complexes, which are not available for genome-scale investigations. The ability to use the results of chemical probe experiments (hydroxyl radical cleavage) to map electrostatic potential will allow this new recognition principle to be applied to entire genomes. As the first application of ORChID2 to genome-scale protein-DNA recognition, we aligned the ORChID2 patterns for more than 40,000 nucleosome-binding sequences from yeast and Drosophila and showed that a periodic structural pattern exists in DNA sequences that have been found experimentally to bind to nucleosomes. To make this dataset available to the scientific community we have deposited an ORChID2 track in the UCSC human genome browser.