Abstract
Free full text
Retroviral DNA Integration: Viral and Cellular Determinants of Target-Site Selection
Associated Data
Abstract
Retroviruses differ in their preferences for sites for viral DNA integration in the chromosomes of infected cells. Human immunodeficiency virus (HIV) integrates preferentially within active transcription units, whereas murine leukemia virus (MLV) integrates preferentially near transcription start sites and CpG islands. We investigated the viral determinants of integration-site selection using HIV chimeras with MLV genes substituted for their HIV counterparts. We found that transferring the MLV integrase (IN) coding region into HIV (to make HIVmIN) caused the hybrid to integrate with a specificity close to that of MLV. Addition of MLV gag (to make HIVmGagmIN) further increased the similarity of target-site selection to that of MLV. A chimeric virus with MLV Gag only (HIVmGag) displayed targeting preferences different from that of both HIV and MLV, further implicating Gag proteins in targeting as well as IN. We also report a genome-wide analysis indicating that MLV, but not HIV, favors integration near DNase I–hypersensitive sites (i.e., +/− 1 kb), and that HIVmIN and HIVmGagmIN also favored integration near these features. These findings reveal that IN is the principal viral determinant of integration specificity; they also reveal a new role for Gag-derived proteins, and strengthen models for integration targeting based on tethering of viral IN proteins to host proteins.
Synopsis
A required step in the replication cycle of retroviruses is the integration of a DNA copy of the viral genome into a host cell chromosome. Recent studies have shown that human immunodeficiency virus (HIV) and murine leukemia virus (MLV) favor integration near different chromosomal features. HIV preferentially targets active genes, while MLV prefers integration near start sites of gene transcription. The authors investigated integration-target site–selection by HIV derivatives substituted with segments of MLV to determine which viral proteins are responsible for integration-targeting preferences. They found that the viral integrase protein is the dominant determinant of integration-site selection, probably through its tethering to cellular proteins bound near preferred genomic regions. In addition, components of the viral structural polyprotein, Gag, appear to be involved in targeting. These findings provide a functional map of the viral proteins involved in directing integration-site selection.
Introduction
The selection of target sites for integration of retroviral DNA is central to the biology of retroviruses and the application of retroviral vectors to gene therapy. The recent setbacks in human gene-therapy trials, in which a therapeutic retroviral vector integrated near the LMO-2 proto-oncogene and caused leukemia-like illness in three patients [1–3], have focused particular attention on the mechanisms responsible for integration targeting. Here we map the retroviral determinants of integration-target site–selection and investigate candidate mechanisms.
The basic DNA cleavage and joining reactions mediating retroviral integration are common among retroviruses (summarized in Figure 1A), but integration in vivo shows pronounced favored and disfavored chromosomal regions that differ among retroviruses. Retroviral integration-site selection is not strongly sequence-specific with respect to the target DNA at the point of joining, though a weakly conserved palindromic sequence can be detected when many integration-target sites are aligned [4–7]. Early studies of murine leukemia virus (MLV) integration targeting led to the suggestion that integration may be favored in open chromatin [8], since a positive correlation was detected between integration frequency and DNase I–hypersensitive sites [9,10]. More recently, the completion of the draft human genome sequence has allowed systematic studies of integration targeting by high-throughput sequencing of integration acceptor sites [11–14]. Human immunodeficiency virus (HIV) integration sites are found predominantly in active transcription units [11,13]. A cellular protein, lens epithelium–derived growth factor (LEDGF/p75), binds HIV IN [15–18] and is partially responsible for favored integration in genes [19]. For MLV, in contrast, roughly 25% of integration events are near transcription start sites and associated CpG islands, while integration within transcription units is only slightly favored [14]. Avian sarcoma-leukosis virus (ASLV) shows the most random pattern of integration-site selection—ASLV favors transcription units only weakly and does not favor transcription start sites [11,12]. Thus for the three retroviruses studied in detail, three different patterns of favored integration sites were found.
Here we investigate the requirements for integration targeting using chimeric viruses in which gene segments of MLV were substituted for the corresponding segments of the HIV genome (Figure 1B). The chimeras contained MLV gag gene segments substituted for HIV gag (HIVmGag) [20], MLV IN substituted for HIV IN (HIVmIN) [21], or both MLV gag and MLV IN substituted for their HIV counterparts (HIVmGagmIN) [21].
Previous characterization has shown that these viruses differ in their ability to infect interphase cells, and that this property maps to the gag gene polyprotein precursor [20,21]. MLV integrates only after mitosis, while HIV can integrate at any time during the cell cycle. The chimeric viruses HIVmGag and HIVmGagmIN show the same requirement for cell division as does MLV [20,21], while HIVmIN, like HIV, can infect non-dividing cells [21] (summarized in Figure 1B). Thus MLV Gag imposes the requirement for cell division on HIV. Consequently, tests of integration-target site–selection by these chimeras provide an opportunity to probe the influence of cell-cycle progression on integration-target site–selection.
Integration-site selection by the chimeric and control viruses was assayed by cloning and sequencing 2,440 junctions between human and proviral DNA generated by infection of human cells. We found that HIVmIN and HIVmGagmIN favored integration near transcription start sites and CpG islands, paralleling the preferences of MLV and implicating IN as the main specificity determinant. The resemblance was closest between MLV and HIVmGagmIN, implicating Gag as a cofactor for targeting as well as IN. HIVmGag exhibited a phenotype that did not resemble either parent—it did not favor transcription starts and CpG islands like MLV, and it did not favor integration in transcription units or gene-rich regions as strongly as did HIV, further implicating Gag as well as IN. In addition, we used new genome-wide data on preferential DNase I cleavage sites [22] to analyze the relationship with favored integration sites, and found that MLV favored integration within 1 kb of DNase I cleavage sites, as did the HIVmIN and HIVmGagmIN chimeras. However, HIV, ASLV, and L1 retrotransposons did not favor these sites, indicating that possible open chromatin marked by DNase I cleavage sites was not globally favorable for integration of new DNA sequences. This result is more consistent with models based on specific interactions between MLV IN and cellular proteins bound near DNase I cleavage sites. These data elucidate the viral determinants of integration targeting, disclose a role for Gag in integration, and indicate that models for targeting based solely on open chromatin or cell-cycle effects are unlikely to be correct.
Results
Cloning and Analysis of Integration Sites
The chimeric viruses used in this study were deleted for the env gene and complemented with the vesicular stomatitis virus G protein (VSV-G) to boost titer and restrict infection to a single round. These chimeras were less infectious than the wild-type virus [20,21], so the puromycin resistance gene was cloned in place of nef, allowing infected cells to be selected with puromycin to enrich for provirus-containing cells. Vpr was also deleted because of its cellular toxicity [23]. In order to control for possible biases in integration-site recovery due to puromycin selection, control infections were carried out with an HIV derivative transducing the puromycin resistance gene (termed “HIVPuro”) and an MLV vector (LPCX) also transducing the puromycin resistance gene (termed “MLVPuro”). Attempts to make reciprocal constructs (HIV gene segments into an MLV background) did not yield infectious viruses (unpublished data). HeLa cells were chosen as infection target cells because they are highly susceptible to infection and because they had been used in a previous study comparing MLV and HIV integration targeting [14].
To clone integration sites, genomic DNA from infected cells was extracted, digested with MseI and ligated to adapters. The junctions between proviral DNA and genomic DNA were amplified by nested PCR using primers complementary to proviral and adaptor sequences, cloned, sequenced, and mapped to the human genome as described [11,13,14,24]. Newly determined sets of integration sites (a total of 2,440 sites for the five viruses studied) were compared to each other and to previously reported datasets (Table 1). The distribution of integration sites was also compared to random sites in the human genome generated computationally. As is discussed in Protocol S1, a bioinformatic procedure was used to control for potential biases in integration-site recovery due to possible nonrandom distributions of restriction sites in the human genome.
Table 1
As a test for correct integration by the chimeric viruses, we determined the target-site duplication lengths for a few integration events of each chimeric virus (Figure 1C). Each chimeric virus showed mostly the duplication length that is characteristic of the virus donating the IN segment—4 bp for MLV and 5 bp for HIV—which is as expected because IN is known to dictate the length of the duplication [25,26]. For unknown reasons, one out of five duplications for the HIVmGagmIN chimera was 5 bp instead of the expected 4 bp; all the others were as expected. In addition, all integration events showed evidence of correct cleavage at the viral DNA 3′ end by integrase. These data support the idea that the IN–DNA complexes of the chimeras generally assembled and functioned normally.
The target DNA sequences at the point of integration were then compared (Figure 1D). Previous studies showed that retroviruses have weak preferences for specific primary DNA sequences at integration sites and, when large numbers of sites are analyzed, these biases become statistically very significant [4–7]. We found that two MLV datasets and the HIVmGagmIN dataset showed the previously determined MLV-favored site, and that the HIVPuro and HIVmGag datasets matched the known HIV sequence. Unexpectedly, the HIVmIN site showed lower information content than the others and was somewhat intermediate in sequence. Pairwise comparisons of selected positions in the consensus sequence showed significant differences (e.g., p < 0.0001 for comparison of HIVmIN to HIVPuro at position −3; p < 0.0001 for comparison of HIVmIN to MLVPuro at position −2, analyzed by chi-square). This indicates that Gag determinants, as well as IN determinants, can influence the favored primary sequences at integration sites.
Integration Frequency near Transcription Start Sites and CpG Islands
Integration sites for each of the five viruses were mapped to the human genome, and nearby features were assessed (Figure 2). To begin to compare integration by the chimeras, we evaluated the frequency of integration near transcription start sites and CpG islands (Figure 3A and and3B;3B; Table 2). The MLVPuro control exhibited a strong preference for integration near transcription start sites—26.1% of MLVPuro sites were within ± 5 kb of a RefSeq gene-transcription start site compared to 5.0% of matched random control sites. For the HIVPuro virus, only 6.9% were near transcription start sites. Thus the preferential integration near transcription start sites by MLV, but not HIV, reported previously [13,14] was seen despite the puromycin selection of transduced cells (p < 0.0001 for pairwise comparison of HIVPuro and MLVPuro; chi-square test).
Table 2
The HIVmIN and HIVmGagmIN target-site preferences closely paralleled MLV, showing 20.7% and 22.4% of integration events within ± 5 kb of transcription start sites, respectively. These high frequencies were significantly different from HIVPuro (p < 0.0001 for both comparisons; chi-square test), and the random control (Table 2), but not significantly different from MLVPuro (p > 0.05 for both comparisons; chi-square test). HIVmGag differed, showing only 3.9% of integration events near transcription start sites, which was significantly lower than HIVPuro (p = 0.0342; chi-square test). Thus MLV IN is a sufficient determinant for directing favored integration near transcription start sites, and Gag-derived proteins also influence integration near these features.
The integration frequency near CpG islands was then compared. CpG islands are regions rich in the CpG dinucleotide, which are undermethylated and frequently associated with gene regulatory regions [27]. MLV strongly favors integration near CpG islands while HIV does not [11,14]. We quantified integration frequency near CpG islands and found that the MLVPuro, HIVmIN, and HIVmGagmIN viruses all favored integration near these sites. Specifically, 11.8%, 9.4%, and 9.9% of sites, respectively, were within 1 kb of a CpG island midpoint, compared to 1.0% of matched random sites. HIVPuro was not significantly different from random sites (0.2%), while the HIVmGag virus significantly disfavored regions within 1 kb of a CpG island midpoint (0%, p = 0.0224 for chi-square comparison to random sites). The MLVPuro, HIVmIN, and HIVmGagmIN datasets showed significantly more frequent integration near CpG islands than did the HIVPuro and HIVmGag datasets (p < 0.0001 for any pairwise comparion between the two groups; chi-square test).
In summary, the HIVmIN and HIVmGagmIN chimeras resembled MLV in their strong preferences for integration near transcription start sites and CpG islands. Evidently, MLV IN is sufficient to direct favored integration near these features. HIVPuro showed significant differences from HIVmGag, implicating Gag in integration targeting near these features as well.
Another difference between HIV and MLV is the different frequency of integration within transcription units (Table 2). The HIVPuro virus favored integration in these sequences (77.9% in RefSeq genes), while the MLVPuro virus showed a much weaker trend (44.3% in RefSeq genes), which is only slightly above the frequency for random sites (33.9%). Comparing the frequency between HIVPuro and MLVPuro achieved p < 0.0001 (chi-square test). The HIVmIN and HIVmGagmIN viruses did not differ significantly from the MLVPuro virus (p-values are 0.1112 and 0.5713, respectively; chi-square test). Both HIVmIN and HIVmGagmIN differed significantly from HIVPuro (p < 0.0001 for both comparisons; chi-square test). HIVmGag showed an intermediate phenotype, being down significantly in the frequency of targeting transcription units compared to HIVPuro (reduced 11%; p < 0.0001; chi-square test), but still significantly greater than the MLVPuro, HIVmIN, or HIVmGagmIN viruses (p < 0.0001 for all comparisons; chi-square test). Thus, analysis of integration frequency in transcription units also indicated that IN was the key determinant, but gag also contributed.
We next assessed the effects of transcriptional activity on integration frequency using transcriptional profiling data for the HeLa target cells. All viruses tested favored active transcription units for integration compared to randomly selected genes (p < 0.0001, Mann-Whitney U-test on signal values). The median expression level of genes targeted for integration was highest for the HIVPuro and HIVmGag viruses but lower for HIVmIN, HIVmGagmIN, and MLVPuro viruses—in that order. The median signals were significantly different between the HIVPuro virus and the HIVmGagmIN and MLVPuro viruses (p = 0.0005 and p = 0.0241; Mann-Whitney U-test of signal values for genes targeted by HIVPuro versus HIVmGagmIN and by HIVPuro versus MLVPuro, respectively). Thus the chimeras containing MLV IN in the HIV background paralleled MLV by this measure as well.
Integration Frequency near DNase I Cleavage Sites
Early studies of MLV integration targeting suggested that MLV favors DNase I–hypersensitive sites for integration [8–10]. DNase I–hypersensitive sites are believed to be nucleosome-depleted chromosomal regions associated with regulatory elements [28]. Genome-wide mapping of DNase I cleavage sites in chromatin has revealed that they are enriched near the 5′ ends of transcription units, near CpG islands, and near active genes, reinforcing the idea that they are markers for regulatory regions [22,29].
To assess the correlation between retroviral integration and DNase I cleavage frequency genome-wide, we quantified integration sites within 1 kb of two positions of DNase I cleavage mapped by Crawford et al. [22]. We chose to use two cleavage sites in the analysis instead of a single site to better match the experimental definition of DNase I–hypersensitive sites, which relies on multiple cleavage events. The conclusions were similar whether one, two, or three DNase I cleavage sites were used for analysis (unpublished data). For technical reasons, Crawford et al. analyzed cleavage sites in resting T cells, but further analysis showed that 80% of sites were shared between resting T cells and HeLa cells [22]; we therefore extrapolated their data for comparison to integration sites in HeLa cells studied here.
Table 2 shows the percentage of integration sites that were in intervals (plus or minus 1 kb of the integration sites) containing two or more DNase I cleavage sites. The enrichment relative to the matched random control is shown in Figure 3C. We also analyzed previously published datasets from MLV [14], HIV [4,11,13,14], ASLV [11,12], and the L1 retrotransposon [30,31] and plotted these in Figure 3C for comparison.
Of these, MLV showed by far the strongest preference for integration near DNase I cleavage sites. HIV and L1 retrotransposons showed no preference for integration near DNase I cleavage sites, while ASLV showed a weak preference that barely achieved statistical significance. Thus, contrary to the expectation that open chromatin at DNase I cleavage sites is globally favorable for integration, we find that strong favoring of integration near DNase I cleavage sites is specific to MLV.
DNase I cleavage sites are known to be enriched near promoters, raising the question of whether the association of DNase I cleavage sites and MLV integration sites is just a reflection of favored integration near promoters. However, a bioinformatic analysis of this issue (Protocol S2; unpublished data) indicates that proximity to DNase I cleavage sites is favorable for integration independently of proximity to promoters. For example, when promoter locations are approximated as the 1 kb of DNA just upstream from a RefSeq transcription start site, analysis of integration sites outside these regions still reveals increased frequency of MLV integration plus or minus 500 bp from a DNase I cleavage site (p < 0.00001).
HIVmIN and HIVmGagmIN were similar to the MLVPuro virus in that they strongly favored DNase I–hypersensitive sites for integration, and all three differed significantly from HIVPuro or HIVmGag (p < 0.0001 for any pairwise comparison between the two groups; chi-square test). Like the HIVPuro virus, the HIVmGag virus did not favor these sites for integration above the expectation for random placement (Table 2). Thus substituting MLV IN into HIV was sufficient to transfer the tendency to favor integration near DNase I cleavage sites.
Transcription-Factor Binding Sites near Integration Sites
Given the favoring of integration by MLV, HIVmIN, and HIVmGagmIN near promoters, we investigated whether transcription-factor binding sites were enriched near integration sites of these viruses. We evaluated possible enrichment of 546 transcription-factor binding-site motifs within plus or minus 1 kb of integration sites compared to matched random control sites. To assess the generality of any findings, we also included in this study a previously published set of MLV integration sites in HeLa cells (termed MLV-Burgess; [14]). The MLVPuro, MLV-Burgess, HIVmIN, and HIVmGagmIN datasets showed by far the highest numbers of significantly (p < 0.001) enriched transcription-factor binding-site motifs (54, 33, 25, and 24, respectively). The HIVPuro and HIVmGag returned far fewer (1 and 0). Strikingly, for the MLV group of motifs, many were common to all four datasets, or were shared between multiple group members (Figure 4). Seventeen significantly enriched factors were common to all four, thus specifying a set of cellular factors correlated with MLV (plus HIVmIN and HIVmGagmIN) integration (see Table S1). No single motif was common between the HIVPuro and HIVmGag datasets.
However, many of the sites in Figure 4 were not found to be enriched when promoter sequences were used as controls instead of randomly chosen genomic sites (see Table S1), indicating the general features of promoters correlate most strongly with MLV integration. Nevertheless, a few transcription-factor binding sites were still significantly enriched when promoters were used as controls (requiring 1.5-fold enrichment and p ≤ 0.001), suggesting potential specific interactions. Among the datasets in Figure 4, binding sites for the Ap-1 and Bach1 transcription factors were enriched relative to promoter controls in three out of four datasets (HIVmIN was the exceptional dataset). In addition, a regression analysis indicated that the presence of a nearby promoter could not fully account for the favorable effect of transcription-factor binding sites on integration frequency, again indicating a possible effect of the transcription-factor binding sites beyond just marking promoters (Protocol S2).
Global Comparison of Trends in Integration Targeting
To assess the similarities among integration-site datasets, a machine learning algorithm based on RandomForest was developed to cluster the datasets, taking into account 109 different types of genomic features (Figure 5A; Protocol S3). Examples of genomic features included: gene calls, CpG islands, G/C content, DNase I cleavage sites, and gene boundaries (a detailed list is included in Protocol S3). The MLVPuro, HIVmIN, and HIVmGagmIN integration-site datasets were clustered together by this means. HIVmGagmIN resembled MLV the most closely. HIVPuro and HIVmGag clustered together, though the analysis also emphasized the distinctions between the two datasets. The genomic features most responsible for distinguishing among integration-site datasets could be determined by further analysis of the clustering results (summarized in Protocol S3). Measures of proximity to transcription start sites and gene boundaries were prominent, as were measures of integration in genes and gene density, all as expected from the data in Table 2 and Figure 3.
Another significant feature was the G/C content at integration sites. The effects of G/C content in isolation are presented in Figure 5B. For the highest G/C content category, there are obvious, strong effects (p < 0.0005 for each integration complex). In the human genome, regions of high G/C content are also high in transcription units, SINE elements, CpG islands, and a variety of other features. Controlling for these features would be expected to reduce the strength of the relationship shown in Figure 5B. However, after controlling for the presence of a CpG island within ± 2.5 kb, the effects of being in the highest G/C content category are still significant (at p < 0.005 for each virus studied). HIVPuro and HIVmGag differed by this measure (particularly in the highest G/C content category where p < 1e−10), indicating that Gag proteins play a role in integration targeting near these sequences.
Effects of Selection on Populations of Proviruses
In order to clone a large number of integration sites from the poorly infectious chimeric viruses, it was necessary to select infected cells with puromycin, raising the question of to what extent the selection for proviral gene expression affected the ultimate distribution of integration sites. Previous work showed that selecting for proviral expression can influence the population of integration sites recovered, though the effect was modest [24]. To account for this, puromycin-transducing HIV (HIVPuro) and MLV (MLVPuro) control viruses were used in the present study for comparison to the chimeras. Thus the data from this study, combined with previous work, allows the effects of selection to be analyzed by comparing the HIVPuro and MLVPuro datasets to unselected HIV and MLV datasets (Wu et al. [14]; Table 1), which were also generated by infection of HeLa cells. The pairs of datasets were compared in a semiautomated fashion with respect to many types of genomic annotation. The results are presented in detail in Protocols S4 and S5, and highlights are shown in Figure 6 and Table S2.
The unselected HIV-Burgess dataset did not differ significantly from the HIVPuro dataset over many forms of annotation. For example, the two did not differ in the frequency of integration in RefSeq genes (Figure 6A), the proportion of sites within 1 kb of a CpG island, or the proportion of sites within 1 kb of two DNase I cleavage sites (Table S2). However, the two datasets did differ with respect to the gene density of regions hosting integration sites (p = 0.0085 for gene density in a 4-Mb window surrounding each integration site; Protocol S4), as did the response to transcriptional intensity in the surrounding region (Figure 6B). These data suggest that gene-dense regions are more favorable for HIV provirus expression, reinforcing earlier findings that integration within long intergenic regions disfavored subsequent proviral gene expression [24].
The MLVPuro dataset did not differ from the unselected MLV-Burgess dataset in the proportion of integration sites within RefSeq genes (Figure 6C), within 5 kb of a RefSeq gene-transcription start site, within 1 kb of a CpG midpoint, or within 1 kb of two DNase I cleavage sites (Table S2). However, like HIV, the selected and unselected MLV sites did differ in their frequency in gene-dense regions (p < 2.22e−16 for gene density in a 4-Mb window surrounding each integration site; Protocol S5) and their response to local transcriptional intensity (Figure 6D). Selected and unselected MLV also differed in the G/C content at integration sites (p = 4.52e−11; see Protocol S5). Evidently, gene-dense regions and correlated regions of high G/C content are favorable for MLV gene expression after integration.
Discussion
Previous studies of target-site selection by mobile DNA elements have revealed that the determinants of integration targeting can be diverse. The prokaryotic transposons Tn7 and bacteriophage Mu each encode specific proteins, distinct from the element-encoded transposase enzymes that bind to integration-target DNA and direct site selection (reviewed in [32]). For the Saccharomyces cerevisiae Ty retrotransposons, in contrast, there is strong evidence for a tethering mechanism involving direct binding of the Ty integrase enzyme to a cellular protein bound near favored sites on target DNA [33–37]. Here we report that two virus-encoded determinants are involved for retroviruses: the IN protein and components of the Gag polyprotein.
An alternative explanation for the data presented here could have been that the viral nucleic acid sequence, and not the encoded protein, was the determinant of target-site specificity. As shown in Figure 1B, the viral DNAs that become integrated retain the IN and gag coding regions. Thus it appears possible that a binding site for a cellular protein might exist in the DNA encoding IN or gag, and that binding of a cellular factor to this DNA site could mediate integration-site selection. However, this model can be ruled out, because integration-site sequence data has been obtained for both HIV and MLV using retroviral vectors that lack the gag and IN coding regions, and these show the same target-sequence preferences as the viruses that do contain the IN and gag coding regions studied here (MLVPuro is such a dataset for MLV, and [11,13] report examples for HIV). Thus the IN and Gag-derived proteins are responsible for selecting the integration target, and not a DNA site within the region encoding gag or IN.
The earliest model for the mechanism of integration-site selection by retroviruses proposed that open chromatin was favored because MLV favored integration near DNase I–hypersensitive sites [8–10]. However, our genome-wide data indicate that DNase I–sensitive regions are not universally favorable. Only MLV—and not HIV, ASLV, or L1—strongly favored integration near to these sites. It is unclear whether relatively greater exposure of DNA at DNase I–hypersensitive sites is involved in integration targeting at all. Binding of MLV integration complexes to specific cellular proteins bound at or near DNase I–hypersensitive sites may fully explain the observations. Contrary to the initial interpretation of the data on integration and DNase I–hypersensitive sites, we conclude that none of the available data require explanations based on DNA accessibility to explain integration targeting near these sites.
Another model for the mechanism of integration targeting invokes effects of the cell cycle. HIV and MLV differ in the cell-cycle dependence of infection. HIV can infect cells regardless of cell-cycle phase [38,39], while MLV infection requires host cells to pass through mitosis [40,41]. The transcriptional state of a cell is known to vary with the cell cycle, so the organization of chromosomal DNA encountered by the MLV and HIV integration complexes should differ. The HIVmGag chimera exhibited cell cycle–restricted infectivity, like that of MLV [20]—thus HIVmGag would likely encounter the chromosomal DNA in the same state as does MLV. The targeting preferences of the HIVmGag chimera did differ from those of HIVPuro, potentially supporting the cell-cycle model, but the HIVmGag integration pattern was very different from that of MLV. Thus cell-cycle effects may have a modest influence on integration, but other factors appear to dominate. Consistent with this, studies of HIV integration targeting in non-dividing cells have not shown large differences from studies of integration in dividing cells [42–43].
The best-supported model at present invokes direct tethering interactions between retroviral proteins and cellular factors. Evidence suggests that HIV IN is one determinant of integration targeting, since it binds LEDGF/p75 protein [15–18], and cells lacking LEDGF/p75 show reduced frequency of integration in transcription units [19]. However, the IN–LEDGF/p75 interaction is not a complete explanation for the HIV integration-target preference, because HIV integration in cells depleted for LEDGF/p75 shows only a modest reduction in integration in transcription units, indicating that other factors may be involved [19].
Data reported here implicate IN as the primary determinant of integration targeting, with Gag-derived proteins playing an auxiliary role. For the MLV case, IN is clearly a dominant determinant, because it reprograms HIV integration toward the MLV-like pattern. It is possible that determinants for targeting HIV exist in other HIV genes, but are recessive to MLV IN. However, the data with LEDGF suggest that HIV IN is one determinant of HIV target-site selection. The mechanism of MLV targeting is not fully specified by our data, but a direct tethering interaction between MLV IN and transcription factors (Figure 4; Table S1) or other proteins bound at promoters is consistent with our findings. The role of Gag is less clear. It could be that MLV Gag–derived proteins are involved indirectly by acting as cofactors for correct assembly of complexes containing MLV IN. Consistent with this idea is the finding that the target-sequence preference at the point of integration is perturbed in the HIVmIN dataset, but fully matches MLV in HIVmGagmIN (Figure 1D). That is, lack of the matched MLV Gag may cause incorrect assembly of MLV IN, resulting in incorrect recognition of the target DNA. MLV Gag could also interact directly with cellular proteins. A third possibility is that MLV Gag is acting through its ability to regulate the relationship of integration to the cell cycle [20], as is discussed above. Our results also suggest that HIV Gag–derived proteins are involved in integration targeting, because the HIVmGag chimera differed significantly from HIVPuro in target-sequence preferences (see, for example, Figure 5B).
In summary, we found that substitution of MLV IN for HIV IN reprogrammed HIV integration-site selection towards that of MLV. Furthermore, addition of MLV gag resulted in a closer parallel with MLV integration targeting. In addition, we found that favored integration near DNase I–hypersensitive sites was an MLV-specific trend, and this tendency also could be transferred to HIV by substituting MLV IN into HIV. These data clarify the viral determinants of integration-site selection, reveal a new role for Gag proteins, and constrain models for the mechanisms directing integration targeting by retroviruses.
Materials and Methods
DNA constructions.
To generate the MLVPuro dataset, we used LPCX (Clontech, Palo Alto, California, United States), which is an MLV-based vector that expresses the puromycin resistance gene from the MLV LTR. All other vectors used were based on the full-length HIV clone pLAI [44]. Vpr was mutated by the insertion of four bases at the NcoI site at 5,207 bp, and env has a deletion between the BglII sites at 6,634 and 7,214 bp [23]. The puromycin resistance gene was cloned in place of nef. The MLV gag gene segment encoding MA, p12, and CA from pAMS [45] was cloned in place of HIV MA and CA for MHIV-mMA12CA-ΔenvΔvprΔnef-puromycin (for the HIVmGag dataset) and MHIV-mMA12CA-mIN-ΔenvΔvprΔnef-puromycin (for HIVmGagmIN) as described previously [20]. For MHIV-mIN-ΔenvΔvprΔnef-puromycin (HIVmIN) and MHIV-mMA12CA-mIN-ΔenvΔvprΔnef-puromycin (HIVmGagmIN), the MLV IN–encoding portion of the pAMS pol gene was cloned in place of HIV IN, starting at the same position of the 5′ end of the HIV IN gene segment. The 3′ end of the HIV IN–encoding region with the cPPT remains and is separated from the end of MLV IN by two stop codons [21]. (The junction sequence is CGTGGAAGCCCTTAATAGTCTgaattc.)
Infections.
VSV-G–pseudotyped virus was prepared as described previously [20]. HeLa cells were infected by spinoculation [46] with concentrated viral supernatant and 20 μg/ml DEAE-dextran. Infected cells were selected with 0.7 μg/ml puromycin for 2 wk. Genomic DNA was extracted from pooled colonies.
Cloning integration sites.
Genomic DNA was digested with MseI and ligated to a linker as described previously [14]. The ligase was heat-inactivated at 65 °C for 15 min, and the genomic DNA was digested with a second restriction enzyme to limit the amplification of an internal viral fragment. SpeI was used for the MLVPuro virus, and SacI was used for the HIV-based viruses. Viral-host DNA junctions were amplified by nested PCR using primers specific for the proviral LTR (reading out from the 3′ end) and the linker essentially as described in the GeneWalker Kit manual (Clontech). Nested-PCR products were cloned using the TOPO TA cloning system (Invitrogen, Carlsbad, California, United States). Clones were sequenced and mapped to the human genome with BLAT (University of California, Santa Cruz, California, United States). The viral genotypes in each genomic DNA sample were confirmed by PCR using primers that detected sequences from HIV gag, HIV IN, MLV gag, and MLV IN.
For analysis of the length of target-site duplications, integration-site clones were randomly chosen and genomic sequence-specific primers were designed. The viral-host DNA junction from the 5′ LTR of the provirus was amplified from undigested genomic DNA and cloned using the TOPO TA cloning system (Invitrogen). Oligonucleotides used in this study are listed in Table S3.
A question arises regarding the use of the VSV-G envelope for infection instead of the authentic HIV or MLV envelopes, but a direct study of this issue has failed to reveal any differences [43].
Bioinformatic analysis.
A detailed statistical analysis is presented in Protocols S1–S5. In order to control for possible biases in the datasets due to the choice of restriction endonuclease used in cloning integration sites, each experimental integration site was paired with ten randomly selected sites in the genome that were exactly the same distance from an MseI site. These matched random control sites were generated in silico and were used for comparison to the integration-site datasets as previously described [11].
The statistical analysis of favored binding-site motifs (Figure 4 and Table S1) was carried out as follows. Transcription-factor binding-site motifs, described as positional-weight matrices, were obtained from the TRANSFAC database. Let X and Y denote sets of significant motifs around the integration sites in two independent experiments, with c motifs in common. Assuming a random sampling of |X| and |Y| distinct factors from a pool of 546 transcription-factor motifs, the hypergeometric p-value estimates the probability of sampling c or more common motifs.
For the analysis of the effects of host-cell transcription on integration, we acquired a set of HeLa transcriptional profiling data (assayed with Affymetrix HG-U133A microarrays [Santa Clara, California, United States]) from NCBI Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/projects/geo/index.cgi).
Supporting Information
Protocol S1
Association of Genomic Features with Integration:(650 KB PDF)
Protocol S2
Screening Transcription Factors for Effects on Retroviral Integration:(309 KB PDF)
Protocol S3
Similarity of Integration Sites of Different Integration Complexes:(451 KB PDF)
Protocol S4
Association of Genomic Features with Integration: Unselected versus Puromycin-Selected HIV:(524 KB PDF)
Protocol S5
Association of Genomic Features with Integration: Unselected versus Puromycin-Selected MLV:(521 KB PDF)
Table S1
Transcription-Factor Binding-Site Motifs Enriched in Each Integration-Site Dataset:(41 KB XLS)
Table S2
Comparison of Selected and Unselected Datasets:(14 KB XLS)
Table S3
Oligonucleotides Used in This Study:(16 KB XLS)
Accession Numbers
The NCBI Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/projects/geo/index.cgi) accession numbers for the publicly available data that we used in our analysis of host-cell transcription effects on integration are GSM23372, GSM23373, GSM23377, and GSM23378.
The NCBI GenBank (http://www.ncbi.nlm.nih.gov/Genbank/index.html) accession numbers for integration sites sequenced in this study are: HIVPuro (DX597229–DX598304), HIVmGag (DX588312, DX593208–DX594687), HIVmIN (DX594688–DX597228), HIVmGagmIN (DX590011–DX590615), and MLVPuro (DX598305–DX598906).
Acknowledgments
We thank members of the Bushman and Emerman laboratories for helpful discussions.
Abbreviations
ASLV | avian sarcoma-leukosis virus |
HIV | human immunodeficiency virus |
LEDGF | lens epithelium–derived growth factor |
MLV | murine leukemia virus |
VSV-G | vesicular stomatitis virus G protein |
Footnotes
Author contributions. MKL, MY, ME, AC, JL, SH, CCB, and FDB conceived and designed the experiments. MKL, MY, AC, HM, PS, JL, SH, and CCB performed the experiments. MKL, MY, ME, AC, HM, GC, FC, PS, JL, SH, CCB, and FDB analyzed the data. MKL, MY, ME, AC, HM, GC, FC, JRE, and FDB contributed reagents/materials/analysis tools. MKL, CCB, and FDB wrote the paper.
Competing interests. The authors have declared that no competing interests exist.
Funding. This work was supported by NIH grants AI52845 and AI34786 (to FDB) and AI30927 (to ME), the James B. Pendleton Charitable Trust, and Robin and Frederic Withington (to FDB) and the Fritz B. Burns Foundation (to JRE).
References
- Check E. Gene therapy put on hold as third child develops cancer. Nature. 2005;433:561. [Abstract] [Google Scholar]
- Hacein-Bey-Abina S, von Kalle C, Schmidt M, Le Deist F, Wulffraat N, et al. A serious adverse event after successful gene therapy for X-linked severe combined immunodeficiency. N Engl J Med. 2003;348:255–256. [Abstract] [Google Scholar]
- Hacein-Bey-Abina S, Von Kalle C, Schmidt M, McCormack MP, Wulffraat N, et al. LMO2-associated clonal T cell proliferation in two patients after gene therapy for SCID-X1. Science. 2003;302:415–419. [Abstract] [Google Scholar]
- Carteau S, Hoffmann C, Bushman F. Chromosome structure and human immunodeficiency virus type 1 cDNA integration: Centromeric alphoid repeats are a disfavored target. J Virol. 1998;72:4005–4014. [Europe PMC free article] [Abstract] [Google Scholar]
- Holman AG, Coffin JM. Symmetrical base preferences surrounding HIV-1, avian sarcoma/leukosis virus, and murine leukemia virus integration sites. Proc Natl Acad Sci U S A. 2005;102:6103–6107. [Europe PMC free article] [Abstract] [Google Scholar]
- Stevens SW, Griffith JD. Sequence analysis of the human DNA flanking sites of human immunodeficiency virus type 1 integration. J Virol. 1996;70:6459–6462. [Europe PMC free article] [Abstract] [Google Scholar]
- Wu X, Li Y, Crise B, Burgess SM, Munroe DJ. Weak palindromic consensus sequences are a common feature found at the integration target sites of many retroviruses. J Virol. 2005;79:5211–5214. [Europe PMC free article] [Abstract] [Google Scholar]
- Panet A, Cedar H. Selective degradation of integrated murine leukemia proviral DNA by deoxyribonucleases. Cell. 1977;11:933–940. [Abstract] [Google Scholar]
- Rohdewohld H, Weiher H, Reik W, Jaenisch R, Breindl M. Retrovirus integration and chromatin structure: Moloney murine leukemia proviral integration sites map near DNase I-hypersensitive sites. J Virol. 1987;61:336–343. [Europe PMC free article] [Abstract] [Google Scholar]
- Vijaya S, Steffen DL, Robinson HL. Acceptor sites for retroviral integrations map near DNase I-hypersensitive sites in chromatin. J Virol. 1986;60:683–692. [Europe PMC free article] [Abstract] [Google Scholar]
- Mitchell RS, Beitzel BF, Schroder AR, Shinn P, Chen H, et al. Retroviral DNA integration: ASLV, HIV, and MLV show distinct target site preferences. PLoS Biol. 2004;2:e234. DOI: 10.1371/journal.pbio.0020234. [Europe PMC free article] [Abstract] [Google Scholar]
- Narezkina A, Taganov KD, Litwin S, Stoyanova R, Hayashi J, et al. Genome-wide analyses of avian sarcoma virus integration sites. J Virol. 2004;78:11656–11663. [Europe PMC free article] [Abstract] [Google Scholar]
- Schroder AR, Shinn P, Chen H, Berry C, Ecker JR, et al. HIV-1 integration in the human genome favors active genes and local hotspots. Cell. 2002;110:521–529. [Abstract] [Google Scholar]
- Wu X, Li Y, Crise B, Burgess SM. Transcription start regions in the human genome are favored targets for MLV integration. Science. 2003;300:1749–1751. [Abstract] [Google Scholar]
- Cherepanov P, Maertens G, Proost P, Devreese B, Van Beeumen J, et al. HIV-1 integrase forms stable tetramers and associates with LEDGF/p75 protein in human cells. J Biol Chem. 2003;278:372–381. [Abstract] [Google Scholar]
- Llano M, Delgado S, Vanegas M, Poeschla EM. Lens epithelium-derived growth factor/p75 prevents proteasomal degradation of HIV-1 integrase. J Biol Chem. 2004;279:55570–55577. [Abstract] [Google Scholar]
- Llano M, Vanegas M, Fregoso O, Saenz D, Chung S, et al. LEDGF/p75 determines cellular trafficking of diverse lentiviral but not murine oncoretroviral integrase proteins and is a component of functional lentiviral preintegration complexes. J Virol. 2004;78:9524–9537. [Europe PMC free article] [Abstract] [Google Scholar]
- Turlure F, Devroe E, Silver PA, Engelman A. Human cell proteins and human immunodeficiency virus DNA integration. Front Biosci. 2004;9:3187–3208. [Abstract] [Google Scholar]
- Ciuffi A, Llano M, Poeschla E, Hoffmann C, Leipzig J, et al. A role for LEDGF/p75 in targeting HIV DNA integration. Nat Med. 2005;11:1287–1289. [Abstract] [Google Scholar]
- Yamashita M, Emerman M. Capsid is a dominant determinant of retrovirus infectivity in nondividing cells. J Virol. 2004;78:5670–5678. [Europe PMC free article] [Abstract] [Google Scholar]
- Yamashita M, Emerman M. The cell cycle independence of HIV infections is not determined by known karyophilic viral elements. PLoS Pathog. 2005;1:e18. DOI: 10.1371/journal.ppat.0010018. [Europe PMC free article] [Abstract] [Google Scholar]
- Crawford GE, Holt IE, Whittle J, Webb BD, Tai D, et al. Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS) Genome Res. 2005;16:123–131. [Europe PMC free article] [Abstract] [Google Scholar]
- Rogel ME, Wu LI, Emerman M. The human immunodeficiency virus type 1 vpr gene prevents cell proliferation during chronic infection. J Virol. 1995;69:882–888. [Europe PMC free article] [Abstract] [Google Scholar]
- Lewinski MK, Bisgrove D, Shinn P, Chen H, Hoffmann C, et al. Genome-wide analysis of chromosomal features repressing HIV transcription. J Virol. 2005;79:6610–6619. [Europe PMC free article] [Abstract] [Google Scholar]
- Bushman FD, Fujiwara T, Craigie R. Retroviral DNA integration directed by HIV integration protein in vitro. Science. 1990;249:1555–1558. [Abstract] [Google Scholar]
- Craigie R, Fujiwara T, Bushman F. The IN protein of Moloney murine leukemia virus processes the viral DNA ends and accomplishes their integration in vitro. Cell. 1990;62:829–837. [Abstract] [Google Scholar]
- Bird AP. CpG-rich islands and the function of DNA methylation. Nature. 1986;321:209–213. [Abstract] [Google Scholar]
- Gross DS, Garrard WT. Nuclease hypersensitive sites in chromatin. Annu Rev Biochem. 1988;57:159–197. [Abstract] [Google Scholar]
- Crawford GE, Holt IE, Mullikin JC, Tai D, Blakesley R, et al. Identifying gene regulatory elements by genome-wide recovery of DNase hypersensitive sites. Proc Natl Acad Sci U S A. 2004;101:992–997. [Europe PMC free article] [Abstract] [Google Scholar]
- Gilbert N, Lutz-Prigge S, Moran JV. Genomic deletions created upon LINE-1 retrotransposition. Cell. 2002;110:315–325. [Abstract] [Google Scholar]
- Symer DE, Connelly C, Szak ST, Caputo EM, Cost GJ, et al. Human l1 retrotransposition is associated with genetic instability in vivo. Cell. 2002;110:327–338. [Abstract] [Google Scholar]
- Stellwagen AE, Craig NL. Mobile DNA elements: Controlling transposition with ATP-dependent molecular switches. Trends Biochem Sci. 1998;23:486–490. [Abstract] [Google Scholar]
- Boeke JD, Devine SE. Yeast retrotransposons: Finding a nice quiet neighborhood. Cell. 1998;93:1087–1089. [Abstract] [Google Scholar]
- Bushman FD. Targeting survival: Integration site selection by retroviruses and LTR-retrotransposons. Cell. 2003;115:135–138. [Abstract] [Google Scholar]
- Craig NL, Craigie R, Gellert M, Lambowitz A, editors. Mobile DNA II. Washington (D. C.): American Society Microbiology; 2002. 1204. p. [Google Scholar]
- Sandmeyer S. Integration by design. Proc Natl Acad Sci U S A. 2003;100:5586–5588. [Europe PMC free article] [Abstract] [Google Scholar]
- Zhu Y, Zou S, Wright DA, Voytas DF. Tagging chromatin with retrotransposons: Target specificity of the Saccharomyces Ty5 retrotransposon changes with the chromosomal localization of Sir3p and Sir4p. Genes Dev. 1999;13:2738–2749. [Europe PMC free article] [Abstract] [Google Scholar]
- Lewis P, Hensel M, Emerman M. Human immunodeficiency virus infection of cells arrested in the cell cycle. Embo J. 1992;11:3053–3058. [Europe PMC free article] [Abstract] [Google Scholar]
- Weinberg JB, Matthews TJ, Cullen BR, Malim MH. Productive human immunodeficiency virus type 1 (HIV-1) infection of nonproliferating human monocytes. J Exp Med. 1991;174:1477–1482. [Europe PMC free article] [Abstract] [Google Scholar]
- Lewis PF, Emerman M. Passage through mitosis is required for oncoretroviruses but not for the human immunodeficiency virus. J Virol. 1994;68:510–516. [Europe PMC free article] [Abstract] [Google Scholar]
- Roe T, Reynolds TC, Yu G, Brown PO. Integration of murine leukemia virus DNA depends on mitosis. Embo J. 1993;12:2099–2108. [Europe PMC free article] [Abstract] [Google Scholar]
- Ciuffi A, Mitchell RS, Hoffmann C, Leipzig J, Shinn P, et al. Integration site selection by HIV-based vectors in dividing and growth-arrested IMR-90 lung fibroblasts. Mol Ther. 2006;13:366–373. [Abstract] [Google Scholar]
- Barr SD, Ciuffi A, Leipzig J, Shinn P, Ecker JR, et al. HIV integration site selection: Targeting in macrophages and the effect of different routes of viral entry. Mol Ther. 2006. In press. [Abstract]
- Peden K, Emerman M, Montagnier L. Changes in growth properties on passage in tissue culture of viruses derived from infectious molecular clones of HIV-1LAI, HIV-1MAL, and HIV-1ELI. Virology. 1991;185:661–672. [Abstract] [Google Scholar]
- Miller AD, Law MF, Verma IM. Generation of helper-free amphotropic retroviruses that transduce a dominant-acting, methotrexate-resistant dihydrofolate reductase gene. Mol Cell Biol. 1985;5:431–437. [Europe PMC free article] [Abstract] [Google Scholar]
- O'Doherty U, Swiggard WJ, Malim MH. Human immunodeficiency virus type 1 spinoculation enhances infection through virus binding. J Virol. 2000;74:10074–10080. [Europe PMC free article] [Abstract] [Google Scholar]
Articles from PLOS Pathogens are provided here courtesy of PLOS
Full text links
Read article at publisher's site: https://doi.org/10.1371/journal.ppat.0020060
Read article for free, from open access legal sources, via Unpaywall: https://journals.plos.org/plospathogens/article/file?id=10.1371/journal.ppat.0020060&type=printable
Citations & impact
Impact metrics
Citations of article over time
Alternative metrics
Article citations
Current Status of Biomedical Products for Gene and Cell Therapy of Recessive Dystrophic Epidermolysis Bullosa.
Int J Mol Sci, 25(19):10270, 24 Sep 2024
Cited by: 0 articles | PMID: 39408598 | PMCID: PMC11476579
Review Free full text in Europe PMC
Breaking the Silence: Regulation of HIV Transcription and Latency on the Road to a Cure.
Viruses, 15(12):2435, 15 Dec 2023
Cited by: 3 articles | PMID: 38140676 | PMCID: PMC10747579
Review Free full text in Europe PMC
Scalable manufacturing of gene-modified human mesenchymal stromal cells with microcarriers in spinner flasks.
Appl Microbiol Biotechnol, 107(18):5669-5685, 20 Jul 2023
Cited by: 1 article | PMID: 37470820 | PMCID: PMC10439856
The lysine methyltransferase SMYD5 amplifies HIV-1 transcription and is post-transcriptionally upregulated by Tat and USP11.
Cell Rep, 42(3):112234, 09 Mar 2023
Cited by: 7 articles | PMID: 36897778 | PMCID: PMC10124996
ISAnalytics enables longitudinal and high-throughput clonal tracking studies in hematopoietic stem cell gene therapy applications.
Brief Bioinform, 24(1):bbac551, 01 Jan 2023
Cited by: 3 articles | PMID: 36545803 | PMCID: PMC9910212
Go to all (241) article citations
Data
Data behind the article
This data has been text mined from the article, or deposited into data resources.
BioStudies: supplemental material and supporting data
GEO - Gene Expression Omnibus (4)
- (1 citation) GEO - GSM23378
- (1 citation) GEO - GSM23377
- (1 citation) GEO - GSM23372
- (1 citation) GEO - GSM23373
Nucleotide Sequences (Showing 11 of 11)
- (1 citation) ENA - DX598304
- (1 citation) ENA - DX590011
- (1 citation) ENA - DX588312
- (1 citation) ENA - DX597229
- (1 citation) ENA - DX594687
- (1 citation) ENA - DX594688
- (1 citation) ENA - DX598305
- (1 citation) ENA - DX597228
- (1 citation) ENA - DX590615
- (1 citation) ENA - DX598906
- (1 citation) ENA - DX593208
Show less
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
Retroviral DNA integration: ASLV, HIV, and MLV show distinct target site preferences.
PLoS Biol, 2(8):E234, 17 Aug 2004
Cited by: 603 articles | PMID: 15314653 | PMCID: PMC509299
Human T-cell leukemia virus type 1 integration target sites in the human genome: comparison with those of other retroviruses.
J Virol, 81(12):6731-6741, 04 Apr 2007
Cited by: 114 articles | PMID: 17409138 | PMCID: PMC1900082
Transcription factor binding sites are genetic determinants of retroviral integration in the human genome.
PLoS One, 4(2):e4571, 24 Feb 2009
Cited by: 69 articles | PMID: 19238208 | PMCID: PMC2642719
Retroviral DNA integration--mechanism and consequences.
Adv Genet, 55:147-181, 01 Jan 2005
Cited by: 76 articles | PMID: 16291214
Review
Funding
Funders who supported this work.
NIAID NIH HHS (8)
Grant ID: R01 AI052845
Grant ID: R01 AI030927
Grant ID: R37 AI030927
Grant ID: R56 AI030927
Grant ID: AI52845
Grant ID: AI30927
Grant ID: AI34786
Grant ID: R01 AI034786