Abstract
Free full text
Population Level Analysis of Human Immunodeficiency Virus Type 1 Hypermutation and Its Relationship with APOBEC3G and vif Genetic Variation†
Abstract
APOBEC3G and APOBEC3F restrict human immunodeficiency virus type 1 (HIV-1) replication in vitro through the induction of G→A hypermutation; however, the relevance of this host antiviral strategy to clinical HIV-1 is currently not known. Here, we describe a population level analysis of HIV-1 hypermutation in near-full-length clade B proviral DNA sequences (n = 127). G→A hypermutation conforming to expected APOBEC3G polynucleotide sequence preferences was inferred in 9.4% (n = 12) of the HIV-1 sequences, with a further 2.4% (n = 3) conforming to APOBEC3F, and was independently associated with reduced pretreatment viremia (reduction of 0.7 log10 copies/ml; P = 0.001). Defective vif was strongly associated with HIV-1 hypermutation, with additional evidence for a contribution of vif amino acid polymorphism at residues important for APOBEC3G-vif interactions. A concurrent analysis of APOBEC3G polymorphism revealed this gene to be highly conserved at the amino acid level, although an intronic allele (6,892 C) was marginally associated with HIV-1 hypermutation. These data indicate that APOBEC3G-induced HIV-1 hypermutation represents a potent host antiviral factor in vivo and that the APOBEC3G-vif interaction may represent a valuable therapeutic target.
APOBEC3G (apolipoprotein B mRNA-editing enzyme, catalytic polypeptide-like 3G) and the closely related APOBEC3F are recently identified anthropoid-specific proteins that restrict human immunodeficiency virus (HIV) type 1 (HIV-1) replication by deaminating cytosine residues in intermediary single-stranded HIV DNA. This host antiviral strategy thereby introduces DNA editing errors into the retroviral genome sequence, resulting in the fixation of an inordinate number of proviral HIV DNA guanine-to-adenine (G→A) substitutions referred to as hypermutation (10, 20, 43). Previous studies have defined polynucleotide motifs within single-stranded DNA that are preferentially targeted by APOBEC3G (resulting in proviral DNA GG→AG substitutions [substituted bases are underlined and italicized]) and APOBEC3F (GA→AA) (2, 16, 36). These studies have also identified HIV-1 viral infectivity protein (vif) as the principal viral factor that counteracts APOBEC3-mediated DNA editing by promoting the degradation of APOBEC3-vif complexes via the proteasomal pathway (6, 21, 42). Recent in vitro data also indicate that APOBEC3G and APOBEC3F are partially resistant to vif (3), suggesting a more fundamental role for these APOBEC3 proteins in directing HIV-1 genetic variation. However, at present little is known of the disease-modulating effects of APOBEC3G and/or APOBEC3F in HIV-infected patients, and it is uncertain if APOBEC3-mediated HIV DNA editing in vivo requires permissive conditions, such as defective vif activity and/or APOBEC3 genetic variation. Here, we have utilized near-full-length clade B HIV-1 proviral DNA sequences (an average of 6,820 ± 1,187 nucleotides/subject; total, 11,202 G→A substitutions) from 127 HIV-infected, antiretroviral therapy-naïve individuals to address each of these issues at a population level.
MATERIALS AND METHODS
Patient selection.
To be included in the study, pretreatment proviral HIV-1 DNA sequences were required to be of clade B (see below) and of sufficient length (>1,000 nucleotides). These criteria allowed the inclusion of 136 adult HIV-infected patients from the Western Australian HIV cohort (19), representing a predominantly Caucasian (84%) male (88%) population who had acquired HIV infection through sexual contact (83%). All sequences were utilized in the construction of the population consensus HIV-1 sequence in this study.
For the analysis of HIV-1 hypermutation based on proportions of nonconsensus nucleotides representing G→A substitutions in one of the dinucleotide sequence contexts (GA, GG, GC, or GT), a further nine sequences with 77 or fewer nucleotide substitutions from the consensus were omitted. These formed a tight cluster of outliers disparate from the remainder of the sample and provided potentially unstable estimates of the relevant proportions which could corrupt results. Our analyses were thus based on 127 cases.
HIV-1 clade assignment.
HIV sequences were analyzed by phylogenetic analysis using Molecular Evolutionary Genetic Analysis, version 3 (MEGA 3.0; http://www.megasoftware.net) as follows. The nucleic acid sequences for the individual HIV-1 protein products (e.g., p17, p24, and reverse transcriptase) were extracted from the database. These were combined with 6 to 15 sequences from each of clades A, B, C, D, and F as well as the sequence for HXB2 and the M-group ancestral sequence; these reference sequences were downloaded from the Los Alamos HIV sequence database. Phylogenetic trees were constructed using the parameters (i) neighbor-joining tree inference, (ii) pairwise deletion, and (iii) the Kumara two-parameter substitution model. The resulting trees were then rooted on the M-group ancestral sequence and inspected. Most of the sequences clearly sorted with reference sequences of a particular clade, and these were assigned as belonging to that clade. Sequences that did not clearly belong to a specific clade were assigned as unknown. A final assignment of clade was made based on the assignments for all of the proteins. If all of the protein products for a particular patient were unambiguously assigned to the same clade, then that sample was assigned to that clade. If all protein products were of the same clade except for one protein product assigned as ambiguous, the consensus clade was again assigned to that sample. Patient samples in which the proteins belonged to several clades were noted as such (e.g., AB, BC) and excluded from the study.
Amplification and sequencing of HIV-1 proviral DNA, measurement of pretreatment HIV RNA levels, and HLA and CCR5 genotyping.
Amplification and sequencing of HIV-1 proviral DNA, measurement of pretreatment HIV RNA levels, and HLA and CCR5 genotyping were performed as previously described in the work of Moore et al. (23). Patient DNA was extracted from blood samples by use of a QIAGEN DNA extraction kit according to the manufacturer's instructions. The PCR conditions and primers used for the full-length amplification of the proviral HIV genome have been previously described (26). Briefly, the first-round PCR was performed with an Expand long-template PCR kit (Boehringer Mannheim) to produce a 9-kb amplified product. This first round product was then used as a template for 13 individual nested PCRs using Taq polymerase (Boehringer Mannheim) to amplify the entire HIV genome from gag p17 to the 3′ long terminal repeat. First- and second-round PCRs were performed on ABI 9700 and 9600 thermocyclers. Successfully amplified PCR products were sequenced in the reverse and forward directions using BigDye Terminator ready reaction prism kits (v3). The samples were electrophoresed on an ABI 3100 genetic analyzer, and sequencing data analyzed using ABI software package Seqscape Version 1.1. Mixtures (sites where more than one nucleotide was observed) were named according to the IUPAC standard. Sites that were unable to be assigned were designated as “N” and excluded from the analyses. Nucleotide sequence length within the study population averaged 6,820 ± 1,187 nucleotides (range, 1,329 to 8,768 nucleotides).
Analysis of G→A substitutions.
To estimate G→A substitutions, individual HIV-1 proviral DNA sequences were aligned against the population consensus clade B sequence (n = 136). Only G→A substitutions where the nucleotide at position +1 in the sample matched the corresponding nucleotide in the consensus sequence were examined (where GA, GC, GG, and GT dinucleotides in the consensus sequence were observed as AA, AC, AG, and AT, respectively, in the sample sequence). Mixture nucleotide results obtained from chromatograph analysis (indicating the presence of mixed nucleotide populations) were assigned as missing values. To estimate “general” G→A hypermutation for each sequence (i.e., without reference to the expected dinucleotide sequence context for APOBEC3F or APOBEC3G), we incorporated two previously used measures of hypermutation (27), G→A preference (#G→A substitutions/#all mutations, where # indicates “number of”) and G→A burden (#G→A substitutions/ #consensus G) into a single formula:
This represents the proportion of all mutations that are G→A substitutions adjusted for the proportion of nucleotides sequenced that are G in the consensus sequence.
Analysis of APOBEC3G and APOBEC3F target motifs.
To investigate the contribution of APOBEC3G (3G) and APOBEC3F (3F) to hypermutation, we examined the dinucleotide sequence contexts of G→A substitutions using the following formulae:
These represent the proportion of all G→A substitutions that occurred in the GG (APOBEC3G) and GA (APOBEC3F) contexts adjusted for the number of available GG (APOBEC3G) and GA (APOBEC3F) dinucleotides.
Analysis of APOBEC3G-mediated hypermutation (HM-3G) and HM-3F.
We combined the hypermutation formula with either the 3G or the 3F formula to identify sequences that had both an inordinate number of G→A substitutions relative to other substitutions and a high preference for G→A substitutions specifically in the sequence context targeted by either APOBEC3G or APOBEC3F as follows:
Thus, these represent the proportions of all mutations that were G→A substitutions that occurred in the GG (consolidated 3G score) and GA (consolidated 3F score) dinucleotide contexts, adjusted for nucleotide availability in the consensus sequence.
Histogram and Q-Q plots of the consolidated 3G values (log10 transformed) suggested a bimodal distribution with a normally distributed main group and a smaller group with higher values. The bimodal distribution was fitted by a mixture of two normal distributions using maximum likelihood, giving estimated distributions with means ± standard deviations (SD) of −0.277 ± 0.151 and 0.417 ± 0.151, respectively, with an estimated 9.0% of observations in the latter group. The presence of a mixture was highly significant (P < 0.00005; likelihood ratio test). The likelihood of belonging to the upper group was higher for the largest 12 observations (9.4%), with one likelihood ratio of 8 and the rest greater than 140. Hierarchical cluster analysis using between-group and centroid linkage both gave two clear groups with numbers of 115 and 12, as did k-means clustering. The nonhypermutated (NH) group is approximately normal. Corresponding cluster analyses of the non-APOBEC3G-hypermutated cases suggests three cases with inordinate consolidated 3F values.
APOBEC3G allele frequencies.
From eight participants with the highest G→A hypermutation scores and a pooled DNA control sample (n = 187 Caucasian individuals) (32), the entire APOBEC3G gene, including 2 kb 5′ of the transcription start site and 1.0 kb 3′ of the 3′ untranslated region, was amplified as two products of approximately 6.5 kb (3′ half) and 8 kb (5′ half) in size. For amplification of each product, 1 μl of DNA was amplified in a volume of 50 μl containing 10× Hi-Fi PCR buffer, 2 mM MgSO4 (1.5 mM MgSO4 for the 3′ half), 0.8 mM deoxynucleoside triphosphate mix, 100 nM of each forward and reverse primer, 1 U of PlatinumTaq Hi-Fi, and 2% dimethyl sulfoxide (1.5% for the 5′ half) under the following conditions: 94°C for 30 s, followed by 35 cycles of 94°C for 30 s, 63°C for 30 s (60°C for the 5′ half), and then by a 9-min extension step at 68°C. The PCR products were then purified using Exosap according to the manufacturer's protocol and sequenced using overlapping primers (see Table S1 in the supplemental material) and a BigDye Terminator kit, version 3.3 (PE, Applied Biosystems). The allele frequencies of the pooled DNA sample were determined by chromatograph relative peak height and compared to the allele frequencies for the eight patients harboring the HIV-1 proviral sequence with the greatest G→A hypermutation scores.
Statistical analysis.
Statistical analyses were carried out using SPSS software (version 12.0.1; SPSS Inc.). All values are expressed as means ± standard deviations unless otherwise stated. Where data were normally distributed, analysis of variance was used for group comparisons, with correction for multiple comparisons utilizing the Bonferroni method where appropriate. For nonnormal data, Mann-Whitney tests were used. The presence of in-frame stop codons in vif amino acid sequences was determined by putative translation of the proviral DNA sequences. With reference to analyses of the determinants of plasma HIV RNA levels prior to antiretroviral therapy, HLA-B27 (n = 7) and HLA-B58 (n = 4) were also considered as covariates along with HLA-B57 (n = 12) and the CCR5Δ32 genotype (n = 25).
RESULTS
Analysis of G→A substitutions.
We first sought to ascertain the population distribution of HIV-1 G→A substitutions in order to identify hypermutated HIV-1 sequences within the context of natural in vivo sequence variation. The study was restricted to sequences identified as clade B by phylogenetic analysis (n = 127), involving a study population of predominantly male (87%) Caucasian (84%) patients with chronic HIV-1 infection. All participants were antiretroviral therapy naïve at the time of clinical assessment and collection of proviral HIV-1 DNA for sequencing, as reflected in the distribution of HIV RNA/ml viral load (mean ± SD, 4.903 ± 0.760 log10 copies), CD4+ T cell counts (380.6 ± 298.3 cells/ml), and percentage of CD4+ T cells (19.3% ± 10.9%). To investigate G→A substitutions at the population level, we aligned each individual's HIV-1 proviral sequence to the population consensus sequence and for each sequence considered two measures of hypermutation as previously described (27): (i) the preference for G→A substitutions relative to all other substitutions (defined by the proportion of all mutations that were G→A substitutions) and (ii) the burden of G→A substitutions relative to the number of available consensus guanine nucleotides (defined by the proportion of consensus guanines that exhibited G→A substitutions). As expected, we found that G→A substitutions were the most common substitution observed (median ± interquartile range, 21.0% ± 4.0% of all substitutions), and there was a highly significant correlation between G→A preference and G→A burden in the study population (Spearman's r = 0.631; P < 0.001), as shown in Fig. Fig.1A1A.
We then incorporated both G→A burden and G→A preference into a single index (general G→A hypermutation score; see Materials and Methods) to ascertain if the distribution of G→A substitutions conformed to a model in which a proportion of sequences could be described as hypermutated relative to natural sequence variation. Utilizing analyses of population mixtures, these data were best described by a bimodal mixed distribution rather than as data from a single population (P < 0.001), consistent with the hypothesis that a proportion of proviral DNA sequences exhibited HIV-1 hypermutation.
Analysis of APOBEC3G and APOBEC3F dinucleotide target motifs.
As mentioned previously, APOBEC3G and APOBEC3F are known to target specific single-stranded polynucleotide DNA motifs. Therefore, in order to further characterize and investigate APOBEC3-associated HIV-1 hypermutation, we further examined the preference for G→A substitutions in the APOBEC3G (GG) and APOBEC3F (GA) dinucleotide contexts. This was first accomplished by investigating the dinucleotide sequence context for G→A substitutions (3G- and 3F-specific G→A substitution scores; see Materials and Methods). Again, the population distribution of the 3G-specific G→A substitution scores indicated the presence of APOBEC3G-mediated G→A substitutions in a proportion of sequences, in that there were two distinct clusters in the distribution of GG→AG substitutions in the study population (P< 0.001) (Fig. (Fig.1B).1B). In contrast, while there were three relatively extreme 3F-specific G→A substitution scores, it was not possible to demonstrate clear clusters (Fig. (Fig.1C1C).
Classification of APOBEC3G- and APOBEC3F-hypermutated HIV-1 sequences.
In order to formally identify sequences with evidence of HM-3G, we then analyzed the population distribution of HIV-1 G→A substitutions incorporating both (i) G→A preference relative to other mutations and (ii) preference for G→A substitutions within the GG context (consolidated 3G score; see Materials and Methods) (Fig. (Fig.2).2). Mixture models and cluster analyses consistently identified 12 sequences (9.4%) that fulfilled these criteria (P < 0.0005). As shown in Table Table1,1, these sequences exhibited a 1.9-fold preference for G→A substitutions on average and a 3.1-fold increase in G→A burden compared with the remaining NH sequences. In addition, 42.7% ± 8.7% of all G→A substitutions were in the GG context, compared with 17.7% ± 5.0% in the NH sequences. The average proportion of consensus guanine bases that demonstrated G→A substitutions in these hypermutated sequences (14.2%) was consistent with previous descriptions of hypermutated HIV-1 proviral DNA obtained from env gene sequences in vivo (16%) (15) and utilizing Δvif virions in vitro (24).
TABLE 1.
Characteristic or score | Value for indicated type of DNAa
| ||
---|---|---|---|
NH | HM-3G | HM-3F | |
Patient characteristics | |||
No. | 112 | 12 | 3 |
CD4+ count (cells/ml) | 367.2 ± 306.1 | 491.3 ± 236.7 | 421.3 ± 152.1 |
CD4+ percentage | 18.5 ± 10.7 | 24.7 ± 11.4 | 28.3 ± 8.3 |
Viral load (log10 copies/ml) | 4.981 ± 0.747 | 4.476 ± 0.50* | 3.721 ± 0.680* |
HIV-1 sequence characteristics | |||
No. of nucleotides sequenced per patient | 6,906 ± 1,016 | 6,571 ± 1,549 | 4,591 ± 3,182 |
No. of non-G→A mutations | 290 ± 66 | 295 ± 97 | 246 ± 129 |
No. of G→A substitutions | 75 ± 18 | 210 ± 107 | 110 ± 40 |
G→A rate (per 100 nucleotides) | 1.08 ± 0.21 | 3.4 ± 1.8 | 3.3 ± 2.4 |
G→A substitutions: all substitutions | 0.206 ± 0.029 | 0.40 ± 0.123 | 0.321 ± 0.049 |
G→A burden (% of consensus G's) | 4.6 ± 0.9 | 14.2 ± 7.5 | 14.9 ± 11.4 |
GG (% of G→A substitutions)b | 17.7 ± 5.0 | 42.7 ± 8.7 | 10.8 ± 5.2 |
GA (% of G→A substitutions)b | 32.1 ± 6.0 | 26.1 ± 7.0 | 59.9 ± 11.3 |
Hypermutation scoresc | |||
General G→A hypermutation | −0.06 ± 0.06 | 0.22 ± 0.11 | 0.14 ± 0.08 |
3G-specific G→A substitutions | −0.21 ± 0.13 | 0.19 ± 0.11 | −0.45 ± 0.25 |
3F-specific G→A substitutions | −0.05 ± 0.08 | −0.15 ± 0.11 | 0.22 ± 0.06 |
Consolidated 3G | −0.28 ± 0.15 | 0.41 ± 0.16 | −0.30 ± 0.16 |
Consolidated 3F | −0.12 ± 0.11 | 0.07 ± 0.15 | 0.36 ± 0.15 |
Three additional sequences showed a strong preference for HM-3F using these criteria (Table (Table1),1), with 59.9% ± 11.3% of all G→A substitutions occurring in the GA context, compared with only 26.1% ± 7.0% and 32.1% ± 6.0% in the HM-3G and NH sequences, respectively.
While G→A substitutions were the most common substitution observed for the 112 NH sequences, there was no association between the general G→A hypermutation score and the 3G-specific G→A substitution score (Pearson's r = 0.093; P = 0.331) or the 3F-specific G→A substitution score (r = 0.140; P= 0.142). Hence, it is unlikely that APOBEC3G- or APOBEC3F-mediated DNA editing contributes significantly to the rate of G→A substitutions (identified by bulk PCR sequencing) within these nonhypermutated proviral DNA sequences.
HIV-1 genomic distribution of G→A hypermutation.
From the data presented in Fig. Fig.3,3, measures of general G→A hypermutation (Fig. (Fig.3A),3A), APOBEC3G-specific G→A substitutions (Fig. (Fig.3B),3B), and APOBEC3G-mediated hypermutation (Fig. (Fig.3C)3C) were distributed relatively evenly along the genome. Consistent with results from Yu and colleagues (41), general G→A hypermutation and APOBEC3G-mediated hypermutation scores in HM-3G sequences were significantly higher for pol (P< 0.001) than for gag (P = 0.005); in contrast, however, they were significantly lower for env than for pol (P values for both were <0.001). Furthermore, consistent with results from Wurtzer and colleagues (37), the distance from the central polypurine tract was significantly negatively correlated to the general G→A hypermutation (r = −0.283; P = 0.007), 3G-specific G→A substitution (r = −0.250; P = 0.020), and consolidated 3G (r = −0.297; P = 0.006) scores.
Association of vif amino acid polymorphisms and G→A hypermutation.
We subsequently sought to investigate associations between vif amino acid polymorphisms and hypermutation. For such analyses, we performed a putative translation of available vif sequences (n = 124) and examined vif polymorphism within the NH sequences (vif amino acids 1 to 193) compared with the corresponding amino acid in the HM-3G sequences. HM-3F sequences were omitted from these analyses so that specific associations relevant to the APOBEC3G-vif interaction could be examined, as it has recently been reported that specific vif amino acid polymorphisms differentially influence APOBEC3G versus APOBEC3F interaction (31).
vif peptide sequences derived from the HM-3G group were significantly different from NH vif sequences in terms of a higher overall rate of nonconsensus amino acids (P < 0.0005; Mantel-Haenszel). The strongest associations were tryptophan-to-in-frame stop substitutions at positions 70, 89, 174, and 21 and substitution of isoleucine for the methionine start codon (all Pc [corrected P] values were <0.04 by the Bonferroni method). These substitutions, which would be anticipated to code for truncated and functionally defective vif proteins, are all in the trinucleotide sequence context targeted by APOBEC3G (TGG) and therefore are likely to have resulted from, rather than caused, HIV-1 hypermutation. HM-3G sequences were also associated with an R90K polymorphism (Pc = 0.07; Bonferroni); again, however, this could be attributed to the action of APOBEC3G. No other vif polymorphisms were found to be significantly associated with APOBEC3G-associated hypermutation (all Pc values were >0.4).
Regarding the potential role of truncated vif sequences in permitting APOBEC3G-mediated HIV-1 hypermutation, it is notable that while HM-3G sequences were significantly associated with in-frame stop codons specifically in vif (P < 0.001), in-frame stop codons in non-vif or non-vif, non-env proteins were not associated with HM-3G sequences (P values were 0.20 and 0.63, respectively). Additionally, HIV-1 hypermutation was not evident in the two sequences in which the only vif in-frame stop codon was at vif amino acid position 153 (leucine) or 174 (tryptophan). In these sequences, G→A substitutions accounted for only 20.7% and 24.9% of all substitutions, affecting only 4.0% and 5.0% of consensus guanine residues, respectively. It has previously been demonstrated that a naturally occurring vif protein truncated at tryptophan174 is indistinguishable from wild-type vif in terms of the maintenance of viral infectivity (25). Here, the occurrence of a naturally occurring vif protein truncated at leucine153 also suggests that the C-terminal 39 amino acids are dispensable for this protein's roles in promoting viral infectivity as well as counteracting APOBEC3G. Interestingly, despite the codon for tryptophan being a target for APOBEC3G, tryptophan residues 5, 11, and 79 were entirely conserved in the hypermutated vif sequences and are therefore unlikely to play a role in inhibiting APOBEC3G or APOBEC3F activity, in contrast to data from Tian and colleagues (33).
We also identified vif polymorphisms that were unique to the HM-3G sequences within this study population, given previous evidence that sequence variation at a number of critical residues can affect vif-APOBEC3G interactions (29, 35). Twenty-three vif amino acid variants were found uniquely in HM-3G sequences (see Fig. Fig.5),5), of which 82% were within regions previously shown to be required for vif-APOBEC3G interaction (29, 35). Of these, 13 could not have resulted from an APOBEC3G-mediated G→A substitution of the NH consensus amino acid, all of which were located within the vif N terminus. It is also interesting that a charged consensus amino acid was present in 14/23 of these positions and that the variant vif sequence was frequently associated with either a reversal of the charge state (6/14, 43%) or substitution of a neutral amino acid (3/14, 21%). An additional three vif amino acids were unique to the two HM-3F vif sequences available and also could not have resulted from a G→A substitution of any of the alternative amino acids present in the NH sequences. Again, two of the HM-3F unique amino acids were located within the vif N terminus (L98 and S103).
Figure Figure44 demonstrates the high degree of vif polymorphism that appears to be tolerated without apparently increasing viral susceptibility to the effects of APOBEC3G and/or APOBEC3F. Within the nonhypermutated group, polymorphism was observed at the serine144 and leucine148 residues (underlined) of the SOCS-box N-terminal BC-box motif (144SLQYLA149) required for vif phosphorylation (40) and Elongin BC and Cul5 binding (22), respectively, and at the proline162 (1.0%) residue of the SOCS-box C-terminal motif 161PPLP164, required for vif multimerization and subsequent vif function (39). In contrast, the existence of polymorphism at these sites in our population suggests that they may not be essential for maintaining vif function. However, the recently described Hx5Cx17-18Cx3-5H zinc-binding motif (38), critical for assembly and activity of the vif-Cul5-E3 ligase (18), was entirely conserved.
APOBEC3G genetic variation and G→A hypermutation.
To assess the contribution of APOBEC3G genetic variation to hypermutation, we sequenced the entire 10.5-kb APOBEC3G gene, including 2 kb 5′ of the transcription start site and 0.5 kb 3′ of the 3′ untranslated region. We initially compared the frequency of APOBEC3G alleles from eight patients with the greatest G→A burden to that estimated from a pooled DNA sample (187 Caucasian individuals) (32) as a control, based on relative chromatogram peak height. As demonstrated in Table Table2,2, the APOBEC3G amino acid sequence was highly conserved in this predominantly Caucasian study population. Of the 22 single nucleotide polymorphisms (SNPs) identified (17 previously described and 5 novel SNPs), significant differences in allele frequencies between the patients with hypermutated sequences and the pooled DNA control sample were evident at positions 625 (9607609) and 6,892 (5757467) relative to the transcription start site (Table (Table2).2). We then genotyped the 625 and 6,892 SNPs for patients that had DNA available (n = 119). The frequency of the 6,892 C allele tended to be higher in patients with evidence of APOBEC3G-mediated hypermutation (50.0% versus 29.9%; P = 0.062), with homozygosity for the 6,892 C allele present in 25% of patients with evidence of APOBEC3G-mediated hypermutation compared with 7.5% of patients with nonhypermutated sequences (P = 0.082). However, this marginally significant univariate association was abrogated after adjusting for multiple comparisons (Pc > 0.5). The frequencies of the 625 C allele were similar for patients with evidence of APOBEC3G-mediated hypermutation and for patients harboring NH sequences (allele frequency, 75.0% versus 77.2% [P= 0.80]; CC genotype frequency, 58.3% versus 57.3% [P= 1.0]).
TABLE 2.
NCBI dbSNP IDb | Positionc | Frequency ofd:
| No. of HM-3G patients that weree:
| ||||||
---|---|---|---|---|---|---|---|---|---|
Control sample
| Allele 2 from the work of Do et al. (7) | HM-3G patient
| A1.A1 | A1.A2 | A2.A2 | ||||
Allele 1 | Allele 2 | Allele 1 | Allele 2 | ||||||
7291971 | −1,193 | 0.60 | 0.40 | 0.40 | 0.44 | 0.56 | 2 | 3 | 3 |
6519166 | −1,190 | 0.70 | 0.30 | 0.31 | 0.56 | 0.44 | 2 | 5 | 1 |
5750743 | −321 | 0.50 | 0.50 | 0.28 | 0.44 | 0.56 | 1 | 5 | 2 |
12160242 | 480 | >0.90 | <0.10 | 0.09 | 0.94 | 0.06 | 7 | 1 | 0 |
9607609 | 625 | 0.80 | 0.20 | ND | 0.38 | 0.62 | 2 | 2 | 4 |
5757465 | 3,756† | 0.60 | 0.40 | 0.51 | 0.75 | 0.25 | 4 | 4 | 0 |
12158985 | 5,985 | >0.90 | <0.10 | 0.07 | 0.88 | 0.12 | 6 | 2 | 0 |
2294366 | 5,986 | 0.70 | 0.30 | 0.34 | 0.56 | 0.44 | 2 | 5 | 1 |
2294367 | 6,207 | 0.60 | 0.40 | 0.42 | 0.44 | 0.56 | 2 | 3 | 3 |
NL | 6,848 | >0.90 | <0.10 | ND | 0.88 | 0.12 | 6 | 2 | 0 |
3091374 | 6,878 | 0.50 | 0.50 | ND | 0.44 | 0.56 | 2 | 3 | 3 |
5757467 | 6,892 | 0.70 | 0.30 | ND | 0.31 | 0.69 | 0 | 5 | 3 |
5757468 | 6,893 | 0.70 | 0.30 | ND | 0.56 | 0.44 | 2 | 5 | 1 |
NL | 6,894 | >0.90 | <0.10 | ND | 0.88 | 0.12 | 6 | 2 | 0 |
NL | 8,984‡ | >0.90 | <0.10 | ND | 0.88 | 0.12 | 6 | 2 | 0 |
5757471 | 9,218 | >0.90 | <0.10 | ND | 0.94 | 0.06 | 7 | 1 | 0 |
28474760 | 9,250 | >0.90 | <0.10 | 0.07 | 0.88 | 0.12 | 6 | 2 | 0 |
28474761 | 9,432 | >0.90 | <0.10 | 0.12 | 0.88 | 0.12 | 6 | 2 | 0 |
NL | 10,393 | 0.50 | 0.50 | ND | 0.92 | 0.08 | 5 | 1 | 0 |
NL | 10,399 | 0.70 | 0.30 | ND | 1.00 | 0.0 | 8 | 0 | 0 |
5757472 | 10,600 | 0.50 | 0.50 | ND | 0.44 | 0.56 | 1 | 5 | 2 |
3891126 | 10,811 | >0.90 | <0.10 | ND | 0.88 | 0.12 | 6 | 2 | 0 |
G→A hypermutation and HIV-1 viremia.
The biological and clinical relevance of HIV-1 hypermutation in proviral DNA sequences is ultimately a function of its impact on productive HIV infection, measured by the level of viremia in plasma samples. We therefore sought to investigate the effect of HIV-1 hypermutation on pretreatment viremia in this study population. Univariate analysis indicates that patients harboring hypermutated sequence as defined had viral loads significantly lower than those with nonhypermutated sequence (4.32± 0.60 versus 4.98 ± 0.75 log10copies HIV RNA/ml; P = 0.001). By use of linear regression models, this association between hypermutation and lower pretreatment viremia remained highly significant (P = 0.013) even after adjusting for CD4+ percentage (P < 0.001) and presence of at least one of the known protective host alleles, CCR5Δ32 (17), HLA-B57, HLA-B58, or HLA-B27 (9) (P = 0.096), and approximates 67% and 40% reductions in viral loads attributable to hypermutation and host protective alleles, respectively. Hence, the clinical influence of HIV-1 hypermutation appears to be demonstrable within this group of patients, who manifest a marked preference for G→A substitutions in an APOBEC3G or APOBEC3F sequence context.
DISCUSSION
The data presented in this population-based study of clade B HIV-1 sequences suggest that HIV-1 G→A hypermutation is a prevalent phenomenon associated with significantly reduced plasma HIV RNA levels in vivo, indicating that APOBEC3G- and APOBEC3F-mediated hypermutation can take its place alongside other protective host genetic factors as a clinically and biologically relevant antiretroviral restriction factor. Indeed, the reduced HIV-1 viremia associated with hypermutation was substantially greater than that exerted by known host factors, such as the CCR5Δ32 chemokine receptor variant, and remained highly significant after adjusting for these variables as well as for the influence of CD4+ count. Such reductions in viral load are substantially greater than those previously associated with protective HLA alleles (28) and are comparable to the effects of zidovudine monotherapy in early studies of antiretroviral treatment (4).
Here, it was estimated that hypermutated proviral DNA was present in 12% of study participants by use of a bulk PCR sequencing approach that approximates the dominant species within a heterogeneous viral population. In many respects, these findings complement the work of Kieffer and colleagues (14), who examined the phenomenon of in vivo hypermutation in detail utilizing 319 pol clones derived from nine patient samples. This study revealed at least one hypermutated proviral DNA sequence from each individual examined, although hypermutated sequences accounted for less than 10% of the proviral population within an individual. Similar results have been obtained by Janini et al. (11), who detected the presence of G→A hypermutation in 45% of HIV protease sequences from a Tanzanian study population. These observations are consistent with in vitro data demonstrating low-level APOBEC3G- and APOBEC3F-mediated cytidine deaminase activity directed against HIV-1 sequences even in the presence of functional vif (13, 21). It has also been suggested that the rapid turnover of vif through proteasomal degradation contributes to constitutively low cellular protein expression (34), thereby providing an incomplete barrier to APOBEC3G and/or APOBEC3F activity. In this study, we now provide evidence that hypermutated proviral HIV-1 DNA sequences can be demonstrated by bulk PCR methods in a significant minority of clade B HIV-1-infected individuals. Taken together, these data suggest that APOBEC3-mediated cytidine deamination is highly prevalent but effectively constrained by functional interactions with vif, so that G→A hypermutation is generally confined to a minor proportion of the overall viral population.
It should be noted that the sequence context targeted by APOBEC3G appears to specifically select for loss-of-function mutations, including within the vif genomic sequence. For example, targeted substitutions involving tryptophan residues (TGG) produce in-frame stop codons (TAG), while the vif start codon and adjacent aspartic acid residue also create an APOBEC3G target motif (ATGG) that results in loss of the methionine initiation signal following G→A substitution. To some extent, this susceptibility appears to have been overcome by the utilization of multiple alternative start codons at methionine residues 8, 16, and 29, although it is notable that the capacity to counteract APOBEC3G activity appears to be lost as a consequence of these N-terminal truncations (31, 34). In this study, substitutions associated with truncated vif sequences could be entirely explained by APOBEC3G-mediated cytidine deamination (Fig. (Fig.5),5), although it is notable that hypermutation was associated specifically with vif in-frame stop codons but not with in-frame stop codons in non-vif viral peptides. Hence, while it is difficult to attribute causality in the relationship between hypermutation and defective vif, the most parsimonious explanation appears to be provided by a mechanism in which APOBEC3-mediated G→A substitutions, occurring in a relatively nonpermissive cellular environment but targeting TGG trinucleotide motifs stochastically along the genome, can become unconstrained if this process selects for defective vif sequences.
Although the translated vif sequences in this cohort were highly polymorphic (Fig. (Fig.4),4), there were 23 vif amino acid variants unique to HIV-1 hypermutated sequences, of which 13 associated with HM-3G sequences and three unique to HM-3F sequences could not have resulted from cytidine deamination by the relevant APOBEC3 protein (Fig. (Fig.5).5). These data are consistent with those that suggest the vif N terminus contains distinct motifs that are likely to interface with APOBEC3G and APOBEC3F (33, 35) and may therefore represent true vif allelic variants that facilitate hypermutation. Although the associated amino acids differed, four of the twenty-three vif amino acids unique to HM-3G sequences (K45, E75, E138, and K185) occurred at positions previously associated with naturally occurring nonfunctional vif variants (31).
With reference to the nonhypermutated vif sequences, we observed polymorphism at residues within key motifs shown to be required for vif phosphorylation (40), vif-E3 ligase complex formation (22), vif multimerization (39), and β-sheet formation (8). In this study, however, these variants were not associated with APOBEC3-mediated hypermutation, suggesting that polymorphisms within these motifs are tolerated without significantly compromising vif function. While it is unclear if these polymorphisms alter the efficiency of polyubiquitination and proteasomal degradation of the vif-APOBEC3G-E3 ligase complex in vivo, the lack of hypermutation in these sequences could potentially be attributed to an undiminished capacity for vif-APOBEC3G complex formation, which is sufficient to restrict APOBEC3G-mediated hypermutation in vitro (13). Despite the high degree of vif polymorphism, several stretches of conserved residues, most notably the first 18 N-terminal amino acids, were observed and may represent novel drug targets.
The dinucleotide sequence context of the G→A substitutions suggests that APOBEC3G was the dominant contributor to hypermutation in this study group. These findings are consistent with the previously mentioned in vivo study by Kieffer et al., which involved clade B HIV-1 sequences (14). In contrast, two studies involving Tanzanian study populations infected with predominantly non-clade B virus (11, 15) indicate less bias towards the APOBEC3G context. These differences warrant further exploration, particularly as a nonsynonymous APOBEC3G 186R polymorphism associated with accelerated progression to AIDS (and therefore implying loss of APOBEC3G function) has been found to be frequent in African Americans (37%) but rare in European Americans (5%) (1). Hence, a relatively greater contribution of APOBEC3F to HIV-1 cytidine deaminase editing may be anticipated in at least some populations of African origin. In this study, there was no convincing evidence that APOBEC3G genetic variation contributed to the development of HIV-1 hypermutation, and it is notable that we were unable to identify significant amino acid variation within APOBEC3G sequences in these Caucasian populations (Table (Table2).2). A marginal univariate association between an intronic APOBEC3G 6,892 C allele and hypermutation was demonstrated, although the role of this intronic polymorphism on APOBEC3G function is unclear. However it is possible that functional polymorphisms are located in the as-yet-undefined promoter or regulatory regions of this genetic locus, and in this regard it is notable that APOBEC3G mRNA levels in stimulated peripheral blood mononuclear cells have recently been found to exhibit an inverse correlation with HIV-1 pretreatment viremia (12), though no correlation exists in resting T cells (5).
In conclusion, this study provides strong support for the proposition that APOBEC3G-mediated cytosine deamination provides a potent source of innate HIV-1 restriction at the population level. Moreover, the prevalence and antiretroviral impact of APOBEC3G- and APOBEC3F-mediated HIV-1 hypermutation within the study population can be compared favorably with known host protective factors. Since Sheehy and colleagues first isolated the APOBEC3G gene 4 years ago and tentatively proposed a cytidine deaminase function for its product (30), considerable advances in the understanding of APOBEC3 proteins and their antiviral activity have been made. We believe that these data contribute to the premise that APOBEC3G comprises a form of innate antiviral resistance that is clinically and biologically relevant and may lend a fresh perspective to the area of HIV/AIDS therapies (2).
Acknowledgments
We declare that we have no competing financial interests. S. Gaudieri was supported by a Healy Fellowship from the Raine Medical Research Foundation.
We are indebted to all participants in the Western Australian HIV Cohort Study and to past and present laboratory staff of the Department of Clinical Immunology & Biochemical Genetics, Royal Perth Hospital, Western Australia, and the Centre for Clinical Immunology & Biomedical Statistics. In particular, we acknowledge the contribution of Filipa Carvalho in the area of HIV genomic sequencing. We thank Graeme Stewart for the kind donation of the pooled DNA sample, L. Park for HIV-1 clade assignment, and A. Rauch for critically reading the manuscript.
REFERENCES
Articles from Journal of Virology are provided here courtesy of American Society for Microbiology (ASM)
Full text links
Read article at publisher's site: https://doi.org/10.1128/jvi.00888-06
Read article for free, from open access legal sources, via Unpaywall: https://europepmc.org/articles/pmc1563905?pdf=render
Citations & impact
Impact metrics
Citations of article over time
Article citations
Human APOBEC3 Variations and Viral Infection.
Viruses, 13(7):1366, 14 Jul 2021
Cited by: 23 articles | PMID: 34372572 | PMCID: PMC8310219
Review Free full text in Europe PMC
Potential Utilization of APOBEC3-Mediated Mutagenesis for an HIV-1 Functional Cure.
Front Microbiol, 12:686357, 15 Jun 2021
Cited by: 7 articles | PMID: 34211449 | PMCID: PMC8239295
Review Free full text in Europe PMC
The frequency of defective genes in vif and vpr genes in 20 hemophiliacs is associated with Korean Red Ginseng and highly active antiretroviral therapy: the impact of lethal mutations in vif and vpr genes on HIV-1 evolution.
J Ginseng Res, 45(1):149-155, 08 Apr 2020
Cited by: 1 article | PMID: 33437166 | PMCID: PMC7790868
Impact of Suboptimal APOBEC3G Neutralization on the Emergence of HIV Drug Resistance in Humanized Mice.
J Virol, 94(5):e01543-19, 14 Feb 2020
Cited by: 9 articles | PMID: 31801862 | PMCID: PMC7022346
USP49 potently stabilizes APOBEC3G protein by removing ubiquitin and inhibits HIV-1 replication.
Elife, 8:e48318, 09 Aug 2019
Cited by: 16 articles | PMID: 31397674 | PMCID: PMC6701944
Go to all (99) article citations
Data
Data behind the article
This data has been text mined from the article, or deposited into data resources.
BioStudies: supplemental material and supporting data
Nucleotide Sequences
- (1 citation) ENA - K03455
RefSeq - NCBI Reference Sequence Database
- (1 citation) RefSeq - NT_011520
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
Ancient adaptive evolution of the primate antiviral DNA-editing enzyme APOBEC3G.
PLoS Biol, 2(9):E275, 20 Jul 2004
Cited by: 325 articles | PMID: 15269786 | PMCID: PMC479043
Differential requirement for conserved tryptophans in human immunodeficiency virus type 1 Vif for the selective suppression of APOBEC3G and APOBEC3F.
J Virol, 80(6):3112-3115, 01 Mar 2006
Cited by: 101 articles | PMID: 16501124 | PMCID: PMC1395459
HIV-1 Vif protein blocks the cytidine deaminase activity of B-cell specific AID in E. coli by a similar mechanism of action.
Mol Immunol, 44(4):583-590, 31 Mar 2006
Cited by: 15 articles | PMID: 16580072
APOBEC deaminases as cellular antiviral factors: a novel natural host defense mechanism.
Med Sci Monit, 12(5):RA92-8, 01 May 2006
Cited by: 12 articles | PMID: 16641889
Review