Abstract
Free full text
Constituents of SH1, a Novel Lipid-Containing Virus Infecting the Halophilic Euryarchaeon Haloarcula hispanica
Abstract
Recent studies have indicated that a number of bacterial and eukaryotic viruses that share a common architectural principle are related, leading to the proposal of an early common ancestor. A prediction of this model would be the discovery of similar viruses that infect archaeal hosts. Our main interest lies in icosahedral double-stranded DNA (dsDNA) viruses with an internal membrane, and we now extend our studies to include viruses infecting archaeal hosts. While the number of sequenced archaeal viruses is increasing, very little sequence similarity has been detected between bacterial and eukaryotic viruses. In this investigation we rigorously show that SH1, an icosahedral dsDNA virus infecting Haloarcula hispanica, possesses lipid structural components that are selectively acquired from the host pool. We also determined the sequence of the 31-kb SH1 genome and positively identified genes for 11 structural proteins, with putative identification of three additional proteins. The SH1 genome is unique and, except for a few open reading frames, shows no detectable similarity to other published sequences, but the overall structure of the SH1 virion and its linear genome with inverted terminal repeats is reminiscent of lipid-containing dsDNA bacteriophages like PRD1.
Viruses infecting hosts in the domain Archaea represent diverse morphotypes (25, 57, 58). However, only some dozen genome sequences have been deposited into sequence databases to date (December 2004). Within the Crenarchaeota, viruses have been isolated and sequenced that infect members of the classes Sulfolobales and Thermoproteales. By morphology, the viruses of Sulfolobales represent spindle-shaped (Sulfolobus spindle-shaped viruses [72]), rod-shaped (Sulfolobus islandicus filamentous virus [3]; S. islandicus rod-shaped viruses 1 and 2 [53]; Acidianus filamentous virus 1 [15]), and icosahedral (Sulfolobus turreted icosahedral virus [STIV] [60]) forms. For Thermoproteales the only sequenced isolate is the Pyrobaculum spherical virus, which has been proposed to possess a lipid envelope surrounding a helical nucleoprotein core (30). The kingdom Euryarchaeota contains physiologically diverse members of methanogenic, halophilic, and thermoacidophilic organisms, and viruses of this group include the Methanothermobacter viruses psiM1 and psiM2 (47, 54), the lytic HF1 that infects cells of the Haloferax and Haloterrigena genera (68), the lytic HF2 that infects Halorubrum cariense (67), and the temperate Natrialba magadii virus Ch1 (40). All have head-tail morphology reminiscent of bacteriophages belonging to the families Myoviridae (such as T4) and Siphoviridae (such as λ).
There is a growing body of sequence information accumulating for these viruses. The tailed double-stranded DNA (dsDNA) bacteriophages, the viruses most commonly isolated from the environment, have strongly influenced the current view of viral genomics (18, 52). Such viruses probably infect most if not all bacterial and some archaeal species. The genomes of tailed phages are mosaics that have likely experienced global genetic exchange (44, 67, 68). On the other hand, tailless bacterial viruses with an internal membrane belonging to the family Tectiviridae do not seem to obey this rule. The family consists of six isolates that infect gram-negative bacteria (PRD1, PR3, PR4, PR5, L17, and PR772) and three that infect gram-positive bacteria (Bam35, AP50, and GIL01) (5; www.ncbi.nlm.nih.gov/ICTV). The genome sequences of isolates that infect gram-negative bacteria are highly similar, with overall identities between 91.9% and 99.8% (A.-M. Saren, J. J. Ravantti, S. D. Benson, R. M. Burnett, L. Paulin, D. H. Bamford, and J. K. H. Bamford, unpublished data). The two sequenced Tectiviridae isolates that infect gram-positive bacteria, Bam35 and GIL01 (59, 71), have only a few nucleotide differences.
Although the two groups of tectiviruses (infecting gram-negative or gram-positive bacteria) show no sequence similarity, their morphologies are nearly identical, and their genome organization and length are similar (59). Much is known of the structural organization of PRD1, with the atomic resolution structure determined down to 4 Å (1). The structure revealed a widely utilized assembly principle for the protein capsid and showed the icosahedrally ordered internal virus membrane (1, 20). When the sequence of the Bam35 coat protein was threaded to the X-ray structure of the PRD1 coat protein, the result indicated that these viruses most probably share a related coat protein fold (14). Most interestingly, the archaeal virus STIV has been proposed to have a capsid architecture and coat protein fold similar to that of PRD1 and to possibly have a membrane (14, 60).
The detailed characterization of archaeal viruses is in its infancy because the Archaea are the most recently described domain of life, and many of the members live in extreme environmental conditions (16, 28). The sequencing of some of the archaeal virus genomes has yielded comparable results to those obtained for the tailed bacteriophages; e.g., the sequences of haloviruses HF1 and HF2 suggest the presence of mosaic genomes produced by genetic exchange (68). However, DNA sequences have not revealed much information about the structures or functions of individual viral proteins since matches to known sequences in the databases are scarce. The same observation has been made for sequences of viruses in the family Tectiviridae (8, 59, 71).
Our interests reside in icosahedral dsDNA viruses with an internal membrane. We have proposed that a number of these viruses that infect bacterial and eukaryotic host, such as PRD1, Bam35, Paramecium bursaria Chlorella virus-1 (PBCV-1), Chilo iridescent virus, and Mimivirus, are structurally related, suggesting a common ancestor that predates the separation of the three domains of life (6, 7, 14). Strong support for such a proposal would come from data that viruses with such architecture also infect archaeal hosts. The first such proposal has been made based on structural studies of STIV (31, 60). However, rigorous evidence of a lipid component in the STIV virion has yet to be provided.
To further explore the relationships between archaeal, bacterial, and eukaryotic viruses, we analyzed SH1, a recently identified virus isolate that infects the halophilic euryarchaeon Haloarcula hispanica (25, 55). We sequenced the viral genome, analyzed the lipid components, and identified genes for most of the structural proteins. We found SH1 to be an icosahedral dsDNA virus with morphology that suggests the presence of a membrane underneath the protein capsid, similar to PRD1 and related bacterial and eukaryotic viruses (14, 55). Thus, SH1 is a prime candidate to be included in the PRD1-type virus lineage.
MATERIALS AND METHODS
Growth and purification of SH1.
SH1 virus was grown on H. hispanica (ATCC 33960) (34) according to the method of Dyall-Smith (24). The virus purification has previously been described (55). In short, the produced virus was concentrated from the virus stock (5 × 1010 to 1 × 1011 PFU/ml) with 10% (wt/vol) polyethylene glycol (PEG 8000) (Sorvall SLA3000 rotor; 10,000 rpm for 30 min at 4°C). The virus was then resuspended in purification buffer used throughout the procedure (40 mM Tris-HCl, pH 7.2, 40 mM MgCl2, and 1 M NaCl), followed by purification in a linear 5% to 20% (wt/vol) sucrose gradient (Sorvall AH626 rotor; 22,000 rpm for 55 min at 20°C) and an equilibrium CsCl gradient (Sorvall AH627 rotor; 24,000 rpm for 22 h at 20°C). The equilibrated virus band was collected and diluted 1:2 in 40 mM Tris-HCl (pH 7.2)-40 mM MgCl2 and collected by differential centrifugation (Sorvall T647.5 rotor; 32,000 rpm for 3 h at 20°C). After the pellet was resuspended in purification buffer, the protein concentration was measured by the Coomassie blue method of Bradford (17) using bovine serum albumin as a standard. The number of infective particles was then determined. The specific infectivity was ~2 × 1012 PFU/mg of protein.
Extraction and analysis of the lipids. (i) Lipid extraction.
Lipids from cell and virus suspensions were extracted according to the method of Folch et al. (27), as modified for halophiles (37), and stored in chloroform-methanol (9:1, vol/vol) at −20°C until analyzed.
(ii) TLC.
Lipids of total extracts were separated by thin-layer chromatography (TLC) on silica gel 60 plates (Merck) using chloroform-methanol-90% acetic acid (65:4:35, vol/vol/vol) as the solvent (21). The lipids were visualized by iodine vapor. Molybdate (23) and 0.1% orcinol-sulfuric acid reagent were used to detect phospholipids and glycolipids, respectively. The phosphorus content of lipid spots was determined as described by Kahma et al. (35). Lipid-containing areas were scraped from a preparative TLC and reextracted for mass spectrometry (MS) analysis. Semiquantitative TLC was performed as described previously (42).
(iii) MS analyses.
Samples were dissolved in chloroform-methanol (1:2, vol/vol) plus 1% ammonia and infused (7 μl/min) into a Micromass Quattro Micro mass spectrometer (Micromass, Manchester, United Kingdom) operated in the negative ion mode. Nitrogen was used as a desolvation (flow, 500 liters/h) and cone (flow, 50 liters/h) gas. The potentials of the capillary, cone, extractor, and RF lens were −3800 V, −40 V, −2, and −0.5, respectively. The spectra were acquired over the m/z range from 200 to 1,999 (5 s/scan), and 30 to 50 scans were averaged for each spectrum. Lipids were identified based on (i) their m/z value, (ii) product ion analysis, and (iii) precursor ion scans. Thus, for example, the presence of phosphate, sulfate, and glycerophosphate moieties was detected by scanning for precursors of m/z 79, 97, and 153, respectively.
Genome sequencing.
Genomic DNA was isolated from the purified virus particles by protease K and sodium dodecyl sulfate (SDS) treatment. For this, the virus preparation was diluted 1:5 with water to lower the salt concentration of the buffer. Protease K was added to the final concentration of 150 μg/ml, and the mixture was incubated for 1 h at 37°C. SDS was added (final concentration, 2%), and the incubation was continued for an additional 40 min, after which the mixture was treated with multiple phenol-chloroform and ether extractions. DNA was recovered from the aqueous phase by ethanol precipitation and used for cloning or as template for sequencing.
The DNA was digested either with AccI, RsaI, or HincII restriction enzymes, treated with Klenow polymerase to create blunt ends, and the fragments were ligated into the SmaI-digested pSU18 vector (10). The resulting clones were sequenced, and the sequences were further used to design specific oligonucleotides for primer walking using the virus genome as a template. Sequencing reactions were performed by the dideoxy chain termination method using a DYEnamic ET terminator Cycle sequencing Kit with ThermoSequenase II DNA polymerase (Amersham Biosciences) on an ABI Prism 377 automated sequencer (Applied Biosystems) at the sequencing service facility of the Institute of Biotechnology, University of Helsinki. The genomic sequence was assembled using the assembly program of the Vector NTIR Suite8 software package (Informax Inc.). Difficult regions, which did not yield high-quality sequence with direct genomic walking, were amplified by PCR, and the sequence was determined from the fragments using specific primers. As a result, each nucleotide was determined more than twice except for the region 3050 to 4690, where only one strand could be determined (several times, however).
Protein analysis. (i) Protein identification.
The protein composition of the virion was analyzed by SDS-polyacrylamide gel electrophoresis (PAGE) using either 8% or 17% polyacrylamide gels (51) or 14% polyacrylamide-tricine-SDS gels (63).
(ii) N-terminal sequence analysis.
SDS-PAGE-separated proteins were transferred by electroblotting (46) onto a polyvinylidene difluoride membrane (ProBlott; Perkin Elmer Applied Biosystems Division, Foster City, Calif.). After being stained with Coomassie brilliant blue, the protein bands of interest were removed and subjected to N-terminal sequence analysis in a Procise 494A protein sequencer (Perkin Elmer Applied Biosystems Division, Foster City, CA).
(iii) Protein digestion and mass spectrometric analysis of fragments.
SDS-PAGE-separated proteins were stained with Coomassie brilliant blue, and protein bands of interest were cut from the gel. The proteins were “in-gel” digested essentially as described by Shevchenko et al. (64). Proteins were reduced with dithiothreitol and alkylated with iodoacetamide before digestion with trypsin (Sequencing Grade Modified Trypsin, V5111; Promega). Peptides generated by enzymatic cleavage were analyzed by matrix-assisted laser desorption-ionization time of flight (MALDI-TOF) MS for mass fingerprinting and electrospray tandem mass spectrometry for determining partial amino acid sequences. MALDI-TOF mass spectra of peptide fragments were acquired using an Ultraflex TOF/TOF instrument (Bruker-Daltonik GmbH, Bremen, Germany) and electrospray ionization quadruple time of flight tandem mass spectra for de novo sequencing using a Q-TOF instrument (Micromass, Manchester, United Kingdom) as described previously (56). The obtained peptide masses and partial sequences were then compared with the sequences of the structural proteins translated from the determined gene nucleotide sequences.
Computational methods.
Programs from the EMBOSS software package (61) were used with locally made shell scripts to calculate several of the results. The GC percentage and length were calculated using the program Infoseq. Inverted DNA repeats were identified using the Einverted program. The predicted genes were translated from open reading frames (ORFs) to proteins with Transeq. Statistics for each ORF were calculated with the program Pepstats and Freak. The secondary structure predictions were made using the mfold program (73). Several web-based services were used to further characterize the SH1 genome. The putative ORFs were found using GeneMark.hmm and GeneMark programs (45; http://opal.biology.gatech.edu/GeneMark/). Several different Markov models were used to ensure that all reasonable ORFs were found. Possible homologies to known proteins were searched with PSI-BLAST (2; http://www.ncbi.nlm.nih.gov/BLAST/). PSI-BLAST was executed for each ORF until no new hits were reported. Prediction of transmembrane helices was performed using the TMHMM program (41; http://www.cbs.dtu.dk/services/TMHMM/). Protein fold recognition was performed using the 3D-PSSM method (39; http://www.sbg.bio.ic.ac.uk/~3dpssm/).
Nucleotide sequence accession number.
Nucleotide sequence data for SH1 have been deposited in the GenBank database under accession number AY950802.
RESULTS
SH1 is a lipid-containing virus.
The total lipid extracts of H. hispanica and highly purified SH1 were examined by thin-layer chromatography (Fig. (Fig.1A).1A). The individual lipid components were identified based on their relative positions and specific staining of the lipid spot on the TLC plate, the mass of the intact lipid, and its fragmentation products as determined by electrospray ionization mass spectrometry (Fig. (Fig.1B1B).
TLC analysis showed that the lipid composition of H. hispanica was quite complex, containing four major and some minor components (Fig. (Fig.1A,1A, spots designated 1 to 8). The Rf of spot 1 and a minor, doubly charged peak (m/z = 760.2) in the negative ion mass spectrum (not shown) suggested the presence of cardiolipin. Spots 3, 5, and 6 most probably represent archaeal phosphatidylglycerol (PG), phosphatidylglycerophosphate methyl ester (PGP-Me), and phosphatidylglycerosulfate (PGS), respectively, as indicated by their relative Rf values (21) and specific staining (data not shown). The masses of these lipids (805.4, 899.4, and 885.4, respectively) (Fig. (Fig.1B),1B), as well as precursor and product ion fragmentation analyses (see Materials and Methods), strongly support this identification. These lipids are common in halophiles (37), and their masses are identical to those of the saturated C20/C20 molecular species that are typical for Haloarcula species (36). However, small amounts of lower-mass species of PG (m/z range, 793.5 to 805.5), PGS (m/z range, 873.4 to 885.4), and PGP-Me (m/z range, 887.4 to 899.5) were also detected, indicating that molecular species with unsaturated and/or cyclopentyl-containing alkyl chains could also be present. Spot 7 was carbohydrate stain positive, suggesting the presence of a glycolipid in the host. The mass spectrum of the material extracted from this spot showed two peaks with m/z of 1137.65 and 1173.65 (Fig. (Fig.1B).1B). These masses (and their isotopic patterns) correspond to those predicted for triglycosyl glycerodiether (TGD) ([M-H]− = 1137.82), found in Haloarcula species (26, 38), and its hydrogen chloride adduct ([M-H]− + HCl = 1173.82). The minor lipid components (i.e., spots 2, 4, and 8) could not be positively identified.
Figure 1A and B indicate that the lipid composition of SH1 is simpler than that of the host; i.e., it contained only three major lipids (PG, PGP-Me, and PGS). Phosphate analysis revealed that SH1 contained a higher proportion of PGP-Me and a lower proportion of PGS compared to the host. The proportion of PG was similar in the host and the virus (Table (Table1).1). SH1 did not contain detectable amounts of glycolipids, which constituted approximately 20% of total host lipids, as estimated from semiquantitative TLC analysis. The TLC data also indicated that neutral lipids represented only about 10% to 15% in SH1, while they comprised 20% to 30% of the total lipids of the host.
TABLE 1.
Lipid | Phospholipid composition (mol%)
| |
---|---|---|
SH1 | H. hispanica | |
PG | 16.5 ± 1.9 | 13.6 ± 2.8 |
PGP-Me | 81.7 ± 2.0 | 56.9 ± 4.1 |
PGS | 1.4 ± 0.7 | 24.7 ± 2.0 |
Otherb | ~0.5 | ~4 |
Genome sequence.
The double-stranded linear genomic DNA (55) was isolated from highly purified virus particles, and its sequence was determined. The linear genome of SH1 is 30,898 bp long and contains 309-bp inverted terminal repeats (ITRs). The genome has an overall G+C content of 68.4%, which is comparable to the corresponding values in halophilic archaeal organisms, e.g., the sequenced regions of the host of SH1 H. hispanica (63.5%), the entire genome of Halobacterium sp. NRC-1 (66.9% [50]), and the largest chromosome (62%) of the recently sequenced genome of Haloarcula marismortui (4). Since the linear SH1 genome has long ITRs, it was computationally folded to determine whether specific hairpin structures occur in the single-stranded form, as in some linear virus genomes (A.-M. Saren, J. J. Ravantti, S. D. Benson, R. M. Burnett, L. Paulin, D. H. Bamford, and J. K. H. Bamford, unpublished data). The minimum folding energy of SH1 single-stranded DNA (−4884.0 kcal/mol) did not differ from five randomly generated sequences with statistics (ITRs, genome length, and G+C content) similar to the SH1 genome, suggesting the absence of such structures in the SH1 genome.
The potential ORFs were assigned using the GeneMark.hmm 2.1 program (http://opal.biology.gatech.edu/GeneMark/) and the Halobacterium sp. NRC-1 codon usage table (http://www.kazusa.or.jp/codon/). The analysis yielded 48 potential ORFs corresponding to proteins longer than 40 amino acid residues. We also analyzed the regions upstream of the potential ORFs to determine putative (bacterial type) ribosome binding sequences, and additional potential ORFs were found. By combining these observations, we concluded that the SH1 genome might contain 56 ORFs (Table (Table2).2). This number agrees well with the number of annotated ORF frequencies in other sequenced archaeal virus genomes (1.7 ORFs/kb on average for all previously known archaeal virus genomes versus 1.8 ORFs/kb in SH1). Interestingly, using these criteria, two of the ORFs (ORFs 1 and 56) are assigned to the ITR regions. It will be intriguing to see whether they are functional. SH1 ORFs begin with either ATG (35) or GTG (20). Many of the ORFs are translationally coupled by overlapping start and stop codons (TAATG or TAGTG). Based on the transcriptional direction of the ORF clusters in the SH1 genome, there appears to a minimum of four operons (Fig. (Fig.2).2). BLAST searches provided only a few hits, the most significant being an ORF17 hit to ATPases (for details, see the description of structural proteins below). ORF48 has high similarity (52%) to an unassigned ORF71 of halovirus Ch1 (accession number AAM88745).
TABLE 2.
ORF | Positiona
| Directionb | No. of residues | Molecular mass (kDa) | Calculated isoelectric point | TMc | GC% | Functional assignments | RBS sequenced | RBS distancee | |
---|---|---|---|---|---|---|---|---|---|---|---|
Start | Stop | ||||||||||
ORF 1 | 164 | 421 | + | 85 | 8.89 | 11.62 | 64.73 | ||||
ORF 2 | 418 | 573 | + | 51 | 5.68 | 12.57 | 73.72 | AGG | −9 | ||
ORF 3 | 570 | 848 | + | 92 | 10.00 | 12.04 | 64.87 | ||||
ORF 4 | 849 | 1106 | + | 85 | 9.42 | 3.74 | 69.38 | AGGAGG | −8 | ||
ORF 5 | 1103 | 1330 | + | 75 | 8.52 | 3.49 | 67.54 | GGTG | −5 | ||
ORF 6 | 1323 | 1733 | + | 136 | 14.25 | 4.97 | 2 | 69.59 | GGTG | −6 | |
ORF 7 | 1730 | 1870 | + | 46 | 4.77 | 4.52 | 71.63 | (GGGGG) | −7 | ||
ORF 8 | 1863 | 2177 | + | 104 | 11.67 | 5.13 | 68.57 | GGAGGT | −8 | ||
ORF 9 | 2170 | 2430 | + | 86 | 9.32 | 3.94 | 68.97 | GAGGTG | −5 | ||
ORF 10 | 2427 | 2585 | + | 52 | 5.35 | 12.12 | 66.04 | GGAGG | −7 | ||
ORF 11 | 2582 | 2920 | + | 112 | 12.63 | 4.43 | 68.14 | GGAGG | −5 | ||
ORF 12 | 3047 | 3208 | + | 53 | 5.71 | 11.74 | 61.11 | ||||
ORF 13 | 3321 | 7493 | + | 1,390 | 152.33 | 3.96 | 68.58 | VP1 | |||
ORF 14 | 7498 | 7770 | + | 90 | 9.50 | 4.46 | 2 | 70.33 | AGGAGGTG | −7 | |
ORF 15 | 7772 | 8125 | + | 117 | 12.27 | 4.09 | 3 | 67.51 | GGAGG | −8 | |
ORF 16 | 8118 | 8348 | + | 76 | 8.48 | 3.55 | 67.97 | AGGAGG | −9 | ||
ORF 17 | 8400 | 9122 | − | 240 | 26.89 | 5.37 | 64.45 | ATPase? | AGGAGG | −5 | |
ORF 18 | 9123 | 9326 | − | 67 | 6.93 | 4.13 | 67.16 | AGGAGG | −3 | ||
ORF 19 | 9319 | 9750 | − | 143 | 15.53 | 3.91 | 67.13 | GGT | −7 | ||
ORF 20 | 9747 | 10397 | − | 216 | 23.07 | 3.16 | 67.28 | GGTGA | −9 | ||
ORF 21 | 10397 | 10804 | − | 135 | 14.37 | 3.69 | 1 | 67.65 | GGAGG | −9 | |
ORF 22 | 10809 | 11480 | − | 223 | 24.18 | 3.95 | 66.07 | AGGA | −4 | ||
ORF 23 | 11545 | 11829 | + | 94 | 9.83 | 11.03 | 2 | 61.05 | VP12 | GAGG | −15 |
ORF 24 | 11845 | 12402 | + | 185 | 20.01 | 4.15 | 64.7 | VP7 | GGAGGT | −8 | |
ORF 25 | 12404 | 13102 | + | 232 | 25.70 | 3.88 | 65.67 | VP4 | (GGGGG) | −8 | |
ORF 26 | 13118 | 13339 | + | 73 | 7.88 | 3.64 | 63.96 | GGAGGT | −8 | ||
ORF 27 | 13343 | 13585 | + | 80 | 8.81 | 4.88 | 65.43 | VP13 | GGA | −14 | |
ORF 28 | 13589 | 16063 | + | 824 | 81.06 | 3.64 | 71.84 | VP2 | GGAG | −8 | |
ORF 29 | 16060 | 16971 | + | 303 | 31.84 | 4.41 | 69.74 | VP5 | GGAGGTG | −7 | |
ORF 30 | 16974 | 17459 | + | 161 | 16.88 | 4.31 | 68.52 | VP10 | GGAGGT | −8 | |
ORF 31 | 17459 | 17908 | + | 149 | 16.54 | 3.97 | 62.67 | VP9 | (GGGGG) | −10 | |
ORF 32 | 17921 | 18934 | + | 337 | 37.50 | 4.02 | 63.51 | VP3 | |||
ORF 33 | 18941 | 19633 | + | 230 | 24.97 | 4.09 | 65.95 | VP6 | GGAG | −9 | |
ORF 34 | 19636 | 19950 | + | 104 | 11.40 | 5.63 | 1 | 67.62 | GTGA | −2 | |
ORF 35 | 20047 | 20232 | − | 61 | 6.69 | 8.5 | 71.51 | GGAGG | −9 | ||
ORF 36 | 20229 | 20408 | − | 59 | 6.20 | 3.9 | 70 | GGAGG | −7 | ||
ORF 37 | 20405 | 20521 | − | 38 | 4.23 | 7.36 | 73.5 | GAGGTG | −5 | ||
ORF 38 | 20518 | 20664 | − | 48 | 5.37 | 7.81 | 65.99 | GGTG | −5 | ||
ORF 39 | 20661 | 20930 | − | 89 | 9.73 | 4.51 | 71.85 | GGAGGT | −6 | ||
ORF 40 | 20927 | 21088 | − | 53 | 6.39 | 12.17 | 66.67 | GGAGG | −5 | ||
ORF 41 | 21085 | 21576 | − | 163 | 19.15 | 5.32 | 67.07 | GGAGG | −5 | ||
ORF 42 | 21573 | 22772 | − | 399 | 44.38 | 4.93 | 69.25 | GGAGG | −8 | ||
ORF 43 | 22776 | 23135 | − | 119 | 13.12 | 3.83 | 69.17 | GAGG | −3 | ||
ORF 44 | 23119 | 24357 | − | 412 | 46.30 | 4.04 | 67.88 | GGAGGTG | −7 | ||
ORF 45 | 24354 | 24530 | − | 58 | 6.20 | 8.81 | 66.1 | ||||
ORF 46 | 24533 | 24790 | − | 85 | 10.03 | 4.12 | 64.34 | GGT | −5 | ||
ORF 47 | 24794 | 24970 | − | 58 | 6.55 | 8.67 | 68.36 | GAGGTGA | −6 | ||
ORF 48 | 24967 | 25407 | − | 146 | 16.34 | 4.64 | 70.52 | GGTGA | −6 | ||
ORF 49 | 25404 | 25940 | − | 178 | 19.79 | 4.5 | 70.95 | AGGTG | −5 | ||
ORF 50 | 25937 | 26602 | − | 221 | 24.38 | 4.5 | 67.72 | GGT | −4 | ||
ORF 51 | 26708 | 27085 | − | 125 | 13.43 | 4.23 | 72.49 | GGAGG | −7 | ||
ORF 52 | 27082 | 27330 | − | 82 | 9.04 | 4.02 | 67.87 | GGAGG | −10 | ||
ORF 53 | 27327 | 27677 | − | 116 | 12.90 | 3.84 | 71.23 | GAGG | −12 | ||
ORF 54 | 27770 | 27883 | − | 37 | 3.97 | 7.12 | 71.05 | GAGG | −7 | ||
ORF 55 | 27880 | 30477 | − | 865 | 94.89 | 4.02 | 72.4 | ||||
ORF 56 | 30514 | 30726 | − | 70 | 7.67 | 13.06 | 68.54 |
The putative ribosome binding sequence (RBS) and its distance from the translation initiation codon (ATG or GTG) in each ORF is presented in Table Table2.2. Analogous to prokaryotic viral genes, a sequence complementary to the 3′ end of the host H. hispanica 16S rRNA was found in most of the SH1 ORFs. However, seven of the ORFs did not have obvious RBSs. This is not surprising since archaeal transcription and translation machineries are mosaics of eukaryotic and prokaryotic ones (11). Typical bacterial features of archaeal transcripts are that they are polycistronic and the genes usually have upstream RBSs (22). However, the first gene of the transcript (or an individually transcribed gene) often lacks such signals, suggesting a different initiation mechanism (62, 69). If both mechanisms are used in SH1, there could be six to eight operons (Table (Table22).
Most of the predicted SH1 gene products had a low calculated isoelectric point (pI < 5) (Fig. (Fig.2),2), similar to haloarchaeal organisms (4, 67). However, based on the calculated pI, some of the SH1 gene products were extremely basic in character (pI > 10). One of the highly basic gene products, VP12 encoded by ORF23 (see below), was confirmed to be a protein component of the virion, suggesting that the acidic character of a predicted protein cannot alone be used as a criterion to determine the putative gene products of halophilic viruses.
SH1 virion is composed of ~15 structural protein species.
To identify the genes encoding the structural proteins of the virion, the highly purified virus material was subjected to SDS-PAGE. The resulting 15 separable Coomassie brilliant blue-stained bands were subjected to identification by protein chemical methods. The proteins, corresponding to apparent molecular masses from 185 kDa to 4 kDa (Fig. (Fig.3),3), were assigned as VP1 to VP15 (where VP is virion protein). For gene start identification, the proteins were subjected to N-terminal sequence analysis, with the results shown in Table Table33 (additional data may be accessed at www.DBLAB.HELSINKI.FI/SUPPLEMENTS). Eight N-terminal amino acid sequences were identified, all of which correlated with the translated gene products from the genome sequence. In all the determined protein N termini, the initiating methionine had been removed. The seven protein bands yielding no detectable signals in N-terminal sequencing (VP5, VP6, VP8, VP11, VP13, VP14, and VP15) were all minor bands in the Coomassie-stained SDS-polyacrylamide gel (Fig. (Fig.3)3) and were hardly visible on the polyvinylidene difluoride membrane after electroblotting and staining. Repeated attempts to obtain N-terminal sequences from these protein bands failed, even though our sequencer is routinely used for sequencing proteins in the 2 to 5 pmol range.
TABLE 3.
Protein | N-terminal sequencea | No. of massesb | No. of peptides/no. of residuesc |
---|---|---|---|
VP1 | ADSTNTDMPL. . . | 17 | 9/69 |
VP2 | GFFSDLKG. . . | 15 | 7/72 |
VP3 | TTIGPKTDNL. . . | 2 | 2/17 |
VP4 | ADQTQEYTISH. . . | 6 | 6/64 |
VP5 | No result | 5 | No result |
VP6 | No result | No result | 1/12 |
VP7 | GNIGNLSAEK. . . | 6 | 6/56 |
VP8 | No result | No result | No result |
VP9 | VPGLXDNED. . . | 5 | No result |
VP10 | GKFALAS. . . | 4 | 7/71 |
VP11 | No result | No result | No result |
VP12 | ASINVSR. . . | No result | 1/10 |
VP13 | No result | 1 | No result |
VP14 | No result | No result | No result |
VP15 | No result | No result | No result |
All proteins (VP1 to VP15) were further subjected to mass spectrometric identification by peptide mass fingerprint analysis. For this the protein bands were digested in gel with trypsin, and the resulting peptides were analyzed. Peptide mass fingerprint analysis by MALDI-TOF MS yielded peptide masses that were compared with the theoretical tryptic peptide masses from the deduced virion proteins. The number of peptide masses from the mass fingerprint corresponding to the masses of theoretical tryptic peptides (mass error within 100 ppm) for each of the virion proteins is shown in Table Table3.3. Analysis of tryptic peptides by liquid chromatography-electrospray tandem mass spectrometry yielded partial sequences from a number of virion proteins. These data (Table (Table3)3) were further used to confirm the genes and to determine nucleotide sequences. The results from mass fingerprint analyses and determined partial amino acid sequences are provided elsewhere (www.DBLAB.HELSINKI.FI/SUPPLEMENTS). Protein sequencing with Edman degradation and MALDI-TOF mass fingerprint analysis together with tandem mass spectrometry confirmed the identification of genes for the virion proteins VP1 to VP7, VP9, VP10, VP12, and VP13. No sequence or peptide mass information could be obtained from the suggested viral protein bands VP8, VP11, VP14, and VP15.
Most of the genes for the structural proteins are located in the middle of the genome.
Of the 15 potential structural proteins, 11 were confirmed using protein chemistry methods. Comparison of the amino acid sequences to the DNA sequences revealed that the proteins are coded by the following ORFs: ORF13 (VP1), ORF23 (VP12), ORF24 (VP7), ORF25 (VP4), ORF27 (VP13), ORF28 (VP2), ORF29 (VP5), ORF30 (VP10), ORF31 (VP9), ORF32 (VP3), and ORF33 (VP6). All genes encoding structural proteins, except ORF13, are located in the middle of the genome and are most probably transcribed from a single operon from left to right in the genome map (Fig. (Fig.2).2). The most abundant virion proteins are VP3, VP4, VP7, and VP12 (Fig. (Fig.3).3). The calculated masses of VP3 (37.5 kDa), VP4 (25.7 kDa), and VP7 (20.0 kDa) suggest that they might represent the coat-associated proteins of the SH1 virion (see also protein complexes below). VP12 (9.8 kDa) is highly abundant and has two membrane-spanning helices according to the TMHMM prediction (Table (Table2).2). These characteristics suggest a role for VP12 as a major structural membrane protein of SH1.
The calculated molecular mass of VP2 is 81.1 kDa. VP2 is rich in glycine (15%) and serine (7%) residues. There are two linker-type regions in the protein: 11 glycine residues at positions 114 to 124 and a GS-rich repeat at positions 282 to 302 that separate the N-terminal, middle, and C-terminal domains. A stretch of 19 amino acid residues is repeated in the N-terminal domain (Fig. (Fig.4).4). The C-terminal domain contains a heptameric sequence DXAARGY (where X is D/E and Y is A/S/T) that is repeated five times (amino acids 640 to 674). If more variation was allowed in the sequence alignments, the heptad repeat could cover 178 residues starting at residue 490 and ending at residue 688 (Fig. (Fig.4).4). The heptapeptide repeat pattern is characteristic for alpha-helical coiled-coil proteins, and secondary structure prediction proposes high helicity (data not shown). These characteristics suggest that VP2 could be an elongated fiber-like protein suitable for forming an extended spike. Similar to bacteriophage PRD1 spike protein P5, VP2 has a domain structure. In the PRD1 spike, a collagen-like repeated region and eight glycine residues function as linkers to separate the protein domains and to make the spike flexible (9, 19, 32).
No peptide sequence or mass spectrometric data could be obtained for resolving the genes of the minor virion proteins VP8, VP11, VP14, and VP15. The database search with ORF17 (coding capacity of 26.9 kDa) gave hits to ATPases including PRD1 DNA packaging ATPase. In PRD1 the packaging ATPase is a structural component of the virion, unlike the packaging terminases of tailed phages (8, 48, 65). When dsDNA virus ATPase sequences were aligned, the predicted gene product of SH1 ORF17 appeared to have the same conserved motifs as the packaging ATPases of PRD1-type bacterial viruses. These motifs include the classical Walker A and B motifs as well as a third conserved motif common to membrane-containing dsDNA viruses (such as PBCV-1 and vaccinia virus) but not found in other dsDNA viruses (such as tailed phages or herpesvirus [66]). SH1 has a linear dsDNA genome and an internal membrane reminiscent of PRD1. The packaging system might also resemble that of PRD1, with the SH1 packaging ATPase a structural component of the virion, possibly VP8 (apparent molecular mass, 20 kDa).
Virion proteins VP1, VP4, and VP7 form stable complexes.
Electrophoresis of a highly purified virion sample in nonreducing conditions revealed additional major bands representing protein complexes (55). The information on SH1 gene sequences and structural proteins that we obtained was used to analyze these complexes. Figure Figure55 depicts a nonreducing low-percentage polyacrylamide (8%) gel displaying proteins and protein complexes from 50 kDa to several hundred kDa. The identity of indicated bands was confirmed by peptide mass fingerprinting with a significant number of peptide hits. Monomeric proteins VP1 (calculated molecular mass, 152.3 kDa) and VP2 (calculated molecular mass, 81.1 kDa) were identified. In addition, five major bands designated C1 to C5 (where C is complex) were observed. C1 and C2 yielded tryptic peptide masses only corresponding to VP1. Although the apparent complex masses from SDS-PAGE are somewhat lower than those calculated from the amino acid sequence for a VP1 trimer and dimer, respectively, they most probably represent such homomultimers. Band C3 (apparent mass, 74 kDa) yielded tryptic peptide masses from VP4 only. Bands C4 and C5 yielded tryptic peptide masses from both VP4 and VP7. It should be noted, however, that the accuracy of determining molecular masses for multimers using electrophoresis analysis is relatively low, especially under nonreducing conditions.
DISCUSSION
Here we directly demonstrated the presence of lipids in the highly purified SH1 virion. Several phospholipids, at least one glycolipid (TGD), and ~25% neutral lipids were present in H. hispanica, whereas SH1 contained only three major phospholipids (PG, PGP-Me, and PGS) and ~12% neutral lipids. In addition to qualitative differences, there were also quantitative ones; for example, the level of PGP-Me was higher and that of PGS was lower in the virus than in the host (Table (Table1).1). Such qualitative and quantitative differences indicate that the virus selectively acquires lipids from the host lipid pool. Several mechanisms could explain this selectivity. For instance, the virus could derive its lipids from a putative host membrane domain (lateral or transversal) that lacks glycolipids. Alternatively, selection could take place due to selective lipid-protein and/or lipid-DNA interactions, as suggested for the lipid-containing bacteriophage PRD1 (20, 42), or could be due to coulombic repulsion and the shape of lipid molecules, as discussed elsewhere (43). Interestingly, MS analysis showed that in addition to lipid head group selectivity, SH1 is enriched in certain PG species (e.g., m/z = 793.4) that probably contain unsaturated and/or cyclopentyl-containing alkyl chains. Such enrichment supports the idea that the molecular shape of such lipids favors their incorporation in the viral membrane (see references 42 and 43).
Database searches using the SH1 genome sequence did not reveal clear similarities to previously identified sequences. This reinforces the observation that only a limited number of archaeal genomes have been sequenced and that archaeal genes differ considerably from those of bacterial and eukaryotic origin. However, the ORFs seem to be arranged in multiple polycistronic operons, and most of them possess a potential 5′ ribosome binding site, a genome organization that resembles that of bacterial viruses (Table (Table22).
We performed a detailed protein chemistry analysis of the SH1 structural proteins. This allowed us to positively identify 11 genes encoding structural proteins. Comparison of SH1 protein composition to that of PRD1, a bacterial dsDNA virus with an internal membrane, highlighted a clear difference: PRD1 has at least nine integral membrane proteins, whereas SH1 has one very abundant putative integral membrane protein (VP12) and possibly only a few others. As the PRD1 membrane proteins are mostly involved in DNA delivery (29), this suggests that the infection mechanism for SH1 deviates from that of PRD1. Interestingly, one of the SH1 ORFs (ORF 17) encodes a protein with ATPase motifs similar to those observed in internal membrane-containing viruses of bacterial and eukaryotic hosts. A similar protein has been identified as the packaging ATPase in PRD1 (66). The presence of a putative packaging ATPase in SH1 predicts an assembly pathway in which preformed empty viral particles (procapsids) are packaged. This prediction is in agreement with the observed empty progeny particles in thin sections of infected cells (55). The abundance of VP1, VP2, VP3, VP4, and VP7 in the virus particle and their sequence features and multimericity (except VP2 and VP3) suggest that they are components of the virion coat (Fig. (Fig.44 and and5).5). The most obvious explanation for the multimericity under nonreducing conditions would be the presence of S-S bridges. However, VP4, VP7, and VP1 all have only one cysteine residue. The nature of these multimers requires a more detailed structural analysis.
In addition to the 11 genes encoding structural proteins, there appears to be a large number of ORFs (approximately 40) residing in several operons. Surprisingly, we were not able to identify any ORF with canonical DNA polymerase motifs. Obviously, there are many putative genes available for study that may be involved in replication, transcription, and regulatory functions as well as for cell entry and exit, creating a challenge for future studies.
Can SH1 be related to other viruses? The closest candidate obviously is STIV. SH1 and STIV infect hosts belonging to different archaeal kingdoms. Both viruses are icosahedral with no tail and most likely have an internal membrane component. It has been proposed that the STIV coat architecture and coat protein fold resemble those of bacterial virus PRD1 (60). PRD1, again, has a coat architecture and capsid protein fold that are similar to adenovirus, PBCV-1, and possibly several other viruses infecting bacterial and eukaryotic hosts (12, 13, 14, 33, 49, 70). Based on such observations, it has been proposed that all these viruses originated from a common ancestor (6, 7). This lineage hypothesis predicts that architecturally similar viruses should also be found to infect archaeal hosts. More detailed structural studies of SH1 will reveal whether this lineage hypothesis has predictive power.
Acknowledgments
Terhi Kemppinen is acknowledged for skillful technical assistance.
This investigation was supported by grant 1201964 (to J.K.H.B.), grant 55081 to P.S., and grants 1202855 and 1202108 (to D.H.B., Finnish Centre of Excellence Program, 2002 to 2005) from the Academy of Finland. The laboratory of M.D.-S. is supported by an MRGS grant from the University of Melbourne.
REFERENCES
Articles from Journal of Virology are provided here courtesy of American Society for Microbiology (ASM)
Full text links
Read article at publisher's site: https://doi.org/10.1128/jvi.79.14.9097-9107.2005
Read article for free, from open access legal sources, via Unpaywall: https://europepmc.org/articles/pmc1168735?pdf=render
Citations & impact
Impact metrics
Citations of article over time
Article citations
Borg extrachromosomal elements of methane-oxidizing archaea have conserved and expressed genetic repertoires.
Nat Commun, 15(1):5414, 26 Jun 2024
Cited by: 3 articles | PMID: 38926353 | PMCID: PMC11208441
Archaeal virus entry and egress.
Microlife, 5:uqad048, 03 Jan 2024
Cited by: 0 articles | PMID: 38234448 | PMCID: PMC10791045
Review Free full text in Europe PMC
A filamentous archaeal virus is enveloped inside the cell and released through pyramidal portals.
Proc Natl Acad Sci U S A, 118(32):e2105540118, 01 Aug 2021
Cited by: 5 articles | PMID: 34341107 | PMCID: PMC8364153
Assembly of complex viruses exemplified by a halophilic euryarchaeal virus.
Nat Commun, 10(1):1456, 29 Mar 2019
Cited by: 12 articles | PMID: 30926810 | PMCID: PMC6441041
Methanosarcina Spherical Virus, a Novel Archaeal Lytic Virus Targeting Methanosarcina Strains.
J Virol, 91(22):e00955-17, 27 Oct 2017
Cited by: 22 articles | PMID: 28878086 | PMCID: PMC5660497
Go to all (69) article citations
Other citations
Data
Data behind the article
This data has been text mined from the article, or deposited into data resources.
BioStudies: supplemental material and supporting data
Nucleotide Sequences (2)
- (2 citations) ENA - AY950802
- (1 citation) ENA - AAM88745
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
Quantitative dissociation of archaeal virus SH1 reveals distinct capsid proteins and a lipid core.
Virology, 356(1-2):4-11, 28 Aug 2006
Cited by: 23 articles | PMID: 16935317
A snapshot of viral evolution from genome analysis of the tectiviridae family.
J Mol Biol, 350(3):427-440, 01 Jul 2005
Cited by: 45 articles | PMID: 15946683
Archaeal Haloarcula californiae Icosahedral Virus 1 Highlights Conserved Elements in Icosahedral Membrane-Containing DNA Viruses from Extreme Environments.
mBio, 7(4):e00699-16, 19 Jul 2016
Cited by: 9 articles | PMID: 27435460 | PMCID: PMC4958249
HCIV-1 and Other Tailless Icosahedral Internal Membrane-Containing Viruses of the Family Sphaerolipoviridae.
Viruses, 9(2):E32, 18 Feb 2017
Cited by: 14 articles | PMID: 28218714 | PMCID: PMC5332951
Review Free full text in Europe PMC