Accessory Gene Products of Influenza A Virus

  1. Paul Digard
  1. The Roslin Institute, University of Edinburgh, Midlothian EH25 9RG, United Kingdom
  1. Correspondence: paul.digard{at}roslin.ed.ac.uk
  • 1 Present address: MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, United Kingdom

Abstract

Influenza A virus has long been known to encode 10 major polypeptides, produced, almost without exception, by every natural isolate of the virus. These polypeptides are expressed in readily detectable amounts during infection and are either fully essential or their loss severely attenuates virus replication. More recent work has shown that this core proteome is elaborated by expression of a suite of accessory gene products that tend to be expressed at lower levels through noncanonical transcriptional and/or translational events. Expression and activity of these accessory proteins varies between virus strains and is nonessential (sometimes inconsequential) for virus replication in cell culture, but in many cases has been shown to affect virulence and/or transmission in vivo. This review describes, when known, the expression mechanisms and functions of this influenza A virus accessory proteome and discusses its significance and evolution.

THE CORE IAV PROTEOME

In total, the influenza A virus (IAV) genome comprises ∼13.6 kb of single-stranded, negative-sense RNA. This is split into eight segments (virion RNAs [vRNAs]) with lengths ranging from 0.89 kb to 2.3 kb, which together encode all the viral polypeptide products. The segmented genome facilitates virus evolution through reassortment. The greatest diversity of IAV strains are found in avian host species, but many mammalian species (e.g., humans, swine, horses) have their own reasonably stable IAV lineages, constantly refreshed by spillover from the avian reservoir (Yoon et al. 2014; Lycett et al. 2019). All IAV vRNAs have a generic structure of a long open reading frame (ORF) in antisense flanked both sides by short segment-unique untranslated regions (UTRs) and then conserved sequences that form a promoter structure for the viral polymerase (Fig. 1). Genomic vRNAs are replicated via low-abundance replicative intermediates (complementary RNAs [cRNAs]) that are not capped or polyadenylated and, as far as is known, are not substrates for translation (Hay et al. 1977, 1982). Instead, viral gene expression is mediated by 5′-capped and polyadenylated messenger RNAs (mRNAs) that are an alternative form of positive-sense RNA transcribed from the vRNA molecules. Caps are derived by “cap snatching”—a process in which the 5′-cap structure and 10-14 downstream nucleotides of mRNA are cleaved from host cell pre-mRNAs by endonucleolytic cleavage and the snatched fragment used to prime viral transcription (Plotch et al. 1981). Polyadenylation is achieved by the polymerase stuttering on a short pol(U) stretch near the 5′-end of the vRNA template (Robertson et al. 1981; Poon et al. 1999).

Figure 1.

Schematic of the three classes of RNA produced by the influenza A virus (IAV). Colored boxes represent the major open reading frame (ORF) within a segment, gray boxes the segment terminal repeats that form a promoter structure, and lines the untranslated regions (UTRs). Note that mRNA has a 5′-extension derived from host pre-mRNA (orange line) and has a poly(A) tail in place of the 3′-promoter element. (mRNA) Messenger RNA, (cRNA) complementary RNA, (vRNA) virion RNA.

Each segment contains a principal coding region that is usually translated from the first AUG codon in the mRNA to produce the primary gene product of the segment. This leads to expression of the three elements of the heterotrimeric viral RNA-dependent RNA polymerase (PB2, PB1, and PA from segments 1, 2, and 3, respectively), the surface glycoproteins (HA from segment 4 and NA from segment 6), the single-strand RNA-binding nucleoprotein (NP from segment 5), the virion matrix protein M1, and the primary interferon antagonist NS1 from segments 7 and 8, respectively (Table 1; Fig. 2, dark blue bars). Six of these eight polypeptides are absolutely essential for virus replication, whereas the NA and NS1 proteins are dispensable in tissue culture conditions in which, respectively, neuraminidase activity is supplied exogenously or an intact interferon response is lacking (Liu and Air 1993; García-Sastre et al. 1998). Viruses encoding severely truncated but (with modern hindsight) still partially functional NS1 genes have been isolated (Norton et al. 1987), and there has been a more recent report of human clinical isolates lacking an NA gene (Moules et al. 2010); however, viruses such as these are the exception and the vast majority of naturally occurring IAV strains encode NA and NS1.

Figure 2.

Schematic of the IAV coding strategy. mRNAs from the eight segments are symbolized by horizontal black lines. Coding regions (only approximately to scale) are represented by boxes with colors defining different reading frames (dark blue: primary products, frame 1; light blue: secondary products, frame 1; magenta: secondary products, frame 2). mRNA splicing in segments 1, 7, and 8 is denoted by deflected lines connecting coding regions.

Table 1.

Influenza A virus (IAV) gene products, expression mechanisms, and functions

Two further viral polypeptides are considered to be part of the core IAV proteome. Unlike many other RNA viruses, IAV RNA synthesis occurs in the host cell nucleus, giving access to the cellular mRNA splicing machinery. Transcripts from segments 7 and 8 are differentially spliced, giving rise to mRNAs encoding the M2 and NEP (NS2) proteins respectively (Lamb and Lai 1980; Inglis and Brown 1981). These polypeptides are also produced by canonical translation initiating at the first AUG codon in the mRNA, but out-of-frame splicing events direct translation shortly thereafter into an alternative reading frame downstream of the splice acceptor sites (Fig. 2). M2 and NEP are produced by every natural isolate of IAV, and NEP is considered to be essential for virus replication. Viruses lacking an intact M2 gene can be generated in the laboratory but are attenuated (Watanabe et al. 2001; Takeda et al. 2002) and, in some cases, complemented by expression of an alternative isoform of M2 that is considered (in more detail below) to be an accessory gene product (Wise et al. 2012).

These 10 proteins encoded by the 8 IAV segments (PB1, PB2, PA, HA, NP, NA, M1, M2, NS1, and NEP) are all highly expressed in infected cells and, in most cases, were identified as unique polypeptide species before the viral genome sequence became available (Skehel 1972). When the first genomic sequences for IAVs were published in the late 1970s/early 1980s (Sleigh et al. 1979), the coding origins of these polypeptides were obvious. These properties, along with the largely essential nature of the viral gene products, provides reasonable justification for defining them as the core IAV proteome.

THE IAV ACCESSORY PROTEOME

The last of the core polypeptides encoded by IAV (M2) was identified in 1981 (Lamb and Choppin 1981), and there things mostly rested for 20 years. However, since the turn of the century, starting with the identification of the PB1-F2 polypeptide (Chen et al. 2001), it has become evident that the core proteome is further elaborated by the expression of many further minor polypeptide species. In contrast to the core gene products, this second class of products tends to be expressed in low abundance and a virus strain–dependent manner, and, when expressed, are nonessential for virus replication. Nevertheless, in many cases, expression of these accessory proteins has been shown to affect virus pathogenicity in vivo, although not necessarily in an easily predictable fashion. These “accessory” proteins are produced by various mechanisms, including minor mRNA splicing events and/or examples of noncanonical translation, including leaky ribosomal scanning and ribosomal frameshift events. The following sections will describe (in segment order), these accessory proteins and, when known, their expression mechanisms and function in the IAV replication cycle.

SEGMENT 1: PB2-S1

Identified in 2015 (Yamayoshi et al. 2016), PB2-S1 (“PB2 Spliced-1”) expression results from a spliced mRNA from segment 1 in which the region corresponding to nucleotides 1513–1894 of the PB2 mRNA is excised so that translation of the mRNA produces the first 495 amino acids of PB2 and then continues in the +1 frame from the S1 ORF for a further 13 codons (Fig. 2). First identified in the highly laboratory-passaged A/Wilson-Smith Neurotropic/33 (WSN) H1N1 strain, the splice donor (SD) and splice acceptor (SA) sequences are conserved in the human H1N1 lineage descended from the 1918 pandemic, and expression of PB2-S1 has been demonstrated for a number of strains. However, the splice sites are not conserved in human H3N2 viruses or the more recent swine-derived human H1N1 viruses originating from the 2009 influenza pandemic (H1N1pdm09), and expression of PB2-S1 from segment 1 of these viruses was not detected (Yamayoshi et al. 2016). Localization of PB2-S1 is mainly mitochondrial and occurs via an amino-terminal mitochondrial localization signal shared with PB2 of human lineage viruses (Carr et al. 2006; Graef et al. 2010). Similarly to full-length PB2, PB2-S1 was shown to inhibit the retinoic acid–inducible gene I (RIG-I)-dependent/mitochondrial antiviral-signaling protein (MAVS) interferon signaling pathway (Yamayoshi et al. 2016). It also interacts with the PB1 subunit of the viral polymerase, but negatively interferes with its activity. Nonetheless, abrogation of expression of PB2-S1 did not alter virus fitness in vitro or virulence in murine in vivo systems (Yamayoshi et al. 2016). The biological significance of this accessory protein thus remains unclear.

SEGMENT 2: PB1-F2

PB1-F2 is a small (typically 57–101 amino acids depending on virus strain) (Zell et al. 2007) polypeptide encoded by an ORF entirely contained within the PB1 coding sequence (Fig. 3A). Translation initiation occurs at the fourth AUG codon in segment 2 mRNA, accessed via leaky ribosomal scanning (Chen et al. 2001; Wise et al. 2009, 2011). This process is facilitated by the suboptimal Kozak (Kozak 1986) translational initiation context of the PB1 AUG codon but also potentially negatively regulated by sequence elements within the 3′-end of the PB1-F2 gene and further downstream in the PB1 gene (Buehler et al. 2013). There is also evidence that amino-terminally truncated PB1-F2 isoforms are expressed from one or more of AUGS 7–9 (Zamarin et al. 2006; Kamal et al. 2015), but how these start codons are accessed by ribosomes is unknown.

Figure 3.

Schematic of the ORFs and encoded protein domain structures of the 5′-end of segment 2, 3, 5, and 6 mRNAs. Boxes represent ORFs in each of the three reading frames (Fr) or the major polypeptide species and their functional domains (top bars). AUG codons are represented by vertical lines, colored according to the predicted strength of their Kozak translation initiation context (green, strong; amber, intermediate; red, weak). Black arrows indicate AUGs known to be used for translation initiation for the indicated proteins. Gray arrows indicate AUG codons proposed to fire on the basis of ribosomal profiling data. (A) Represents the first 900 nucleotides of the PR8 sequence. Thin brackets indicate the approximate location of two sequence elements proposed to negatively regulate (blunt arrow) PB1-F2 expression. (Panel adapted from data in Figure 1A of Wise et al. 2009.) (B) Represents the first 900 nucleotides of PR8 sequence; note that in this (and most IAV strains), PA-N155 and -N182 are initiated from the 13th and 15th AUG codons in the segment, respectively. The stepped arrow indicates the +1 ribosomal frameshift event that accesses the X-ORF to produce PA-X. The gray line in the X-ORF indicates a common length polymorphism present in the H1N1pdm09 virus lineage. (C) Represents the first 150 nucleotides of A/England/195/2009 (H1N1) sequence, as well as downstream sequences around 450 nucleotides. (D) Represents the first 150 nucleotides of the WSN sequence.

PB1-F2 was first identified as a short-half-life protein that interacted with mitochondria, postulated to induce apoptosis in monocytes (Chen et al. 2001). Further studies revealed that PB1-F2 translocated into mitochondria via Tom40 channels, culminating in a loss of mitochondrial membrane potential and apoptosis (Gibbs et al. 2003; Zamarin et al. 2005; Varga et al. 2012; Yoshizumi et al. 2014). Studies have also linked PB1-F2 to inhibition of cellular antiviral responses (potentially related to its mitochondrial function), through interactions with MAVS and the RIG-I and NF-κB pathways (Varga et al. 2011, 2012; Reis and McCauley 2013; Yoshizumi et al. 2014; Leymarie et al. 2017; James et al. 2019). Further postulated functions include pro-inflammatory activity, viroporin function, amyloid behavior, and regulatory interactions with the viral polymerase complex—in short, a plethora of possible roles that are reviewed in more detail elsewhere (Kamal et al. 2018; Cheung et al. 2020).

Despite much effort at molecular characterization of PB1-F2 in vitro, attempts to elucidate in vivo function have given conflicting results. In some virus backgrounds, disruption of PB1-F2 expression (or, conversely, repairing the gene in viruses that have lost it) has given no phenotype in animal experiments (e.g., Zamarin et al. 2006; Hai et al. 2010). In other circumstances, the presence, or precise sequence of PB1-F2 does affect pathogenesis (Zamarin et al. 2006; Conenello et al. 2007; Schmolke et al. 2011; Leymarie et al. 2014), but no overall simple conclusion can be drawn as to whether PB1-F2 expression leads to exacerbation or amelioration of disease. This almost certainly reflects viral strain- and host-dependent effects and is reviewed elsewhere (Kamal et al. 2018; Cheung et al. 2020). Consistent with virus-derived variation, not all IAV strains possess a full-length (defined as ≥87 codon) ORF; in general, avian strains of IAV tend to have full-length genes, whereas human and swine viruses (notably including the H1N1pdm09 virus) often do not (Zell et al. 2007; Pasricha et al. 2013; Kamal et al. 2018).

PB1-N40

Like PB1-F2, PB1-N40 is expressed from segment 2 mRNA, but originates from the fifth AUG codon on the message (Fig. 3A) and thus corresponds to an amino-terminally truncated version of the main segment product PB1 (Wise et al. 2009). PB1-N40 was in fact originally identified more than a decade prior to the discovery of PB1-F2 as an unexpected PB1-related polypeptide (Akkina et al. 1991), but the techniques of the day did not permit detailed characterization. As with PB1-F2, ribosomes reach AUG 5 through leaky scanning past the first few codons on the mRNA, probably aided by a termination-reinitiation mechanism following translation of an intervening short ORF started at AUG 3 (Wise et al. 2011). Importantly, expression of PB1-N40 is down-regulated by the presence of the PB1-F2 start codon (Wise et al. 2009), a confounding factor for studies that have used the mutational strategy of altering AUG 4 to remove PB1-F2 expression. The amino-terminal 39 amino acids present in PB1, but missing from PB1-N40, include the primary interaction site for the PA polymerase subunit (Fig. 3A; Pérez and Donis 1995), and therefore PB1-N40 lacks heterodimerization function and, consequently, nuclear import and transcriptase functions (Fodor and Smith 2004; Wise et al. 2009). PB1-N40 nevertheless retains some ability to interact with PB2 and the polymerase trimer, as well as with a variety of cellular proteins (Wise et al. 2009; Wang et al. 2019).

The PB1-N40 AUG codon and its favorable Kozak context are very highly conserved in IAV isolates, and the protein is expressed at reasonable levels (∼5% of that of PB1) in a variety of IAV strains that have been tested (Wise et al. 2009). Nevertheless, PB1-N40 function remains to be clarified. Ablating PB1-N40 expression (by mutating its AUG codon, with consequent mutation of the full-length PB1 protein) or inducing its overexpression (by mutating the PB1-F2 AUG codon) slightly reduced viral fitness of the H1N1 laboratory-adapted A/Puerto Rico/8/34 (PR8) and WSN strains in vitro. However, double deletion of PB1-N40 and PB1-F2 resulted in WT-like virus propagation (Wise et al. 2009; Tauber et al. 2012). Thus, as it currently stands, PB1-N40 is an accessory protein in search of a function.

SEGMENT 3: PA-N155 AND PA-N182

PA-N155 and PA-N182 are amino-terminally truncated isoforms of PA translated from the 11th and 13th AUG codons (in the WSN strain), respectively, of segment 3 mRNA (Muramoto et al. 2013). Similar to PB1-N40, these products were noted prior to the advent of reverse genetics and other ready means of further characterization (Akkina et al. 1991). Also like PB1-N40, the missing amino-terminal regions prevent the PA isoforms from supporting viral transcriptase function (Muramoto et al. 2013), as they lack the amino-terminal endonuclease domain necessary for cap snatching (Fig. 3B; Dias et al. 2009; Yuan et al. 2009; Pflug et al. 2014). The mechanism (transcriptional or translational) by which these AUG codons are accessed by ribosomes has not been elucidated. Inspection of the ORF structure of the 5′-end of segment 3 suggests that it is unlikely to be simple leaky ribosomal scanning, as (unlike the PB1 AUG), the PA AUG has a strong Kozak consensus and there are multiple other methionine codons upstream of those used to make PA-N155 and -N182 (Fig. 3B). However, the PA isoform initiator codons are highly conserved among IAV sequences and are present in isolates from different host species, making it likely that the majority of IAV strains will express them (Akkina et al. 1991; Muramoto et al. 2013). The precise roles of these truncated PA polypeptides within the IAV life cycle have not yet been elucidated. Recent studies on these polypeptides from an H5N1 virus suggested interactions of PA-N155 and PA-N182 with a variety of (chicken) cellular proteins involved in RNA processing, protein transport, and various cellular signaling pathways (Wang et al. 2018). Many, if not most, of these interactions might be common to the interactome of full-length PA protein and their significance is unclear. Nonetheless, mutant WSN viruses unable to express PA-N155 and/or -N182 following mutation of the relevant AUG codons showed delayed virus replication in vitro and reduced virulence in mouse models (Muramoto et al. 2013). As with PB1-N40 mutants, however, it must be borne in mind that the strategy used to ablate expression of the accessory proteins also mutates the primary gene product of the segment, making it hard to rule out pleiotropic effects. As with PB1-N40, further work is required to understand the significance of these polypeptides.

PA-X

PA-X expression occurs through a low-level (∼1%) +1 ribosomal frameshift event, driven by a rare arginine codon (CGU) next to a sequence that facilitates transfer RNA (tRNA) realignment, leading to the production of a fusion protein containing the first 191 amino acids of PA (corresponding to the endonuclease domain) and a short carboxy-terminal domain translated from reading frame 2 that lacks its own AUG codon (the X-ORF; Fig. 3B; Firth et al. 2012; Jagger et al. 2012). The frameshift site is highly conserved among IAV strains, suggesting that most IAV strains will express it, but the length of the X-ORF varies, with common polymorphisms being either 41 or 61 codons (Shi et al. 2012; Rash et al. 2014). The best described function of the protein is to contribute to host cell shutoff, through the PA endonuclease domain acting as a broad (but not nonspecific) mRNA endonuclease when liberated from the rest of the viral polymerase complex (Jagger et al. 2012; Desmet et al. 2013; Khaperskyy et al. 2016; Muller and Glaunsinger 2017; Chaimayo et al. 2018; Gaucherand et al. 2019). This is not to say that the sequences from the X-ORF are nonfunctional; polymorphisms in this region affect shutoff activity, possibly by affecting PA-X subcellular localization (Bavagnoli et al. 2015; Oishi et al. 2015; Feng et al. 2016; Hayashi et al. 2016; Nogales et al. 2018).

PA-X functions other than destroying host mRNAs are also possible. The strength of the PA-X shutoff activity varies between IAV strains but even in viruses with low PA-X shutoff activity (e.g., the lab-adapted WSN and PR8 strains), altering PA-X expression has phenotypic effects (Desmet et al. 2013; Hussain et al. 2018; Rigby et al. 2019). Other postulated mechanisms include interactions with stress granules and histone deacetylase 4 (Khaperskyy et al. 2012, 2014; Galvin and Husain 2019). In sum, the picture is of a viral protein that limits cellular responses, including induction of innate immunity, to infection.

Expression of PA-X is not required for virus replication in vitro, in mice, in chickens, or in ovo (Jagger et al. 2012; Hu et al. 2015; Hussain et al. 2018). Nevertheless, alteration of PA-X expression or sequence has shown phenotypic effects in a variety of animal models of IAV disease. In mice, using H1N1 and H5N1 strains of virus, the absence of PA-X resulted in augmented clinical disease and/or immune responses (Jagger et al. 2012; Gao et al. 2015a; Hu et al. 2015; Rigby et al. 2019). Conversely, comparisons between wild-type and mutant H9N2 avian or human H1N1 2009 pandemic viruses resulted in the opposite effect of reduced virulence in the absence of PA-X, suggesting that viral context is important for the in vivo effect of PA-X on disease outcome (Gao et al. 2015b; Hayashi et al. 2015; Lee et al. 2017; Nogales et al. 2017). Data for other host species are more limited, but in chickens, loss of PA-X expression increased pathogenicity (Hu et al. 2015), whereas in pigs, mutating the frameshift site decreased pathogenicity (Xu et al. 2017). Therefore, as seen for PB1-F2, PA-X exhibits virus strain- and host-specific functions whose interplay and outcome cannot currently be predicted with certainty.

SEGMENT 5: AN AMINO-TERMINALLY EXTENDED ISOFORM OF NP

The UTRs of IAV are generally more conserved than the coding regions and in many cases are disregarded when “full-length” segment sequences are reported on public databases. Nevertheless, the UTRs do exhibit sequence variation (Furuse and Oshitani 2011; Benkaroun et al. 2018), and many H1N1 viruses descended from the 1918 pandemic virus (both human and swine) contain a C28A nucleotide polymorphism in the 5′-UTR of segment 5 that results in the addition of a new start codon upstream of the canonical NP start site (Wanitchang et al. 2011; Wise et al. 2019). This new AUG codon is in a poor Kozak consensus, but in frame with the primary NP AUG so that its use leads to an NP isoform with an extra six amino acids at the amino terminus (Fig. 3C). First noticed in the context of the 2009 pandemic virus, the presence of this upstream AUG (uAUG) was associated with a slight (less than twofold) increase in viral gene expression (minireplicon) activity, but whether or not it was used for translation initiation was not addressed (Wanitchang et al. 2011). Subsequent work has shown that the uAUG is indeed used, but not exclusively, for translation initiation in vitro and in vivo to produce a mixture of normal and elongated (e)NP. The presence or absence of the uAUG codon had little effect on virus replication in vitro, but in vivo, in both mice and pigs, expression of eNP correlated with increased viral pathogenicity (Wise et al. 2019). The mechanism behind this phenotypic effect in vivo is unknown. The additional six amino acids in eNP lie at the amino terminus of a flexible sequence primarily characterized as containing an unconventional nuclear localization signal (Wang et al. 1997). Differential interactions between NP and importin α isoforms have been associated with IAV cell tropism (Gabriel et al. 2011), so this represents a potentially testable hypothesis for eNP function.

SEGMENT 6: NA43

NA43 is an amino-terminally truncated form of the NA polypeptide that lacks the first 14 amino acids of the canonical protein and thus the entire amino-terminal cytoplasmic domain and a portion of the transmembrane domain (Fig. 3D). It was discovered in a PR8 reassortant virus with WSN segment 6 as an unexpected peak in a ribosome profiling (Ribo-seq) data set generated using the inhibitor lactimidomycin to bias reads toward sites of translation initiation (Machkovech et al. 2019). Initiation at the second AUG present in the mRNA (codon 15 in the NA gene, starting at the 43rd nucleotide downstream of the primary NA AUG, hence the protein name chosen by the authors) seems likely to occur via leaky scanning; the primary AUG, serving as the initiation site for NA translation, is in a poor Kozak consensus and the second (NA43) AUG is in a moderate Kozak consensus (Fig. 3D). Most, but not all, N1 NAs have this methionine codon, suggesting that NA43 expression may be widespread. The existence of the truncated NA polypeptide is well supported by plasmid-based experiments for WSN and the early H1N1pdm09 strain A/California/04/2009, but has not been unequivocally demonstrated in actual virus infection (Machkovech et al. 2019). Despite the partial loss of the transmembrane domain, exogenously expressed NA43 still accumulates on the plasma membrane. However, probing its importance through reverse genetics gave no indication of functional importance for replication in vitro or in mice (Machkovech et al. 2019), so its significance remains unclear.

SEGMENT 7: M42 AND OTHER SEGMENT 7 GENE PRODUCTS

Segment 7 is known to produce a primary unspliced mRNA and up to three other transcripts via alternative splicing: mRNAs 2–4. These splice variants use a common splice acceptor site but use different splice donor sequences (for review, see Dubois et al. 2014). Virus strain–dependent variations in the sequences of these splice donor sites leads to variability in their use. All IAVs produce the unspliced transcript in abundance, from which the M1 polypeptide is translated from the first AUG codon in the transcript (Fig. 4A).

Figure 4.

Schematic of the open reading frames (ORFs) and encoded protein domain structures of segment 7 mRNAs (PR8 strain). Boxes represent ORFs in each of the three reading frames (Fr) or the major polypeptide species and their functional domains (partially shaded bars). Zigzag lines represent splice junctions. AUG codons are represented by vertical lines, colored according to the predicted strength of their Kozak translation initiation context (green, strong; amber, intermediate; red, weak). Purple bars indicate CUG codons thought to be used for translation initiation. Black arrows indicate AUGs known to be used for translation initiation for the indicated proteins. Gray arrows indicate AUG codons proposed to fire on the basis of ribosomal profiling data or (DRiPs) from T-cell epitope data. (A) Represents the first 350 nucleotides of the unspliced transcript. (B) Represents mRNA2 (M2). (C) Represents mRNA3. (D) Represents mRNA4.

Spliced mRNA2 is produced by virtually all strains of IAV and encodes the ion channel, M2. The structure of the spliced transcript is such that the M2 ORF has the initiator codon and the first nine codons in common with M1 but thereafter encodes the carboxy-terminal 88 amino acids from what was the distal end of the +2 reading frame before the splice event (Fig. 4B; Lamb et al. 1981).

Spliced mRNA3 is produced using the most 5′-proximal splice donor site, but because this is at the boundary of the conserved terminal promoter element and segment-unique UTR, its use removes the M1/M2 start codon and the resulting short mRNA lacks major protein-coding potential. The first potential ORF encodes a putative nine amino acid peptide from the carboxyl terminus of M1 (Fig. 4C) that has been named M3. Production of mRNA3 has been proposed to negatively regulate segment 7 protein expression at early stages of infection (Shih et al. 1995). However, its presence is nonessential for virus growth in vitro, and protein expression from the mRNA has also not been detected (Chiang et al. 2008; Jackson and Lamb 2008). Production of mRNA3 has become enshrined in influenza textbooks through its early discovery (Lamb et al. 1981), but it is not clear if all strains of IAV produce it; for example, the PR8 strain accumulates far lower quantities of mRNA3 than other human strains of IAV (Wise et al. 2012).

A small fraction of IAV strains produce appreciable amounts of a third splice variant, mRNA4 (Shih et al. 1998; Wise et al. 2012; Dubois et al. 2014). This mRNA is produced using the most 3′ of the known splice donor sites and is predicted to express a 54 amino acid–long internally deleted version of M1 from the first AUG codon (Fig. 4D), but no such protein product has been detected. However, in the PR8 strain of IAV, mRNA4 has been shown to encode an M2 isoform, named M42 (Wise et al. 2012). M42 translation is initiated from the second AUG codon in the +2 frame, to produce a polypeptide that is identical to M2 downstream of the splice junction but which has a unique amino-terminal ectodomain sequence (Fig. 4D). M42 can functionally replace M2 during virus replication, despite accumulating less at the plasma membrane and more toward Golgi localization than M2 (Wise et al. 2012).

SEGMENT 8: tNS1 AND NS3

The main polypeptide species made from segment 8 is NS1, the primary viral IFN antagonist (Hale et al. 2008). NS1 is translated from the first AUG in the unspliced mRNA transcript, to produce a protein around 230 amino acids long (the length varies between IAV strains) that folds into two domains: an amino-terminal RNA-binding domain and a carboxy-terminal “effector” domain, responsible for binding multiple cellular polypeptides (Fig. 5A). Like segment 7, segment 8 encodes two core IAV polypeptides. Differential splicing of the primary transcript using a splice donor site near the 5′-end of the mRNA leads to the production of an 121 amino acid NEP protein that shares its amino-terminal 10 residues with NS1 (Lamb et al. 1978; Inglis et al. 1979) but contains a carboxyl terminus expressed from the +2 reading frame of the segment (Fig. 5B). All strains of IAV contain this 5′ splice donor sequence and produce NEP, as it is essential for RNP nuclear export and completion of the virus life cycle (O'Neill et al. 1998; Neumann et al. 2000).

Figure 5.

Schematic of the ORFs and encoded protein domain structures of segment 8 mRNAs and vRNA (PR8 strain). Zigzag lines represent splice junctions. Boxes represent ORFs in each of the three reading frames (Fr) or the major polypeptide species and their functional domains (partially shaded bars). AUG codons are represented by vertical lines, colored according to the predicted strength of their Kozak translation initiation context (green, strong; amber, intermediate; red, weak). Black arrows indicate AUGs known to be used for translation initiation for the indicated proteins. Gray arrows indicate AUG codons proposed to fire on the basis of T-cell epitope data (DRiPs). (A) Represents the unspliced transcript. (B) Represents mRNA2 (NEP). (C) Represents mRNA3 (NS3). (D) Represents vRNA (the putative NSP protein).

Recently, amino-terminally truncated versions of NS1 have been identified (Kuo et al. 2016). These “tNS1” polypeptides are expressed from the 5th and/or 6th AUG codons of segment 8 and thus lack the first 78 or 80 amino acids of NS1, effectively producing the NS1 effector domain in isolation (Fig. 5A). A mutant PR8 virus lacking these AUG codons induced higher levels of IRF3 phosphorylation and hence greater IFN-β induction (Kuo et al. 2016). A priori, this could be due to nonsynonymous mutation of NS1 and/or failure to express the tNS1 polypeptides. However, exogenously expressed tNS1 polypeptides displayed differential localization (cytoplasmic versus nuclear) to the WT protein and were able to antagonize IRF3 activation, supporting the suggestion that loss of their expression contributed to the phenotype. The mechanism underlying their expression has not been characterized—the main NS1 AUG codon is in a seemingly strong Kozak consensus, so leaky ribosomal scanning is perhaps not the explanation.

A third variant of the NS1 protein (“NS3”) has also been identified. The NS3 protein is encoded by a splice variant mRNA in which an intron between nucleotides 400 and 528 has been removed to produce an internally truncated version of NS1 (Fig. 4C), missing amino acids 126-168 (Selman et al. 2012). NS3 was discovered in the process of characterizing the phenotypic effects of a D125G change in the NS1 ORF following adaptation of the prototype virus of the 1968 H3N2 pandemic (A/Hong Kong/1/1968) to growth in mice (Brown et al. 2001). This revealed that increased virus propagation in murine cells actually resulted from the mutation introducing a novel splice donor site, rather than the nonsynonymous change in NS1 (Selman et al. 2012). As with segment 7 mRNA4, production of the NS3 mRNA is likely to occur in only a small minority of IAV strains, because most do not contain the necessary splice donor sequence.

DRiPs, UFOs, AND NONSENSE PEPTIDES

The IAV gene products described in the sections above are all bona fide proteins—within the size range expected for polypeptides capable of folding into a discrete domain (Garbuzynskiy et al. 2013) and, in all cases, visualized at some point as a “band on a gel.” However, there is also evidence for a further understratum of viral gene expression that occurs in infected cells, largely seen at the level of peptide species. Much of the data showing the existence of these products stem from work aimed at understanding the rapid kinetics of MHC Class I peptide presentation of virus-derived peptides—seemingly too fast to be solely explained through the natural turnover of stably folded viral polypeptides (Esquivel et al. 1992). Instead, the defective ribosomal product (DRiP) hypothesis holds that the relatively rare mistranslation events that produce defective polypeptides provide a source of more rapidly turned-over polypeptides that feed the antigenic processing pathway (Yewdell et al. 1996). These DRiPs could be miscoded products from bona fide viral ORFs or products from viral ORFs that have not been selected to encode functional proteins. The sensitive assay methods developed in the course of testing this hypothesis (reviewed elsewhere [Wei and Yewdell 2019]) have not only shed light on fundamental processes of cell biology but have identified a variety of unexpected translation events that occur during IAV replication. As well as prompting the discovery of PB1-F2 (promptly reclassified as a genuine accessory gene) (Chen et al. 2001), specific DRiPs that provide T cell epitopes originating from both canonical (AUG) and noncanonical (CUG) internal translation initiation in the M1 and NS1 genes (Figs. 4B and 5A, respectively) have been identified (Yang et al. 2016; Zanker et al. 2019).

In addition, T-cell epitope presentation provides the best evidence to date for protein expression from the negative-strand RNA of IAV; from the enigmatic long ORF (167 codons in PR8 but up to 216 codons in some strains; Fig. 5D) present in segment 8 vRNA (Baez et al. 1980; Zhirnov et al. 2007; Clifford et al. 2009; Hickman et al. 2018). Expression of a full-length negative-strand polypeptide (NSP) has not been detected, however (Hickman et al. 2018).

Other evidence for noncanonical gene expression events in IAV has come from the technique of ribosomal profiling. This technique, which maps fragments of mRNA protected from RNase digestion by ribosomes back to the genome to (among other things) identify sites of translation initiation (Ingolia 2016), has identified at least four previously unrecognized ribosome start sites in the IAV genome (Machkovech et al. 2019). These are internal methionine codons at positions 86 and 136 of the PA and NP genes (Fig. 3B,C), respectively, and out-of-frame ORFs in the NA (Fig. 3D) and M1 genes (Fig. 4A). No matching polypeptides have yet been defined for these initiation events, in contrast to the NA43 polypeptide described above, whose identification also stemmed from the same study.

A final layer of complexity in IAV gene expression comes from the demonstration that the cap-snatching mechanism itself can generate novel translational start sites on viral mRNAs, when the leader sequence generated from the host cell pre-mRNA contains an AUG codon or non-AUG initiation site (Sloan et al. 2019; Ho et al. 2020). The evidence for this comes from a variety of methods, including deep sequencing of the capped leader sequences, ribosomal profiling, T-cell epitope assays, and mass-spectrometric detection of novel peptides. This mechanism can in theory produce amino-terminal extensions to the core IAV polypeptides, as well as providing means to drive expression of otherwise cryptic ORFs near the 5′-end of the viral segments. However, given the level of variability in the cellular sequences (upward of 4 million unique leader sequences generated during the course of infection [Clohisey et al. 2020]), this mechanism is unlikely to generate large amounts of a discrete polypeptide species, and its ultimate significance remains to be determined.

CONCLUDING REMARKS

Nearly 40 years ago, IAV was characterized as encoding 10 polypeptides, initiated from eight AUG codons elaborated by two major mRNA splice isoforms. Since then, a further 10 polypeptide species have been defined in some detail, and as many again potential protein/peptide products suggested. These novel products are produced from multiple alternative translation initiation sites (both AUG and CUG codons), minor spliced mRNAs, and one example of ribosomal frameshifting. So, what does it all mean? A negative characterization could be gene expression mayhem in a colonized and dying cell, producing random junk. We think that a more useful way of considering the data is in the context of the current theories of how cellular genes are birthed de novo. In this hypothesis, genes do not just originate by diversification following gene duplication but can evolve step by step from noncoding sequence, through acquisition of an ORF, that, once translated, allows selection (if the protein product is useful) to drive increases in expression, ORF length, protein domain structure, and stability (Carvunis et al. 2012; Schmitz and Bornberg-Bauer 2017).

Applying this model of gene evolution to IAV (or any other virus) gene structure provides a simple framework for conceptualizing the plethora of “new” polypeptide species defined in recent years (Fig. 6). At least a subset of the many ORFs in the IAV genome is translated to at least some extent. Some of these have been defined as DRiPs in the narrow sense of fortuitously being detected as a T-cell epitope in a mouse, as well as small size and/or low expression level and rapid turnover. Others, such as the amino-terminally truncated isoforms of the polymerase proteins and NA, in which no function has been shown, might also fall into a broader DRiP category of being translational noise of no importance to viral fitness, although their evident stability and domain structure perhaps argue against this. Further IAV polypeptides, in which unambiguous evidence of in vivo importance exists (e.g., PB1-F2, PA-X), can be firmly classified as accessory proteins. However, it is important to note that this model of gene birth also includes death, in which (lack of) selection can lead to gene loss (Carvunis et al. 2012). Indeed, the PB1-F2 gene might be following this backward trajectory in swine and/or humans (Zell et al. 2007; Trifonov et al. 2009). An admitted problem with the de novo gene birth hypothesis is the difficulty of observing the process in motion, given the evolutionary timescales animal and plant genomes evolve at (Schmitz and Bornberg-Bauer 2017). The far higher (perhaps a million-fold) mutation rate of RNA viruses (Belshaw et al. 2008) may make them a useful model system for observing gene evolution in action.

Figure 6.

A model for gene birth and death in IAV. Boxes represent ORFs. Arrows and colored bars indicate approximate expression levels, whereas darker shading of the recycling symbol indicates faster turnover of the polypeptide species. (Figure based on data in Carvunis et al. 2012.)

ACKNOWLEDGMENTS

P.D. and S.L. are supported by Institute Strategic Programme (BB/P013740/1). P.D. is also supported by project grant (BB/S00114X/1) funding from the U.K. Biotechnology and Biological Sciences Research Council, as well as the European Union's Horizon 2020 research and innovation programme under grant no. 727922 (DELTA-FLU). S.L. is also supported by the European Union's Horizon 2020 research and innovation programme under Grant No. 874835 (VEO). E.G. is supported by a Wellcome Trust/Royal Society Sir Henry Dale Fellowship (211222/Z/18/Z). R.M.P. was supported by a PhD studentship from the University of Edinburgh.

This article has been made freely available online courtesy of TAUNS Laboratories.

Footnotes

REFERENCES

| Table of Contents

Richard Sever interviews Joan Brugge