Charleston Noble1-3*, Jason Olejarz1*, Kevin M. Esvelt4, George M. Church2-3, Martin A. Nowak1,5-6
1Program for evolutionary dynamics, Harvard University, 2Wyss Institute for Biologically Inspired Engineering, Harvard University, 3Department of Genetics, Harvard Medical School, 4Media Laboratory, Massachusetts Institute of Technology, 5Department of Mathematics, Harvard University, 6Department of Organismic and Evolutionary Biology, Harvard University, USA
*These authors contributed equally to this work.
The alteration of wild populations has been discussed as a solution to a number of humanity’s most pressing ecological and public health concerns. Enabled by the recent revolution in genome editing, CRISPR gene drives, selfish genetic elements which can spread through populations even if they confer no advantage to their host organism, are rapidly emerging as the most promising approach. But before real-world applications are considered, it is imperative to develop a clear understanding of the outcomes of drive release in nature. Toward this aim, we mathematically study the evolutionary dynamics of CRISPR gene drives. We demonstrate that the emergence of driveresistant alleles presents a major challenge to previously reported constructs, and we show that an alternative design which selects against resistant alleles greatly improves evolutionary stability. We discuss all results in the context of CRISPR technology and provide insights which inform the engineering of practical gene drive systems.
Gene drive systems are selfish genetic elements which bias their own inheritance and spread through populations in a super-Mendelian fashion (Fig. 1A). Such elements have been discussed as a means of contributing to the eradication of insect-borne diseases such as malaria, reversing herbicide and pesticide resistance in agriculture, and controlling destructive invasive species (1–12). Various examples of gene drive can be found in nature, including transposons (13), Medea elements (14, 15), and segregation distorters (16–19), but for ecological engineering purposes, endonuclease gene drive systems have received the most significant attention in the literature (1–10, 20–22). In general, these elements function by converting drive-heterozygotes into drive-homozygotes through a two-step process: (i) the drive construct, encoding a sequence-specific endonuclease, induces a double-strand break (DSB) at its own position on a homologous chromosome, and (ii) subsequent DSB repair by homologous recombination (HR) copies the drive into the break site. Any sequence adjacent to the endonuclease will be copied as well; if a gene is present we refer to it as ‘cargo’, as it is ‘driven’ by the endonuclease through the population.
Though originally proposed over a decade ago (1), the chief technical difficulty of this approach—inducing easily programmable cutting at arbitrary target sites—has only recently been overcome by the discovery and development of the CRISPR/Cas9 genome editing system (23–27). Briefly, Cas9 is an endonuclease whose target site is prescribed by an independently expressed guide RNA (gRNA) via a 20-nucleotide protospacer sequence. Because virtually any position in a genome can be uniquely targeted by Cas9, so-called RNA-guided gene drive elements can be constructed by simply inserting a suitable sequence encoding both Cas9 and gRNA(s).
Recent studies have demonstrated highly functional CRISPR gene drive elements in mosquitoes (5, 6), yeast (7), and fruit flies (8). In each case, the basic construct consists of a copy of Cas9 with a single corresponding gRNA and cargo sequence (Fig. 1B). Despite drive inheritance of about 95% on average in the published studies (compared to 50% expected by Mendelian inheritance), the evolutionary stability of these constructs in large populations has been debated due to the potential emergence of drive resistance within a population (1, 2, 21). A resistant allele is anticipated to arise whenever the cell repairs the drive-induced DSB using non-homologous end joining (NHEJ) instead of HR, a process which typically introduces a small insertion or deletion mutation at the target sequence. Because the reported constructs cut only at a single site, a large fraction of NHEJ events will create drive-resistant alleles which could prevent the construct from spreading to the entire population (Fig. 1B).
Drive resistance was first mathematically studied in the context of single-cutting homing endonuclease-based drive elements (21). There, it was concluded that drive is most effective when the fitness cost of the drive is low and the fitness cost of resistance is high (see SM Section 1 for a description of that work). Unfortunately, in the drive constructs reported thus far, these two requirements are fundamentally at odds: the fitness cost of resistance arises from disruption of the target sequence, but the drive copies itself precisely by disrupting the target sequence.
Here we study the evolutionary dynamics of an alternative drive architecture (2) which decouples these effects by rescuing function of the target gene, but only if the drive cassette is successfully copied. This is accomplished by targeting multiple sites within the 3’ end of a gene for cutting by the drive and including a completely genetically recoded (28, 29) copy of this 3’ target sequence in the drive construct (Fig. 1C). The 3' UTR of the gene is also replaced with an equivalent sequence in order to remove all homology between the cut sites and the drive components, which ensures that the drive cassette is copied as a single unit. If repair occurs by HR, the target gene is restored to functionality as the drive is copied. But if repair occurs by NHEJ, then the target gene is mutated, potentially resulting in a knockout and a corresponding loss of fitness. Using this design, drive resistance can be selected against by simply choosing an essential or even haploinsufficient gene as the drive target. In addition, the construct employs multiple gRNAs. The use of multiple gRNAs offers two important benefits with respect to resistance: (i) all gRNA target sites must be mutated or lost before a single allele becomes drive-resistant, and (ii) if cutting occurs at two or more gRNA target sites simultaneously, then the intervening DNA sequence is lost, resulting in a large deletion and a knockout of the target gene. This is in contrast to single-cutting constructs, where a knockout can be avoided by an in-frame indel or substitution mutation.
For numerical simulations, we further consider a mechanistic model which explicitly describes the mechanism of drive in individuals (Fig. 2B, Supplementary Material Section 7.3). We assume that, in the germline of an individual which is heterozygous for a drive construct and a susceptible allele (DSi where 0 <= i < n or DRi where 1 <= i < n), each susceptible target site undergoes cutting independently with probability q. If there is at least one cut, then HR occurs with probability P, while NHEJ occurs with probability 1-P. If HR occurs, then the cell is converted to a drive homozygote. But if NHEJ occurs, there are a few possibilities, depending on the number of cuts.
If there is exactly one cut, then one gRNA target is lost on the susceptible allele. If the susceptible allele was initially functional (Si), then with probability γ it retains function and converts to Si+1, otherwise it loses function and converts to Ri+1. We assume that the parameter γ is the probability that the reading frame is unaffected, so γ = 1/3. If the susceptible allele is initially nonfunctional (Ri) then we assume that it cannot regain function, so it converts to Ri+1.
If there are two or more cuts, then all j susceptible gRNA targets between and including the outermost targets in the locus are lost (2 <= j <= n-i). The resulting allele is certainly nonfunctional and thus converts to Ri+j. For simplicity, we assume that the i resistant targets are uniformly distributed among the n total sites in order to determine a probability distribution for the number of targets lost. We assume that sequential cutting and repair events do not occur.
Now we address two fundamental questions: whether a CRISPR gene drive will invade a resident wild-type population and, if so, whether it will be evolutionarily stable (30). We begin with the former. We find that a CRISPR gene drive will invade a wild population if:
A derivation of this result can be found in Supplementary Material (Sections 3, 7.1). For the drive to spread when initially rare, the advantage from inheritance biasing (pWD,D)—typically about 95% in published studies—must overcome the difference in fitness between the drive/wild-type heterozygote (fWD) and the wild-type (fWW). Note that this condition holds in the context of drive resistance and is agnostic to individual-level drive dynamics and thus applies both to previous drive architectures and our proposed architecture. Indeed, Eq. 1 explains the apparent success of CRISPR drive constructs reported in the literature (5–8), which easily invade wild-type laboratory populations—or would be predicted to do so after optimization of drive expression: over short timescales, drive resistance is rare and thus does not affect the dynamics.
However, over longer time scales, NHEJ-mediated resistance will dramatically affect the dynamics. We find that a resident drive population is stable against invasion by resistant alleles only if:
Here the maximization is over all non-drive alleles S0, ..., Sn, R1, ..., Rn. Intuitively, the drive is stable if and only if no other allele can invade, and each drive allele has an invasion condition identical in form to Eq. 1 (SM Sections 4, 7.2).
Disconcertingly, Eq. 2 suggests that drive constructs are necessarily unstable in sufficiently large populations. An individual which is heterozygous for the drive and the fully-resistant cost free allele Sn has probability pDSn,Sn = ½ of producing an Sn gamete, and this individual has fitness equivalent to (or potentially greater than) the drive/wild-type heterozygote. Thus if the drive construct has lower fitness than the wild-type, and if the fully-resistant cost-free allele has a nonzero rate of production in the population, the latter will certainly invade a resident drive population. This is especially problematic for highly deleterious population suppression drives, as in (6), which have low fitness relative to the wild-type and cost-free resistant alleles.
Population alteration drives (sometimes referred to as replacement drives) might not require long-term persistence in a population to produce their desired effect. Indeed, some applications might still be successful as long as the drive construct attains and persists at a sufficiently high frequency in the population over some length of time.
To quantify the relative effectiveness of the two drive architectures, we considered three quantities: (i) the maximum frequency achieved by a drive construct released in a wild population, (ii) the time required for a drive construct to attain 90% of its maximum frequency, and (iii) the frequency of the drive construct after 200 generations, roughly the longest relevant timescale for a typical application. We computed these quantities numerically for drives featuring cutting and HR probabilities consistent with average drive inheritance rates observed in previous fruit fly (8) and mosquito (5, 6) experiments (q = P = 0.95, corresponding to a drive inheritance rate of 95.1% from DW individuals).
Our results suggest that, as anticipated from Eq. 1, both the previous and proposed drive constructs should spread similarly in the short term immediately following release (Fig. 3A, B, and D). However, over longer timescales, the two constructs undergo dramatically different dynamics. The proposed drive constructs, released at an initial frequency of 1% in a wild population, employing five gRNAs and targeting an essential gene, can attain >99% frequency in a population (Fig. 3B, C) in 10-20 generations (Fig. 3B, D) and remain above 99% for at least 200 generations (Fig. 3B, E). Furthermore, this is seen over a large range of drive fitness costs, up to approximately 30% (Fig. 3C-E). The previously demonstrated constructs, in contrast, attain maximum frequencies between 90% and 95% over a narrower range of fitness values (Fig. 3A, C) and demonstrate significantly reduced stability (Fig. 3E). In particular, previous constructs exceeding 8% fitness cost invariably fall below their initial release frequency in fewer than 200 generations.
Here we have mathematically shown that previously demonstrated CRISPR gene drives constructed as proofs-of-principle should effectively invade wild populations—consistent with experimental observations—but could have limited utility due to their inherent instability, brought about by their production of resistant alleles. We studied an alternative drive architecture which contains (i) multiple CRISPR guide RNAs which target the 3’ end of a gene, and (ii) a recoded copy of the target gene which is functional but resistant to cutting. We concluded that this architecture substantially improves the stability of CRISPR gene drives.
Another alternative strategy which we have not modeled here would involve multiple independent single-guide drive constructs targeting the same locus. This is conceptually symmetric to the strategy considered here: rather than a single drive with multiple (n) gRNAs (“multiple guides”), one might consider multiple (n) drives with one gRNA each (“multiple drives”). In this strategy, each independent drive would behave similarly to the previously demonstrated constructs studied here. This strategy would likely outperform the previous strategy, but we anticipate that it would not outperform the multiple guide strategy. This is because, in the multiple drive strategy, each gRNA target can undergo NHEJ-mediated mutation independently, providing steppingstones to fully-resistant alleles. Furthermore, the multiple drive strategy lacks the benefit of large NHEJ knockouts from multiple simultaneous cuts which help combat cost-free resistance (Fig. 2B, red box), although it would be capable of editing regions unimportant to fitness.
In conclusion, we suggest three concrete design principles for future CRISPR gene drive systems. Constructs will maximize efficacy and stability if (i) multiple guide RNAs with minimal off-target effects are employed, (ii) disruption of the target locus is highly deleterious, and (iii) any cargo genes are as close to neutral as possible.
- A. Burt, Site-specific selfish genes as tools for the control and genetic engineering of natural populations. Proc. Biol. Sci. 270, 921–928 (2003).
- K. M. Esvelt, a. L. Smidler, F. Catteruccia, G. M. Church, Concerning RNA-guided gene drives for the alteration of wild populations. Elife. 3, e03401 (2014).
- B. O. S. Akbari et al., Safeguarding gene drive experiments in the laboratory. Science. 349, 927–9 (2015).
- K. A. Oye et al., Regulating gene drives. Science. 345, 626–8 (2014).
- V. M. Gantz et al., Highly efficient Cas9-mediated gene drive for population modification of the malaria vector mosquito Anopheles stephensi. Proc. Natl. Acad. Sci., 201521077 (2015).
- A. Hammond et al., A CRISPR-Cas9 gene drive system targeting female reproduction in the malaria mosquito vector Anopheles gambiae. Nat. Biotechnol. (2015), doi:10.1038/nbt.3439.
- J. E. DiCarlo, A. Chavez, S. L. Dietz, K. M. Esvelt, G. M. Church, Safeguarding CRISPR-Cas9 gene drives in yeast. Nat. Biotechnol. (2015), doi:10.1038/nbt.3412.
- V. M. Gantz, E. Bier, The mutagenic chain reaction: A method for converting heterozygous to homozygous mutations. Science. 348, 442–444 (2015).
- S. P. Sinkins, F. Gould, Gene drive systems for insect disease vectors. Nat. Rev. Genet. 7, 427–435 (2006).
- N. Windbichler et al., A synthetic homing endonuclease-based gene drive system in the human malaria mosquito. Nature. 473, 212–215 (2011).
- O. S. Akbari et al., A synthetic gene drive system for local, reversible modification and suppression of insect populations. Curr. Biol. 23, 671–7 (2013).
- L. Alphey, Genetic control of mosquitoes. Annu. Rev. Entomol. 59, 205–24 (2014).
- B. Charlesworth, C. H. Langley, The population genetics of Drosophila transposable elements. Annu. Rev. Genet. 23, 251–287 (1989).
- C.-H. Chen et al., A Synthetic Maternal-Effect Selfish Genetic Element Drives Population Replacement in Drosophila. Science. 316, 597–600 (2007).
- C. M. Ward et al., Medea selfish genetic elements as tools for altering traits of wild populations: a theoretical analysis. Evolution. 65, 1149–62 (2011).
- T. W. Lyttle, Segregation distorters. Annu. Rev. Genet. 25, 511–557 (1991).
- B. Charlesworth, D. L. Hartl, Population dynamics of the segregation distorter polymorphism of drosophila melanogaster. Genetics. 89, 171–192 (1978).
- Y. Tao, D. L. Hartl, C. C. Laurie, Sex-ratio segregation distortion associated with reproductive isolation in Drosophila. Proc. Natl. Acad. Sci. U. S. A. 98, 13183–8 (2001).
- S. Henikoff, K. Ahmad, H. S. Malik, The centromere paradox: stable inheritance with rapidly evolving DNA. Science. 293, 1098–102 (2001).
- A. Deredec, H. C. J. Godfray, A. Burt, Requirements for effective malaria control with homing endonuclease genes. Proc. Natl. Acad. Sci. U. S. A. 108, E874–80 (2011).
- A. Deredec, A. Burt, H. C. J. Godfray, The population genetics of using homing endonuclease genes in vector and pest management. Genetics. 179, 2013–2026 (2008).
- N. Windbichler et al., Homing endonuclease mediated gene targeting in Anopheles gambiae cells and embryos. Nucleic Acids Res. 35, 5922–33 (2007).
- M. Jinek et al., A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science. 337, 816–21 (2012).
- P. Mali et al., RNA-guided human genome engineering via Cas9. Science. 339, 823–6 (2013).
- L. Cong et al., Multiplex genome engineering using CRISPR/Cas systems. Science. 339, 819–23 (2013).
- P. Mali, K. M. Esvelt, G. M. Church, Cas9 as a versatile tool for engineering biology. Nat. Methods. 10, 957–63 (2013).
- J. A. Doudna, E. Charpentier, The new frontier of genome engineering with CRISPR-Cas9. Science. 346, 1258096–1258096 (2014).
- M. J. Lajoie et al., Probing the limits of genetic recoding in essential genes. Science. 342, 361–3 (2013).
- M. J. Lajoie et al., Genomically recoded organisms expand biological functions. Science. 342, 357–60 (2013).
- M. A. Nowak, Evolutionary Dynamics (Harvard University Press, 2006).