Biological systems possess traits that are a product of coordinated gene expression, heavily influenced by environmental pressures. This is true of both desirable and undesirable traits (i.e. disease). In cases of unwanted traits, it would be tremendously useful to have a means of systematically lowering the expression of genes implicated in producing the undesirable traits. This article describes the implementation of a computational biology algorithm designed to compute small interfering RNA sequences, capable of systematically silencing the expression of specific genes.
Introduction and Background
Small interfering RNAs (siRNAs) are short, single-stranded fragments of RNA, typically 20–30 nucleotides in length, that have created tremendous excitement in the biological sciences because of their ability to down-regulate the expression of genes in a targeted, systematic manner. siRNAs mediate their knockdown effects by activating a gene expression surveillance mechanism widely conserved in eukaryotic cells, known as RNA interference (RNAi). Once activated by siRNA, the RNAi machinery of the cell suppresses the expression of a specific gene by blocking translation (the synthesis of protein by ribosomes) or by promoting the degradation of mRNA needed to synthesize a protein. Figure 1 highlights the basic interplay between siRNA and RNAi. Further excellent reviews of siRNA and RNAi can be found in [1–3]. Having the ability to purposefully engineer siRNAs to “turn off” (or “turn down”) the expression of genes promoting disease phenotypes like cancer, multiple sclerosis, diabetes, inflammation, etc. is of immediate and widespread interest to scientists, clinicians, and patients living with the burden of disease. Indeed, several clinical trials are currently evaluating the effectiveness of siRNA therapy for conditions such as transthretin amyloidosis [4], elevated intraocular pressure [5], and non-small cell lung cancer [6].
Figure 1. (A) A genetically modified retroviral vector is used to insert DNA instructions into the nucleus of a eukaryotic cell. (B) The cell uses the inserted DNA instructions to manufacture short hairpin RNA (shRNA). (C) Following enzymatic trimming in the nucleus, the shRNA is exported from the nucleus. (D) An enzyme named Dicer removes the loop from the shRNA hairpin, releasing a double-stranded version of siRNA. (E) A complex of proteins named RISC (short for RNA-induced silencing complex) enzymatically removes the passenger strand of the siRNA duplex. (F) The mature single-stranded siRNA binds to its target—a specific mRNA from a gene being expressed by the cell. (G) The presence of siRNA attached to the mRNA interferes with translation (protein production), effectively short-circuiting the cell’s intention to fully express the information contained in its DNA.
This article describes a Mathematica implementation (named siRNA Silencer) of the seminal algorithm created by Naito et al. [7] to design siRNA sequences for the express purpose of silencing a specific gene. Like Naito’s algorithm, siRNA Silencer (SRS for short) uses “rules” of effective siRNA design that have been elucidated through meticulous experimentation [8]. To generate viable siRNA candidates, SRS progressively works through the four steps described in Table 1.
These 23-mer sequences correspond to the complementary sequence of 21-nucleotide guide strand and 2-nucleotide 3′ overhang of the passenger strand within the target sequence (Figures 1 and 2).Step 2 (select functional siRNAs)
Rule C: Choose 23-mers that have at least 4 A or U residues in the 5′ terminal 7 base pairs of the guide strand.Step 3 (reduce seed-dependent off-target effects)
Calculate the melting temperature (Tm) of the seed-target duplex (i.e. nucleotide pairs 2–8 of the siRNA guide and passenger strands as bound to their target sequence) for the 23-mers that survive Step 2. Accept only those sequences with a Tm less than 21.5 °C.
As an example of how the steps in Table 1 are applied, Figure 2 presents a detailed illustration of some of the output generated by SRS when it is tasked with silencing the human gene IFNβ1. (Interferon beta 1 is a powerful antiviral protein, manufactured from instructions in the IFNβ1
gene [9].)
Figure 2. One of 23 siRNA candidates generated by siRNA Silencer to silence IFNβ1 expression is presented in detail here. The middle strip of letters contains a portion of the mRNA sequence manufactured from instructions in the human IFNβ1 gene. The black box surrounds nucleotides 591 to 613 of the mRNA target, which is precisely 23 nucleotides in length, corresponding to Step 1 above. The sequence outlined in red is the guide strand of RNA that will eventually bind to the target mRNA and block protein synthesis (Figure 1). Notice how the guide sequence satisfies rules A, C, and D in Step 2 above. The passenger sequence (blue outline) satisfies rules B and D in Step 2. The predicted melting temperatures (details described below) of the guide-target seed sequence (green bar) and the passenger-target seed sequence (purple bar) are 8.98 °C and 13.3 °C, respectively, satisfying Step 3 above. Step 4 output will be discussed below.
As the end user interprets the output of siRNA Silencer and ultimately makes a decision about which presented siRNA candidate to use, it becomes a (relatively) simple cloning project to engineer the guide and passenger sequences into an expression vector (Figure 1) to silence the gene in a living cell.
The siRNA Silencer (SRS) Algorithm
SRS is template driven, meaning the algorithm expects several pieces of user-defined information to be provided in a notebook cell that is used as a template for entering information. The features of SRS will be illustrated using the pre-mRNA sequence of the human gene IFNβ1 (interferon beta 1) to design siRNA candidates capable of silencing IFNβ1.
The example above contains several variables that must be completed by the user to let SRS do its job. The reader may modify them to run the code.
The variables requiring user input are:
1. refseqlocation: This variable holds the directory location for finding pre-parsed RNA data files from specific organisms of interest. The pre-parsed RNA data files were generated by downloading transcript source files from NCBI’s Map Viewer FTP (ftp://ftp.ncbi.nih.gov/genomes/MapView). The data contained in these files is part of NCBI’s RefSeq database, which maintains a nonredundant catalog of all known biological molecules in a species-specific manner [10]. Original source files include:
Once downloaded, the source files were parsed by custom Mathematica code (not shown here) to generate expressions containing transcript accession values, annotation information, and sequence information for the genes contained in the source files. These expressions were saved and serve as the primary data source for the SRS program.
2. refdatabaseversion: This variable contains the name of the specific organism from which a gene is to be silenced. Options available include: "humantranscripts", "mousetranscripts", "rattranscripts", "dogtranscripts", and "cattranscripts". The option chosen must correspond to the name of a file present in defined above.
3. savelocationroot: This variable holds the location where the user would like the final results of the analysis to be saved.
4. studyname: This variable lets the user name the output generated by SRS. The output of SRS is saved using this name to the location provided in "saverootlocation" above.
5. query: This variable contains the transcript sequence, in string format, of the gene to be silenced.
SRS starts by loading the pre-parsed RNA data file from which the query gene sequence is to be silenced and then proceeds to create a preliminary list of siRNA candidates that are filtered through the four steps referenced in Table 1.
For the specific gene highlighted here (IFNβ1), successive pruning of the initial list of 818 23-nucleotide-long sequences is highlighted by a decline in the length of variables containing the pruning results. The variables highlighted below contain the results of completing Step 1, Step 2 (Rules A and B), and Step 2 (Rules C and D).
A small portion of initial siRNA candidates is shown here, where each row contains a guide and passenger sequence (Figure 2) generated from IFNβ1 that may, if they pass additional requirements, prove useful in silencing the expression of IFNβ1.
The algorithm moves next to complete Step 3, in which the predicted melting temperatures of the guide and passenger seed sequences (Figure 2) are calculated and screened to accept only those sequences with predicted melting temperatures below 21.5 °C. Melting temperature is a reflection of thermodynamic stability, and for the purposes of gene silencing, experiments have suggested melting temperatures below 21.5 °C are optimal [11]. If no candidate siRNAs can meet that demand, the requirement is waived, but the algorithm will print a warning message that the requirement could not be met.
Inspection of the first five candidates that survived Step 3 screening reveals that none of them match the first five candidates that survived Step 2 screening. This means that none of the first five candidates from Step 2 screening had melting temperatures below 21.5 °C, causing SRS to remove them from further consideration.
Indeed, there is considerable reduction in the size of the candidate list from Step 2 to Step 3.
After the candidate list of potential siRNAs is pruned through Step 3, the algorithm presents a visual representation of the current list of siRNA candidates as they are mapped to their respective positions within the query sequence.
For publication purposes, the next output is reduced in size to allow the data to fit within printable margins.
Figure 3. A map of the siRNA candidates (shown in red) that have past the first three steps of Table 1, aligned to their positions within the target gene sequence (shown in black) to be silenced. The example here shows siRNA candidates suggested for silencing IFNβ1.
From here, SRS begins an exhaustive search for genes within <refdatabaseversion> that contain subsequences that match the siRNA candidates graphically presented above. In our current example, SRS searches the siRNA candidates generated from IFNβ1 that may also bind to alternative human genes with similar subsequences. The results of this search are presented for inspection so the user can decide which of the potential siRNA candidates is least likely to create undesirable effects if used to silence the gene of interest. Making this decision is largely based on human experience, and is an area that highlights the “art of doing science.”
Caution: Due to the sheer volume of computation that needs to be completed using the data described in this article, the next segment of code will likely take 20–40 minutes to complete (depending on the speed of your computer) and consume roughly 24 GB of RAM. Computations on machines with less RAM will finish, but will require significant use of the hard drive, slowing computation considerably.
For publication purposes, the next output is reduced in size to allow the data to fit within printable margins.
SRS then finishes by saving its results to the location specified by the user in the template described above. SRS will output a JPEG version of Figure 3, as well as Mathematica and Microsoft Excel versions of the data in Table 2. The Mathematica version of Table 2 is complete, including all the potential off-targets that were found. The Excel version of Table 2 lacks the data of the potential off-targets because there is no reasonable way to organize such a large volume of information within Excel.
Interpreting SRS Output
Table 2 provides summary information about the siRNA candidates the algorithm is proposing as effector mediators of gene silencing. In the example of this article, the IFNβ1 gene generates 23 siRNA candidates that should be capable of effectively reducing the expression of the IFNβ1 gene; all 23 candidates have passed the four steps of Table 1. It is left to the end user to make the final decision about which siRNA sequence to use. While much is known about RNA interference and the use of siRNAs to down-regulate the expression of a gene, there are still aspects that benefit from human experience and intuition, which is why Table 2 presents information that may be contextually important to the researcher.
The first column of Table 2 conveys the positions of the siRNA candidates in relation to the target gene sequence. This also corresponds to the same graphical information presented in Figure 3. These 23-mer candidates represent subregions of the gene’s transcript that are likely to be suitable targets for the guide RNA (in combination with RISC, Figure 1) to actually silence a gene. While the position of the siRNA candidate may be important to effect gene silencing, there are no universally accepted rules to develop algorithmic prediction based on position alone, which is one reason the final decision about which siRNA candidate to use is best left to the end user. Maximal suppression of gene expression by the siRNA candidate list will likely require empirical comparison of several siRNA candidates.
The second column of Table 2 provides 23-mer sequences (written 5′ to 3′) specific to the gene of interest to be silenced. The black box of Figure 2 highlights one example of a 23-mer from within IFNβ1. Each 23-mer represents a region of the mRNA transcript to which a guide sequence of RNA (i.e. the guide strand, Figure 1) can bind and mediate repression of protein synthesis.
The third column of Table 3 reports the 21-nucleotide guide and passenger sequences (Figures 1 and 2) that are found originally in the small hairpin RNA (shRNA), itself manufactured from the vector referenced in Figure 1. DNA versions of these RNA sequences can be subsequently used to construct vectors for permanent or transient repression of a specific gene. As this sounds more complicated than it actually is, Figure 4 provides a graphical description of how to take the information from Table 2 and create a vector capable of using the information provided by SRS to silence a gene.
Figure 4. Basic workflow involved in using the output of SRS to silence expression of IFNβ1. Here, the siRNA candidate at position 591–613 of the target gene (referenced in Figure 2 and Table 2) is used to compute the passenger and guide strand RNA sequences that a cell would need to manufacture from instructions contained in a suitable expression vector (Figure 1). The sequences outlined in red refer to the passenger and guide strand RNAs, respectively, while the sequence outlined in green represents one of many possible loop structures that could be designed to ensure the expression vector contains the necessary information for a cell to manufacture short hairpin RNA (shRNA, Figure 1). Where indicated, “Column 2” and “Column 3” refer to the columns of Table 2 that contain information displayed here.
The fourth and fifth columns of Table 2 report the melting temperatures (Tm) for the seed-target duplexes (Figure 2) of both the guide and passenger RNA sequences. The lower the Tm value, the more stable the seed-target duplex is. A good rule of thumb for selecting siRNA candidates, which is computationally enforced by SRS, is to use candidates that have seed-target melting temperatures below 21.5 °C. All things considered, the lower the Tm, the better.
The final two columns of Table 2 (columns 6 and 7) contain checkboxes that will, when selected, reveal the results of SRS’s attempt to find similar genes that may be inadvertently silenced by the siRNA candidates under consideration. Checkboxes are used because the amount of information to be displayed can be considerable. When a checkbox is selected, a pop-up window reveals the data that was discovered. Figure 5 shows a typical result displayed when selecting a checkbox in Table 2. SRS will discard any similar gene sequences with five or more differences, as the possibility of those sequences being repressed by the siRNA sequence is virtually zero.
Figure 5. Example output produced when the first checkbox contained in the third row of Table 2 is selected. The first column of output displays sequence alignments between the siRNA candidate chosen (in this case the 23-mer sequence positioned between nucleotides 68–90 of IFNβ1) and other gene sequences contained within the RefSeq database used by SRS. Mismatch positions are highlighted in green. The second column contains annotation information about the genes that are aligned to the siRNA candidate. As it is possible that siRNA candidates with a limited number of mismatches to other gene sequences could inappropriately silence these other genes, SRS provides critical information to allow the end user to choose the siRNA candidate that best fits their needs.
SRS Performance
To gauge the performance of SRS, two genes were chosen randomly from each of the five species the program can accommodate, and each of the genes was used as an input query to generate a list of siRNA candidates (Table 3). Timings came from running Mathematica 10 under Windows 7 (64 bit) using an Intel 2500K processor overclocked to 4.48 GHz. Total system memory is 32 GB. All reported timings use a fresh kernel.
Conclusion
As biologists continue to develop better methods to identify causal genes driving changes in phenotype, facile methods of altering the expression of those genes are necessary to interrupt undesirable phenotypes, most notably those of disease. SRS brings to the Mathematica community a contemporary algorithm enabling the end user to quickly design small interfering RNAs capable of suppressing a gene’s output, while simultaneously minimizing off-targeting effects that have challenged earlier attempts at using siRNA as a gene suppression technology.
References
[1] | E. M. Youngman and J. M. Claycomb, “From Early Lessons to New Frontiers: The Worm as a Treasure Trove of Small RNA Biology,” Frontiers in Genetics, Nov 27, 2014. doi:10.3389/fgene.2014.00416. |
[2] | D. Barbosa Dogini, V. D’Avila Bittencourt Pascoal, S. H. Avansini, A. Schwambach Vieira, T. Campos Pereira, and I. Lopes-Cendes, “The New World of RNAs,” Genetics and Molecular Biology, 37(1 Suppl), 2014 pp. 285–293. www.ncbi.nlm.nih.gov/pmc/articles/PMC3983583. |
[3] | K. Gavrilov and W. M. Saltzman, “Therapeutic siRNA: Principles, Challenges, and Strategies,” Yale Journal of Biology and Medicine, 85(2), 2012 pp. 187–200. www.ncbi.nlm.nih.gov/pmc/articles/PMC3375670. |
[4] | T. Coelho, D. Adams, A. Silva, P. Lozeron, P. N. Hawkins, T. Mant, J. Perez, J. Chiesa, S. Warrington, E. Tranter, M. Munisamy, R. Falzone, J. Harrop, J. Cehelsky, B. R. Bettencourt, M. Geissler, J. S. Butler, A. Sehgal, R. E. Meyers, Q. Chen, T. Borland, R. M. Hutabarat, V. A. Clausen, R. Alvarez, K. Fitzgerald, C. Gamba-Vitalo, S. V. Nochur, A. K. Vaishnaw, D. W. Y. Sah, J. A. Gollob, and O. B. Suhr, “Safety and Efficacy of RNAi Therapy for Transthyretin Amyloidosis,” New England Journal of Medicine, 369, 2013 pp. 819–829. doi:10.1056/NEJMoa1208760. |
[5] | J. Moreno-Montañés, B. Sádaba, V. Ruz, A. Gómez-Guiu, J. Zarranz, M. V. González, C. Pañeda, and A. I. Jimenez, “Phase I Clinical Trial of SYL040012, a Small Interfering RNA Targeting -Adrenergic Receptor 2, for Lowering Intraocular Pressure,” Molecular Therapy, 22(1), 2014 pp. 226–232. www.nature.com/mt/journal/v22/n1/full/mt2013217a.html. |
[6] | J. A. McCarroll, T. Dwarte, H. Baigude, J. Dang, L. Yang, R. B. Erlich, K. Kimpton, J. Teo, S. M. Sagnella, M. C. Akerfeldt, J. Liu, P. A. Phillips, T. M. Rana, and M. Kavallaris, “Therapeutic Targeting of Polo-Like Kinase 1 Using RNA-Interfering Nanoparticles (iNOPs) for the Treatment of Non-Small Cell Lung Cancer,” Oncotarget, 6(14), 2014 pp. 12020–12034. www.ncbi.nlm.nih.gov/pmc/articles/PMC4494920. |
[7] | Y. Naito, J. Yoshimura, S. Morishita, and K. Ui-Tei, “siDirect 2.0: Updated Software for Designing Functional siRNA with Reduced Seed-Dependent Off-Target Effect,” BMC Bioinformatics, 10(392), 2009. doi:10.1186/1471-2105-10-392. |
[8] | K. Ui-Tei, Y. Naito, F. Takahashi, T. Haraguchi, H. Ohki-Hamazaki, A. Juni, R. Ueda, and K. Saigo, “Guidelines for the Selection of Highly Effective siRNA Sequences for Mammalian and Chick RNA Interference,” Nucleic Acids Research, 32(3), 2004 pp. 936–948. doi:10.1093/nar/gkh247. |
[9] | J. Bekisz, H. Schmeisser, J. Hernandez, N. D Goldman, and K. C Zoon, “Human Interferons Alpha, Beta and Omega,” Growth Factors, 22(4), 2004 pp. 243–251. doi:10.1080/08977190400000833. |
[10] | K. D. Pruitt, T. Tatusova, G. R. Brown, and D. R. Maglott, “NCBI Reference Sequences (RefSeq): Current Status, New Features and Genome Annotation Policy,” Nucleic Acids Research, 40(D1), 2012 pp. D130–D135. doi:10.1093/nar/gkr1079. |
[11] | K. Ui-Tei, Y. Naito, K. Nishi, A. Juni, and K. Saigo, “Thermodynamic Stability and Watson–Crick Base Pairing in the Seed Duplex Are Major Determinants of the Efficiency of the siRNA-Based Off-Target Effect,” Nucleic Acids Research, 36(22), 2008 pp. 7100–7109. doi:10.1093/nar/gkn902. |
T. D. Allen, “A Computational Strategy for Effective Gene Silencing through siRNAs,” The Mathematica Journal, 2016. dx.doi.org/doi:10.3888/tmj.18-1. |
About the Author
Todd Allen is an associate professor of biology at HACC, Lancaster. His interest in computational biology using Mathematica took shape during his postdoctoral research years at the University of Maryland, where he developed a custom cDNA microarray chip to study gene expression changes in the chestnut blight pathogen, Cryphonectria parasitica.
Todd D. Allen, Ph.D.
Harrisburg Area Community College (Lancaster Campus)
East 206R
1641 Old Philadelphia Pike
Lancaster, PA 17602
tdallen@hacc.edu