The Epi4K Data Browser is a catalogue of genetic variants identified from 337 probands ascertained and sequenced for their diagnosis of Epileptic Encephalopathy. The database includes single nucleotide substitution variants (SNVs) and insertion and deletion (indels) variants.


Members

The members of Epi4K are Andrew S. Allen, Samuel F. Berkovic, Patrick Cossette, Norman Delanty, Dennis Dlugos, Evan E. Eichler, Michael P. Epstein, Tracy Glauser, David B. Goldstein, Erin L. Heinzen, Michael R. Johnson, Ruben Kuzniecky, Daniel H. Lowenstein, Anthony G. Marson, Heather C. Mefford, Sahar Esmaeeli Nieh, Terence J. O’Brien, Ruth Ottman, Stephen Petrou, Slavé Petrovski, Annapurna Poduri, Elizabeth K. Ruzzo, Ingrid E. Scheffer, and Elliott Sherr.

The members of EPGP are Bassel Abou-Khalil, Brian K. Alldredge, Eva Andermann, Frederick Andermann, Dina Amron, Jocelyn F. Bautista, Samuel F. Berkovic, Judith Bluvstein, Alex Boro, Gregory Cascino, Damian Consalvo, Patricia Crumrine, Orrin Devinsky, Dennis Dlugos, Michael P. Epstein, Miguel Fiol, Nathan B. Fountain, Jacqueline French, Daniel Friedman, Eric B. Geller, Tracy Glauser, Simon Glynn, Kevin Haas, Sheryl R. Haut, Jean Hayward, Sandra L. Helmers, Sucheta Joshi, Andres Kanner, Heidi E. Kirsch, Robert C. Knowlton, Eric H. Kossoff, Rachel Kuperman, Ruben Kuzniecky, Daniel H. Lowenstein, Shannon M. McGuire, Paul V. Motika, Edward J. Novotny, Ruth Ottman, Juliann M. Paolicchi, Jack Parent, Kristen Park, Annapurna Poduri, Lynette Sadleir, Ingrid E. Scheffer, Renée A. Shellhaas, Elliott Sherr, Jerry J. Shih, Rani Singh, Joseph Sirven, Michael C. Smith, Joe Sullivan, Liu Lin Thio, Anu Venkat, Eileen P.G. Vining, Gretchen K. Von Allmen, Judith L. Weisenberg, Peter Widdess-Walsh, and Melodie R. Winawer.


Data Generation

Sequencing of DNA was performed by the Duke Center for Human Genome Variation (now Institute for Genomic Medicine, Columbia University). Samples were either exome sequenced using the Agilent All Exon (65MB) or the Roche SeqCap EZ 3.0 Exome Enrichment kit. According to standard protocols, six individual barcoded samples were sequenced across two lanes of an Illumina HiSeq 2000 or 2500 sequencer.

The Illumina lane-level fastq files were aligned to the Human Reference Genome (NCBI Build 37) using the Burrows-Wheeler Alignment Tool (BWA). Picard software was used to remove duplicate reads and process these lane-level SAM files, resulting in a sample-level BAM file that was used for variant calling. GATK was used to recalibrate base quality scores, realign around indels, and call variants. For Epi4Kdb, variants were required to have a quality score (QUAL) of at least 30, a quality by depth score of at least 2, a mapping quality score of at least 40, a genotype quality (GQ) score of at least 20, a read position rank sum score greater than -10 and at least 10x coverage. Additionally, variants were restricted according to VQSR tranche (calculated using the known SNV sites from HapMap v3.3, dbSNP, and the Omni chip array from the 1000 Genomes Project): the cutoffs were a tranche of 99.9% for SNVs and 99% for indels. Variants are flagged among the “Genotype Confidence” field if they were determined to be sequencing, batch-specific or kit-specific artifacts, HWE violations, or if they were marked by EVS as being artifacts.

Variant calls were restricted to coordinates within the Consensus Coding Sequence (CCDS) release 14, with an addition of two base pairs flanking each side of a protein-coding exon. All variants were annotated to Ensembl 73 using Variant Effect Predictor (VeP!). For the summary information only the single most damaging variant effect prediction is reported; however, the effect of a variant on all transcripts can be identified in the variant-level page.

Coverage information for carrier and non-carrier sites is summarized as the percentage of 337 sequenced probands ascertained for an epileptic encephalopathy that had at least 3x, 10x, 20x and 201x read-depth coverage at the site.


Website team

Nick Ren, Joshua Bridgers, Quanli Wang, Slavé Petrovski