Study Guide
📖 Core Concepts
Genomics – the study of whole genomes (structure, function, evolution, mapping, editing).
Genome – complete DNA set of an organism, including genes and 3‑D DNA configuration.
Omics – collective analysis of large‑scale molecular data sets (e.g., proteomics, epigenomics).
High‑throughput sequencing – parallel sequencing of millions of DNA fragments, drastically lowering cost per base.
Shotgun sequencing – random fragmentation of DNA, sequencing of each piece, and computational re‑assembly using overlapping ends.
Coverage – average number of times each base is read; higher coverage → more accurate assembly.
De novo assembly – building a genome from reads without a reference, using de Bruijn or overlap graphs.
Annotation – attaching biological meaning to assembled sequences (structural: ORFs, gene boundaries; functional: gene‑product roles).
---
📌 Must Remember
Sanger read length: 800–1000 nt; still used for small projects and long contiguous reads.
NGS error rate: ≈0.1 % (1 error per 1 000 bases) for Illumina; long‑read platforms (PacBio, Nanopore) ≈1 % but give 10–100 kb reads.
Human Genome Project error rate: < 1 error per 20 000 bases.
Coverage formula: \(\text{Coverage (X)} = \frac{N \times L}{G}\) where N = number of reads, L = read length, G = genome size.
Three main annotation steps: (1) Identify non‑coding regions, (2) Predict gene structures, (3) Assign functional information (e.g., via BLAST).
Key “‑ome” suffix: denotes the total set of a molecular entity (genome, proteome, metabolome, etc.).
---
🔄 Key Processes
Illumina Dye‑Sequencing Cycle
Add reversible‑terminator nucleotide → fluorescence emitted → image → chemical removal of terminator → repeat.
Ion Torrent Sequencing
Incorporation of a nucleotide releases a H⁺ ion → detected as an electrical signal → base called.
Shotgun Assembly Workflow
Fragment DNA → sequence each fragment → align overlapping ends → merge into contigs → use paired‑end reads to order contigs into scaffolds → fill gaps (finishing).
De novo Assembly (Eulerian path)
Break reads into k‑mers → construct de Bruijn graph → find Eulerian path that visits each edge once → reconstruct genome.
Functional Annotation Pipeline
Run similarity search (e.g., BLAST) → map to known proteins/GO terms → integrate expression data (RNA‑seq) → curate manually if needed.
---
🔍 Key Comparisons
Sanger vs. Illumina
Read length: 800‑1000 nt vs. 50‑300 nt (short).
Throughput: One fragment at a time vs. millions simultaneously.
Cost per base: High vs. Low.
Short‑read (Illumina) vs. Long‑read (PacBio/Nanopore)
Accuracy: Illumina ≈0.1 % error, long‑reads ≈1 % error.
Read length: ≤300 nt vs. 10‑100 kb.
Best use: SNP detection & high‑coverage genomes vs. repeat resolution & structural variants.
De novo vs. Comparative Assembly
Reference needed: No vs. Yes.
Complexity: Higher (graph algorithms) vs. Lower (read alignment).
Typical when: No close reference genome exists vs. studying strain variation.
---
⚠️ Common Misunderstandings
“Higher coverage always means better assembly.”
True only if reads are high quality and repeats are resolved; excessive coverage can amplify systematic errors.
“Sanger is obsolete.”
Still the gold standard for long, accurate reads (e.g., finishing gaps, validation).
“All “‑omics” data are interchangeable.”
Each omic layer (genome, transcriptome, proteome, epigenome) provides distinct biological insight; they complement, not replace, each other.
---
🧠 Mental Models / Intuition
Puzzle‑piece model for assembly: Think of reads as jigsaw pieces; overlapping edges let you connect them into larger pictures (contigs), and paired‑end “tags” tell you where distant pieces belong (scaffolds).
Signal‑to‑noise in sequencing: Short‑read platforms give a clean, low‑noise picture of each base (high accuracy) but miss the “big picture” of repeats; long‑read platforms provide the big picture but with more “static” (errors).
---
🚩 Exceptions & Edge Cases
Highly repetitive genomes (e.g., many transposons): Short reads often collapse repeats → need long reads or hybrid assembly.
Polyploid organisms: Multiple homologous chromosome sets complicate de novo assembly; may require haplotype‑specific approaches.
Metagenomic samples: No single reference; assembly must disentangle mixed species—often results in fragmented bins rather than complete genomes.
---
📍 When to Use Which
Choose Sanger when you need >99.9 % accuracy for a limited region (e.g., clinical validation, gap closing).
Choose Illumina short‑read NGS for large‑scale SNP/variant discovery, population genomics, and high‑coverage projects where cost matters.
Choose PacBio / Oxford Nanopore when resolving structural variants, repetitive regions, or assembling a new genome de novo.
Use comparative assembly if a high‑quality reference from a close relative exists; saves compute time and improves scaffold accuracy.
Apply functional annotation pipelines (automatic + manual curation) for any new assembly; rely on BLAST/Ensembl for well‑studied organisms, but add RNA‑seq evidence for novel species.
---
👀 Patterns to Recognize
Coverage spikes in read depth → possible duplicated regions or mapping artifacts.
Consistent mismatches at the same position across many reads → true SNP; random mismatches → sequencing error.
Contig ends enriched for repeat motifs → assembly terminated due to unresolved repeats.
High proportion of “unknown” ORFs → likely novel genes or poor annotation; check for conserved domains.
---
🗂️ Exam Traps
“Long‑read platforms have lower error rates than short‑read platforms.” – Wrong; they have higher per‑base error but provide longer contiguity.
“Coverage of 1× is sufficient to assemble a genome.” – Insufficient; at least 30× (short‑read) or 10‑15× (long‑read) is needed for reliable assembly.
“All genes are annotated automatically with 100 % accuracy.” – False; automatic pipelines miss rare genes, pseudogenes, and require manual curation.
“Shotgun sequencing only works with Sanger reads.” – Incorrect; modern shotgun projects rely on NGS reads.
“Epigenomics studies DNA sequence changes.” – Misleading; it studies reversible chemical modifications (methylation, histone marks) that do not alter the DNA sequence.
or
Or, immediately create your own study flashcards:
Upload a PDF.
Master Study Materials.
Master Study Materials.
Start learning in seconds
Drop your PDFs here or
or