News
 
Gravatar
Pin on Pinterest

Overview

Bioluminescence is a particularly interesting phenomenon, and its origin and evolution have long fascinated biologists. Fireflies (Lampyridae) are one of the best-known luminescent organisms, and thus an important subject of scientific studies, especially related to their bioluminescent behavior and biochemistry. Together with other luminous beetles, such as Rhagophthalmidae, Phengodida e, and some Elateridae. Fireflies pass Luciferase catalyzes luciferin for bioluminescence. The sequence, structure, and function of firefly luciferase have long been extensively studied, resulting in numerous molecular, biomedical, pharmaceutical, and bioanalytical applications.

However, the genetic basis and evolutionary features behind the firefly luciferase gene remain unclear to scientists, and little information about fireflies is available in public databases. Sequencing the firefly genome is needed to improve understanding and explore the mechanisms underlying the complex features of its life history. The PacBio SMRT and Oxford Nanopore sequencing platforms can generate high-quality genomes for fireflies. Integrating in-depth studies of multiple levels of data (including comparative genomics, proteomics, and transcriptomics of luminescent organs and their 3D reconstruction, in vitro experimental functional validation of genes, and CRISPR/Cas9 gene editing) can provide new perspectives on bioluminescence and light patterning for luciferin biosynthesis, origin and evolution.

The pathway of luciferin biosynthesis proposed based on multilevel data.

The pathway of luciferin biosynthesis proposed based on multilevel data. (Zhang et al., 2020)

Advantages of Long-read Genome Sequencing in Fireflies

Integrated Genome Assembly

Conventional short-read sequencing technologies often have difficulty dealing with repetitive genomic regions. For Aquatica Lateralis, whose genome complexity is similar to that of its close relatives Abscondita cerata and Lamprigera yunnana, such repetitive sequences are critical to understanding its unique biology. Long-read sequencing can span these problematic regions to produce a continuous and complete genome assembly.

Resolving Complex Regions

In addition to simple repeats, firefly genomes can contain complex structural variants that are critical to their bioluminescence and behavior. Long-read sequencing excels at mapping these regions, providing insights into the unique characteristics of fireflies, such as the origin of bioluminescence and its evolution.

Enhanced Annotation Capabilities

Given that long-read sequencing produces longer DNA fragments, it facilitates improved gene annotation, especially for those genes that may be segmented or lost in short-read assemblies. For organisms like fireflies, this is critical for a comprehensive understanding of their genetic functional landscapes, from bioluminescence to mating behavior.

 

Applications of Long-read Genome Sequencing in Fireflies

Evolutionary Insights

Long-read sequencing of Aquatica Lateralis, based on the draft genomes of species such as Abscondita cerata, can paint a holistic picture of firefly evolution. In turn, this could help researchers identify when specific traits (such as different light patterns or UV sensitivity in vision) appear in their evolutionary timeline.

Biotechnological Potential

The luciferase enzyme responsible for firefly luminescence is already widely used as a reporter gene in molecular biology and in biomedical imaging. A deeper understanding of the Aquatica Lateralis genome may reveal new proteins or pathways that can be used for biotechnological applications.

Conservation Research

As human activities reshape landscapes, understanding the genetic adaptations and resilience of species becomes critical. Lengthy genomic insights into Aquatica Lateralis could inform natural resource conservationists about the vulnerability of fireflies, contributing to their conservation in their natural habitat.

 

Read More:

merip seq

microbial whole genome sequencing

Reference

  1. Zhang, Ru, et al. "Genomic and experimental data provide new insights into luciferin biosynthesis and bioluminescence evolution in fireflies." Scientific reports. 10.1 (2020): 15882.

Gravatar
Pin on Pinterest
long-read-sequencing-for-population-scale-genomic-

Population genetics and precision health research rely on large genomic datasets. Long-read sequencing from Pacific Biosciences and Oxford Nanopore Technologies (ONT) has achieved a level of accuracy and throughput that allows for the progression from single genomes and small populations of individuals to the detection of variation in large-scale populations. Population-scale genomic studies are important, including reflecting the genetic diversity of target populations, detecting challenging genomic regions, serving as a resource for population genetics, translational research, and drug discovery, etc.

Long-read Sequencing for Population-scale Genomic Study

Overview of population-scale studies using long-read sequencing. (De Coster et al., 2021)

Overview

Sequencing the deoxyribonucleic acid (DNA) or messenger ribonucleic acid (mRNA) of different individuals in single or multispecies populations (known as population-scale sequencing) is fundamentally aimed at revealing allelic variation in macroscopic population profiles. This approach provides a critical scaffold for addressing multifaceted queries spanning the research fields of evolutionary biology, agronomic biotechnology, and translational medicine. Historical precedents of population-centric genomic studies, especially genome-wide association studies (GWAS), have always faced challenges in capturing the full range of genetic determinants of human phenotypic expression and pathological manifestations. This gap in understanding can largely be attributed to the intricate network of structural variation (SV). These SVs include inversions, deletions, and other complex chromosomal rearrangements that often remain elusive in the face of traditional sequencing methods.

High-throughput short-read sequencing platforms are characterized by read lengths that fluctuate between 25 base pairs (bp) and an upper limit of 400 bp. Their abilities are often hampered when they are tasked with deciphering variations hidden within the "dark matter" regions of the genome. Furthermore, they do not perform well in accurately resolving broad or complex variants. These obstacles not only compromise the integrity of genetic inferences derived from ancestry cohort datasets, but ultimately lead to a weakened, if not fragmented, understanding of the intricate interplay between genetic markers and disease etiology.

Emerging on this horizon is the promising field of long-read sequencing. This format enables the interrogation of genomic fragments spanning considerable contiguous lengths. The resulting capability is a holistic characterization of SVs across the human genomic landscape, setting the stage for an era dominated by population-scale long-read sequencing. By leveraging this cutting-edge technique, researchers are poised to unearth previously mysterious SVs with important links to phenotypic expression in humans, crops, fruit flies, and even birds such as songbirds. This paradigm shift is not just a technological advance, but marks a transformative leap in metagenomic research, heralding unprecedented insights and breakthroughs.

Project Strategies for Population-scale Sequencing

At the start of a population-scale sequencing project, there are multiple strategies with specific budget requirements to consider, as shown below. These strategies allow for different sizes and budgets, which can have an impact on the level of resolution at which genetic variants are detected.

Full Coverage Approach

This strategy is designed to sequence every sample from a population with moderate to high coverage, allowing for the highest level of resolution. The main criterion for determining the coverage required for each sample is whether it is assembled from scratch (requiring >40-fold coverage) or a reference-based comparison method (requiring >12-fold coverage42 ). The advantages of this strategy are its comprehensiveness, simplicity of study design, and relatively simple computational workflow. In addition, the samples are similarly covered and therefore equally well-studied, and rare variations in each sample can be easily detected.

Mixed Coverage Approach

In a "mixed-coverage" approach, a subset of samples representing subgroups (e.g., ethnicities or subgroups) of a cohort is sequenced at high coverage, and the remaining samples are sequenced at low coverage. Although this approach is generally less expensive than the full coverage approach, it still achieves higher overall detection sensitivity and is therefore particularly suitable for studies with a large number of individuals or a limited budget. However, some analytical challenges remain, especially in achieving high accuracy of genotypes across multiple samples or in distinguishing somatic versus heterozygous germline variants, which is further complicated by regions exhibiting recurrent mutations. In addition, this hybrid coverage approach will certainly bias against common alleles, as many rare alleles may be missed, especially when a locus is heterozygous and alternative alleles are therefore sparsely covered.

Hybrid Sequencing Methods

This approach involves sequencing only a small number of samples (e.g., 10-20% of all samples) with long reads, sequencing the remaining samples with short reads, and genotyping the variants found in the long reads. Once a subset of samples has been sequenced using the long read technique to produce a set of identified SVs, they can be genotyped for their breakpoint coordinates in the short read long sequence dataset. In this way, robust allele frequencies for the identified variants can be obtained. This strategy has been applied to diversity panels of human SVs to discover new expression quantitative trait loci (eQTL) and evolutionarily adapted traits.

Long-read Sequencing for Population-scale Genomic Study

Overview of long-read population study design. (De Coster et al., 2021)

The Importance of Long-Read Sequencing Technology in Population-Scale Study

One of the inherent challenges of population genetics is the accurate phasing of haplotypes-determining specific combinations of alleles located on a single chromosome. Long-read sequencing provides an effective solution by capturing longer DNA fragments, which can directly determine haplotype structure without relying on computational prediction or family-based studies. This capability is transformative for population-scale studies, where understanding the distribution and combination of specific allele sets can help decipher population history, migration patterns, and shared inheritance patterns.

Structural variants, such as deletions, duplications, and inversions, can have profound effects on gene function and expression. Capturing these variants at high resolution is critical when studying large populations. Long-read sequencing can identify structural variants that may be overlooked or inaccurately represented by short-read methods.

Population-scale Sequencing Downstream Analysis Methods

The choice of analytical tools is critical for downstream analysis at the population scale. Prior to downstream analysis, quality control must be performed on experimental factors that directly affect the performance of assembly, SV detection, and read-sequencing phases. There are several strategies for population-scale downstream analysis:

Read Alignment-based Analysis

Comparison-based methods are often the preferred approach for population-scale studies because they facilitate the comparison of all samples to a common coordinate system (i.e., the reference genome). In addition, these methods are usually less computationally demanding and require much less coverage than compilation-based methods. Comparison-based methods rely on matching sequencing reads to a reference genome, the overall correctness of which will affect the analysis of the read data.

Software for analyzing long-read sequence data, such as NGMLR and LAST methods, speeds up the matching process and improves the accuracy of long-read matching. In addition, a variety of tools for detecting genetic variation can eliminate the need for high sequencing coverage by enabling SV calling and genotyping at lower coverage.

Population-scale De Novo Assemblies

Traditional reference genomes, often based on short-read sequencing, can be fragmented and may miss key sequences. Such omissions may lead to significant differences, including false-positive or false-negative variant identifications. Therefore, there is an urgent need to construct and compare scratch assemblies.

The increased availability and affordability of long-read sequencing data have led to an explosion of faster and more accurate genome assembly toolsDe novo assembly-based methods are often more sensitive and better suited to reconstructing highly diverse regions of the genome than comparison-based methods. The increasing yield of long-read sequencing technologies will allow sufficient coverage of each sample to be sequenced for high-quality de novo assembly.

Graph Genome Methods

Both read matching and de novo assembly methods can have systematic problems with complex structural variants, missing insertion sequences in the reference genome, repetitive variants, and highly polymorphic loci. A major benefit of graph genomes is the use of short reads for genotyping SVs. In addition, with this graph-based approach, for population studies, the often discussed dichotomy of using an existing reference genome for alignment or constructing a new reference genome by assembling it from scratch can be avoided since downstream of this step all sequences have to be aligned with the backbone of the individual (reference) assembly or pan-genome map for identification of variants, annotation, and statistical evaluation.

Learn more:

circular rna sequencing

ribosome footprinting

References

  1. De Coster, Wouter, Matthias H. Weissensteiner, and Fritz J. Sedlazeck. "Towards population-scale long-read sequencing." Nature Reviews Genetics 22.9 (2021): 572-587.
  2. Rech, Gabriel E., et al. "Population-scale long-read sequencing uncovers transposable elements associated with gene expression variation and adaptive signatures in Drosophila." Nature Communications 13.1 (2022): 1948.