Lack of genome storage: Crucial genetic databases are running out of space
Lack of genome storage: Crucial genetic databases are running out of space
Lack of genome storage: Crucial genetic databases are running out of space
- Author:
- December 15, 2022
Insight summary
With cutting-edge technologies like precision medicine, CRISPR, and gene therapy continuing to evolve, more storage space is needed for their associated data. Along with the increase in genomics data comes a greater demand for computing power to manage it. The long-term implications of the lack of genome storage could include increasing transition to cloud-based systems and investments in faster DNA analysis tools.
Lack of genome storage context
Genomics has become the basis of drug discovery since the 2010s, and it has helped scientists create more precise medicines with a greater chance of succeeding in clinical trials. In 2020, customized treatments accounted for 39 percent of US Food and Drug Administration (FDA)-approved medicines, according to the Personalized Medicine Coalition.
The continual rise in the application of genomics in drug discovery and individualized therapies can be linked to reduced DNA sequencing costs. In 2021, it was estimated by the biotech news site Labiotech that for under USD $1,000, anyone could get their genome sequenced, with results processed within the day. This advancement was due to the success of the Human Genome Project, which cost nearly USD $2.6 billion and took 13 years.
By 2025, it is anticipated that over 100 million genomes will have been sequenced as part of genomic research projects. Efforts by both big pharma and national population genomics groups are already generating massive quantities of data that are sure to rise. The University of Illinois at Urbana-Champaign estimates that by 2025, the world may run out of data storage space for human genomes. The UK’s National Institutes of Health predicts that by 2030, genomics research will create a staggering 2 to 40 exabytes of data.
Disruptive impact
Data compression technologies can help reduce the size and storage costs of genomic data. For example, biotech PetaGene focuses on reducing the size of genomic data, so it’s more manageable. Meanwhile, the research organization The Garvan Institute recognized that the capabilities of existing infrastructure were insufficient for processing a large number of genomes.
Genomic sequencing became even more essential during the COVID-19 pandemic, and the Institute understood that moving to the public cloud was the only way forward. About 14,000 genomes were processed as part of a pilot program launched by the organization to migrate to the public cloud.
Moving to the public cloud may be a feasible solution; using DNA sequencing, each genome needs to be analyzed thoroughly to gain valuable scientific observations. The average big data analytics project for genomics creates 100 gigabytes per genome, which can be too costly for some organizations.
Cloud computing has proven to be an effective method for analyzing massive data sets while avoiding the costs associated with maintaining and upgrading servers. The pay-per-use service allows consumers to rent computational power and storage. However, cloud data storage is not unlimited. Data centers are also bursting to capacity, and other solutions will need to be discovered to address the increasing costs of global data consumption.
Implications of a lack of genome storage
Wider implications of a lack of genome storage infrastructure may include:
- Public cloud providers establishing more profitable partnerships with biopharma firms to host their genomic data.
- End-to-end genomic research being performed on the cloud for faster processing. However, this trend can lead to an increased carbon footprint for the sector.
- Cloud providers implementing edge cloud services to establish data centers near laboratories and research institutions to manage data better.
- Increased investments in faster genomic analysis tools, including enhanced data privacy and security.
- Genomics research, biopharma, and cloud providers being monitored by governments to ensure that they comply with genetic data privacy laws.
- Biopharma firms building their private data centers to reduce reliance on public cloud providers, ensuring greater control over their genomic data and data security.
- A surge in data breaches and cyberattacks targeting valuable genetic information, requiring the development of robust cybersecurity measures across the genomics industry.
- Governments establishing international agreements and standards for genomic data storage and sharing to promote global collaboration in research while safeguarding data privacy and ethical considerations.
Questions to consider
- What are other possible storage solutions for genetic research?
- How can governments and cloud service firms collaborate to ensure there is sufficient storage for healthcare research?
Insight references
The following popular and institutional links were referenced for this insight: