A minimal genome is the smallest group of genes needed for an organism to live and grow in a place with plenty of nutrients and no stress. It can also be described as the set of genes that allow an organism to survive in a laboratory culture with rich nutrients and no other living things. The specific genes that make up a minimal genome may vary based on the environment where the organism lives.
The idea of a minimal genome came from noticing that many genes are not needed for survival. If scientists could collect all the genes that are essential for life, they could create a minimal genome in a controlled environment. Adding more genes to this set could help create organisms with specific traits.
To design a new organism, scientists must identify the smallest number of genes needed for the organism to get energy and reproduce. This can be done by studying the chemical processes inside cells through experiments and computer models. A good example of an organism with a minimal genome is Mycoplasma genitalium, which has a very small genome. Most of its genes are believed to be essential for survival, and scientists have suggested that 256 genes form the minimal set. Once essential genes are identified, scientists can use computer models and laboratory experiments to map important biological processes and build new genomes.
In science, studying minimal genomes helps scientists find the most important genes and simplify genetic systems, making engineered organisms easier to predict. In industry and agriculture, minimal genome research could help create plants that survive harsh conditions, bacteria that produce useful chemicals, or microbes that create helpful biological products.
Contents
According to an early study by Mushegian et al., the smallest genome of a bacterium must include nearly all proteins needed for copying DNA and making proteins, machines that help read DNA instructions, four parts of an RNA-making machine that include special proteins for fixing DNA, several helper proteins, the ability to make energy without oxygen through processes like breaking down sugar and using it directly, changing one amino acid into another in a special molecule, making lipids (but not fatty acids), eight enzymes that help with chemical reactions, tools that move proteins out of the cell, and a simple system for moving small molecules in and out of the cell, including special pumps in the cell membrane. Proteins found in the smallest bacterial genomes are often more similar to proteins in other types of organisms, like archaea and eukaryotes, than the average protein in bacterial genomes. This suggests many of these proteins are shared by nearly all living things. The smallest genomes created based on known genes do not rule out simpler systems in older cells, such as genomes made of RNA that do not need DNA-making tools, which are otherwise part of the smallest genomes in modern cells.
Genes that are most likely to stay even when others are lost are those involved in copying DNA, reading DNA instructions, and making proteins. However, some exceptions exist. For example, parts of the complex that copies DNA and some DNA repair genes are often lost. Most proteins that build ribosomes are kept, though some, like rpmC, may be missing. In some cases, certain enzymes that help build amino acids are also lost. Gene loss is also common in genes that make parts of the cell wall, build molecules like purine, handle energy, and other functions. Genes that can be removed without harming the cell are often left out of minimal genomes.
The smallest genomes are linked to smaller genome sizes because the number of protein-making genes in bacteria is usually about one gene per kilobase. Mycoplasma genitalium, which has a genome size of 580 kilobases and 482 protein-making genes, is a key example used to study minimal genomes.
In nature
The smallest known genome of a free-living bacterium is 1.3 Mb with about 1,100 genes. However, naturally occurring symbiotic and parasitic organisms often have much smaller genomes. In these organisms, genome size can decrease over time due to genetic changes, such as gene deletions, in small or asexual populations. These changes are common in symbionts and parasites, which often experience fast evolution, changes in how certain genetic codes are used, a preference for DNA made of adenine (A) and thymine (T) bases, and increased problems with proteins not folding correctly. These issues require the use of special proteins called molecular chaperones to help other proteins function properly. These changes are often linked to an increase in mobile genetic elements, nonfunctional genes (pseudogenes), changes in genome structure, and the loss of parts of chromosomes.
This happens because symbionts or parasites can rely on the host organism to perform certain cellular tasks, which allows them to lose the genes responsible for those tasks. Beneficial symbionts can reduce their genomes more than parasites because they often work closely with their hosts, allowing them to lose more genes. A key difference between genome reduction in parasites and endosymbionts is that parasites lose both the gene and its function, while endosymbionts may still keep the function if the host takes over the task.
The most extreme examples of genome reduction are seen in endosymbionts that are passed from mother to offspring and have evolved with their hosts for a long time, losing much of their ability to function independently. In some cases, these endosymbionts may lose their entire genome. For example, some organelles called mitosomes and hydrogenosomes (which are simplified versions of mitochondria in certain organisms) have completely lost their genes, while human mitochondria still have a small genome. The human mitochondrial genome is 16.6 kb in size and contains 37 genes. Mitochondria can produce between 3 and 67 different proteins, suggesting that the last common ancestor of eukaryotes had at least 70 genes in its genome. The smallest known mitochondrial genome is found in Plasmodium falciparum, with a size of 6 kb and only three protein-coding genes and a few rRNA genes. In contrast, the largest known mitochondrial genome is 490 kb. Similar small genomes are also found in related organisms called apicomplexans. Meanwhile, the mitochondrial genomes of land plants have grown larger, with some reaching over 11 Mb, which is larger than the genome of some bacteria and simple eukaryotes. Plastids in plants, such as chloroplasts, chromoplasts, and leucoplasts, which were once free-living cyanobacteria, typically have genomes of 100–200 kb with 80–250 genes. In one study of 15 chloroplast genomes, the chloroplasts had between 60 and 200 genes. Across these chloroplasts, 274 different protein-coding genes were identified, but only 44 were found in all of them. Examples of organisms with reduced genomes include species of Buchnera, Chlamydia, Treponema, Mycoplasma, and others. Studies of multiple endosymbiont genomes from the same species and lineage show that even long-term symbionts continue to lose genes and transfer them to the host's nucleus. DNA from mitochondria or plastids that moves into the nucleus is sometimes called "numts" and "nupts."
Many symbionts have genomes smaller than 500 kb, with most being bacterial symbionts of insects, often from the groups Pseudomonadota and Bacteroidota. The parasitic archaeon Nanoarchaeum equitans has a genome of 491 kb. In 2002, some species of Buchnera were found to have genomes as small as 450 kb. In 2021, the endosymbiont Candidatus Azoamicus ciliaticola was found to have a genome of 290 kb. The symbiont Zinderia insecticola has a genome of 208 kb. In 2006, the endosymbiont Carsonella ruddii was found to have a genome of 160 kb with 182 protein-coding genes. Surprisingly, gene loss in Carsonella is still happening. Some genes in other reduced genomes have become pseudogenes, meaning they no longer work because the host performs their function. The genome of Candidatus Hodgkinia cicadicola, a symbiont of cicadas, is 144 kb. In 2011, Tremblaya princeps was found to have an intracellular endosymbiont with a genome of 139 kb, so small that some translation-related genes were lost. In 2013, two leafhopper symbionts were found to have highly reduced genomes: Sulcia muelleri with a genome of 190 kb and Nasuia deltocephalinicola with a genome of only 112 kb and 137 protein-coding genes. Together, these two symbionts can only make ten amino acids and some parts of the machinery for DNA replication, transcription, and translation. Genes for ATP production through oxidative phosphorylation have been lost.
Viruses and virus-like particles have the smallest genomes in nature. For example, the bacteriophage MS2 has a genome of only 3,569 nucleotides (single-stranded RNA) and encodes just four proteins, which overlap to use the genome space efficiently. Similarly, among eukaryotic viruses, porcine circoviruses are among the smallest, encoding only 2–3 open reading frames. Viroids are circular RNA molecules that do not have any protein-coding genes, but the RNA itself acts as a ribozyme to help with its replication. Viroid genomes are between 200 and 400 nucleotides in length.
History
The study of minimal genomes began with a team effort between NASA and two scientists, Harold Morowitz and Mark Tourtellotte, in the 1960s. At that time, NASA was looking for life beyond Earth and believed that if such life existed, it might be very simple. To share his ideas, Morowitz wrote about mycoplasmas, which are tiny and simple living cells. NASA and the scientists decided to try building a living cell using parts from mycoplasmas. Mycoplasmas were chosen because they have only the most basic parts needed for life, such as a cell membrane, structures that make proteins (ribosomes), and a single circular DNA molecule. Morowitz aimed to understand all the parts of a mycoplasma cell at the molecular level. He believed that working together internationally would help achieve this goal.
The plan included these steps:
1. Mapping and fully reading the mycoplasma’s genetic code.
2. Identifying sections of DNA that code for proteins.
3. Determining which proteins each section of DNA produces.
4. Learning what each gene does.
5. Rebuilding the mycoplasma’s cell parts.
By the 1980s, Richard Herrmann’s lab completed the full reading and analysis of the 800,000-letter genome of M. pneumoniae. This process took three years. In 1995, a team from the Institute for Genomic Research (TIGR) in Maryland worked with scientists from Johns Hopkins University and the University of North Carolina. This group studied the genome of M. genitalium, which has a smaller genome of 580,000 letters. This project was completed in six months.
To find the smallest number of genes needed for life, scientists turned off or removed genes one by one and tested the effects. The J. Craig Venter Institute did these experiments on M. genitalium and found that 382 genes are essential for survival.
Later, the J. Craig Venter Institute began a project to create a synthetic organism called Mycoplasma laboratorium, using the essential genes identified from M. genitalium.
As of 1999, two organisms with nearly minimal genomes were Haemophilus influenzae and M. genitalium. Scientists compared their proteins to identify those needed for life. This comparison showed how organisms evolved and helped find genes that are not necessary for survival. Since H. influenzae is Gram-negative and M. genitalium is Gram-positive, scientists expected them to share genes important for all life. However, 244 of the shared proteins did not relate to parasitism. The study suggested that similar functions might be carried out by different proteins. Even when scientists mapped the biochemical processes of both organisms, many pathways were incomplete. Proteins common to both were not related through evolution.
Most research focuses on the original genome of organisms rather than the minimal genome. Studies of these genomes showed that genes shared between H. influenzae and M. genitalium are not always essential for survival. In fact, genes that are not shared were found to be more important. Scientists also learned that proteins do not need to have the same structure or shape to perform the same functions.
JCVI projects
The J. Craig Venter Institute (JCVI) studied the essential genes of M. genitalium using a method called global transposon mutagenesis. They discovered that 382 out of 482 protein-coding genes were essential for the organism's survival. Of these essential genes, 28% encoded proteins with unknown functions. Before this study, JCVI had previously studied non-essential genes of M. genitalium using the same method. However, it was unclear whether the products of these non-essential genes had important biological roles. Only through studies on gene essentiality did JCVI determine a hypothetical minimal gene set required for life.
In their 1999 study of M. genitalium and Mycoplasma pneumoniae, JCVI mapped about 2,200 transposon insertion sites and identified 130 putative non-essential genes. They grew cells transformed with Tn4001, isolated genomic DNA, and sequenced amplicons to locate transposon insertions. Genes with insertions were often hypothetical proteins or considered non-essential.
Some genes initially labeled as non-essential were later found to be essential after further analysis. This error may have occurred because some genes tolerated transposon insertions, cells had duplicate copies of the same gene, or gene products were shared between cells. Insertions disrupted gene function, but without confirming the absence of gene products, researchers mistakenly classified these genes as non-essential. Examples included genes for isoleucyl and tyrosyl-tRNA synthetases, DNA replication (dnaA), and DNA polymerase III subunit a.
The 1999 study was later expanded and updated in 2005. Researchers improved the study by isolating and characterizing transposon insertions in individual colonies, which provided more accurate results. Previously, colonies often contained mixtures of mutants, but a filter cloning approach helped separate these mixtures.
In the 2005 study, the number of non-essential genes was reduced from 130 to 67. Of the remaining 63 genes, 26 were only disrupted in M. pneumoniae, meaning their M. genitalium counterparts were likely essential. After further analysis, JCVI identified 100 non-essential genes in M. genitalium out of 482 protein-coding genes.
The final result of JCVI’s projects was the creation of a synthetic organism, Mycoplasma laboratorium, based on 387 protein-coding regions and 43 structural RNA genes from M. genitalium.
In 2010, JCVI successfully created a synthetic bacterial cell capable of self-replication. The team synthesized a 1.08 million base pair chromosome of a modified Mycoplasma mycoides, named Mycoplasma mycoides JCVI-syn1.0. The DNA was designed on a computer, synthesized, and transplanted into a cell with its original genome removed. The recipient cell’s molecules and reaction networks used the artificial DNA to produce daughter cells, which replicated further.
The first phase of the project took 15 years. Researchers designed a digitized genome of M. mycoides, building 1,078 DNA cassettes, each 1,080 base pairs long. These cassettes overlapped by 80 base pairs at the ends. The assembled genome was transplanted into yeast cells and grown as yeast artificial chromosomes. After refinement and gene reduction, JCVI-syn3.0 was constructed.
Number of essential genes
The number of essential genes varies between different organisms. For example, the genetic code of E. coli can be reduced by about 30%, showing that this species can survive with fewer genes than it has in nature. Each organism has a different number of essential genes, depending on the specific organism tested. The number also depends on the environment in which the organism is studied. In some bacteria and other microbes like yeast, scientists have removed genes one by one to find out which genes are necessary for survival. These experiments are usually done using rich media, which provides all the nutrients a cell needs. However, when all nutrients are available, genes that help make nutrients are not considered essential. When cells are grown on minimal media, which has very few nutrients, more genes become essential because the cells may need to produce nutrients like vitamins themselves. The numbers in the table below were collected using rich media.
The essential gene counts in the table came from the Database of Essential Genes (DEG), except for B. subtilis, where the data was taken from the Genome News Network. The organisms listed in the table were tested in a planned way to identify essential genes.
The following table lists minimal genome projects, which show the number of essential genes for different organisms, along with the methods used in each study.