A groundbreaking Japanese genome study reveals best practices for managing massive DNA databases
In an era when large-scale human genome analysis was not yet commonplace, the Tohoku Medical Megabank Organization (ToMMo) embarked on an ambitious project, launching its genome cohort study a decade ago. After ten years of meticulous work, they are now sharing invaluable insights into the techniques required to analyze, manage, maintain, and update a genomic database of an impressive 100,000 individuals. These findings, published in the JMA Journal on October 3, 2025, are a treasure trove of knowledge for researchers worldwide, offering a roadmap to advance genome research and lay the foundation for personalized genetic healthcare.
The study's first author, Fumiki Katsuoka, emphasizes the significance of this achievement, stating, "As large-scale genome sequencing becomes more prevalent, we wanted to share our extensive learnings from these ten years. We are proud that some of our unique techniques are now adopted by other institutions."
ToMMo's journey began in 2013 with the ambitious goal of completing whole genome sequencing for 100,000 Japanese individuals. Whole genome sequencing is a complex process that deciphers the entire DNA sequence, the fundamental building blocks of life that shape our unique identities. However, conducting in-depth analysis on such a massive scale presents significant challenges, with technical and operational limitations that are a hurdle for many countries even today.
Maintaining high accuracy and consistent quality required meticulous planning, optimized equipment, and the development of innovative techniques, as Katsuoka explains, "Maintaining high accuracy and consistent quality required careful planning, optimized equipment, and developing innovative new techniques."
ToMMo's approach was twofold. In the early phase, they developed a method named qMiSeq, which involved small-scale sequencing analyses for each group of samples (typically 96 samples), and the optimal sequencing conditions were determined based on the data volume. This method proved effective in optimizing sequencing processes. Later, they introduced a protocol named iDeal, which divided the sequencing of each group into multiple runs to ensure equal data yield, further enhancing efficiency.
These findings are not just a technical achievement but also a testament to the importance of transparency in scientific research. ToMMo's 100,000 genome project data, including frequency and summary data, are freely available on jMorp and widely used by researchers worldwide. Individual-level genome data, however, are accessible under specific conditions, following a rigorous application-based review process.
As the field of genome analysis expands, the potential for healthcare innovation grows. The insights from this study will serve as a valuable resource for the genomics community in Japan and beyond, contributing to the advancement of genomic medicine and personalized prevention. This study not only showcases the power of collaboration and innovation but also highlights the importance of sharing knowledge in the pursuit of scientific progress.