Figure 1. Overview of Genome and Multi-omics Analysis in the Tohoku Medical Megabank Project. The TMM Project recruited 80,000 participants for its community-based cohort and 70,000 participants for its Birth-and-Three-Generation cohorts. As illustrated, various types of samples have been collected along with data on participants’ health, lifestyle, and medical history. Follow-up surveys are conducted every 5 years. Utilizing these samples and their associated rich datasets, genome and multi-omics analyses are actively being carried out. TMM: Tohoku Medical Megabank.

From: Advancements in Whole-Genome Sequencing Protocols: A Decade of In-House Operations and Quality Controls at the Tohoku Medical Megabank

Figure 2. Pedigree information available in the 100K Repository. The repository contains pedigree data for 16,757 pairs and 9,038 trios, which include either parents and their child or grandparents and one parent. In addition, the repository contains 181 hepta-families, each comprising two sets of grandparents, parents, and their child. Light blue represents the grandfather, while pink represents the grandmother. Dark blue represents the father, and red indicates the mother. Black indicates the child.

From: Advancements in Whole-Genome Sequencing Protocols: A Decade of In-House Operations and Quality Controls at the Tohoku Medical Megabank

Figure 3. The annual progress of the TMM WGS project. The TMM genome reference panels have been continuously released since 2015 (red), and the TMM repositories, including samples related to family pedigrees, have been available since 2022 (blue). The schematic diagram on the left shows the HiSeq 2500, which was used at the project’s inception, while the diagram on the right shows the NovaSeq 6000, which accelerated the project. In March 2024, we achieved the milestone of completing the WGS of 100,000 participants. Informatics work is underway to release a new reference panel TMM-61KJPN, and a repository based on this dataset in 2025.
TMM: Tohoku Medical Megabank; WGS: whole-genome sequencing.

From: Advancements in Whole-Genome Sequencing Protocols: A Decade of In-House Operations and Quality Controls at the Tohoku Medical Megabank

Figure 4. QC workflow for HiSeq 2500. Pooled new libraries, along with a few libraries previously sequenced on the HiSeq 2500, are processed through small-scale sequencing on the MiSeq to obtain the index ratio. When the new libraries are pooled in equal volumes, the index ratio reflects the relative concentration of each library. Based on this relative concentration, the run conditions for the HiSeq 2500 are determined.
QC: quality control.

From: Advancements in Whole-Genome Sequencing Protocols: A Decade of In-House Operations and Quality Controls at the Tohoku Medical Megabank

Figure 5. The QC workflow for the NovaSeq series. QC1 involves the measurement of the concentration of the pooled new libraries using the Qubit. QC2 is the assessment of library size using the TapeStation RNA Kit. QC3 is iDeal (initial run-based data equalization), which involves performing multiplex sequencing runs multiple times. Based on the relative concentration, the volume of each sample for re-pooling is determined.
QC: quality control.

From: Advancements in Whole-Genome Sequencing Protocols: A Decade of In-House Operations and Quality Controls at the Tohoku Medical Megabank

Figure 6. Representative data of the library size assessment using TapeStation D1000 and TapeStation RNA. With TapeStation D1000, the Illumina PCR-free library of ~600 bp migrates as a larger fragment, typically around 1000 bp or more, due to its Y-shape adapters (A and B). In contrast, with TapeStation RNA, the Illumina PCR-free library of ~600 bp migrates as expected, as the library is denatured and migrates as a single strand (C and D).
PCR: polymerase chain reaction; RNA: ribonucleic acid.

From: Advancements in Whole-Genome Sequencing Protocols: A Decade of In-House Operations and Quality Controls at the Tohoku Medical Megabank

Figure 7. Representative data of iDeal-based sequencing. A total of 96 pooled libraries were sequenced on the NovaSeq 6000 using three S4 flow cells. In the first run, equally pooled libraries were sequenced to obtain relative concentration data for each library based on its index ratio. Subsequently, using the obtained relative concentration data, the 96 libraries were re-pooled with adjusted volumes to ensure consistent final data by sequencing with the remaining two flow cells. The number of reads from each library in the first run is shown in blue, while those from the second and third runs are shown in orange and red, respectively.

From: Advancements in Whole-Genome Sequencing Protocols: A Decade of In-House Operations and Quality Controls at the Tohoku Medical Megabank

Figure 8. Snapshot of mean coverage data at the CYP2D6 gene locus. The data, sourced from the jBrowser embedded in jMorp, is displayed. The chromosome position and RefSeq genes at the locus are shown above. The mean coverage, with a map quality score (MAPQ) ≥20, is calculated from the WGS data of 1000 samples sequenced on each platform and protocol shown on the left. Inaccessible regions for all protocols are indicated by blue arrows, while an accessible region with the 161 or 162 bp PE protocol is marked with red arrows. Regions where DNBSEQ-T7 shows low mean coverage are indicated by green arrowheads.
PE: paired-end.

From: Advancements in Whole-Genome Sequencing Protocols: A Decade of In-House Operations and Quality Controls at the Tohoku Medical Megabank

PAGE TOP