In a collector’s edition embossed full metal jacket…
As physicists, we often think about information in terms of entropy, computation, or fundamental limits. But what about the sheer data volume contained within a biological system as complex as Homo sapiens? After some thought, we explored the storage requirements if we were to digitise two key components of human biological information: the genomic sequence and a high-resolution neuro-connectome derived from advanced imaging modalities.
Let’s break down the numbers.
The Genomic Data Footprint.
The human genome comprises approximately 3.1×109 base pairs. Given that each base pair (considering the canonical Watson-Crick pairing rules reduce the degrees of freedom per pair compared to independent bases) can effectively be encoded using 2 bits (e.g., A-T as 00, T-A as 11, C-G as 01, G-C as 10), the total information content of the haploid human genome sequence is roughly:
3.1×109 base pairs×2 bits/base pair=6.2×109 bits.
Converting this to more familiar storage units (using the standard 1 byte=8 bits and 1 MB=10242 bytes):
6.2×109 bits/8 bits/byte=7.75×108 bytes.
7.75×108 bytes/(1024 bytes/KB×1024 KB/MB)≈739.1 MB.
So, the fundamental sequence information of one copy of the human genome is surprisingly compact, roughly equivalent to the data on a standard CD-ROM.
Incorporating the Neuro-Connectome
Beyond the static genetic code, the dynamic and structural connectivity of the brain represents a vastly larger information space. Capturing a detailed human neuro-connectome using modalities like fMRI (functional connectivity, temporal dynamics) and PET (metabolic activity, receptor distribution) generates significant data volumes. The exact size is highly variable depending on spatial and temporal resolution, scan duration, and the complexity of the derived connectomic models (e.g., adjacency matrices, graph representations).

Referencing data from large-scale projects like the Human Connectome Project, raw and minimally pre-processed fMRI data for a single subject can range from tens to potentially hundreds of gigabytes. A comprehensive, processed connectome dataset integrating multiple modalities would likely fall into the gigabyte range per individual. For our calculation, let’s use a representative figure of 30 GB for a processed fMRI/PET neuro-connectome dataset, acknowledging this is a simplified estimate.
Converting the neuro-connectome data to megabytes:
30 GB×1024 MB/GB=30,720 MB
The total estimated data size for the human blueprint (DNA) plus the neuro-connectome is:
739.1 MB (DNA)+30,720 MB (Neuro-connectome)=31,459.1 MB
This combined dataset is approximately 31.5 GB.

Storage Media Requirements:

Now, let’s consider how this combined data volume maps onto common optical storage media:
- CD-R (700 MB capacity):31,459.1 MB/700 MB/CD-R≈44.94Requires 45 CD-Rs.
- DVD-R (4.7 GB ≈ 4812.8 MB capacity):31,459.1 MB/4812.8 MB/DVD-R≈6.54Requires 7 DVD-Rs.
- BD-ROM (Single-layer, 25 GB ≈ 25600 MB capacity):31,459.1 MB/25600 MB/BD-ROM≈1.23Requires 2 single-layer BD-ROMs.
- BD-ROM (Dual-layer, 50 GB ≈ 51200 MB capacity):31,459.1 MB/51200 MB/BD-ROM≈0.61Requires 1 dual-layer BD-ROM.
In conclusion, while the fundamental genetic sequence is relatively compact in terms of digital storage, the complexity captured by advanced neuroimaging modalities significantly increases the data footprint. Storing a combined dataset of one individual’s genome and a representative neuro-connectome moves from requiring multiple DVDs to fitting comfortably on a single dual-layer Blu-ray disc. This highlights the scale difference between the relatively static genetic blueprint and the highly complex, spatially and temporally resolved data representing brain connectivity.
Leave a Reply