Sapiens Garden
HumanGenomeMeta =
⟨
Alphabet Σ = {A, C, G, T};
GenomeLength:
n ≈ 3.05×10^9 // base pairs of nuclear DNA, reference GRCh38
Composition:
P(A) ≈ 0.295
P(T) ≈ 0.295
P(C) ≈ 0.205
P(G) ≈ 0.205
GC = P(G)+P(C) ≈ 0.41
AT = P(A)+P(T) ≈ 0.59
Information:
Entropy H ≈ 1.9–2.0 bits/base // including local sequence correlations
Structure:
Chromosomes = 22 autosomes + X, Y
Karyotype ≈ 46, XX/XY
Repeats:
RepetitiveFraction ≈ 0.5 // LINE, SINE/Alu, LTR and other repeats
SequenceModel:
Order k = 2 // second-order Markov chain over Σ
For all u,v ∈ Σ:
For all x ∈ Σ:
P(x | u,v) = T[u,v,x]
Constraint: ∑_x T[u,v,x] = 1
// Transition matrix T is estimated from the human reference genome Homo sapiens (GRCh38)
VariationModel:
SNP_rate ≈ 1×10^-3 per base // ~1 SNP per 1000 base pairs between humans
Indel_rate ≈ 1×10^-4 per base
N_SNP ~ Poisson(SNP_rate × n)
N_Indel ~ Poisson(Indel_rate × n)
MutationMatrix M(a→b), a,b ∈ Σ, defines substitution probabilities
⟩