Sapiens Garden

HumanGenomeMeta =
⟨
  Alphabet Σ = {A, C, G, T};

  GenomeLength:
    n ≈ 3.05×10^9  // base pairs of nuclear DNA, reference GRCh38

  Composition:
    P(A) ≈ 0.295
    P(T) ≈ 0.295
    P(C) ≈ 0.205
    P(G) ≈ 0.205
    GC = P(G)+P(C) ≈ 0.41
    AT = P(A)+P(T) ≈ 0.59

  Information:
    Entropy H ≈ 1.9–2.0 bits/base  // including local sequence correlations

  Structure:
    Chromosomes = 22 autosomes + X, Y
    Karyotype ≈ 46, XX/XY

  Repeats:
    RepetitiveFraction ≈ 0.5  // LINE, SINE/Alu, LTR and other repeats

  SequenceModel:
    Order k = 2  // second-order Markov chain over Σ

    For all u,v ∈ Σ:
      For all x ∈ Σ:
        P(x | u,v) = T[u,v,x]
      Constraint: ∑_x T[u,v,x] = 1

    // Transition matrix T is estimated from the human reference genome Homo sapiens (GRCh38)

  VariationModel:
    SNP_rate   ≈ 1×10^-3 per base   // ~1 SNP per 1000 base pairs between humans
    Indel_rate ≈ 1×10^-4 per base

    N_SNP   ~ Poisson(SNP_rate × n)
    N_Indel ~ Poisson(Indel_rate × n)

    MutationMatrix M(a→b), a,b ∈ Σ, defines substitution probabilities

⟩
Made on
Tilda