A probabilistic framework for knowledge acquisition, separating simple information intake from insight-generating knowledge that drives exponential growth.
A mathematical framework formalising the multiplicative growth patterns observed across expert learning domains — distinguishing simple information intake from insight-generating integration, where existing knowledge catalyses understanding of new information. Closed-form solutions cover all three growth regimes; a stochastic extension models individual-level variance; optimal control proves front-loaded study schedules yield 55–96% higher cumulative knowledge than constant pacing; and an information-theoretic result ties the leverage coefficient $\alpha$ to the mutual information between new and existing knowledge.
Learning a new theorem, a mathematician doesn't simply add a fact — it reveals connections, new approaches, patterns across domains. Each connection is additional knowledge beyond the original input. Every learning event has probability $p$ of triggering a «eureka moment» that compounds on existing knowledge — the more you know, the more likely the next insight: a self-reinforcing cycle.
In plain terms: knowledge grows when new material sticks (r), some of it sparks extra insight (p), and some of it fades from memory (δ). The equations below make that precise.
| Symbol | Name | Description |
|---|---|---|
| K(t) | Knowledge base | Total concepts, facts, skills or connections at time t |
| r(t) | Learning rate | New learning opportunities encountered per unit of time |
| p(t) | Insight probability | Probability that a learning event generates extra derivative ideas |
| I | Insight size | Random variable: number of extra insights per insight event |
| δ | Forgetting rate | Proportional decay — knowledge lost as a fraction of current |
| α | Knowledge leverage | How much each unit of existing knowledge raises future insight probability |
| λ | Growth coefficient | $r\alpha \cdot \mathbb{E}[I] - \delta$ — determines whether growth accelerates or plateaus |
The sign of $\lambda$ determines your entire learning trajectory.
At baseline ($r=0.5$, $\alpha=0.02$, $\mathbb{E}[I]=2.0$, $\delta=0.05$, $p_0=0.20$), $\lambda = -0.03$ — plateau regime. Which parameter pushes the system fastest towards accelerating?
The deterministic model captures average behaviour. In reality, individual trajectories vary a lot — some «take off», others with identical parameters stagnate. Modelled with a Cox–Ingersoll–Ross SDE (Stochastic Differential Equation — an equation describing how a quantity evolves under randomness; CIR is a well-known form of it used in economics):
The equations below describe how knowledge "grows" over time — you don't need to read the math to follow the conclusion.
Given fixed total effort $R = \int_0^T r(t)\,dt$, which learning schedule maximises cumulative knowledge $\int_0^T K(t)\,dt$? The answer follows from Optimal Control theory (Pontryagin et al., 1962).
| Schedule | Pattern | Rank | vs Constant | Mechanism |
|---|---|---|---|---|
| Front-loaded | r_max → 0 | 1st | +55–96% | Cumulative leverage — early K boosts all future p(t) |
| Spaced | r̄(1 + 0.8 sin) | 2nd | +4% | Reduced interference |
| Constant | r̄ | 3rd | ref. | Baseline |
| Back-loaded | 0 → r_max | 4th | −33% | Forgetting in idle period |
The coefficient $\alpha$ is a 25× stronger lever than the learning rate $r$ (see Sensitivity Analysis). But what determines it physically?
| Domain | α (estimate) | δ (estimate) | p₀ | Regime (K=50) |
|---|---|---|---|---|
| Chess (grandmaster) | 0.06–0.10 | 0.02–0.04 | 0.35–0.55 | Accelerating |
| Clinical medicine | 0.03–0.06 | 0.03–0.06 | 0.20–0.40 | Borderline |
| Intro. mathematics | 0.02–0.05 | 0.04–0.08 | 0.15–0.30 | Plateau → Linear |
| Factual recall | 0.01–0.02 | 0.06–0.10 | 0.10–0.20 | Plateau |
A theoretical framework is only valuable if it can be falsified. The model produces three classes of falsifiable predictions, testable with suitable longitudinal data.
The framework generates concrete, testable predictions about which strategies should be most effective — those that push $\lambda$ above zero by maximising $p$ and $\alpha$.
The model introduces significant abstractions. Acknowledging them doesn't weaken it — it precisely delineates where it holds and opens directions for extension.
The question «how to learn more effectively» has a quantitative answer: maximise α, not r. The sign of λ = rαE[I] − δ decides whether knowledge self-reinforces or hits a ceiling — and that's testable. Closed-form solutions, bang-bang optimal control with +55–96% gain from front-loading, and three falsifiable predictions make this framework testable, not merely descriptive.