Let the Speedrun Search Itself
Eval-gated config-only autoresearch on the canonical super fp8 lane
Reproducing Canon, mHC, and Engram
A research narrative: wrong starts, PhysicsLM4 alignment, and one real polysemy failure
RDEP
keeping sparse expert compute hot across a whole NVLink fabric
Do MoE Experts Need Different Learning Rates?
Why Moonlet's old 15x expert-LR rule overshoots in bf16 AdamW
The Atlas Hypothesis
why output-only dashboards cannot name what pretraining built, and what a real receipt would have to measure
Super-4096
Loss keeps improving while routing collapses under extreme sparsity
NVFP4 Dynamics
Why our NVFP4 recipe lagged BF16, and what actually closed almost all of the gap
What Are We Holding Fixed?
Dense-vs-MoE comparison depends on the fairness contract; a failed `#420` transfer exposed the real problem
The Speedrun Loop
A small-model speedrun is our fastest honest instrument for architecture research
Make It Measurable
What to track when loss isn't enough
What We Built
A production-grade MoE training system, because reproducibility is the experiment
Why Training MoEs is So Hard
Three failure modes that make frontier MoE training qualitatively different