Toggle light / dark theme

Generalization Dynamics of LM Pre-training

An AI has a limited amount of “capacity” (brainpower). Early in training, it develops quick, shallow circuits to memorize data because that’s the easiest way to get the right answer. Later, it develops complex circuits for actual reasoning. Because space is limited, these two internal systems are constantly competing for control. Whichever type of data the AI happens to be reading in a specific moment determines which circuit wins the battle.


People typically assume that LMs stably mature from pattern-matching parrots to generalizable intelligence during pre-training. We build a toy eval suite and show this mental model is wrong: throughout pre-training, LMs frequently and suddenly hop between parrot-like and intelligence-like modes, i.e. distinct algorithms implemented by distinct circuits. We call this mode-hopping. Across our suite, LMs can suddenly latch onto memorized or in-context patterns instead of in-context learning, use System 1 instead of System 2 thinking, pick up what sounds true instead of what is true, fail at multi-hop persona QA, out-of-context reasoning, and emergent misalignment — then just as suddenly revert and generalize. Mode-hopping is not explained by standard optimization dynamics: it is locally stable and can not be fixed by checkpoint averaging. We instead think of it as a capacity allocation problem: in a capacity-bounded model, generalizable circuits must compete with the shallow ones learned early in training, and the data in each pre-training window decides which circuits win. Our suite provides a cheap set of pre-training monitors and a new lens on generalization. Building upon our insights, we demonstrate three applications: (i) select intermediate pre-training checkpoints that strongly generalize reasoning and alignment, better than the final pre-or mid-training checkpoints, (ii) select pre-training data that controls and stabilizes generalization dynamics, and (iii) test prior generalization predictors, falsifying the monolithic belief that “simpler solutions generalize better”

Building general AI without generalization is doable but meh. We want an intelligence that learns deep, transferable structure, not a parrot that matches shallow patterns. Real generalization would unblock many today’s key open problems: data-efficient (online) learning, shortcut learning, transfer capabilities from verifiable domains (math, coding) to broader non-verifiable yet economically valuable domains, and maintain a coherent character that truly aligns with human values.

The distinction between parrots and intelligence is computational. Parrots repeat in-context patterns; intelligence infers in-context functions. Parrots encode a persona as bags of disconnected facts and traits; intelligence learns a shared persona representation that connects all. Parrots memorize reasoning steps; intelligence forms general reasoning circuits for entity tracking, backtracking, or even for highly abstract concepts like truth.

Designing better quantum circuits with AI

Researchers from the group of theoretical physicist Hans Briegel have collaborated with NVIDIA to develop an AI method that automatically generates efficient quantum circuits, a key bottleneck in making quantum computers practically useful.

The work was published in Machine Learning: Science and Technology, in a paper titled “Synthesis of discrete–continuous quantum circuits with multimodal diffusion models.”

Before a quantum computer can perform any useful task, a quantum algorithm needs to be translated into a sequence of elementary quantum operations, known as quantum gates. Writing these quantum circuits efficiently is one of the hardest open problems in the field.

New quantum algorithm solves “impossible” materials problem in seconds

A new quantum-inspired algorithm has cracked a problem so massive that conventional supercomputers struggle to even approach it. Researchers used the method to simulate extraordinarily complex quantum materials known as quasicrystals, opening the door to powerful new quantum devices and ultra-efficient electronics. The work could help scientists design advanced topological qubits and materials for future quantum computers.

Quobly Toolbox Explores Quantum Phase Estimation Pipeline With Tensor Networks

An international collaboration between a French quantum startup and a major Taiwanese electronics manufacturer has yielded a new open-source tool for exploring a critical area of quantum computing. Quobly and Taiwan’s Hon Hai Research Institute, the R&D arm of Foxconn, jointly released a numerical toolbox dedicated to the Quantum Phase Estimation (QPE) algorithm, described as a cornerstone of fault-tolerant quantum computing with major applications in quantum chemistry and materials science. While QPE’s theoretical benefits are understood, simulating its practical resource needs has proven difficult; the toolbox aims to bridge this gap by allowing researchers to explore implementations and their implications. The tool focuses on practical, interpretable numerical experiments, enabling full circuit executions for up to 20 qubits and circuits ranging from 1,000 to 100,000 gates on standard laptops.

Quantum Phase Estimation Toolbox for Molecular Systems

While the theoretical underpinnings of QPE are well established, simulating its practical demands has proven a significant hurdle, limiting exploration beyond simplified models. The toolbox addresses this gap by offering a platform for practical, interpretable numerical experiments, allowing scientists to investigate QPE implementations without requiring access to full-scale quantum hardware, which is currently unavailable. Built upon advanced tensor network techniques and the open-source quimb library, the toolbox facilitates the preparation of initial states using DMRG and matrix product states, and allows encoding of molecular Hamiltonians into quantum circuits through methods like trotterization and qubitization. Researchers can directly compare standard QPE with the single-ancilla Robust Phase Estimation (RPE) method, analyzing circuit depth, gate counts, and potential error sources.

String theory is uniquely derived from basic assumptions about the universe, physicists show

If you could take an apple and break it into smaller and smaller parts, you would find molecules, then atoms, followed by subatomic particles like protons and the quarks and gluons that make them up. You might think you hit the bottom, but, according to string theorists, if you keep going to even smaller scales—about a billion billion times smaller than a proton—you will find more: tiny vibrating strings.

Developed in the 1960s, string theory proposes that everything in the universe is made from invisible strings. The theory arose as a possible solution to the problem of “quantum gravity,” the quest to align quantum mechanics, which describes our world at the smallest scales, with the general theory of relativity, which explains how our universe works on the largest scales (and includes gravity). Researchers have tried to reconcile the two theories—asking, for example, how gravity behaves in the quantum realm—but their equations go berserk, or in mathematical terms, go to infinity.

String theory is a mathematical solution that tames the unruly infinities. It purports that all particles, including the graviton—the hypothetical particle believed to convey the force of gravity—are generated by very small vibrating strings. The math behind string theory requires the strings to vibrate in at least 10 dimensions, rather than the four we live in (three for space and one for time), which is one of the reasons some scientists are not convinced that string theory is correct. But perhaps the biggest challenge for the theory is the ultrahigh energies required for testing it: Such an experiment would require a particle collider the size of a galaxy.

Engineered proteins store digital files with 30 times density at one-tenth cost

Massive volumes of digital data are generated every day from AI training, big data analytics and smart devices. As conventional hard drives and cloud storage are increasingly constrained by high costs, limited capacity, high power consumption and short lifespans, molecular data storage has emerged as a breakthrough storage alternative.

Researchers at The Hong Kong Polytechnic University (PolyU) have pioneered a method that uses engineered proteins to store digital data and, for the first time, completed the full process from data storage to data retrieval in de novo designed unnatural proteins.

This demonstrates the potential of establishing a protein-based storage framework with sustainability, high storage capacity and high stability, offering a promising solution to the explosive AI-generated growth in data globally.

String Theory Emerges from “Almost Nothing”

Developed in the 1960s, string theory proposes that everything in the universe is made from invisible strings. The theory arose as a possible solution to the problem of “quantum gravity,” the quest to align quantum mechanics, which describes our world at the smallest scales, with the general theory of relativity, which explains how our universe works on the largest scales (and includes gravity). Researchers have tried to reconcile the two theories—asking, for example, how gravity behaves in the quantum realm—but their equations go berserk, or in mathematical terms, go to infinity.

String theory is a mathematical solution that tames the unruly infinities. It purports that all particles, including the graviton—the hypothetical particle believed to convey the force of gravity—are generated by very small vibrating strings. The math behind string theory requires the strings to vibrate in at least 10 dimensions, rather than the four we live in (three for space and one for time), which is one of the reasons some scientists are not convinced that string theory is correct. But perhaps the biggest challenge for the theory is the ultrahigh energies required for testing it: Such an experiment would require a particle collider the size of a galaxy.

What is a physicist to do? One way they can probe the theory is to turn to a “bootstrap” approach, in which researchers start with certain assumptions they believe to be true about the universe, and then see what laws emerge out of those assumptions. In a new paper titled “Strings from Almost Nothing,” accepted for publication in Physical Review Letters, Caltech researchers, and their colleagues at New York University and Institut de Fisica d’Altes Energies in Barcelona, have done just that. From a couple of basic assumptions about how particles should scatter off one another at very high energies, they derived the elements of string theory.

Universal Bridge Theorem

We proved that our Universe was made from AI Algorithm.


What if spacetime itself is the result of a gigantic self-learning quantum neural network? 🤯🌌

A new framework called the Universal Bridge Theorem (UBT) proposes a deep equivalence between:

🧠 Neural network training.
and.
🌌 The evolution of spacetime geometry.

The proposal combines:

/* */