Toggle light / dark theme

Midtraining Bridges Pretraining and Posttraining Distributions

Recently, many language models have been pretrained with a “midtraining” phase, in which higher quality, often instruction-formatted data, is mixed in at the end of pretraining. Despite the popularity of this practice, there is little scientific understanding of this phase of model training or why it is effective. In this work, we conduct the first systematic investigation of midtraining through controlled experiments with language models pretrained from scratch and fine-tuned on supervised finetuning datasets in different domains. We find that when compared after supervised fine-tuning, the effectiveness of midtraining is highest in the math and code domains, where midtraining can best reduce the syntactic gap between pretraining and posttraining data. In these cases, midtraining consistently outperforms continued pretraining in both in-domain validation loss as well as pretraining data forgetting after posttraining. We conduct ablations on the starting time of the midtraining phase and mixture weights of the midtraining data, using code midtraining as a case study, and find that timing has a greater impact than mixture weights, with earlier introduction of specialized data, yielding greater benefits in-domain as well as preserving general language modeling better. These findings establish midtraining as a domain adaptation technique that compared to continued pretraining yields better performance through reduced forgetting.

Artificial neurons replicate biological function for improved computer chips

Researchers at the USC Viterbi School of Engineering and School of Advanced Computing have developed artificial neurons that replicate the complex electrochemical behavior of biological brain cells.

The innovation, documented in Nature Electronics, is a leap forward in neuromorphic computing technology. The innovation will allow for a reduction of the chip size by orders of magnitude, will reduce its energy consumption by orders of magnitude, and could advance artificial general intelligence.

Unlike conventional digital processors or existing neuromorphic chips based on silicon technology that merely simulate neural activity, these physically embody or emulate the analog dynamics of their biological counterparts. Just as neurochemicals initiate brain activity, chemicals can be used to initiate computation in neuromorphic (brain-inspired) . By being a physical replication of the biological process, they differ from prior iterations of artificial neurons that were solely mathematical equations.

Unit-free theorem pinpoints key variables for AI and physics models

Machine learning models are designed to take in data, to find patterns or relationships within those data, and to use what they have learned to make predictions or to create new content. The quality of those outputs depends not only on the details of a model’s inner workings but also, crucially, on the information that is fed into the model.

Some models follow a brute force approach, essentially adding every bit of data related to a particular problem into the model and seeing what comes out. But a sleeker, less energy-hungry way to approach a problem is to determine which variables are vital to the outcome and only provide the model with information about those key variables.

Now, Adrián Lozano-Durán, an associate professor of aerospace at Caltech and a visiting professor at MIT, and MIT graduate student Yuan Yuan, have developed a theorem that takes any number of possible variables and whittles them down, leaving only those that are most important. In the process, the model removes all units, such as meters and feet, from the underlying equations, making them dimensionless, something scientists require of equations that describe the physical world. The work can be applied not only to machine learning but to any .

Researcher improves century-old equation to predict movement of dangerous air pollutants

A new method developed at the University of Warwick offers the first simple and predictive way to calculate how irregularly shaped nanoparticles—a dangerous class of airborne pollutant—move through the air.

Every day, we breathe in millions of , including soot, dust, pollen, microplastics, viruses, and synthetic nanoparticles. Some are small enough to slip deep into the lungs and even enter the bloodstream, contributing to conditions such as heart disease, stroke, and cancer.

Most of these are irregularly shaped. Yet the mathematical models used to predict how these particles behave typically assume they are perfect spheres, simply because the equations are easier to solve. This makes it difficult to monitor or predict the movement of real-world, non-spherical—and often more hazardous—particles.

Gravitational wave events hint at ‘second-generation’ black holes

In a paper published in The Astrophysical Journal Letters, the international LIGO-Virgo-KAGRA Collaboration reports on the detection of two gravitational wave events in October and November of 2024 with unusual black hole spins. This observation adds an important new piece to our understanding of the most elusive phenomena in the universe.

Gravitational waves are “ripples” in that result from cataclysmic events in deep space, with the strongest waves produced by the collision of black holes.

Using sophisticated algorithmic techniques and mathematical models, researchers are able to reconstruct many physical features of the detected black holes from the analysis of gravitational signals, such as their masses and the distance of the event from Earth, and even the speed and direction of their rotation around their axis, called spin.

Mathematical proof unites two puzzling phenomena in spin glass physics

A fundamental link between two counterintuitive phenomena in spin glasses—reentrance and temperature chaos—has been mathematically proven for the first time. By extending the Edwards–Anderson model to include correlated disorder, researchers at Science Tokyo and Tohoku University provided the first rigorous proof that reentrance implies temperature chaos.

Spin glasses are in which atomic “spins,” or tiny magnetic moments, point in random directions rather than aligning neatly as in a regular magnet. These disordered spins can remain stable for extremely long periods of time, possibly even indefinitely. This frozen randomness gives rise to unusual physical properties not seen in any other physical system.

To describe the spin glass behavior, physicists use models such as the Edwards–Anderson (EA) model, which simulates how spins interact in two or three dimensions—conditions that more closely reflect real-world systems than the well-studied mean-field model. Numerical studies of the EA model have uncovered two strange and counterintuitive phenomena: reentrant transitions and temperature .

Music of the Spheres and the Lessons of Pythagoras

I. Using simple mathematics, Pythagoras was able to describe the basis of almost all musical scales, including the pentatonic, the Western, the chromatic and the Arabic scales. This shows the power and excitement of science. For the first time, Pythagoras could answer the question, WHY? Why are these notes and scales special? The answer is that they are formed in a simple, systematic, and mathematical manner. Most importantly, Pythagoras showed that the notes are not random or arbitrary and that they could be understood on a deeper level.

II. Pythagorass discoveries bring up a deeper psychology question: scales were first developed by ear: we and the Neanderthals choose these particular notes before there was any understanding of mathematics or physics. The notes were chosen simply because they were pleasing to the ear. But, as it turns out, the scales also follow basic mathematical constructs. So the question is, what does this say about our likes and emotions? Is there a mathematical/physical basis to them, as well?

III. The power of spectroscopy. What Pythagoras did was look a physical system (the musical scale), found characteristic frequencies (pitches/notes) and found simple mathematical relationships between the frequencies (ratios of 3/2, for example). This process actually became a fundamental part of physics, and modern physics, in particular.

Statistical mechanics method helps machines better understand complex systems

A study by University of Hawaiʻi researchers is advancing how we learn the laws that govern complex systems—from predator-prey relationships to traffic patterns in cities to how populations grow and shift—using artificial intelligence (AI) and physics.

The research, published in Physical Review Research, introduces a new method based on to improve the discovery of equations directly from noisy real-world data. Statistical mechanics is a branch of physics that explains how collective behavior emerges from individual particles, such as how the random motion of gas molecules leads to predictable changes in pressure and temperature.

In this new work, statistical mechanics is used to understand how different mathematical models “compete” when trying to explain a system. This matters because many scientific fields rely on understanding how systems change over time, whether tracking disease spread, analyzing or predicting the stock market. But real-world data is often messy, and traditional AI models can be unreliable when the data gets noisy or incomplete.

Mathematical model reveals why cracks sharpen during rapid rubber fracture

A research group from the University of Osaka, Zen University, and the University of Tokyo has mathematically uncovered the mechanism that causes crack tips to sharpen during the rapid fracture of rubber.

The bursting of balloons or tire blowouts is caused by rapid fracture, a phenomenon in which a small crack propagates instantaneously. During this process, the crack tip sharpens, accelerating the fracture. However, the reason behind this sharpening had long remained unexplained. Traditionally, it was believed to result from the material’s complex nonlinear effects.

The research group—comprising Hokuto Nagatakiya, a doctoral student; Shunsuke Kobayashi, assistant professor; and Ryuichi Tarumi, professor at the University of Osaka; along with Naoyuki Sakumichi, associate professor at Zen University and project associate professor at the University of Tokyo—has mathematically solved the problem of crack propagation. They derived equations that describe both the shape of the crack and the overall deformation of the material.

/* */