Pre Trining Post-Training Reasoning

Microsoft and China AI Research Possible Reinforcement Pre-Training Breakthrough

Reinforcement Pre-Training (RPT) is a new method for training large language models (LLMs) by reframing the standard task of predicting the next token in a sequence as a reasoning problem solved using ...

NextBigFuture

All the Fundamental Concepts for AGI are Here

Bob McGrew, OpenAI’s former Head of Research, led OpenAI from the GPT-3 breakthrough to today’s reasoning models. Three main pillars of AGI—Transformers are scaled pre-training, post-training and ...

Analytics Insight

AGIBOT’s Scalable Online Post-Training Is Pushing Robots Beyond Static Intelligence

For years, progress in robotics has followed a familiar pattern. Researchers train increasingly powerful ...

CoinTelegraph

The AI pre-training age will soon come to an end — OpenAI co-founder

OpenAI co-founder Ilya Sutskever recently lectured at the Neural Information Processing Systems (NeurIPS) 2024 conference in Vancouver, Canada, arguing that the age of artificial intelligence ...

VentureBeat

Ai2's new Olmo 3.1 extends reinforcement learning training for stronger reasoning benchmarks

The Allen Institute for AI (Ai2) recently released what it calls its most powerful family of models yet, Olmo 3. But the company kept iterating on the models, expanding its reinforcement learning (RL) ...

VentureBeat

New AI architecture delivers 100x faster reasoning than LLMs with just 1,000 training examples

Singapore-based AI startup Sapient Intelligence has developed a new AI architecture that can match, and in some cases vastly outperform, large language models (LLMs) on complex reasoning tasks, all ...

mccormick.northwestern.edu

Training Reasoning Agents in Interactive, Complex Environments

Chatbots can make quick work of routine e-commerce customer service tasks and information retrieval. Sephora’s Smart Skin Scan, for example, provides personalized product recommendations, while Lowe’s ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results