AI Research

Latest papers, benchmarks, and research developments

Paper of the Day

Autonomous Agent Architectures with Hierarchical Memory and Self-Reflective Planning

Sarah Chen, James Park, Aisha Patel · Mar 27, 2026

We present a new agent architecture that combines hierarchical episodic memory with self-reflective planning loops. The system maintains a structured memory of past interactions, retrieves relevant episodes during planning, and critically evaluates its own reasoning before acting. On the AgentBench suite, our architecture achieves state-of-the-art results, outperforming the previous best by 12 points. We find that self-reflective planning is the single most impactful component, accounting for over half the improvement.

cs.AIcs.CLcs.LG

Latest Papers

Scaling Sparse Mixture-of-Experts to 10 Trillion Parameters with Dynamic Routing

Yichen Zhang, Priya Nair, Tomás Rivera · Mar 25, 2026

We present a novel dynamic routing mechanism for sparse mixture-of-experts architectures that enables efficient scaling to 10 trillion parameters. Our approach reduces inference cost by 40% compared to dense models of equivalent capability while maintaining competitive performance across standard benchmarks.

cs.AIcs.LG

ReasonGraph: Chain-of-Thought Verification via Directed Acyclic Proof Structures

Sarah Chen, Marcus Weber, Aisha Patel · Mar 23, 2026

We introduce ReasonGraph, a framework that structures chain-of-thought reasoning as directed acyclic graphs with verifiable proof nodes. This approach catches 87% of reasoning errors that sequential chain-of-thought methods miss, significantly improving mathematical problem solving.

cs.AIcs.CL

Multi-Agent Constitutional AI: Emergent Cooperation in Self-Governing Language Models

James Park, Li Wei, Elena Sokolova · Mar 22, 2026

We study emergent cooperative behavior in systems of multiple constitutional AI agents tasked with self-governance. Our experiments demonstrate that groups of 8 or more agents reliably converge on stable behavioral norms that align with human preferences without explicit reward shaping.

cs.AIcs.CLcs.LG

TokenFormer: Replacing Attention with Learned Token Interactions at Scale

David Kim, Fatima Al-Rashid, Igor Petrov · Mar 20, 2026

We propose TokenFormer, an architecture that replaces standard self-attention with learned pairwise token interaction functions. On language modeling benchmarks, TokenFormer achieves comparable perplexity to transformers while reducing memory requirements by 60% for long contexts.

cs.LGcs.CL

Grounding Language Models in Real-Time Sensor Data for Robotic Manipulation

Anna Kowalski, Raj Mehta, Yuki Tanaka · Mar 18, 2026

We present a method for grounding large language models in continuous real-time sensor streams for robotic manipulation tasks. Our system processes tactile, visual, and proprioceptive data at 100Hz, enabling language-guided dexterous manipulation with a 94% task success rate.

cs.AIcs.LG

Federated Reinforcement Learning from Human Feedback Across Distributed Deployments

Michael Torres, Chloe Dubois, Kenji Nakamura · Mar 16, 2026

We introduce a federated approach to RLHF that enables distributed model improvement without centralizing sensitive preference data. Our protocol achieves 95% of the alignment quality of centralized RLHF while preserving user privacy across deployment boundaries.

cs.AIcs.LG

Emergent Tool Use in Language Agents Without Explicit Tool Descriptions

Sophie Martin, Ahmed Hassan, Laura Gomez · Mar 14, 2026

We demonstrate that language agents can discover and learn to use novel tools through environmental interaction alone, without explicit tool descriptions or documentation. Agents trained with our method successfully utilize 78% of previously unseen APIs within 10 interaction steps.

cs.AIcs.CL

Compression-Aware Training: Producing Models That Quantize Without Quality Loss

Robert Yang, Natasha Ivanova, Felix Braun · Mar 12, 2026

We propose compression-aware training, a method that produces models resilient to post-training quantization down to 2-bit precision. Models trained with our approach retain 99.2% of their full-precision performance after aggressive quantization, enabling efficient edge deployment.

cs.LGcs.AI

Benchmark Tracker

BenchmarkClaude Opus 4.6GPT-4.5Gemini 2.5 ProLlama 4
MMLU92.490.891.186.3
HumanEval95.193.794.288.9
GPQA74.671.272.863.5

Scores represent published results as of March 2026. Higher is better.