Cognitive Memory in Large Language Models -> https://huggingface.co/papers/2504.02441
Covers what is used specifically in LLMs: external memory, KV-cache methods, parameter-based approaches, and hidden-state models, with concrete techniques for storage, retrieval, and compression.MemOS: A Memory OS for AI System -> https://huggingface.co/papers/2507.03724
Introduces MemOS, a memory operating system for LLMs that unifies parameter, activation, and external memories, for explicit memory management, lower training and inference costs, and more updatable knowledge across interactionsMemEvolve: Meta-Evolution of Agent Memory Systems -> https://huggingface.co/papers/2512.18746
MemEvolve is a framework that shows how to jointly adapt agent experience and memory architecture. It also presents EvolveLab, a modular codebase for comparing memory designs, showing improved performance and transfer across tasks and LLMs
Ksenia Se
AI & ML interests
Organizations
If models forget everything, how can they be reliable? AI systems need to remember past interactions, update knowledge, stay consistent over time, and work beyond a single prompt. That's why many start to talk more about memory in AI.
Here’s a useful set of studies and videos on where AI memory stands today:
1. Memory in the Age of AI Agents (2512.13564)
A great survey that organizes agent memory research. It gives concrete taxonomies across memory form, function, and dynamics, summarizes benchmarks, frameworks, and emerging directions for building systematic agent memory systems
2.When Will We Give AI True Memory? A conversation with Edo Liberty, CEO and founder @ Pinecone -> https://youtu.be/ITbwVFZYepc?si=_lAbRHciC740dNz0
Edo Liberty discusses what real memory in LLMs requires beyond RAG - from scalable vector storage to reliable knowledge systems - and why storage, not compute, is becoming the key bottleneck for building dependable AI agents.
3. Why AI Intelligence is Nothing Without Visual Memory | Shawn Shen on the Future of Embodied AI -> https://youtu.be/3ccDi4ZczFg?si=SbJg487kwrkVXgUu
Shawn Shen argues AI needs a separate, hippocampus-like memory to move beyond chatbots, enabling long-term visual memory, object permanence, and on-device intelligence for robots, wearables, and the physical world
4. From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs (2504.15965)
Links human memory types to LLM memory, introduces a taxonomy across object, form, and time, and identifies concrete limitations and future research directions
5. Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions -> https://arxiv.org/abs/2505.00675v2
Proposes a concrete taxonomy, core operations, and research directions to systematically organize and advance agent memory systems.
Read further below ⬇️
If you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe
Prompt templates – reusable prompt structures with placeholders for inputs.
Retrieval-augmented prompting – injecting external information into the prompt. ⟶ https://arxiv.org/abs/2512.04106
P.S. These are only the main prompt engineering techniques, but there are much more. Here's a good survey where you can find them: https://arxiv.org/pdf/2402.07927
▪️ Context Engineering (designing the information environment the model operates in)
Retrieval-Augmented Generation (RAG) – dynamically injecting external knowledge retrieved from databases, search, or vector stores.
Tool calling / function calling – enabling the model to use external tools. (APIs, calculators, code, search). They return extra info that the model needs to perform the task.
Structured context – providing schemas, JSON, tables, or graphs instead of free-form text.
System prompts / policies – persistent high-level instructions that govern behavior across interactions. ⟶ https://arxiv.org/abs/2212.08073
Short-term memory – passing recent interaction history or intermediate state; summarizing information from the ongoing conversation. ⟶ https://arxiv.org/pdf/2512.13564
Long-term memory – storing and retrieving user profiles, facts, or past decisions and conversations over time. ⟶ https://arxiv.org/abs/2503.08026
Environment state – exposing the current world, task, or agent state (files, variables, observations).
Multi-agent context – sharing state or messages between multiple LLM-based agents. ⟶ https://arxiv.org/abs/2505.21471
Earlier on, we relied on clever prompt wording, but now structured, complete context matters more than just magic phrasing. The next year is going to be a year of context engineering which expands beyond prompt engineering. The two complement each other: prompt engineering shapes how we ask, while context engineering shapes what the model knows, sees, and can do.
To keep things clear, here are the main techniques and design patterns in both areas, with some useful resources for further exploration:
▪️ 9 Prompt Engineering Techniques (configuring input text)
1. Zero-shot prompting – giving a single instruction without examples. Relies entirely on pretrained knowledge.
2. Few-shot prompting – adding input–output examples to encourage model to show the desired behavior. ⟶ https://arxiv.org/abs/2005.14165
3. Role prompting – assigning a persona or role (e.g. "You are a senior researcher," "Say it as a specialist in healthcare") to shape style and reasoning. ⟶ https://arxiv.org/abs/2403.02756
4. Instruction-based prompting – explicit constraints or guidance, like "think step by step," "use bullet points," "answer in 10 words"
5. Chain-of-Thought (CoT) – encouraging intermediate reasoning traces to improve multi-step reasoning. It can be explicit ("let’s think step by step"), or implicit (demonstrated via examples). ⟶ https://arxiv.org/abs/2201.11903
6. Tree-of-Thought (ToT) – the model explores multiple reasoning paths in parallel, like branches of a tree, instead of following a single chain of thought. ⟶ https://arxiv.org/pdf/2203.11171
7. Reasoning–action prompting (ReAct-style) – prompting the model to interleave reasoning steps with explicit actions and observations. It defines action slots and lets the model generate a sequence of "Thought → Action → Observation" steps. ⟶ https://arxiv.org/abs/2210.03629
Read further ⬇️
Also subscribe to Turing Post: https://www.turingpost.com/subscribe
AI coding is moving fast, and it’s getting harder to tell what actually works. Agents, workflows, context management and many other aspects are reshaping how software gets built.
We’ve collected a set of resources to help you understand how AI coding is evolving today and what building strategies work best:
1. AI Agentic Programming: A Survey of Techniques, Challenges, and Opportunities (2508.11126)
Provides a clear taxonomy, compares agent architectures, and exposes practical gaps in tools, benchmarks, and reliability that AI coding agents now struggle with
2. Does AI-Assisted Coding Deliver? A Difference-in-Differences Study of Cursor's Impact on Software Projects (2511.04427)
This survey from Carnegie Mellon University shows causal evidence that LLM agent assistants deliver short-term productivity gains but have lasting quality costs that can slow development over time
3. A Survey of Vibe Coding with Large Language Models (2510.12399)
Turns Vibe Coding from hype into a structured field, categorizing real development workflows. It shows which models, infrastructure, tool requirements, context, and collaboration setups affect real software development outcomes
4. From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence (2511.18538) (from Chinese institutes and companies like ByteDance and Alibaba)
Compares real code LLMs, shows how training and alignment choices affect code quality and security, and connects academic benchmarks to everyday software development
5. Build Your Own Coding Agent via a Step-by-Step Workshop⟶ https://github.com/ghuntley/how-to-build-a-coding-agent
A great guide that covers the basics of building an AI-powered coding assistant – from a chatbot to a file reader/explorer/editor and code search
6. State of AI Coding: Context, Trust, and Subagents⟶ https://www.turingpost.com/p/aisoftwarestack
Here is our in-depth analysis of where AI coding is heading and the new directions we see today – like agent swarms and context management importance – offering an emerging playbook beyond the IDE
If you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe
Superposition Yields Robust Neural Scaling → https://neurips.cc/virtual/2025/loc/san-diego/poster/116346
Controlling superposition in toy models and checking real LLMs, researchers show that strong superposition naturally creates the familiar “bigger model = lower loss” power laws, explaining when scaling laws work and when they might failWhy Diffusion Models Don’t Memorize: The Role of Implicit Dynamical Regularization in Training → https://neurips.cc/virtual/2025/loc/san-diego/poster/119372
Shows diffusion models hit an early “good samples” phase and a later memorization phase. Larger datasets widen the "generalization window," avoiding overfitting much longer and revealing implicit regularizationDoes RL Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? → https://neurips.cc/virtual/2025/loc/san-diego/poster/119944
Explains that while RLVR makes models better at finding correct answers efficiently, it doesn’t create really new reasoning abilities. RLVR models mostly reuse patterns already present in the base model, highlighting the need for better RL to unlock reasoning gains1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities → https://neurips.cc/virtual/2025/loc/san-diego/poster/115731
Simply making RL models way deeper, up to 1024 layers, can massively improve self-supervised RL, letting agents learn far better behaviors from scratch and boosting performance by 2-50× on locomotion and manipulation tasksTitans + MIRAS: Helping AI have long-term memory → https://research.google/blog/titans-miras-helping-ai-have-long-term-memory/
Titans is a new architecture with a deep MLP memory that updates itself during inference using a “surprise” signal, letting the model keep important info, forget noise, and handle million-token contexts with RNN-like speed and Transformer-like accuracyGenerative Data Augmentation via Diffusion Distillation, Adversarial Alignment, and Importance Reweighting → https://neurips.cc/virtual/2025/loc/san-diego/poster/116854
Introduces DAR-GDA, which distills diffusion models into a fast one-step generator, aligns them with real data via adversarial training, and reweights synthetic samples to remove biasSlow Transition to Low-Dimensional Chaos in Heavy-Tailed RNNs → https://arxiv.org/abs/2505.09816
Shows that RNNs with brain-like heavy-tailed weights don’t behave like Gaussian ones. They shift and widen the edge-of-chaos transition but reduce the system’s effective dimensionality.Evaluating multiple models using labeled and unlabeled data → https://arxiv.org/abs/2501.11866
Introduces Semi-Supervised Model Evaluation (SSME), a way to evaluate classifiers using both labeled and unlabeled data by modeling how predictions relate to true labels, giving far more accurate performance estimates when labeled data is limitedRiemannian Consistency Model → https://arxiv.org/abs/2510.00983
Extends consistency models to curved spaces, enabling few-step generation that stays on the manifold, using exponential maps and covariant derivatives, and works well on spheres, tori, and 3D rotationsBioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model → https://arxiv.org/abs/2505.23579
BioReason links a DNA model with an LLM so the LLM can reason over genomic data, yielding clear biological explanations and strong accuracy gains on pathway and variant prediction tasksNFL-BA: Near-Field Light Bundle Adjustment for SLAM in Dynamic Lighting → https://asdunnbe.github.io/NFL-BA/NeurIPS2025_NFL_BA.pdf
Introduces NFL-BA, a SLAM loss that models near-field lighting so systems work better in settings like endoscopy or dark indoor scenes, yielding large improvements in camera tracking and mapping
NeurIPS 2025, as a premier annual event in machine learning and computational neuroscience, tackles major topics like the future of AI, current research, and the most difficult challenges. While we’re not attending this year, we’re closely following the updates and today we pull together a quick, easy-to-digest roundup of a few standout papers so you can jump in without getting overwhelmed.
Here is a list of 15 papers from NeurIPS 2025, including 8 top research papers that received awards, along with 7 others that caught our attention:
1. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks → https://neurips.cc/virtual/2025/loc/san-diego/test-of-time/128328
Test of Time Award winner. Introduces the RPN, a small convnet that predicts objectness and boxes on shared features, enabling Faster R-CNN to share computation and run around 5 fps on a GPU
2. Artificial Hivemind: The Open-Ended Homogeneity of LMs (and Beyond) → https://neurips.cc/virtual/2025/loc/san-diego/poster/121421
Releases a huge open-ended prompt dataset, showing that LLMs often fall into an “artificial hivemind” – generate surprisingly similar answers – and measuring diversity collapse
3. Optimal Mistake Bounds for Transductive Online Learning → https://neurips.cc/virtual/2025/loc/san-diego/poster/119098
Settles a 30-year-old question by showing how much unlabeled data helps in online learning – it gives a precise quadratic advantage with tight matching bounds
4. Gated Attention for LLMs: Non-linearity, Sparsity, and Attention-Sink-Free → https://neurips.cc/virtual/2025/loc/san-diego/poster/120216
Demonstrates how gating actually affects attention: a simple sigmoid gate after Scaled Dot-Product Attention (SDPA) boosts performance, stability, and long-context behavior by adding useful nonlinearity and sparse modulation
Read further below ⬇️
Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe
QuantAgent → https://huggingface.co/papers/2509.09995
A multi-agent LLM system for high-frequency trading in real time. It splits the job between 4 agents – Indicator, Pattern, Trend, and Risk – to make quick, precise decisions, based on short-term market signalsMAC-Flow → https://huggingface.co/papers/2511.05005
Learns complex multi-agent coordination with a flow model and distills it into fast one-step policies, providing diffusion-level coordination with Gaussian-level real-time speedMrlX → https://github.com/AQ-MedAI/MrlX
A multi-agent RL framework where 2 agents talk through a multi-turn dialogue (Agent A initiates it, Agent B engages in responses), learn from each other, and update their models in a continuous “generate → train → sync” loop. The agents co-evolve and get better at collaborative decision-making over timeM-GRPO for Multi-Agent Deep Research → https://huggingface.co/papers/2511.13288
This training method lets different agents in a MAS use their own specialized LLMs while still learning together. It gives each agent its own local reward signal and aligns their uneven trajectories, so they stay coordinated even when running at different speeds or on different serversMarsRL→ https://huggingface.co/papers/2511.11373
Trains the Solver, Verifier, and Corrector agents together with separate rewards for each and a pipeline-style RL setup, which makes them better at catching mistakes and refining answers and reaching much higher accuracy on math benchmarks
The idea to split tasks across multiple agents instead of relying on one universal agent is now seen as one of the most effective ways to build an AI stack. Concepts like “agent swarms” were highlighted at the AI Engineer Code Summit in NYC (Nov 20–21) as the winning architecture. And this trend is not only about coding and software. It applies across all AI domains.
So here is some recent research that helps keep multi-agent systems (MAS) better and up-to-date:
1. LatentMAS → Latent Collaboration in Multi-Agent Systems (2511.20639)
AI agents share their hidden "thoughts" directly in latent space instead of talking through text. This makes collaboration and reasoning way faster and accurate (no extra training needed)
2. Puppeteer → Multi-Agent Collaboration via Evolving Orchestration (2505.19591)
Uses a “puppeteer” LLM that dynamically decides which agents (“puppets”) to call and in what order. By learning this orchestration with reinforcement learning (RL), the system solves complex tasks more efficiently and with fewer compute costs
3. MADD → MADD: Multi-Agent Drug Discovery Orchestra (2511.08217)
A MAS with 4 agents for drug discovery. It lets researchers describe a drug discovery task in plain language. Then MADD automatically builds and runs the full hit-identification pipeline, making AI-driven drug design a simple end-to-end workflow
4. Multi-Agent Tool-Integrated Policy Optimization (MATPO) → Multi-Agent Tool-Integrated Policy Optimization (2510.04678)
Lets one LLM act as multiple agents (like a planner and a worker) by using different prompts and training them together with RL. So you get the benefits of a multi-agent system without needing multiple models
If you're interested in trends in multi-agent for software development of the future, explore my article with the emergent playbook. This is super interesting → https://www.turingpost.com/p/aisoftwarestack
Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe
Read further below ⬇️
A Survey of Large Language Model-Powered Spatial Intelligence Across Scales: Advances in Embodied Agents, Smart Cities, and Earth Science -> https://arxiv.org/abs/2504.09848
Explores spatial memory and reasoning in LLMs, and compares spatial intelligence across scales – from agents to cities to the planet – offering a framework and insights for futureSITE: Towards Spatial Intelligence Thorough Evaluation → https://arxiv.org/abs/2505.05456
Introduces the SITE benchmark to evaluate spatial intelligence in vision-language models across modalities and scales
In AI, spatial intelligence is basically the model’s “sense of space” – its ability to understand where things are, how they relate, and how they move. It lets an AI models navigate a room, interpret a scene, or figure out how objects fit together, like giving it a built-in mental map. For example, world models can't live without spatial intelligence.
Here are 6 good reads to explore what spatial intelligence is and how it's evolving:
1. From Words to Worlds: Spatial Intelligence is AI’s Next Frontier by Fei-Fei Li → https://drfeifei.substack.com/p/from-words-to-worlds-spatial-intelligence
Fei-Fei Li, the godmother of AI, is a key figure in spatial intelligence, since her work in computer vision, especially ImageNet, helped AI learn to recognize and understand objects in space. She's recently started a blog, and this post, in particular, argues that true intelligence requires grounding in space, understanding geometry, motion and consequences in the real world
2. Spatial Reasoning in Multimodal LLMs: A Survey of
Tasks, Benchmarks and Methods → https://arxiv.org/abs/2511.15722
Breaks down how AI models handle spatial reasoning from a cognitive angle, maps all the existing tasks and benchmarks to that framework
3. What is Spatial Intelligence? → https://www.turingpost.com/p/cvhistory5
Our special article easily explains what spatial intelligence actually is, why it matters, and how researchers are trying to boost it so machines can better understand and navigate the physical world
4. From 2D to 3D Cognition: A Brief Survey of General World
Models → https://arxiv.org/pdf/2506.20134
Shows how AI world models are evolving from simple 2D perception to full-on 3D understanding, explaining the tech behind it, what new 3D abilities these models gain, and where they’re used in the real world
Read further below ⬇️
If you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe
- TD-JEPA (Temporal difference JEPA) → https://huggingface.co/papers/2510.00739
An unsupervised RL method that uses TD learning to model long-term latent dynamics, training encoders and a policy-conditioned predictor for zero-shot reward optimization
5 Iconic JEPA types:
I-JEPA (Image-based) → https://huggingface.co/papers/2301.08243
Masks out parts of an image and predicts their latent representation from the remaining context region. Uses Vision Transformers; no pixel-level reconstruction neededV-JEPA (Video-based) → https://huggingface.co/papers/2404.08471
Predicts future or missing frame embeddings from observed frames. Learns temporal dynamics without contrastive negatives or text supervision
- V-JEPA 2 trained on 1M+ hours of internet videos and a little bit of robot interaction data. It can watch, understand, answer questions, and help robots plan and act in physical world → https://huggingface.co/papers/2506.09985
MC-JEPA (Motion-Content) → https://huggingface.co/papers/2307.12698
Jointly learns motion (optical flow) and content features with a shared encoder. It combines a flow prediction task with a standard image representation task (VICReg) in one modelA-JEPA (Audio-based) → https://huggingface.co/papers/2311.15830
Extends JEPA to audio spectrograms. Masks time-frequency patches of the spectrogram (with a curriculum strategy) and predicts their latent features from the unmasked contextTI-JEPA (Text-Image) → https://huggingface.co/papers/2503.06380
Aligns text and image embeddings in a shared latent space via an energy-based predictive objective
We break down how JEPA works and its main ideas in this comprehensive article: https://www.turingpost.com/p/jepa
Check out more JEPA types here:
Since Yann LeCun together with Randall Balestriero released a new paper on JEPA (Joint-Embedding Predictive Architecture), laying out its theory and introducing an efficient practical version called LeJEPA, we figured you might need even more JEPA. Here are 7 recent JEPA variants plus 5 iconic ones:
1. LeJEPA → LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics (2511.08544)
Explains a full theory for JEPAs, defining the “ideal” JEPA embedding as an isotropic Gaussian, and proposes the SIGReg objective to push JEPA toward this ideal, resulting in practical LeJEPA
2. JEPA-T → JEPA-T: Joint-Embedding Predictive Architecture with Text Fusion for Image Generation (2510.00974)
A text-to-image model that tokenizes images and captions with a joint predictive Transformer, enhances fusion with cross-attention and text embeddings before training loss, and generates images by iteratively denoising visual tokens conditioned on text
3. Text-JEPA → Speaking in Words, Thinking in Logic: A Dual-Process Framework in QA Systems (2507.20491)
Converts natural language into first-order logic, with a Z3 solver handling reasoning, enabling efficient, explainable QA with far lower compute than large LLMs
4. N-JEPA (Noise-based JEPA) → Improving Joint Embedding Predictive Architecture with Diffusion Noise (2507.15216)
Connects self-supervised learning with diffusion-style noise by using noise-based masking and multi-level schedules, especially improving visual classification
5. SparseJEPA → SparseJEPA: Sparse Representation Learning of Joint Embedding Predictive Architectures (2504.16140)
Adds sparse representation learning to make embeddings more interpretable and efficient. It groups latent variables by shared semantic structure using a sparsity penalty while preserving accuracy
6. TS-JEPA (Time Series JEPA) → Joint Embeddings Go Temporal (2509.25449)
Adapts JEPA to time-series by learning latent self-supervised representations and predicting future latents for robustness to noise and confounders
Read further below ↓
It you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe
FP4 → https://arxiv.org/abs/2310.16836 (4-bit Transformer); https://arxiv.org/abs/2305.14314 (QLoRA)
Experimental format for ultra-compact inference. It's used in research and quantization-aware inference, including 4-Bit Floating-Point Quantized Transformers and 4-bit NormalFloat (NF4) in QLoRAINT8/INT4 → https://arxiv.org/abs/2004.09602
Integer low-precision formats that use 8 or 4 bits. Primary used in inference. The model's weights and activations are converted into integer values that can be processed efficiently on hardware optimized for integer arithmetic2-bit (ternary or binary quantization) → https://research.ibm.com/blog/low-precision-computing
Experimental ultra-low precision for computation in ultra-efficient AI accelerators. Uses values like {-1, 0, 1}. It turns multiplications into additions/subtractions - extremely cheap operations
Precision is very important in AI as it shapes how accurate and efficient models are. It controls how finely numbers are represented, approximating real-world values with formats like fixed-point and floating-point. A recent BF16 → FP16 study renewed attention to precision impact.
Here are the main precision types used in AI, from full precision for training to ultra-low precision for inference:
1. FP32 (Float32):
Standard full-precision float used in most training: 1 sign bit, 8 exponent bits, 23 mantissa bits. Default for backward-compatible training and baseline numerical stability
2. FP16 (Float16) → https://arxiv.org/abs/2305.10947v6
Half-precision float. It balances accuracy and efficiency. 1 sign bit, 5 exponent bits, 10 mantissa bits. Common on NVIDIA Tensor Cores and mixed-precision setups. There’s now a new wave of using it in reinforcement learning: https://www.turingpost.com/p/fp16
3. BF16 (BFloat16) → https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus
Same dynamic range as FP32 but fewer mantissa bits: 1 sign bit, 8 exponent bits (same as FP32), 7 mantissa bits. It was developed by the research group Google Brain as part of their AI/ML infrastructure work at Google. Preferred on TPUs and modern GPUs
4. FP8 (E4M3 / E5M2) → https://proceedings.neurips.cc/paper_files/paper/2018/file/335d3d1cd7ef05ec77714a215134914c-Paper.pdf
Emerging standard for training and inference on NVIDIA Hopper (H100) and Blackwell (B200) tensor cores and AMD MI300. Also supported in NVIDIA’s Transformer Engine: https://developer.nvidia.com/blog/floating-point-8-an-introduction-to-efficient-lower-precision-ai-training/
E4M3 = 4 exponent, 3 mantissa bits
E5M2 = 5 exponent, 2 mantissa bits
Read further below ⬇️
If you like this, also subscribe to the Turing post: https://www.turingpost.com/subscribe
Agentic Entropy-Balanced Policy Optimization (AEPO) → https://huggingface.co/papers/2510.14545
Keeps web agents from collapsing during training by balancing entropy in data collection and policy updates, and adjusting gradients on high-uncertainty stepsAgent- and Turn-wise Grouped Reinforcement Policy Optimization (AT-GRPO) → https://huggingface.co/papers/2510.11062
PO for multi-agent LLM systems. It groups training by agent roles and dialogue turns, allowing each agent to learn more effectively within its contextDirect Group Preference Optimization (DGPO) → https://huggingface.co/papers/2510.08425
RL method made for diffusion models. Learns directly from group-level preferences between samples, allowing it to use fast deterministic ODE samplers instead of noisy stochastic onesEntropy-regularized Policy Optimization (EPO) → https://huggingface.co/papers/2509.22576
Controls entropy and adapts it across training phases, encouraging exploration early on and steady convergence laterMultiplayer Nash Preference Optimization (MNPO) → https://huggingface.co/papers/2509.23102
Extends human feedback alignment to a multiplayer game setup. Each policy competes with a population of others, capturing more complex and realistic human preference patterns while keeping stable Nash equilibria
Policy optimization (PO) algorithms are central to training AI models with preference-based feedback. In recent weeks, numerous new PO methods have emerged that build on or replace the popular PPO and GRPO, solving their issues. Here are 11 of them:
1. BAlanced Policy Optimization (BAPO) → BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping (2510.18927)
Dynamically adjusting the clipping bounds in PPO-style updates to balance positive and negative gradients and prevent entropy collapse
2. Training-Free GRPO → Training-Free Group Relative Policy Optimization (2510.08191)
Instead of using numeric rewards, it compares rollouts semantically to distill useful knowledge as a token prior, which is then applied during inference to guide the model’s behavior
3. Asymmetric Importance Sampling Policy Optimization (ASPO) → ASPO: Asymmetric Importance Sampling Policy Optimization (2510.06062)
Fixes imbalanced token weighting in LLM training. It flips the importance sampling ratios for positive tokens to correct over- and under-updates, and adds a soft dual-clipping step to keep gradients stable
4. In-Context Steered Policy Optimization (ICPO) → https://arxiv.org/abs/2510.26519
Uses a model’s own in-context learning ability to guide training with existing data. It combines Mixed-Policy GRPO with Implicit Expert Forcing to expand exploration and adds Expert Region Reject Sampling and Annealed Expert-Bonus Reward Shaping to ensure stability and balanced expert influence
5. Graph-Enhanced Policy Optimization (GEPO) → https://arxiv.org/abs/2510.26270
Builds a graph of an agent’s experiences to understand how different states connect, guide exploration and assign rewards more effectively
6. Information Gain-based Policy Optimization (IGPO) → Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents (2510.14967)
Uses the model’s own belief updates to create dense, informative feedback for smoother multi-turn learning
Read further below ⬇️
If you like this, also subscribe to the Turing post: https://www.turingpost.com/subscribe
DeepCode → https://github.com/HKUDS/DeepCode
A platform where multiple AI agents work together to turn research papers or natural language descriptions into full, production-ready applicationsAutoGPT https://github.com/Significant-Gravitas/AutoGPT
A platform for building, deploying, and running continuous AI agents for complex workflows – available as a free self-hosted setup and a soon-to-launch cloud serviceKilo Code https://github.com/Kilo-Org/kilocode
AI coding agent for VS Code, powered by all top models like GPT-5 and Claude 4. It turns your editor into a self-checking, multi-mode AI coworker that streamlines development from planning to debuggingCodeGeeX → https://github.com/zai-org/CodeGeeX (the later update CodeGeeX4 https://github.com/zai-org/CodeGeeX4)
Streamlines global software development by enabling seamless cross-language coding, faster prototyping, and intelligent code assistance across multiple platforms and IDEs
Coding is the field where AI is welcomed with open arms. Here’s a collection to help you take your AI-assisted coding workflows to the next level of convenience and efficiency:
1. Smol Developer → https://github.com/smol-ai/developer
A lightweight AI “junior dev” that takes your product spec and automatically scaffolds or helps you build full codebases
2. Tabby → https://github.com/TabbyML/tabby
A self-hosted AI coding assistant that runs locally as an alternative to GitHub Copilot. Easy to integrate, GPU-friendly, and doesn’t rely on the cloud
3. Beads (bd) Issue Tracker → https://github.com/steveyegge/beads
Gives coding agents long-term memory, letting them organize, plan, and execute complex tasks reliably across sessions
4. MetaGPT → https://github.com/FoundationAgents/MetaGPT
A multi-agent framework that imitates a software company team using LLMs. It assigns AI agents roles like PM, Architect, and Developer to produce user stories, designs, specs, and final code
5. Open Interpreter → https://github.com/openinterpreter/open-interpreter
Gives you ChatGPT’s coding power with full local control – no limits, no sandbox – so you can automate, analyze, and create anything right from your desktop through a chat interface
6. OpenSpec → https://github.com/Fission-AI/OpenSpec
A lightweight, spec-driven development tool that helps humans and AI agree on what to build before any code is written
7. PR-Agent → https://github.com/qodo-ai/pr-agent
An AI code reviewer that automatically reviews, describes, and improves pull requests across GitHub, GitLab, and other platforms
8. BabyAGI → https://github.com/yoheinakajima/babyagi
A self-building AI framework that gives agents the ability to write, manage, and refine their own functions, turning them from passive tools into active, self-building systems
9 ...⬇️
Subscribe to the Turing Post: https://www.turingpost.com/subscribe – your shortcut to deep, clear AI analysis
If you want to understand the multifaceted AI landscape in 2025 and see where the field is heading – start with (or revisit) these legendary talks. They can help you capture what’s happening in AI from multiple angles:
1. Andrej Karpathy: Software Is Changing (Again) → https://www.youtube.com/watch?v=LCEmiRjPEtQ
Unveils Software 3.0 – a paradigm where LLMs are the new computers, programmed with prompts instead of code. The key: developers must now master coding, training, and prompting as AI becomes the heart of software building
2. Richard Sutton, The OaK Architecture: A Vision of SuperIntelligence from Experience → https://www.youtube.com/watch?v=gEbbGyNkR2U
Unveils the OaK (Options and Knowledge) architecture – a model-based RL framework for continual intelligence, where every component learns, meta-learns & builds hierarchical abstractions
3. GTC March 2025 Keynote with NVIDIA CEO Jensen Huang → https://www.youtube.com/watch?v=_waPvOwL9Z8
Dives into the accelerated computing and the importance of Physical AI. From the Blackwell GPU architecture & AI factories to breakthroughs in agentic AI & robotics, Jensen Huang explains how NVIDIA aims to power every layer of the AI ecosystem
4. Yann LeCun "Mathematical Obstacles on the Way to Human-Level AI" → https://www.youtube.com/watch?v=ETZfkkv6V7
Yann LeCun always argues we need a new path to machines that reason about the world – not LLMs or RL. So this lecture is about self-supervised systems with world models, planning, memory and energy-based learning
5. Andrew Ng: State of AI Agents → https://www.youtube.com/watch?v=4pYzYmSdSH4
Highlights one of the most pressing topics of 2025 – agents, explaining why most effective AI agents rely on simple, linear workflows built from modular “Lego-brick” tasks + what predicts AI startup success in the new agent era
Subscribe to the Turing Post: https://www.turingpost.com/subscribe –your shortcut to deep, clear AI analysis