Large Reasoning Models (LRM)
Publiée le October 19, 2025
Publiée le October 19, 2025
Large Reasoning Models (LRMs) are a new category of artificial intelligence systems that go far beyond the simple generation of fluent text. Whereas large language models (LLMs) such as GPT-4 or LLaMA focus on word or sentence prediction using training statistics, LRMs are designed for reasoning, multi-step inference cascades, chain-of-thought, tree-of-thought or reasoning graphs.
They combine LLM-type architectures with explicit inference modules, reflection mechanisms, sometimes heuristic search, and even reinforcement learning to better simulate human or quasi-human reasoning.
In concrete terms, an LRM can, for example :
for a complex math or logic problem, don’t answer directly, but generate a sequence of “thinking steps” before arriving at the answer,
explore several possible solutions, compare and verify them, and choose the most appropriate,
be specialized in structured reasoning tasks, such as medical diagnosis, programming, planning and simulation.
So, we can say: an LRM is a model trained or refined for reasoning, more than for simple text prediction.
The rise of MRLs addresses an important limitation of conventional LLMs: even the most powerful are often weak in tasks that require a real chain of reasoning – multi-step, verification, planning or abstract logic. They can generate fluent text, but don’t “think” like someone who questions, explores alternatives, checks their hypotheses.
LLMs seek to fill this gap: they aim to be more robust, more reliable in demanding contexts.
A few points of distinction:
Core function: LLM → text generation/fluent function. LRM → complex problem solving, reasoning.
Typical use cases: LLM → translation, summarization, conversation, generation. LRM → mathematics, logic, programming, diagnostics, decision-making.
Time & efficiency: MRLs are often slower and more computationally expensive, as they perform internal reflection steps.
Internal structure: MRLs incorporate “thinking steps”, sometimes explicitly, while LLMs remain more “black box”.
The operation of an LRM is based on several key elements:
Like an LLM, an LRM starts by encoding the input (text, possibly image or structure). Then :
it generates a ” chain of thought” in which several intermediate steps are formulated,
they can use research strategies (e.g. exploring several hypotheses, “tree of thought” or “graph of thought”),
it can incorporate a verification or revision loop, comparing different paths before choosing the final solution.
For a model to become an LRM, it’s not enough to train a standard LLM:
training data are used, containing not only answers but also traces of reasoning (intermediate steps).
we apply methods such as reinforcement learning with human feedback (RLHF ), but adapted to reasoning: we reward logical chains of thought and correct paths, and penalize errors.
We sometimes use hybrid architectures combining symbolic or heuristic learning + neural learning.
Studies show that MRLs enter different “performance regimes” depending on the complexity of the task:
for simple tasks, a conventional LLM can sometimes do as well or even better than an LRM, because the additional reasoning doesn’t add any value.
for moderately complex tasks, the advantage of MRLs is felt – their reasoning ability adds value.
for very complex tasks, LRMs can “fall apart”: their accuracy drops, they generate a lot of effort but no good results.
Here are the main benefits of this category of models:
When the problem requires multiple steps, hypotheses, deduction or induction, MRLs outperform conventional LLMs. They are better equipped for diagnostics, programming, logical reasoning or mathematical tasks.
Thanks to the generation of visible chains of thought, it becomes possible to track how the model arrived at an answer – which reinforces trust, auditability, and alignment (a critical need in sectors such as healthcare and finance).
In fields such as law, medicine and finance, where a “right” answer is essential and must be well-founded, the LRM approach is more appropriate. They enable decision-making processes to be modeled, hypotheses to be verified and choices to be justified.
MRLs represent a step towards systems that don’t just generate text, but can think – or at least simulate reasoning – which is a key element towards more general artificial intelligence.
Despite their power, MRLs still present significant obstacles:
Generating intermediate steps, exploring branches of reasoning, checking or revising, involves much more computation, memory and time than “simple” LLMs.
As mentioned, recent studies show that above a certain complexity threshold, even MRLs “give less”: they can reduce their reasoning effort, but their performance drops precipitously. This raises questions about the fundamental limits of automated reasoning.
Even if we get some good answers, there’s still a debate about whether AI really “reasons”? Or does it simply apply powerful heuristics? Some research shows that chains of thought can be superficial or contain logical errors.
Even with intermediate steps exposed, the underlying logic can remain opaque: why did the model choose such and such a branch? We don’t yet have the level of transparency we’d like for critical decisions.
The need for long chains of evidence and complex scenarios makes data acquisition costly. There is also the risk of bias or uncovered areas.
| Criteria | LLM (Large Language Model) | LRM (Large Reasoning Model) |
|---|---|---|
| Main objective | Fluid text generation | Multi-step structured reasoning |
| Response times | Fast, optimized | Slower, more computation |
| Best domain | Text generation tasks, simple | Complex, logical, diagnostic tasks |
| Explanatory | Limited to output | No visible or accessible thought |
| Latency & cost | Relatively low | Relatively high |
| Efficient for simple tasks | Yes | Not optimized for very simple tasks |
| Efficient for highly complex tasks | Limited | Better, but peaks above a certain threshold |
Organizations dealing with decision, logic, verification or compliance problems are well advised to turn to MRLs: they offer a qualitative leap over conventional LLMs.
This means greater reliability, better traceability, and stronger alignment with sensitive uses where error is costly.
MRLs are a field of intense research: how to simulate human reasoning, how to structure chains of thought, how to demonstrate true robustness? All this is helping to push forward the frontier of AI.
The question of “AGI” (general artificial intelligence) undoubtedly requires a greater capacity for reasoning – and MRLs are an important milestone.
Mastering LRMs, their architectures, data and uses, is becoming a strategic asset for public and private technology players. Those able to build, adapt or control such models have a competitive advantage.
A few major trends emerge for LRM:
Efforts to improve efficiency: reduce latency and token consumption, manage the chain of thought more intelligently so as not to generate unnecessary text.
Hybrid models: combine LRM approaches with agents, knowledge bases and symbolic systems to boost robustness.
Adapting to real-time uses: moving away from research towards industrial applications where time and cost count.
Multimodal extension: reasoning not only on text, but also on images, video, audio, structured data, with multi-modal thought chains.
Governance, ethics, reliability: guaranteeing that MRL decisions are transparent, audited and secure is of paramount importance, especially in sensitive areas.
Large Reasoning Models represent a significant step forward in AI innovation: they no longer seek merely to form or generate text, but to think, reason and analyze. For tasks of medium to high complexity, this is a clear advantage over pure generation models.
However, these models are not yet perfect: latency, cost, collapse at high complexity, limited explicability remain challenges.
For any company, researcher or decision-maker interested in AI systems for critical use – reasoning, decision, diagnosis – LRMs are today an avenue to follow closely.