Artificial intelligence

Full definition of Generative AI

Publiée le September 24, 2025

Full definition of generative AI

1) Academic definition (rigorous)

Generative AI covers machine learning methods that aim to model the distribution of data. $t e x t e, ima g e, a u d i o, v i d e ˊ o, co d e, e t c .$ to generate plausible new samples. Formally, a generative model learns $pθ(x)p_\theta(x)$ (or $pθ(x∣c)p_\theta(x \mid c)$ conditioned by context $c$ ) based on a training corpus, then sample new bodies $x^\*$ that respect the statistical regularities we’ve learned.

In other words, instead of just predicting or classifying, we create content in line with the style and structures of the original data.

2) Intuition and principles

Objective: to learn the “shape” of the data (its distribution) so as to be able to produce new, credible variants.
Two key questions:
1. How to learn $p (x)$ ? (optimization, loss function, architecture)
2. How to sample? (generation procedure, style and diversity control)
Model types:
- With explicit likelihood (a bound or the log-likelihood is maximized)
- Adversarial implications (we don’t calculate $p (x)$ a discriminator is “fooled”)
- Noise-based (progressive denoising is learned for sampling)

3) Main families of generative models

Autoregressive transformers (LLM)
- Principle: predict the next token $p(xt∣x<t)p(x_t \mid x_{<t})$ .
- Strengths: text, code, recent multimodal; very good at “prompt” composition and packaging.
- Control: temperature, top-k, top-p (nucleus), format constraints (JSON), external tools (RAG, functions).
GANs (Generative Adversarial Networks)
- Principle: a generator produces examples, a discriminator tries to distinguish them from the real thing; zero-sum game.
- Strengths: high-fidelity images, style and detail;
- Limitations: drive instabilities, collapse mode, sometimes fragile metrics.
VAEs (Variational Autoencoders)
- Principle: encode $x$ in one latent space $z$ rebuild $x$ since $z$ with probabilistic regularization.
- Strengths: interpretable latents, smooth interpolation, conditional generation.
Diffusion / Score-based models
- Principle: learn to remove added noise in several stages; at inference, we debudge to sample.
- Strengths: excellent image quality, video/3D progress; fine control via classifier-free guidance, ControlNet.
Normalizing Flows
- Principle: transform a simple distribution into a complex one via bijective transformations; exact log-density.
- Strengths: calculable likelihood;
- Limits: architectural constraints to remain reversible.
Energy-based models (EBM)
- Principle: define an energy function whose minimum corresponds to the probable data; MCMC sampling.
- Forces: general theoretical framework;
- Limitations: sampling can be costly.

4) Data, training and alignment

Data preparation: cleansing, deduplication, quality filtering, domain balancing, copyright management, PII.
Training objectives:
- Next-token prediction (LLM)
- Denoising (distribution, VAEs)
- Adversarial loss (GAN)
Optimization: Adam/AdamW, LR design, gradient clipping, mixed precision; scaling laws (quality ∝ model size × data × compute).
Fine-tuning & specialization:
- SFT (Supervised Fine-Tuning) on high-quality demonstrations.
- PEFT (LoRA/QLoRA, adapters) to reduce memory/compute costs.
- RAG (Retrieval-Augmented Generation) to anchor answers to verifiable sources.
- Human preferences: RLHF / RLAIF or DPO/IPO/ORPO alternatives to control style, security and usefulness.

5) Inference, control and constraints

Sampling (text): temperature (diversity), top-k (candidate vocabulary size), top-p (probability mass).
Output constraints: guided decoding, grammars/JSON Schema, beam search (when deterministic coherence is preferred).
Image/video control (broadcast): classifier-free guidance, ControlNet, image-to-image, inpainting, IP-Adapter.
Tools & agents: function-calling, external tools (search, code), toolformer-like; planning and execution in loops(agentic).
Performance: quantization (8-bit, 4-bit), KV cache, MoE (mixture-of-experts), batching, distillation, speculative decoding.

6) Assessment (quality, reliability, safety)

Text: perplexity (fluency proxy), RED/BLUE (summaries/trades), BERTScore/COMET, human evals (accuracy, usefulness).
Image: FID, KID, IS, CLIPScore; human perceptual assessments.
Factuality & safety: hallucination rate, accuracy on open-book, robustness to prompt injection, toxicity/bias.
Format compliance: JSON/SQL validity, strict schemas, accuracy on constraints (units, value ranges).

7) Major use cases

Content & productivity: assisted copywriting, summarization, translation, visual asset generation, storyboarding.
Code & data: programming assistance, test generation, migration/modernization, data synthesis for balancing training sets.
Operations & support: internal assistants, RAG on document bases, dynamic SOPs, compliant chatbots.
Design, R&D, industry: CAD assistance, visual prototyping, simulation (synthetic data), anomaly detection.
Health/Science: molecular design, imaging, scientific literacy (with strong safeguards).
Finance/Insurance: report generation, structured document extraction, what-if (with dedicated templates and strict control).

8) Limits and risks

Hallucinations (LLM): fluid responses but false if not anchored (RAG) or poorly constrained.
Bias & representativeness: historical data → reproduced bias; need for debiasing and targeted assessments.
Security: prompt injection, data exfiltration, jailbreaks; need for continuous red teaming.
Intellectual property & rights: data provenance, copyright, licensing; logo/visage management.
Sensitive data & confidentiality: PII, secrets; differential privacy, synthetic data with care.
Costs & footprint: compute/energy; trade-offs between model size and business value.

9) Governance, compliance and best practices

Responsible lifecycle: model cards, data sheets, prompt logging, version tracking.
Pre-release checks: off-domain evaluation, attack tests (security), guardrails, rate limiting.
Documentary anchoring (RAG ): citations/justifications, source grounding, update management.
Media authenticity: watermarks, C2PA (provenance), content authenticity.
Regulatory frameworks:trusted AI principles (fairness, explicability, robustness) and increasing requirements (e.g. obligations by risk level).

10) Structuring trends

Native multimodal (text-image-audio-video-sensors) and reasoning tools (code, search, business tools).
Specialist vs. foundation models: combining a generalist LLM + light experts via routing/MoE.
Efficiency: small, high-performance language models for targeted domains, quantified and adapted on-prem/edge.
Constrained generation: structured output (JSON/SQL), direct integration into workflows and databases.
Next-generation security: prompt attack detection, contextual content moderation, policy engines.

11) Glossary of this article

Autoregressive: generates one token at a time, conditioned on history.
Temperature: controls diversity (high = more creative, low = more conservative).
Top-k / Top-p: restrict candidate space to stabilize style.
LoRA/PEFT: refining a large model with few trainable parameters.
RAG: retrieve relevant documents and anchor their generation.
Diffusion: progressive denoising of a noisy signal.
FID: generated image quality metric (statistical proximity to reality).