When teams talk about LLM hallucinations, they usually mean "the model said something false." That definition is accurate but too broad to be actionable. Effective hallucination detection requires understanding that there are fundamentally different types of hallucination, each with a distinct cause and a distinct detection strategy.
Treating all hallucinations as the same problem leads to evaluation gaps where you are measuring one type well while missing the others entirely. This guide breaks down the taxonomy and explains what each type means for your detection approach.
Type 1: Factual Confabulation
Factual confabulation is what most people mean when they say "hallucination." The model generates a plausible-sounding statement that is factually incorrect — inventing a statistic, misattributing a quote, citing a study that does not exist, or stating a historical fact incorrectly.
The mechanism behind factual confabulation is a gap between the model's training distribution and the factual reality it is trying to describe. Models learn statistical patterns in language, not grounded facts. When asked for specific factual claims that fall outside the dense region of their training distribution, they generate the most plausible-sounding continuation — which may bear little resemblance to reality.
Detection approach: Factual confabulation is best detected through retrieval-augmented verification, where outputs are checked against authoritative external sources. For closed-domain applications with a defined knowledge base, grounding verification — checking whether every factual claim in the output is supported by the provided context — is highly effective. For open-domain factual questions, retrieval against a knowledge graph or search index is necessary.
Key difficulty: You cannot verify that a claim is factually correct without a ground truth source to check against. For domains where authoritative sources are unavailable or contested, factual confabulation detection reduces to uncertainty quantification — identifying outputs where the model is likely operating outside its reliable knowledge.
Type 2: Grounding Hallucination (RAG-Specific)
Grounding hallucination occurs in retrieval-augmented generation systems where the model generates claims that are not supported by, or contradict, the retrieved context documents. Unlike factual confabulation (where there is no reference source), grounding hallucination is detectable with high precision because you have an explicit ground truth: the retrieved documents themselves.
Common forms include: claiming a document says something it does not say, attributing a statement to the wrong source document, combining information from two separate documents in a way that creates a false implication, and extrapolating beyond what the source material supports.
Detection approach: Natural language inference (NLI) classifiers trained to detect entailment can identify whether model outputs are entailed by, contradict, or are neutral relative to provided context. Sentence-level attribution — tracking which sentence in the output was grounded in which retrieved chunk — enables fine-grained detection. Confidence calibration signals, where the model indicates uncertainty about claims weakly supported by the context, also help.
Key difficulty: Models often blend information from multiple retrieved chunks in ways that individually look correct but together create a misleading picture. Detection systems need to evaluate the aggregate consistency of a response, not just individual sentence-level grounding.
Type 3: Citation Fabrication
Citation fabrication is a specific subset of factual confabulation that is particularly damaging in professional, academic, and legal contexts. The model invents references — paper titles, author names, DOIs, URLs, case law citations — that do not exist. These fabricated references have the surface appearance of legitimate citations, making them especially likely to mislead users who do not verify them.
High-profile incidents involving citation fabrication — including legal briefs citing invented case law — have created significant reputational and liability risk for teams that deploy LLMs in document-heavy workflows without proper validation.
Detection approach: For structured citation formats (academic references, legal citations), automated lookup verification against reference databases (CrossRef, PubMed, legal databases) can achieve near-complete detection. For URLs and web references, link validation with content verification is effective. The challenge is building comprehensive enough lookup coverage to catch fabrications across diverse domains.
Type 4: Numerical and Quantitative Inconsistency
Models frequently generate numerical errors that are not obviously wrong — claiming a percentage that is slightly off, getting the order of magnitude wrong on a large number, or performing an arithmetic operation incorrectly mid-reasoning. These errors are dangerous precisely because they are plausible: a claim that a treatment is "67% effective" is easy to accept uncritically even if the actual figure is 37%.
Detection approach: For structured documents with defined numerical fields, schema validation against authoritative data sources catches most quantitative errors. For free-form text, pattern extraction followed by range checking (flagging values that fall outside plausible bounds for the domain) provides a useful first-pass filter. For calculations, having the model re-derive the result through step-by-step reasoning and checking for internal consistency is more reliable than checking the final answer alone.
Type 5: Temporal Hallucination
Temporal hallucination occurs when models apply stale information as if it is current — describing the state of technology as it was during their training window, attributing current leadership roles to people who have since changed positions, or presenting outdated statistics as present-day facts. This type of hallucination is particularly insidious because the information was accurate at training time, making it harder to flag automatically.
Detection approach: Temporal markers in outputs (phrases like "currently," "as of today," "recently") combined with entity extraction and knowledge currency checks provide partial coverage. For applications where currency of information is critical, systematic retrieval augmentation with date-filtered sources is more reliable than relying on detection alone. Prompting strategies that encourage the model to express uncertainty about time-sensitive claims also help.
Type 6: Confidence Miscalibration
Confidence miscalibration is arguably the most dangerous hallucination type because it is about the relationship between certainty and correctness rather than the content of the output itself. A miscalibrated model expresses high confidence in incorrect outputs and, inversely, expresses uncertainty about outputs that are actually correct. Users trusting the model's expressed confidence as a reliability signal are misled.
Detection approach: Calibration evaluation requires comparing expressed confidence (via verbal hedges, explicit uncertainty markers, or log probabilities) against empirical accuracy on a held-out dataset. Well-calibrated models should be correct approximately X% of the time when they express X% confidence. Measuring and tracking calibration error over time — and across prompt types — helps identify where the model's confidence signals can and cannot be trusted.
Building a Multi-Layer Detection Stack
No single detection technique catches all hallucination types. An effective hallucination detection system combines multiple layers:
- Grounding verification for RAG and document-based applications
- Reference validation for any output that contains citations, URLs, or specific factual claims
- Semantic consistency checking to catch self-contradiction within a response
- Calibration monitoring to ensure the model's expressed confidence is meaningful
- Domain-specific validators for numerical, temporal, or schema-constrained content
The specific combination depends heavily on your application type. A customer service chatbot has very different hallucination risks than a medical information assistant or a legal document drafting tool. Designing your detection stack requires understanding which types of hallucination would cause the most harm in your specific deployment context.
Measurement Strategy
When building hallucination metrics for your evaluation framework, track each hallucination type separately rather than aggregating into a single score. A single "hallucination rate" conflates problems that require fundamentally different interventions — you cannot address grounding hallucination with the same approach you use for confidence miscalibration.
Define separate acceptance thresholds for each type based on the harm level if that hallucination reaches a user. Factual confabulation in a casual recommendation app is much lower stakes than citation fabrication in a legal brief. Your thresholds should reflect the actual risk profile of your application, not a generic standard.
Confident AI's hallucination detection covers all six types.
Our multi-layer detection pipeline identifies grounding failures, citation fabrication, numerical inconsistency, and confidence miscalibration as distinct signals in your evaluation results. See how it works →
Detect Every Hallucination Type
Confident AI's multi-layer detection identifies the hallucination types that matter for your specific application.