Methodological rigour

How QualIntel OS Implements the Trustworthy LLM-QDA Framework (arXiv:2501.00775) — In Full

Published 12 June 2026

In January 2025, Jie Gao and colleagues (Johns Hopkins University and collaborators) published “Efficiency with Rigor! A Trustworthy LLM-powered Workflow for Qualitative Data Analysis” (arXiv:2501.00775). It has become the clearest articulation of a position the field was converging on anyway: large language models can make qualitative analysis dramatically more efficient without sacrificing rigour — but only if the workflow meets specific, checkable conditions.

QualIntel OS was built independently of this paper, yet satisfies its framework in full — and the convergence is the point. When a working product and an academic framework arrive at the same requirements by different routes, those requirements look less like one team's preference and more like the real conditions for trustworthy AI-assisted analysis. This post maps each requirement of the paper to the corresponding mechanism in the product — not as marketing gloss, but as the axis-by-axis account a supervisor, examiner, or methods-chapter author needs when deciding whether an AI-assisted analysis can be trusted.

The framework's four requirements

Stripped to its core, the paper requires four things of any trustworthy LLM-assisted qualitative workflow: the researcher keeps interpretive authority (the AI is a supervised assistant, never an autonomous analyst); every AI contribution is documented in a way others can trace and scrutinise; the assistance operates inside the researcher's stated methodology rather than as generic text processing; and the prompting approach itself is disclosed, so the analysis can be assessed and reproduced.

Most AI research tools satisfy none of these cleanly. Auto-coding tools fail the first requirement by design: the AI codes, the human checks — which inverts the supervision relationship the paper requires. General chatbots fail the second and third: no audit trail, no methodology. And almost everything fails the fourth, because prompting is treated as invisible plumbing rather than method.

Axis by axis: framework requirement → product mechanism

Human primacy

The paper requires: The AI is a supervised assistant, never an autonomous analyst. Interpretive authority stays with the researcher.

In QualIntel: Nothing is coded without explicit researcher confirmation. The AI surfaces candidate evidence segments; the researcher accepts or rejects each one individually. There is no bulk-accept, and no analytic conclusion can rest on an unreviewed output.

Auditability

The paper requires: Documentation must let others trace and scrutinise analytic decisions.

In QualIntel: Every AI suggestion, every accept/reject decision, and every revision is timestamped and researcher-attributed in a complete audit trail. Each LLM call is logged with its model and prompt-template version. The trail exports with the submission package.

Methodological grounding

The paper requires: LLM assistance must operate within the researcher's stated methodology, not generic text processing.

In QualIntel: Seven methodology modes — Reflexive TA, IPA, Grounded Theory, Gioia, Codebook TA, Content Analysis, Template Analysis — each with deep per-methodology prompting and the matching reporting standard (RTARG, COREQ, SRQR, Smith et al., Charmaz). The methodology is auto-detected from the research design and confirmed by the researcher before analysis begins.

Prompting disclosure

The paper requires: The prompting approach and reasoning process should be disclosed so an examiner can assess and reproduce the analysis.

In QualIntel: Every project generates a non-editable AI Method Statement: model and provider, task framing, retrieval method, methodology conditioning, the human-confirmation guarantee, and version-pinned prompt-template provenance. It appears in the audit panel and in the exported AI disclosure statement.

The fourth axis is the one nobody else ships

Human confirmation, audit trails, and methodology-awareness can each be found, partially, somewhere in the qualitative software landscape. Prompting disclosure cannot. Yet the paper is explicit that reproducibility depends on it: two researchers using “the same tool” can obtain very different behaviour from different prompting, so an analysis cannot be assessed — let alone reproduced — without knowing how the model was instructed.

QualIntel resolves this with an automatically generated, non-editable AI Method Statement, built from run metadata and appended to every exported AI disclosure statement. A real example:

“Model and provider: claude-sonnet-4-6 (Anthropic), accessed via API. Task framing: the model was prompted to surface candidate text segments semantically related to each researcher-defined code, and to draft starting-point prose from researcher-accepted evidence; it was not asked to assign codes, generate themes, or interpret data. Retrieval method: candidate segments were retrieved via semantic similarity over vector embeddings of the researcher-uploaded data, then assessed for candidate-code fit before being presented for researcher review. Methodology awareness: all prompting was conditioned on Reflexive Thematic Analysis. Outputs were structured against the RTARG reporting standard. Human confirmation: every surfaced candidate was individually accepted or rejected by the researcher; no analytic conclusion rests on an unreviewed AI output. The prompting approach is version-pinned for reproducibility: scaffold_section v5; suggest_codings v3 (runs logged 2026-06-01 to 2026-06-10).”

Note what the statement does and does not contain. It pins the prompting approach to named, versioned templates with dates — enough for an examiner to assess the AI's bounded role and for the approach to be reproduced. It does not expose raw prompt text, which would add little methodological value while creating its own integrity problems. The disclosure is generated from logged metadata, and the researcher cannot edit it — the same property that makes the rest of QualIntel's disclosure trustworthy.

Why “the AI codes, you check” fails the framework

The most common architecture in AI-assisted qualitative tools is auto-coding with human review: the model assigns codes across the dataset and the researcher edits the result. This is efficient, and for some applied contexts it may be defensible. But measured against the framework, it fails the first axis at the design level. When the AI codes first, the human is positioned as a checker of machine output — and checking is not the same act as interpreting. The volume of machine-generated codings also makes genuinely individual review implausible, which then erodes the audit trail's meaning: a log of bulk-approvals documents compliance, not interpretation.

The supervised-assistant architecture inverts this. The researcher defines the codebook; the AI retrieves and proposes; the researcher decides, one piece of evidence at a time. Slower per decision, but every entry in the audit trail records a real interpretive act — which is precisely what Lincoln and Guba's dependability and confirmability criteria, and the examiners applying them, are looking for.

Using this in your methods chapter

If you are writing up an AI-assisted analysis, the framework gives you a defensible structure: name the tool and model, state the AI's bounded role, evidence your interpretive authority with the audit trail, and disclose the prompting approach. Our companion guide — How to Disclose AI Use in Your Qualitative Methods Chapter — walks through the disclosure paragraph itself, with a worked example you can adapt.

If you use QualIntel, the artefacts are generated for you: the audit trail, the AI disclosure statement, and the AI Method Statement export together in the submission package, alongside your evidence pack and codebook.

Frequently asked questions

What is the trustworthy LLM-QDA framework?

It is the framework set out in Gao et al., 'Efficiency with Rigor! A Trustworthy LLM-powered Workflow for Qualitative Data Analysis' (arXiv:2501.00775). It argues that large language models can be used in qualitative data analysis without sacrificing rigour, provided the AI acts as a transparent, supervised assistant: the researcher retains interpretive authority, every AI contribution is auditable, the workflow is grounded in a named methodology, and the prompting approach is disclosed so others can assess and reproduce the analysis.

Does any qualitative analysis software implement the trustworthy LLM-QDA framework?

QualIntel OS implements all four requirements: researcher confirmation of every AI-surfaced candidate (human primacy), a timestamped researcher-attributed audit trail (auditability), seven methodology modes with per-methodology prompting and reporting standards (methodological grounding), and an automatically generated AI Method Statement covering model, task framing, retrieval method, and version-pinned prompt provenance (prompting disclosure).

What is an AI Method Statement?

A methods-grade description of how AI was used in an analysis, written for examiners: which model and provider, what the model was asked to do (and not do), how candidate evidence was retrieved, which methodology conditioned the prompting, the guarantee that a human confirmed every output, and the prompt-template versions used — so the approach is reproducible without exposing raw prompts.

Why does prompting disclosure matter in qualitative research?

Because reproducibility requires knowing how the AI was instructed. Two researchers using 'the same tool' can get very different behaviour from different prompting. Disclosing the prompting approach — at minimum its task framing and a pinned version — lets an examiner assess whether the AI's role was appropriately bounded, and lets another researcher reproduce the setup.

Reference: Gao, J. et al., “Efficiency with Rigor! A Trustworthy LLM-powered Workflow for Qualitative Data Analysis”, arXiv:2501.00775 (2025). See also Lincoln & Guba (1985) on trustworthiness criteria; Braun & Clarke on reflexive thematic analysis and RTARG; COREQ and SRQR reporting standards.