self-refinement-llm

Guided self-refinement for grounded RAG at small scale

Two working papers on whether inference-time self-refinement can improve citation faithfulness in retrieval-augmented generation when the backbone model is small ($\leq 4$B parameters). Phase 1 studies a cheap lexical heuristic; the extended paper adds token-level mechanistic attribution and a cross-backbone replication on Llama-3.2-3B. Evaluated on ALCE-ASQA and GaRAGe with Qwen3-4B. Both papers are shareable pre-prints; the extended version is a work in progress.

Papers

Phase 1 — heuristic refinement

19 pages · self-contained

Compares six refinement conditions on Qwen3-4B over ALCE-ASQA (5 retrieved passages, mostly relevant) and GaRAGe (15 retrieved passages, mixed relevance). Headline findings: (1) Self-Refine degrades citation precision on clean retrieval by $9.37$ NLI points ($p<0.01$); (2) a twenty-line passage-overlap heuristic beats Self-Refine on both benchmarks and beats the no-refinement baseline on GaRAGe by $+7.09$ RFCP points; (3) STR-EM is blind to all of this. Includes a deployment-oriented decision tree and an AML worked example.

PDF ~404 KB TeX ~65 KB

Extended paper (v3) — attribution-guided refinement + cross-backbone

42 pages · supersedes Phase 1 · work in progress

Extends Phase 1 with three additions. (a) An attempt to replace the lexical heuristic with token-level DynamicLRP attribution, its failure on Qwen3-4B, and a pivot to plain input$\times$gradient. (b) A critique-prompt ablation identifying the "binary-trigger insight": attribution's value is in \emph{when} to refine, not \emph{what} to criticize. (c) A cross-backbone replication on Llama-3.2-3B using LXT v2.1's AttnLRP, which reproduces the Self-Refine advantage directionally but not the beats-baseline advantage (Qwen3-specific). Honest negative results documented throughout.

PDF ~681 KB TeX ~138 KB

Access & sharing. These drafts are shared with colleagues for feedback and are not yet intended for public distribution. Please do not circulate the URL or the files beyond the intended audience. Feedback is welcome.