Towards Verifiable Text Generation with Symbolic References

Publication Date:

11/15/23

Authors:

Lucas Torroba Hennigen, Shannon Zejiang Shen, Aniruddha Nrusimha, Bernhard Gapp, David Sontag, Yoon Kim

Journal or conference name:

arXiv preprint

Abstract:

LLMs are vulnerable to hallucinations, and their outputs generally require laborious human verification for high-stakes applications. This paper proposes symbolically grounded generation (SymGen), a simple approach for enabling easier manual validation of LLM output. SymGen prompts an LLM to interleave its regular output with explicit symbolic references to fields present in the conditioning data, such as a table in JSON format. These references display the provenance of different spans of generated text, reducing the effort required for verification. Across data-to-text and question-answering experiments, LLMs are able to output text with accurate symbolic references while maintaining fluency and factuality. A human study further finds that SymGen annotations reduce average verification time by 20%.

Publication Link