Ensure accuracy when abstracting using Large Language Models (LLMs)
Publication Date:
Authors:
Monica Agrawal, Stefan Hegselmann, Hunter Lang, Yoon Kim, David Sontag
Journal or conference name:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Abstract:
A long-running goal of the clinical NLP community is the extraction of important variables trapped in clinical notes. Roadblocks have included dataset shift from the general domain and a lack of public clinical corpora and annotations. This work shows that large language models such as InstructGPT perform well at zero- and few-shot information extraction from clinical text despite not being trained specifically for the clinical domain. The paper covers span identification, token-level sequence classification, and relation extraction, and introduces new datasets for benchmarking few-shot clinical information extraction. On the clinical extraction tasks studied, GPT-3 systems significantly outperform existing zero- and few-shot baselines.

