The Triple-R dataset enhances fact verification by leveraging external evidence and generating human-readable explanations. Built on the LIAR dataset, Triple-R uses a three-component system: Retriever, Ranker and Reasoner. The Retriever gathers evidence from the web, the Ranker scores and selects the most relevant paragraphs and the Reasoner utilizes GPT-3.5-Turbo to generate reasons for the claims.
The Triple-R dataset was constructed by applying the Triple-R methodology to the original LIAR dataset. For each claim, a set of top web-retrieved documents was processed, selecting paragraphs that provide relevant evidence. The reason component, powered by GPT-3.5-Turbo, was employed to generate explanations based on this evidence.
The Triple-R dataset can be used to train models for misinformation detection and explainable AI systems, making it ideal for applications requiring transparency in decision-making.
In summary, the contributions of this study are:
Column | Description |
---|---|
id | A unique identifier for each sample. |
statement | The claim or statement to be verified. |
label | The truthfulness of the claim (true, false, etc.). |
evidence | Relevant information retrieved from the web. |
reason | Generated explanation based on the evidence (in train set only). |
The Triple-R dataset is publicly available for researchers. If you are using our dataset, you must cite our related research paper that covers important details related to its usage and application.
The authors would like to thank the Canadian Institute for Cybersecurity for its financial and educational support.
Mohammadamin Kanaani, Sajjad Dadkhah, and Ali A. Ghorbani. 2024. Triple-R: Automatic Reasoning for Fact Verification Using Language Models. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 16831–16840, Torino, Italia. ELRA and ICCL.