Triple-R Dataset 2024 | Datasets | Research | Canadian Institute for Cybersecurity | UNB

Global Site Navigation (use tab and down arrow)

Canadian Institute for Cybersecurity

Triple-R: Automatic reasoning for fact verification

The Triple-R dataset enhances fact verification by leveraging external evidence and generating human-readable explanations. Built on the LIAR dataset, Triple-R uses a three-component system: Retriever, Ranker and Reasoner. The Retriever gathers evidence from the web, the Ranker scores and selects the most relevant paragraphs and the Reasoner utilizes GPT-3.5-Turbo to generate reasons for the claims.

The Triple-R dataset was constructed by applying the Triple-R methodology to the original LIAR dataset. For each claim, a set of top web-retrieved documents was processed, selecting paragraphs that provide relevant evidence. The reason component, powered by GPT-3.5-Turbo, was employed to generate explanations based on this evidence.

The Triple-R dataset can be used to train models for misinformation detection and explainable AI systems, making it ideal for applications requiring transparency in decision-making.

In summary, the contributions of this study are:

  • Our proposed causal language model can determine the truthfulness of a claim, enabling us to understand how the model makes decisions. This leads to greater transparency and interpretability in the process of fact verification.
  • We use a larger language model to supervise a smaller one, improving our framework's accuracy and effectiveness.
  • We present a hybrid zero-shot ranker that retrieves supporting information to justify the claim. The gathered evidence serves as an explanation that reinforces the generated reasoning.

Dataset files

  1. Train.json: Includes 10,047 samples with statements, labels, evidence, and generated reasons.
  2. Test.json: Contains 1,283 samples with statements, labels, and evidence.

Feature columns

Column Description
id A unique identifier for each sample.
statement The claim or statement to be verified.
label The truthfulness of the claim (true, false, etc.).
evidence Relevant information retrieved from the web.
reason Generated explanation based on the evidence (in train set only).

License

The Triple-R dataset is publicly available for researchers. If you are using our dataset, you must cite our related research paper that covers important details related to its usage and application.

Acknowledgements

The authors would like to thank the Canadian Institute for Cybersecurity for its financial and educational support.

Using the dataset

To learn more about why this dataset was created, watch this video, "Defending Democracy: Combatting Information Disorder by Sajjad Dadkhah."

Citation

Mohammadamin Kanaani, Sajjad Dadkhah, and Ali A. Ghorbani. 2024. Triple-R: Automatic Reasoning for Fact Verification Using Language Models. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 16831–16840, Torino, Italia. ELRA and ICCL.

Download the dataset