This dissertation investigates the cost-effectiveness of large language models (LLMs) in the task of fake news detection, within the scope of the interdisciplinary project “Artificial Intelligence for Early Detection of Fake News.” Initially designed to analyze real-time data from Twitter (now X), the project was redirected to use pre-existing datasets due to changes in the platform's access policy. The study evaluates various LLMs available on AWS across multiple fake news datasets, exploring their performance under different prompting strategies and fine-tuning techniques.
Three key research questions guide this work: (i) Are LLMs capable of reliably identifying fake news? (ii) How do their performances compare to traditional state-of-the-art methods? (iii) What is the trade-off between model size, accuracy, and operational cost? To support this evaluation, a novel metric called PoC-score (Performance over Cost) is proposed, quantifying the efficiency of each model by relating its F1-score to its cost per hour.
Results show that LLMs perform well on datasets with richer and more structured content, such as COVID-related news, but face limitations on more ambiguous or noisy datasets like LIAR or PolitiFact. While larger models tend to achieve slightly higher accuracy, their cost escalates disproportionately, making smaller models with fine-tuning strategies—such as Gemma-7B—more attractive in real-world deployments. Ultimately, the findings suggest that LLMs are promising tools for fake news detection, but their adoption should consider domain characteristics, computational constraints, and application goals.