A Trust-Aware Recommender System for Open Data: Integrating NLP and LLM-Powered Social Trust

Chenhao Li

The exponential growth of open data and academic publications has created a fundamentalchallenge for researchers and data consumers: identifying not merely relevant resources, but trustworthy ones. Traditional recommender systems rely primarily on content-based similar- ity or collaborative filtering, failing to model the multi-dimensional trust that practitioners naturally employ when evaluating information sources. This dissertation addresses this gap by developing and evaluating a trust-aware recommender framework for open data that inte- grates natural language processing, machine learning, and large language model (LLM)-based reasoning. The research unfolds in two complementary implementations. The first introduces a three- layer trustworthy article ranking system that combines BERT-based semantic embeddings for relevance matching, a Random Forest classifier trained on retraction signals (achieving 90% accuracy and 97% recall for retracted articles), and a custom multi-factor scoring function that balances relevance with trust indicators derived from citation count, Altmetric score, and jour- nal impact factor. Evaluated on 16,052 articles spanning five research domains, citation count emerges as the dominant trust signal (53.26% feature importance), and the system outperforms conventional ranking approaches in surfacing both relevant and reliable literature. The second implementation extends the trust model into the social dimension by building a trust-based recommender on the Model Context Protocol (MCP). This system orchestrates three data sources—MongoDB for social interaction storage, CrossRef for bibliographic meta- data, and Semantic Scholar for citation metrics—and delegates trust and field-similarity judg- ments to GPT-4. A critical finding emerges from two-phase validation: LLMs without explicit scoring rubrics produce near-random trust judgments (Pearson r = −0.087), while structured prompting with rubric guidelines achieves 85.7% accuracy in trust assessment. A dual-threshold recommendation filter and a weighted combination score (60% trust, 40% field similarity) yield reliable personalized recommendations across synthetic academic collaboration networks of 50 researchers in six fields. Together, these contributions establish a coherent, extensible framework for trust-aware open data recommendation, demonstrating that citation-based trust signals and socially-derived trust are complementary dimensions that, when properly engineered, can substantially improve the quality and reliability of information discovery systems.

A Trust-Aware Recommender System for Open Data: Integrating NLP and LLM-Powered Social Trust

Abstract

Files and links (1)

Metrics

Details