Abstract
The exponential growth of open data and academic publications has created a fundamentalchallenge for researchers and data consumers: identifying not merely relevant resources, but
trustworthy ones. Traditional recommender systems rely primarily on content-based similar-
ity or collaborative filtering, failing to model the multi-dimensional trust that practitioners
naturally employ when evaluating information sources. This dissertation addresses this gap
by developing and evaluating a trust-aware recommender framework for open data that inte-
grates natural language processing, machine learning, and large language model (LLM)-based
reasoning.
The research unfolds in two complementary implementations. The first introduces a three-
layer trustworthy article ranking system that combines BERT-based semantic embeddings for
relevance matching, a Random Forest classifier trained on retraction signals (achieving 90%
accuracy and 97% recall for retracted articles), and a custom multi-factor scoring function that
balances relevance with trust indicators derived from citation count, Altmetric score, and jour-
nal impact factor. Evaluated on 16,052 articles spanning five research domains, citation count
emerges as the dominant trust signal (53.26% feature importance), and the system outperforms
conventional ranking approaches in surfacing both relevant and reliable literature.
The second implementation extends the trust model into the social dimension by building
a trust-based recommender on the Model Context Protocol (MCP). This system orchestrates
three data sources—MongoDB for social interaction storage, CrossRef for bibliographic meta-
data, and Semantic Scholar for citation metrics—and delegates trust and field-similarity judg-
ments to GPT-4. A critical finding emerges from two-phase validation: LLMs without explicit
scoring rubrics produce near-random trust judgments (Pearson r = −0.087), while structured
prompting with rubric guidelines achieves 85.7% accuracy in trust assessment. A dual-threshold
recommendation filter and a weighted combination score (60% trust, 40% field similarity) yield
reliable personalized recommendations across synthetic academic collaboration networks of 50
researchers in six fields.
Together, these contributions establish a coherent, extensible framework for trust-aware
open data recommendation, demonstrating that citation-based trust signals and socially-derived
trust are complementary dimensions that, when properly engineered, can substantially improve
the quality and reliability of information discovery systems.