Abstract
Causal discovery from free text is a complex task in natural language processing (NLP). Extracting and understanding causal relationships from text is essential for many applications, including risk assessment, medical research, and automated decision-making. Causal relationship extraction and discovery enable systems to interpret the reasons behind events and predict potential outcomes, but achieving accuracy across diverse text sources remains challenging. Variations in language across domains, such as scientific, industrial, and medical texts, require models that can adapt to a wide range of causal expressions, both explicit and implicit. While deep learning models have found effective in capturing causal dependencies, traditional methods relying on token-level accuracy often miss contextually nuanced relationships crucial for effective cross-domain application.To address these challenges, this thesis presents a novel approach for cross-domain causal extraction by fine-tuning a T5 model with a custom loss function. The custom loss integrates token-level cross-entropy with sequence-level ROUGE-based loss, controlled by an empirically optimized alpha parameter set to 0.5, balancing precision and contextual relevance. Tested across three datasets—Licensee Event Reports (LER), Neuropathic Pain, and Tubingen—this approach demonstrates substantial improvements in causal extraction accuracy, achieving the highest F1 score (1.0) on the Neuropathic Pain dataset, while consistently outperforming baseline models across all datasets. These findings underscore the utility of a flexible loss function in adapting to diverse causal structures, offering a scalable framework for accurate causal inference in fields ranging from industrial event reporting to healthcare data analysis.