Logo image
Transparent Practices: OCR and AI in the Archives
Journal article   Open access   Peer reviewed

Transparent Practices: OCR and AI in the Archives

Rebecca Hastings and Andrew Weymouth
Collections (Walnut Creek, Calif.), Vol.22(2), pp.130-152
06/01/2026

Abstract

optical character recognition archival ethics digital preservation artificial intelligence accessibility digital stewardship sustainable digital practices Computer Vision
This paper examines optical character recognition (OCR) through the lens of archival ethics as outlined in the Society of American Archivists (SAA) Core Values Statement and Code of Ethics, given the current debates surrounding artificial intelligence (AI). A literature review highlights persistent challenges of authenticity and integrity, transparency and accountability, access and equity, and responsible stewardship and sustainability, as well as new concerns about bias, sustainability, and accountability using large language models (LLM). A case study describes systematic testing of LLM, transformer model (TM), and neural network (NN) architectures and examines the challenges in creating a reliable, scalable in-house OCR tool named Opticolumn. This case study finds that NN approaches better align with archival ethics than do LLM tools, which may generate fabrications, but that OCR tool choice will depend on the capacities and preferences of individual institutions.
pdf
hastings-weymouth-2026-transparent-practices-ocr-and-ai-in-the-archives2.17 MBDownloadView
Open Access
url
Article Landing PageView

Metrics

1 Record Views

Details

Logo image