TY - CONF
AU - Şen, Mehmet Umut
AU - Bilecen, Ali
AU - Bilgin Taşdemir, Esma Fatıma
AU - Yanıkoğlu, Berrin
TI - Transcription of Ottoman Documents using Transformer Based Models - Osmanlica Dok manlarin D n st r c Tabanli Modeller ile Transkripsiyonu
PB - IEEE
M1 - DZNE-2025-01114
SP - 1 - 4
PY - 2025
AB - Although access to a large number of Ottoman documents has become easier today, the Arabic-Persian-based Ottoman script remains a barrier for interested users in utilizing these documents. To address this challenge, there is a need for automatic transcription systems. While some deep learning-based commercial and academic models exist for Ottoman transcription, no studies have yet explored models based on transformer architectures. This paper introduces an Ottoman transcription system developed using TrOCR, a transformer-based model. Instead of the commonly used two-step approach in the literature, a model was designed to perform both optical character recognition and transcription into Turkish in one step. Additionally, the decoder responsible for language modeling was initialized with a BERT-based model trained on Turkish data, achieving results comparable to the original model. During testing, this model produced outputs more quickly due to improved tokenization performance.
T2 - 33rd Signal Processing and Communications Applications Conference
CY - 25 Jun 2025 - 28 Jun 2025, Sile (Istanbul)
Y2 - 25 Jun 2025 - 28 Jun 2025
M2 - Sile, Istanbul
LB - PUB:(DE-HGF)8 ; PUB:(DE-HGF)7
DO - DOI:10.1109/SIU66497.2025.11112382
UR - https://pub.dzne.de/record/281367
ER -