Transcription of Ottoman Documents using Transformer Based Models | Osmanlica Dok manlarin D n st r c Tabanli Modeller ile Transkripsiyonu

Şen, Mehmet Umut; Bilecen, Ali; Bilgin Taşdemir, Esma Fatıma; Yanıkoğlu, Berrin

doi:10.1109/SIU66497.2025.11112382

TY  - CONF
AU  - Şen, Mehmet Umut
AU  - Bilecen, Ali
AU  - Bilgin Taşdemir, Esma Fatıma
AU  - Yanıkoğlu, Berrin
TI  - Transcription of Ottoman Documents using Transformer Based Models  -  Osmanlica Dok manlarin D n st r c Tabanli Modeller ile Transkripsiyonu
PB  - IEEE
M1  - DZNE-2025-01114
SP  - 1 - 4
PY  - 2025
AB  - Although access to a large number of Ottoman documents has become easier today, the Arabic-Persian-based Ottoman script remains a barrier for interested users in utilizing these documents. To address this challenge, there is a need for automatic transcription systems. While some deep learning-based commercial and academic models exist for Ottoman transcription, no studies have yet explored models based on transformer architectures. This paper introduces an Ottoman transcription system developed using TrOCR, a transformer-based model. Instead of the commonly used two-step approach in the literature, a model was designed to perform both optical character recognition and transcription into Turkish in one step. Additionally, the decoder responsible for language modeling was initialized with a BERT-based model trained on Turkish data, achieving results comparable to the original model. During testing, this model produced outputs more quickly due to improved tokenization performance.
T2  - 33rd Signal Processing and Communications Applications Conference
CY  - 25 Jun 2025 - 28 Jun 2025, Sile (Istanbul)
Y2  - 25 Jun 2025 - 28 Jun 2025
M2  - Sile, Istanbul
LB  - PUB:(DE-HGF)8 ; PUB:(DE-HGF)7
DO  - DOI:10.1109/SIU66497.2025.11112382
UR  - https://pub.dzne.de/record/281367
ER  -

guest :: login DZNEPUB
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help