%0 Conference Paper
%A Şen, Mehmet Umut
%A Bilecen, Ali
%A Bilgin Taşdemir, Esma Fatıma
%A Yanıkoğlu, Berrin
%T Transcription of Ottoman Documents using Transformer Based Models  -  Osmanlica Dok manlarin D n st r c Tabanli Modeller ile Transkripsiyonu
%I IEEE
%M DZNE-2025-01114
%P 1 - 4
%D 2025
%< 2025 33rd Signal Processing and Communications Applications Conference (SIU) : [Proceedings] - IEEE, 2025. - ISBN 979-8-3315-6655-5 - doi:10.1109/SIU66497.2025.11112382
%X Although access to a large number of Ottoman documents has become easier today, the Arabic-Persian-based Ottoman script remains a barrier for interested users in utilizing these documents. To address this challenge, there is a need for automatic transcription systems. While some deep learning-based commercial and academic models exist for Ottoman transcription, no studies have yet explored models based on transformer architectures. This paper introduces an Ottoman transcription system developed using TrOCR, a transformer-based model. Instead of the commonly used two-step approach in the literature, a model was designed to perform both optical character recognition and transcription into Turkish in one step. Additionally, the decoder responsible for language modeling was initialized with a BERT-based model trained on Turkish data, achieving results comparable to the original model. During testing, this model produced outputs more quickly due to improved tokenization performance.
%B 33rd Signal Processing and Communications Applications Conference
%C 25 Jun 2025 - 28 Jun 2025, Sile (Istanbul)
Y2 25 Jun 2025 - 28 Jun 2025
M2 Sile, Istanbul
%F PUB:(DE-HGF)8 ; PUB:(DE-HGF)7
%9 Contribution to a conference proceedingsContribution to a book
%R 10.1109/SIU66497.2025.11112382
%U https://pub.dzne.de/record/281367