Home > Publications Database > Transcription of Ottoman Documents using Transformer Based Models | Osmanlica Dok manlarin D n st r c Tabanli Modeller ile Transkripsiyonu |
Contribution to a conference proceedings/Contribution to a book | DZNE-2025-01114 |
; ; ;
2025
IEEE
This record in other databases:
Please use a persistent id in citations: doi:10.1109/SIU66497.2025.11112382
Abstract: Although access to a large number of Ottoman documents has become easier today, the Arabic-Persian-based Ottoman script remains a barrier for interested users in utilizing these documents. To address this challenge, there is a need for automatic transcription systems. While some deep learning-based commercial and academic models exist for Ottoman transcription, no studies have yet explored models based on transformer architectures. This paper introduces an Ottoman transcription system developed using TrOCR, a transformer-based model. Instead of the commonly used two-step approach in the literature, a model was designed to perform both optical character recognition and transcription into Turkish in one step. Additionally, the decoder responsible for language modeling was initialized with a BERT-based model trained on Turkish data, achieving results comparable to the original model. During testing, this model produced outputs more quickly due to improved tokenization performance.
![]() |
The record appears in these collections: |