**4. Summary and future directions**

modern and archaic German. Pilz et al. considered spelling variations of English and German historical texts [23]. In general, the main challenge for historical European languages like

Furthermore, Kimura and Maeda proposed a retrieval method that considers not only language differences over time but also cultural and time differences in modern and archaic Japanese [24]. Tripathi developed a retrieval system that considers the differences in various scripts and writing systems of Brahmic (Indic) and proposed a method to retrieve Sanskrit documents written in Sanskrit script or Brahmic families' scripts, using scripts such as Devanagari, Kannada, Telugu and Bengali [25]. To cope with cross-chronological and cross-script Mongolian documents, Khaltarkhuu and Maeda proposed a retrieval technique that is capable of searching

We improved Khaltarkhuu and Maeda's grammatical-rule-based approach [26–28] and proposed an 'ancient-to-modern information retrieval' method [7, 29] by adding a dictionarybased query translation technique in order to consider cross-chronological differences in the writing systems of the ancient and modern Mongolian languages for accessing cross-chronological and cross-script ancient Mongolian documents by using a query in modern Mongolian in Cyrillic. To boost the quality of the translation, the 'ancient-to-modern information retrieval' approach [7, 29] matches query terms to words in a dictionary. If no exact match is found, the grammatical-rule-based approach [26–28] is used. In other words, the grammatical-rulebased query translation approach is used for inflected words, words with ancient spellings or grammar or the words missing from the dictionary. For the word sense disambiguation, in case if there are words which have multiple candidates, we choose the most frequent words.

We have already integrated the 'ancient-to-modern information retrieval' method in the TMSDL, and it can be easily applied to our digital edition for accessing ancient Mongolian

We have been demonstrating a facility for cross-language searching between English and Japanese for enabling English-speaking users to search Ukiyo-e databases available in Japanese by using English queries [30–32]. Such a feature is very useful for users, since the Ukiyo-e databases in Japanese institutions are mostly available in Japanese, so that users who do not understand Japanese may not find the desired information. Ukiyo-e, a Japanese traditional woodblock printing, is known worldwide as one of the fine arts of the Edo period (1603–1868). The texts of Ukiyo-e databases contain archaic Japanese words which reflect the Japanese language of the

Like the 'ancient-to-modern information retrieval', a dictionary-based query translation approach is adopted by utilizing a domain-specific dictionary, which contains the terms related to Japanese arts and cultures. The proposed feature works well with a variety of keywords (i.e., no full sentences) that may include the personal names, specific terms such as 'Geisha', traditional Japanese female entertainers; 'Fuji', Mount Fuji, the highest mountain in

traditional Mongolian script documents using modern Mongolian query [26–28].

In our approach, we merge spelling variants of ancient Mongolian words.

historical collections written in traditional Mongolian script.

*3.3.2. Applying the proposed approach to other languages*

Edo period.

Dutch, English and German is the spelling variants.

156 Multilingualism and Bilingualism

In this chapter, we have described our research to achieve cross-lingual and cross-chronological information access to ancient Mongolian historical materials. More specifically, we have introduced methods for providing information access that cuts across different historical periods and dialects.

We introduced an information extraction method for digitized ancient Mongolian historical manuscripts of the 13–16th century in Sections 3. The proposed information extraction method for ancient Mongolian historical documents performs computerized massive analysis on Mongolian historical documents. It can reduce traditional labour-intensive manual analysis on Mongolian historical text significantly. Named entities such as historical figures and places of ancient Mongolia that are difficult for manual examination are recognized from historical manuscripts.

The extracted results are utilized for building a digital edition of an ancient Mongolian historical document and made available through a web-based system.1 We also believe the TEIencoded digital edition that reflects the ancient Mongolian manuscripts would help scholars conducting research in the ancient history for digging hidden knowledge of the Middle Ages of Mongolia in ancient Mongolian historical documents that is not available in modern-language documents. Furthermore, explicitly encoded digital text enables users to search and browse ancient Mongolian manuscript using the named entities' visualization, i.e., it allows not only retrieving information but also analysing and visualizing the contents of the information. We also hope digital editions along with the scanned images would recreate the experience of encountering the original manuscripts. Its information visualization feature of ancient Mongolian texts and a TMSDL's feature that can retrieve ancient manuscripts written in traditional Mongolian script using a query in modern Mongolian (Cyrillic) would help researchers who are interested in using digital representations of ancient historical manuscripts as scholarly tools by using a modern language. Such a feature is very useful, since the needs of humanities researchers are diverse and might require access to information in ancient languages, rather than searching and browsing limited collections in modern languages. Indeed Mongolian ancient documents are mostly available in ancient scripts and dialects, so users who do not understand ancient Mongolian may not find the desired information.

<sup>1</sup> http://www.dl.is.ritsumei.ac.jp/AltanTovch/

Finally, the proposed prototype could be applied to other documents in Todo, Manchu and Sibe, which are the derivative scripts of traditional Mongolian. The systems introduced in this chapter are targeted primarily at researchers in the humanities field. Nevertheless, these systems are expected to be useful to users other than researchers, in the sense that they open up new possibilities for acquiring the kinds of information that cannot be found solely in modern documents available on the web.

on Association for Computational Linguistics (ACL '05); June 25-30, 2005; Ann Arbor, Michigan. Stroudsburg, PA, USA: Association for Computational Linguistics; 2005.

Cross-Lingual and Cross-Chronological Information Access to Multilingual Historical Documents

http://dx.doi.org/10.5772/intechopen.72421

159

[9] Choimaa Sh. Qad-un űndűsűn quriyangγui altan tobči (Textological Study). vol. 1 ed. Ulaanbaatar: Urlakh Erdem Khevleliin Gazar: Ulaanbaatar: Centre for Mongol Studies,

[10] Ramshaw LA, Marcus MP. Text chunking using transformation-based learning. In: Armstrong S, Church K, Isabelle P, Manzi S, Tzoukermann E, Yarowsky D, editors. Natural Language Processing Using Very Large Corpora. Dordrecht: Springer Netherlands;

[11] Asahara M, Matsumoto Y. Japanese named entity extraction with redundant morphological analysis. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1; May 27-June 01, 2003; Edmonton, Canada. Stroudsburg, PA, USA: Association

[12] Yoshimura M, Kimura F, Maeda A. Personal name extraction from ancient Japanese texts. In: Proceedings of the Exploration, Navigation and Retrieval of Information in Cultural Heritage ENRICH 2013 Workshop; 1 August 2013; Dublin, Ireland. New York,

[13] Chinggaltai, editor. A grammar of the Mongol language. Revised ed. New York: Frederick

[14] Batjargal B, Khaltarkhuu G, Kimura F, Maeda A. An approach to named entity extraction from Mongolian Historical Documents. In: Proceedings of the 2015 International Conference on Culture and Computing; 17-19 October; Kyoto, Japan. IEEE Computer

[15] Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ. LIBLINEAR: A library for large linear

[16] Del Turco RR, Buomprisco G, Pietro CD, Kenny J, Masotti R, Pugliese J. Edition visualization technology: A simple tool to visualize TEI-based digital editions. Journal of the

[17] Batjargal B, Khaltarkhuu G, Kimura F, Maeda A.Applying text encoding initiative guidelines to a historical record in traditional Mongolian script. In: Proceedings of the 2013 International Conference on Culture and Computing; 16-18 September; Kyoto, Japan. IEEE Computer Society; 2013. p. 141-142. DOI: 10.1109/CultureComputing.2013.36 [18] Soualah MO, Hassoun MA. TEI P5 manuscript description adaptation for cataloguing digitized Arabic manuscripts. Journal of the Text Encoding Initiative [Online].

[19] Ernst-Gerlach A, Fuhr N. Retrieval in text collections with historic spelling using linguistic and spelling variants. In: Rasmussen E, editor. Proceedings of the 7th ACM/IEEE-CS

Society; 2015. pp. 205-206. DOI: 10.1109/Culture.and.Computing.2015.41

classification. The Journal of Machine Learning Research. 2008;**9**:1871-1874

Text Encoding Initiative [Online]. 2014;**8**. DOI: 10.4000/jtei.1077

for Computational Linguistics; 2003. pp. 8-15. DOI: 10.3115/1073445.1073447

pp. 363-370. DOI: 10.3115/1219840.1219885

National University of Mongolia; 2002. 276 pp

NY, USA: ACM; 2013. pp. 31-34

Ungar Publishing Co; 1963. 173 pp

2012;**2**:DOI: 10.4000/jtei.398

1999. pp. 157-176. DOI: 10.1007/978-94-017-2390-9\_10
