1 Introduction
The wide proliferation of multilingual data on the Semantic Web results in many ontologies scattered across the web in various natural languages. According to the Linked Open Vocabularies (LOV)1, the majority of the ontologies in the Semantic Web are in English, however, ontologies in other Indo-European languages also exist. For instance, out of a total 681 vocabularies found in LOV, 500 are in English, 54 in French, 39 in Spanish, and 33 in German. Few ontologies exist in non-Indo-European languages, such as 13 in Japanese and seven in Arabic. Monolingual ontologies with labels or local names presented in a certain language are not easily understandable to speakers of other languages. Therefore, in order to enhance semantic interoperability between monolingual ontologies, approaches for building multilingual ontologies from the existing monolingual ones should be developed [26]. Multilingual ontologies can be built by applying cross-lingual ontology enrichment techniques, which expand the target ontology with additional concepts and semantic relations extracted from external resources in other natural languages [23]. For example, suppose we have two ontologies; Scientific Events Ontology in English (SEO) and Conference in German (Conference
). Both SEO
and Conference
have complementary information, i.e. SEO
has some information which does not exist in Conference
and vice versa. Let us consider a scenario where a user wants to get information from both SEO
and Conference
to be used in an ontology-based application. This may not be possible without a cross-lingual ontology enrichment solution, which enrich the former by the complementary information in the latter. Manual ontology enrichment is a resource demanding and time-consuming task. Therefore, fully automated cross-lingual ontology enrichment approaches are highly desired [23]. Most of the existing work in ontology enrichment focus on enriching English ontologies from English sources only (monolingual enrichment) [23]. To the best of our knowledge, only our previous work [1, 14] has addressed the cross-lingual ontology enrichment problem by proposing a semi-automated approach to enrich ontologies from multilingual text or from other ontologies in different natural languages.
In this paper we address the following research question; how can we automatically build multilingual ontologies from monolingual ones? We propose a fully automated ontology enrichment approach in order to create multilingual ontologies from monolingual ones using cross-lingual matching. We extend our previous work [14] by: (1) using the semantic similarity to select the best translation of class labels, (2) enriching the target ontology by adding new classes in addition to all their related subclasses in the hierarchy, (3) using ontologies in non-Indo-European languages (e.g., Arabic), as the source of information, (4) building multilingual ontologies, and (5) developing a fully automated approach. OECM comprises six phases: (1) translation: translate class labels of the source ontology, (2) pre-processing: process class labels of the target and the translated source ontologies, (3) terminological matching: identify potential matches between class labels of the source and the target ontologies, (4) triple retrieval: retrieve the new information to be added to the target ontology, (5) enrichment: enrich the target ontology with new information extracted from the source ontology, and (6) validation: validate the enriched ontology. A noticeable feature of OECM is that we consider multiple translations for a class label. In addition, the use of semantic similarity has significantly improved the quality of the matching process. We present a use case for enriching the Scientific Events Ontology (SEO) [9], a scholarly communication ontology for describing scientific events, from German and Arabic ontologies. We compare OECM to five state-of-the-art approaches for cross-lingual ontology matching task. OECM outperformed these approaches in terms of precision, recall, and F-measure. Furthermore, we evaluate the enriched ontology by comparing it against a Gold standard created by ontology experts. The implementation of OECM and the datasets used in the use case are publicly available2.
The remainder of this paper is structured as follows: we present an overview of related work in Sect. 2. Overview of the proposed approach is described in Sect. 3. In order to illustrate possible applications of OECM, a use case is presented in Sect. 4. Experiments and evaluation results are presented in Sect. 5. Finally, we conclude with an outline of the future work in Sect. 6.
2 Related Work
A recent review of the literature on multilingual Web of Data found that the potential of the Semantic Web for being multilingual can be accomplished by techniques to build multilingual ontologies from monolingual ones [12]. Multilingual enrichment approaches are used to build multilingual ontologies from different resources in different natural languages [5, 6, 24]. Espinoza et al. [6] has proposed an approach to generate multilingual ontologies by enriching the existing monolingual ontologies with multilingual information in order to translate these ontologies to a particular language and culture (ontology localization). In fact, ontology enrichment depends on matching the target ontology with external resources, in order to provide the target ontology with additional information extracted from the external resources.
All the literature have focused on the cross-lingual ontology matching techniques which are used for matching different natural languages of linguistic information in ontologies [12, 26]. Meilicke et al. [20] created a benchmark dataset (MultiFarm) that results from the manual translations of a set of ontologies from the conference domain into eight natural languages. This dataset is widely used to evaluate the cross-lingual matching approaches [7, 15, 16, 28]. Manual translation of ontologies can be infeasible when dealing with large and complex ontologies. Trojahn et al. [27] proposed a generic approach which relies on translating concepts of source ontologies using machine translation techniques into the language of the target ontology. In the translation step, they depend on getting one translation for each concept (one-to-one translation), then they apply monolingual matching approaches to match concepts between the source ontologies and the translated ones. Fu et al. [10, 11] proposed an approach to match English and Chinese ontologies by considering the semantics of the target ontology, the mapping intent, the operating domain, the time and resource constraints and user feedback. Hertling and Paulheim [13] proposed an approach which utilizes Wikipedia’s inter-language links for finding corresponding ontology elements. Lin and Krizhanovsky [18] proposed an approach which use Wiktionary3 as a source of background knowledge to match English and French ontologies. Tigrine et al. [25] presented an approach, which relies on the multilingual semantic network BabelNet4 as a source of background knowledge, to match several ontologies in different natural languages. In the context of OAEI 2018 campaign5 for evaluating ontology matching technologies, AML [7], KEPLER [16], LogMap [15] and XMap [28] provide high-quality alignments. These systems use terminological and structural alignments in addition to using external lexicon, such as WordNet6 and UMLS-lexicon7 in order to get the set of synonyms for the ontology elements. In order to deal with multilingualism, AML and KEPLER rely on getting (one-to-one translation) using machine translation technologies, such as Microsoft translator, before starting the matching process. LogMap and XMap do not provide any information about the utilized translation methodology. Moreover, LogMap is an iterative process, that starts from initial mappings (‘almost exact’ lexical correspondences) to discover new mappings. It is mentioned in [15] that the main weakness of LogMap is that it can not find matching between ontologies which do not provide enough lexical information as it depends mainly on the initial mappings. A good literature of the state-of-the-art approaches in cross-lingual ontology matching is provided in [26].
Most of the literature have focused on enriching monolingual ontologies with multilingual information in order to translate or localize these ontologies. In addition, in the cross-lingual ontology matching task, there is a lack of exact one-to-one translation between terms across different natural languages which negatively affects the matching results. We address this limitations in our proposed approach by building multilingual ontologies, where a class label is presented by several natural languages, from monolingual ones. Such approach support the ontology matching process with multiple translations for a class label in order to enhance the matching results.
3 The Proposed Approach
Goal: Given two ontologies S and T, in two different natural languages and
respectively, as RDF triples
where
is the set of ontology domain entities (i.e. classes),
is the set of relations, and
is the set of literals. We aim at finding the complementary information
from S in order to enrich T.




The workflow of OECM.
3.1 Translation
Let and
be the set of classes in S and T respectively. Each class is represented by a label or a local name. The aim of this phase is to translate each class in
to the language of T (i.e.
). Google Translator8 is used to translate classes of source ontologies. All available translations are considered for each class. Therefore, the output of the translation is
which has each class, in S, associated with a list of all available translations. For example, the class Thema in German has a list of English translations (Subject and Topic), and the class label “
” in Arabic has a list of English translations such as “Review, Revision, Check”. The best translation will be selected in the terminological matching phase (Subsect. 3.3).
3.2 Pre-processing







Illustration of a terminological matching between list of translations, in English, for every concept in , in Arabic, and
in English

3.3 Terminological Matching
The aim of this phase is to identify potential matches between class labels of S and T. We perform a pairwise lexical and/or semantic similarity between the list of translations of each class in and
to select the best translation for each class in S that matches the corresponding class in T (see Algorithm 1). Jaccard similarity [22] is used to filter the identical concepts instead of using semantic similarity from the beginning because there is no need for extra computations to compute semantic similarity between two identical classes. The reason behind choosing the Jaccard similarity is that according to the experiments conducted for the ontology alignment task for the MultiFarm benchmark in [2], Jaccard similarity has achieved the best score in terms of precision. For non-identical concepts, we compute the semantic similarity using the path length measure, based on WordNet, which returns the shortest path between two words in WordNet hierarchy [3]. If two words are semantically equivalent, i.e., belonging to the same WordNet synset, the path distance is 1.00. We use a specific threshold
in order to get the set of matched terms (matched classes) M. We obtained the best value of
which has the best matching results after running the experiments for ten times. If no match is found, we consider this class as a new class that can be added to T and we consider its list of translations as synonyms for that class. Generally, class labels have more than one word, for example “InvitedSpeaker”, therefore, the semantic similarity between sentences presented in [21] is adapted as described in Algorithm 1 - line 9. Given two sentences sentence1 and sentence2, the semantic similarity of each sentence with respect to the other is defined by: for each word
, the word
in sentence2 that has the highest path similarity with
is determined. The word similarities are then summed up and normalized with the number of similar words between the two sentences. Next, the same procedure is applied to start with words in sentence2 to identify the semantic similarity of sentence2 with respect to sentence1. Finally, the resulting similarity scores are combined using a simple average. Based on the similarity results, the best translation is selected and
is updated. For example, in Fig. 2, the class “
” in Arabic, has a list of English translations such as “President, Head, Chief”. After computing the similarity between
and
, “President” has the highest similarityScore of 1.00 with the class “Chairman”, in
, because they are semantically equivalent. Therefore, “President” is selected to be the best translation for “
”. The output of this phase is the list of matched terms M between
and the updated
.
3.4 Triple Retrieval
The aim of this phase is to identify which and where the new information can be added to T. Each class in S is replaced by its best translation found in from the previous phase in order to get a translated ontology
(see Algorithm 2). We design an iterative process in order to obtain
, which is represented by
, that has all possible multilingual information from S to be added to T. We initiate the iterative process with all matched terms (
) in order to get all related classes, if exist. The iterative process has three steps: (1) for each class
, all triples tempTriples are retrieved from
where c is a subject or an object, (2) a new list of new classes is obtained from tempTriples, (3) tempTriples is added to newTriples which will be added to T. These three steps are repeated until no new classes can be found (newClasses.isEmpty() = true). Next, we retrieve all available information from the other language for each class in newTriples such as
president, rdfs:label, “
”@ar
. The output of this phase is
which contains all multilingual triples (i.e., in
and
languages) to be added to T.
3.5 Enrichment

The aim of this phase is to enrich T using triples in . By using OECM, the target ontology can be enriched from several ontologies in different natural languages sequentially, i.e. one-to-many enrichment. In this case,
can have more than two natural languages. For instance, English T can be enriched from a German ontology, then the enriched ontology can be enriched again form a different Arabic ontology, i.e. the final result for
is presented in English, German, and Arabic. With the completion of this phase, we have successfully enriched T and create a multilingual ontology from monolingual ones.
3.6 Validation



Small fragment from SEO ontology after the enrichment. The newly added information is marked in bold.
Use case: the sample output of each phase, from translation to triple retrieval.
Phase | Output |
---|---|
Translation | (Thema) |
(Gutachter) | |
(Herausgeber) | |
(Fortschritte der Konferenz) | |
Pre-processing | SizeOrDuration |
WorkshopProposals | |
InvitedSpeaker | |
In-useTrack | |
Terminological matching score results | (invited speaker, keynote speaker, 0.57) |
(person, person, 1.00) | |
(tutorial, tutorial proposals, 0.78) | |
(prize, award, 1.00) | |
(conference document, license document, 0.61) | |
(publisher, publisher, 1.00) | |
(conference series, event series, 0.79) | |
(conference series, symposium series, 0.75) | |
(proceedings, proceedings, 1.00) | |
(poster, posters track, 0.78) | |
Triple Retrieval (Iterative process) | 1 |
| |
| |
2 | |
| |
| |
| |
| |
| |
Triple Retrieval ( |
|
| |
| |
|
4 Use Case: Enriching the Scientific Events Ontology
In this use case, we use an example scenario to enrich the SEO10 ontology (with 49 classes), in English, using the MultiFarm dataset (see Sect. 5). We use the Conference ontology (60 classes) and the ConfOf ontology (38 classes), in German and Arabic respectively, as source ontologies. This use case aims to show the whole process starting from submitting the source and target ontologies until producing the enriched multilingual ontology. Here, the source ontology is the German ontology Conference
and the target ontology is the English ontology SEO
. The output is the enriched ontology SEO
, which becomes a multilingual ontology in English and German. Table 1 demonstrates the enrichment process for SEO
from Conference
and shows the output sample of each phase starting from the translation phase to the produced set of triples which are used to enrich SEO
. In the terminological matching task, the relevant matching results (with similarity scores in bold) are identified with
. The iterative process, in the triple retrieval phase, is initiated with the identified matched terms, for example, person class. At the first iteration, six triples (not all results are exist in the table because of the limited space) are produced such as
, rdfs:subClassOf, person
, where the matched term person is located at the object position. New classes are determined from the produced triples such as conference contributor and committee member (in bold). At the second iteration, all triples that have these new classes, as subject or object, are retrieved, for example; for the committee member class, the triples
committee member, rdf:type, Class
and
, rdfs:subClassOf, committee member
are retrieved. This process is repeated again and new classes are identified from the produced triples such as chairman. The iterative process ended at the fifth iteration where three triples are produced without any new classes. The output of this phase is
which has 40 new triples (with 20 new classes and their German labels), to be added to SEO
and produce SEO
. Figure 3 shows a small fragment of the enriched ontology SEO
, in Turtle, after completing the enrichment process. The resulting multilingual ontology contains a newly added class CommitteeMember with its English and German labels, a new relation rdfs:subClassOf between the two classes CommitteeMember and Chair, and new German labels such as Herausgeber and Vorsitzender for classes Publisher and Chair respectively. Similarly, SEO
is enriched from the Arabic ontology ConfOf
, where all classes with English labels in SEO
are matched with class labels in ConfOf
. The produced SEO
has 113 new triples with 37 new classes with their Arabic labels. Final output results can be found at the OECM GitHub repository.
5 Evaluation
The aim of this evaluation is to measure the quality of the cross-lingual matching process in addition to the enrichment process. We use ontologies in MultiFarm benchmark11, a benchmark designed for evaluating cross-lingual ontology matching systems. MultiFarm consists of seven ontologies (Cmt, Conference, ConfOf, Edas, Ekaw, Iasted, Sigkdd) originally coming from the Conference benchmark of OAEI, their translation into nine languages (Chinese, Czech, Dutch, French, German, Portuguese, Russian, Spanish and Arabic), and the corresponding cross-lingual alignments between them.
Experimental Setup. All phases of OECM have been implemented using Scala and Apache Spark12. SANSA-RDF library13 [17] with Apache Jena framework14 are used to parse and manipulate the input ontologies (as RDF triples). In order to process the class labels, the Stanford CoreNLP15 [19] is used. All experiments are carried out on Ubuntu 16.04 LTS operating system with an Intel Corei7-4600U CPU @ 2.10 GHz x 4 CPU and 10 GB of memory. In our experiments, we consider English ontologies as target ontologies to be enriched from German and Arabic ontologies.
Our evaluation has three tasks: (1) evaluating the effectiveness of the cross-lingual matching process in OECM compared to the reference alignment provided in the MultiFarm benchmark, (2) comparing OECM matching results with four state-of-the-art approaches, in addition to our previous work (OECM 1.0) [14], and (3) evaluating the quality of the enrichment process.
Precision, recall and F-measures for the cross-lingual matching
Ontology pairs | German | Arabic | |||||||
---|---|---|---|---|---|---|---|---|---|
Precision | Recall | F-measure | Precision | Recall | F-measure | ||||
Before | After | Before | After | Before | After | ||||
Conference | 1.00 | 0.38 | 0.56 | 1.00 | 1.00 | 0.33 | 0.42 | 0.50 | 0.59 |
ConfOf | 1.00 | 0.70 | 0.82 | 1.00 | 1.00 | 0.30 | 0.60 | 0.46 | 0.75 |
Sigkdd | 1.00 | 0.90 | 0.95 | 1.00 | 1.00 | 0.40 | 0.80 | 0.57 | 0.89 |


















State-of-the-art comparison results. Bold entries are the top scores.
Approaches | Conference | Conference | ||||
---|---|---|---|---|---|---|
Precision | Recall | F-measure | Precision | Recall | F-measure | |
AML [7] | 0.56 | 0.20 | 0.29 | 0.86 | 0.35 | 0.50 |
KEPLER [16] | 0.33 | 0.16 | 0.22 | 0.43 | 0.18 | 0.25 |
LogMap [15] | 0.71 | 0.20 | 0.31 | 0.71 | 0.29 | 0.42 |
XMap [28] | 0.18 | 0.16 | 0.17 | 0.23 | 0.18 | 0.20 |
OECM 1.0 [14] | 0.75 | 0.67 | 0.71 | 0.93 | 0.76 | 0.84 |
OECM 1.1 | 1.00 | 0.80 | 0.89 | 1.00 | 0.78 | 0.88 |
Conference | Conference | |||||
AML [7] | 0.64 | 0.39 | 0.28 | 0.71 | 0.42 | 0.29 |
KEPLER [16] | 0.40 | 0.30 | 0.24 | 0.40 | 0.30 | 0.24 |
LogMap [15] | 0.40 | 0.13 | 0.08 | 0.40 | 0.18 | 0.12 |
XMap [28] | 1.00 | 0.0 | 0.0 | 1.00 | 0.00 | 0.00 |
OECM 1.1 | 1.00 | 0.50 | 0.67 | 0.86 | 0.67 | 0.75 |
OECM 1.1 | Conference’ | Conference’ | ||||
0.88 | 0.70 | 0.78 | 1.00 | 0.78 | 0.88 |
Evaluating the Enrichment Process. According to [4], the enriched ontology can be evaluated by comparing it against a predefined reference ontology (Gold standard). In this experiment, we evaluate the enriched ontology SEO (cf. Sect. 4). A gold standard ontology has been manually created by ontology experts. By comparing SEO
with the gold standard, OECM achieves 1.00, 0.80, and 0.89 in terms of precision, recall, and F-measure respectively. This finding confirms the usefulness of our approach in cross-lingual ontology enrichment.
6 Conclusion
We present a fully automated approach, OECM, for building multilingual ontologies. The strength of our contribution lies on building such ontologies from monolingual ones using cross-lingual matching between ontologies concepts. Indo and non-Indo-European languages resources are used for enrichment in order to illustrate the robustness of our approach. Considering multiple translations of concepts and the use of semantic similarity measures for selecting the best translation have significantly improved the quality of the matching process. Iterative triple retrieval process has been developed to determine which information, from the source ontology, can be added to the target ontology, and where such information should be added. We show the applicability of OECM by presenting a use case for enriching an ontology in the scholarly communication domain. The results of the cross-lingual matching process are found promising compared to five state-of-the-art approaches, involving the previous version of OECM. Furthermore, evaluating the quality of the enrichment process emphasizes the validity of our approach. Finally, we propose some linguistic corrections for the Arabic ontologies in the MultiFarm benchmark that used in our experiment, which considerably enhanced the matching results. In conclusion, our approach provides a springboard for a new way to build multilingual ontologies from monolingual ones. In the future, we intend to further consider properties and individuals in the enrichment process. In addition, we aim to apply optimization methods in order to evaluate the efficiency of OECM when enriching very large ontologies.
Acknowledgments
This work has been supported by the BOOST EU project no. 755175. Shimaa Ibrahim and Said Fathalla would like to acknowledge the Ministry of Higher Education (MoHE) of Egypt for providing scholarships to conduct this study.

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.