5
Application of Graph Rewriting to Semantic Annotation in a Corpus

Deep syntax is an important step on the path from surface syntax to semantics. First, deep syntax does not include words with no semantic content. The nodes of a semantic graph are therefore very similar to those of the equivalent structure in deep syntax. The main problem to solve in this case is assigning a semantic representation to each word via the use of lexicons. In some cases, a semantic representation may be associated with a multiword expression instead of a single word. Second, by resolving indirect links, deep syntax establishes links between verbs, adjectives, nouns and adverbs and all of their arguments. All modifications are also linked directly to the modified word. The vast majority of links shown in the final graph are already present in the deep structure, albeit in syntactic form and not necessarily in the right direction. The main task in this respect is to ensure correct labeling of these links using labels from the semantic universe. Finally, any residual syntactic elements must be removed from the structure.

Two semantic formalisms, AMR and DMRS, are presented in Chapter 4. The principles described above are applicable for any given semantic formalism. We have chosen to apply them via a four-part transformation process, moving from a deep syntax representation to an AMR semantic transformation. As we shall see later, the process for a transformation into DMRS is almost identical. The four steps in the transformation process are as follows:

  1. 1) uniformization of the deep dependency structures, removing certain syntactic specificities associated with the chosen format. For example, in DSQ, amalgamated prepositions and pronouns are not broken down (as they are in UD). This task is carried out here;
  2. 2) Determination of nodes in the semantic graph. To do this, we separate words with a one-to-one correspondence to a semantic representation from words without a one-to-one correspondence, as far as possible. The latter category of words include copulas, which disappear at the semantic level, and multiword expressions, which are represented by single semantic units;
  3. 3) determination of the central arguments of predicates. These include subjects and objects (including indirect objects in some cases). An analysis is carried out by studying each possible syntactic combination of relations around the predicate. A lexicon is required at this stage;
  4. 4) determination of non-core arguments. Once again, a lexicon is used to distinguish elements relating to time, location, cause, etc. At this stage, the final graph is already present in the computed structure and must simply be cleaned to obtain the final form.

We shall now provide a detailed presentation of the conversion system used to move from deep syntax in SEQUOIA format to AMR semantics.

5.1. Main stages in the transformation process

5.1.1. Uniformization of deep syntax

Amalgamated prepositions and pronouns are dissociated at this stage to enable them to be processed in the same way as their simple equivalents. Enumerations are converted into coordinations. For coordinations of prepositional phrases introduced by an identical preposition, the preposition is established as a common factor, as shown below. The left side of the figure shows the rule pattern, while the right side shows the commands (the fourth command codes the fact that a node is semantically empty by adding VOID=Y).

c05unf001

The example below shows an application of this rule for the phrase “par an et par adherent” (per year and per subscriber). The annotation on the left represents the input graph and the annotation on the right shows the output1.

c05unf002

5.1.2. Determination of nodes in the semantic graph

As a general rule, lexical words are nodes in both the deep syntax graph and the semantic graph. This rule does not apply to multi-word expressions, light verbs or copulas. For multi-word expressions, for example, the initial structures contain multiple nodes which are grouped into a single node at semantic level, as shown in the transformation below for the prepositional locution à partir de (from).

c05unf003

For light verb constructions, the semantic unit we wish to identify is not linked to a single word. We thus need to modify the structure, retaining a single node, which then takes a composed predicate. In Max a envie de partir (Max wants to leave), for example, the two words avoir and envie form a single predicate (to want something). The transformation applied in this case is shown below.

c05unf004

The verb être (to be) is often used as a copula (in UD, it is an explicit copula, while in SEQUOIA, this construction is described using the ats relation). A distinction is made between copulas that simply link a predicate to its deep subject (Max est malade, Max is sick) and those that establish the fact that an individual is an instance of a concept (Max est un professeur, Max is a teacher), or that one concept is more specific than another (un professeur est un enseignant, a professor is a teacher). For example, the two sentences Max est malade and Max est un professeur are denoted by the following two rules (the without clause in the first rule ensures that only one rule is applied):

c05unf005

These rules give us the two AMR structures below:

c05unf006

5.1.3. Central arguments of predicates

Words and multiword expressions are associated with predicates while determining their central arguments. Certain transformations for verbs, adjectives, adverbs, prepositions and subordination conjunctions are regular and can be carried out by applying general rules. The system contains rules to transform syntactic relations into semantic relations: suj becomes ARG0, obj and ats become ARG1, and indirect arguments (a_obj, de_obj) become ARG2 or ARG3.

Clearly, there are a number of exceptions to these general rules. For example, in the sentence Max change de projet, while projet is indirectly introduced by the preposition de, it becomes ARG1 as this corresponds to a specific meaning of the verb changer, used as an intransitive verb. In this case, the rule shown below is used with a lexicon (verb_deobj.lp) of around 200 verbs for which de_obj becomes ARG1 in AMR:

c05unf007

More specific rules, such as the one shown above, should take priority over general rules and are thus applied first. Central arguments are therefore established using a set of lexical rules, then by non-lexicalized general rules.

In deep syntax, no distinction is made between complements governed by nouns and modifiers affecting nouns, so a lexicon is always required in order to identify their central arguments.

5.1.4. Non-core arguments of predicates

Once again, a two-stage process is used, with general rules to automatically transform syntactic relations into semantic relations and more specific rules for cases where these general rules do not apply, often involving the use of lexicons. Rules of this latter type are applied first.

As an illustration, let us consider the duration relation used in AMR. Two rules are used to establish this relation.

Rule prep_modif.duration is applied to cases where a duration is indicated by a preposition:

c05unf008

The following rule identifies the construction de N1 à N2, from N1 to N2, where N1 and N2 are nouns indicating a position in time2.

c05unf009

5.1.5. Final cleaning

During the final stage, the remaining syntactic dependencies that have no role to play at semantic level are deleted. This is the case for determiners that are not demonstratives, possessives or quantity determiners. Temporary features used in the conversion process are also removed. Finally, certain roles are inverted to ensure that the final structure respects the rooted, acyclic graph structure.

The computation of semantic representations of coordinations forms an important element of the process described above, as this phenomenon results in structure sharing and creates interference with other phenomena. It is therefore particularly important that coordinations are treated at the optimum point in the process. This task is carried out just before final cleaning, avoiding the need to write a specific version of each of the other rules for cases of coordination. This issue will be discussed in greater detail later.

The DSQ_to_AMR system implements this process as a sequential composition of four strategies, each of which is a sequential composition of packages. In total, 41 packages are used, with no repetitions. These packages include a total of 217 rules, of which 77 are lexical rules.

5.2. Limitations of the current system

Ambiguity rarely arises in transformations from surface syntax to deep syntax, but is much more common in transformations from DSQ to AMR annotation; however, it is generally lexical in nature. For any given word, there are often multiple possible entries in a lexicon and the right entry must be selected for any given context. In practice, lexicons are already used when detecting the central arguments of predicates. This step thus permits partial clarification of lexical information using the syntactic context described by the central arguments in question.

For example, in French, the verb compter has several meanings: 18, according to the Dubois-Charlier Lexique des Verbes du Français (LVF)3. Consider the sentence il compte sur moi (he counts on me), where compter governs a single complement, introduced by the preposition sur. This information, found in the DSQ annotation, allows us to clarify the meaning and select the lexical entry espérer dans (to rely on)4.

Clearly, knowledge of central arguments is not sufficient on its own. In the sentence le livre compte une centaine de pages (the book includes around a hundred pages), the verb compter governs a single complement, a direct object, but this is not enough to allow us to select a meaning; the verb may signify dénombrer (enumerate) or comporter (include). Semantic information must therefore be used to make a selection, for example, the fact that livre is an object. This additional information implies that the meaning of the verb in this example is comporter, to include.

The problem is even more complex for nouns, insofar as the DSQ format makes no distinction between governed complements and modifiers. Even in cases where a distinction can be established using a lexicon, this is not always enough to ensure that syntactic functions are associated with the correct semantic roles. For example, when we say la surveillance de Pierre, we do not know if Pierre is watching or if Pierre is being watched.

The current system is therefore only able to provide partial lexical clarification. Other types of methods (using statistics, for example) are needed to complete the task; this discussion lies outside the scope of this book.

5.3. Lessons in good practice

5.3.1. Decomposing packages

This method has already been discussed in section 3.2.2.4, and can be applied to the determination of central arguments here. For example, using a large package of non-ordered rules creates risk of ambiguity, as shown in the example below. Take the sentence le juge a confondu l’accusé avec son frère (the judge confused the accused with his brother). The verb confondre can have at least two meanings: confondre A avec B means taking A for B, and in this case, the verb governs two complements, A and B. Second, confondre A means proving that A committed a crime; in this case, the verb governs a single complement, A. Using only one package to determine the central roles, we obtain the two solutions above. Now, let us break the package down into sub-packages, each designed to process a given number of arguments, and execute these sub-packages, starting with those concerning the predicates with the greatest number of arguments. For example, for verbs, we would begin by processing verbs that govern two complements, then those governing a single complement, then those governing no complements at all. In this way, the first meaning of confondre would be selected, clarifying the ambiguity. The same result can be obtained through the use of without clauses, but this increases the size of the rules and makes the system harder to maintain.

5.3.2. Ordering packages

The action of a package often interferes with that of another, meaning that their relative order is important. This order determines the form of the rules making up the packages, which may become more or less difficult to write and more or less numerous. There is no general rule to use in defining this order; the key point is to attempt to identify all interferences before treating them on a case-by-case basis.

The transformation of coordinations from deep syntax to semantics offers a clear illustration of this problem. The transformation essentially involves replacing the first conjunct by the coordination conjunction in the coordination head.

This transformation interferes with the determination of core and non-core roles. We have therefore chosen to apply it last, avoiding the need to write specific rules for role determination in cases featuring a coordination. An illustration is given below, featuring a case where two different prepositions introduce the conjuncts of a coordination and cannot be merged into a common factor. In this case, the two conjuncts do not necessarily play the same semantic role, complicating the conversion process. Consider the following examples:

(5.1) [emea-fr-dev_00377]

conservation après reconstitution et avant utilisation

conservation after reconstitution and before use

’conservation after reconstitution and before use

(5.2) [emea-fr-test_00146]

Aclasta est contre-indiqué pendant la grossesse et

Aclasta is contraindicated during pregnancy and

chez la femme qui allaite

in women who breastfeed

’Aclasta is contraindicated during pregnancy and when breastfeeding

In example (5.1), the two conjuncts in the coordination play the same semantic role, time, in relation to the noun conservation, while in example (5.2), they play two different roles in relation to the participle contre-indiqué: duration for pendant la grossesse and beneficiary for chez la femme qui allaite. In these two examples, the roles are all non-core, but the reasoning would be the same for core roles.

By determining non-core roles before producing the coordination semantics, we are able to treat the role of the first conjuncts without the addition of specific rules. The annotation state of the part of example (5.1) following determination of non-core roles is:

c05unf010

The syntactic dependency dep from conversation to après has been transformed into the role time. Previously, during the determination of core semantic roles, dependencies of the type obj.p from prepositions to their object were also transformed into semantic roles op1.

Now, to produce the coordination semantics, we verify that the second conjunct fulfills the same role as the first. This is the case in our example, as the prepositions avant and après are both of the same type, time. The standard rule for producing coordination semantics can therefore be applied, resulting in the annotation below.

c05unf011

For example (5.2), the same process is applied initially, but when processing the coordination, we see that chez does not have the same semantic type as pendant. The first is of the beneficiary type, whilst the second is a duration. The coordination therefore needs to be separated in order to process the roles of the two conjuncts separately. The figure below shows the final annotation of the portion of the sentence in question.

c05unf012

The task of determining core and non-core semantic roles and that of introducing coordination semantics can be carried out in a different order but this results in an increase in the number of cases requiring specific treatment and consequently in the number of rules needed to treat them. A specific form of many of the rules used to determine core or non-core roles would be needed for cases where the argument is a coordination.

EXERCISE 5.1.– Consider the transformations shown in the examples above. We shall consider patterns of the following form in the graphs:

c05unf013

Prepositions prep1 and prep2 are both of the time or location type, indicated by the TYPE feature. Write a rule system to produce a semantic representation of these motifs in AMR making a distinction between cases where the two prepositions are of the same type and cases where they are different.

EXERCISE 5.2.– Consider the role reversal transformation required to guarantee that the final semantic representation of a sentence in AMR is an acyclic graph with a single root. All DSQ annotations of sentences feature a single root, which is marked. We therefore simply work through the graph from this root, marking the nodes reached in each case. On finding an edge with the correct direction, we mark its target; for edges with the wrong direction, we reverse the edge, replacing the label with the opposite role. Note that this transformation is not necessarily unique.

Write a rule system to carry out this task. The system should be applicable to any given oriented, labeled graph, starting from the root. The output should be a rooted acyclic graph, in which reversed edges are labeled e-of instead of e.

5.4. The DSQ_to_DMRS conversion system

We shall not describe the DSQ_to_DMRS conversion system in detail here, as it follows the same major steps found in the DSQ_to_AMR system. However, certain steps are carried out in a different manner to that encountered in DSQ_to_AMR and we shall focus on these differences. The system is made up of 154 rules, 54 of which are lexical rules, grouped into 35 packages.

5.4.1. Modifiers

One of the main differences between MRS and AMR with consequences for the design of the conversion system concerns modifiers. In AMR, modifiers are given very fine semantic roles in relation to the words they modify; in DMRS, they have the same semantics as predicates, with the modified word as the argument. The relation between the two elements is thus reversed. This relation reversal, especially notable in head changes, often has consequences for the conversion process, which require particular attention. In conversions from DSQ to DMRS format, interference can occur between the inversion of relations for modifiers and head changes for coordinations, where the coordination conjunction becomes the semantic head.

Consider the following example, showing the state of annotation of an expression during the DSQ to DMRS conversion process, just before coordinations are treated.

c05unf014

(5.3) [annodis.er_00154]

Offices de tourisme et Syndicats d’ initiative du Doubs

Offices of tourism and Boards of initiative of Doubs

’Tourist offices and boards of Doubs’

The deep syntax of the modifiers here, three instances of de, has already been transformed into semantics. They take the form of predicates with two arguments, the first representing the modified noun, and the second the object of the preposition. The modification relation is shown as a conjunction relation EQ between the modified noun and the modifying predicate5.

Transforming the deep syntax of the coordination consists mainly of transferring the head from the first conjunct to the coordination conjunction. This normally implies the transfer of all dependencies targeting the head of the first conjunct Offices to the conjunction et, as we consider that these dependencies concern the whole of the coordination. Unfortunately, the reversal of dependencies between modifiers and modified words in this case creates confusion. For example, the dependencies ARG1 and EQ from the preposition de introducing the noun tourisme only concern the first conjunct, whereas the same dependencies from the preposition de introducing the noun Doubs concern the whole coordination. Only the second set should therefore be transferred. A pretreatment process is used before treating coordination, marking dependencies of the first type to prevent them from being transferred. This marking task is based on word order: complements belonging to the first conjunct alone are found after the conjunct, but before the coordination conjunction. Coordination rules can then be applied, taking account of this marking, resulting in the annotation below.

c05unf015

5.4.2. Determiners

The transformation illustrated above is not finished, as the determiner le still needs to be treated. This treatment is the second significant difference between MRS and AMR. In AMR, determiners are ignored, whereas in MRS, they are treated as generalized quantifiers. The system therefore needs to identify the kernels of the restriction and body of the quantifier. This is done in a perfunctory manner by considering that the noun to which the determiner applies is the kernel of the restriction and that the governor of the noun in deep syntax is the kernel of the body. This is demonstrated in the elementary example below.

c05unf016

(5.4) [annodis.er_00461]

Le conseil municipal donne son accord pour cette

The Council City gives its agreement to this procédure

procedure

’City Council agrees to this procedure’

To obtain a full representation of this sentence in DMRS, we simply replace the syntactic dependencies of the determiners with semantic dependencies expressing the restriction and the body. A single rule may be used to do this6. The determination of the ARG1 relation corresponding to the restriction is not problematic, as the target will always be the source of the det dependency. For the ARG2 relation that corresponds to the body, the rule indicates that the governor of the noun in deep syntax should be used. However, since the deep syntax dependencies have already been replaced, this governor is harder to identify. We cannot simply take the semantic governor of the noun, as there may be multiple options, as for the word conseil. Some of these governors need to be eliminated using negative patterns; in our example, this allows us to eliminate the municipal option. Applying the rule three times, we obtain the full DMRS below.

c05unf017

EXERCISE 5.3.– Write a rule system to transform a syntactic dependency of the det type into two semantic dependencies ARG1 and ARG2, expressing the roles of restriction and body in relation to the determiner considered as a generalized quantifier.