"methods.shtml"

Методы

General Strategy

In our work we adopted the merge approach to building the Russian wordnet, i.e. starting with the language-internal structure of RussNet, and then coordinating it with EuroWordNet Top Ontology and linking it to Inter-Lingual-Index (ILI).

RussNet is by no means a clone or a translation of Princeton WordNet or any other similar resource, although most of the RussNet methodology follows the previous tradition of wordnet construction and makes use of WordNet and EuroWordNet experiences.

Collecting Base Concepts

Usually the starting point for building a wordnet is a list of Base Concepts (BCs), i. e. general words meanings on which more specific meanings depend and which are used most frequently.

Within the EuroWordNet the following formal criteria for BCs identification were postulated: The criteria and procedures we used in RussNet differ slightly from those specified in EuroWordNet:
  1. Selecting Russian BCs, we started with the most frequent words. Words with a relative frequency no less than 120 ipm were picked out from Frequency Lists for Russian and Text Corpora.

    Also words belonging to the so called “core of the national mental lexicon” (ядро языкового сознания) were extracted from the Russian Word Association Thesaurus and added to the resulting list of words that included:
  2. We had to take into consideration that the more frequent a word is, the more senses it has. Therefore, on the next stage we should examine the set of senses for each word and select the most frequent ones. For that purpose we employed Text Corpora and data presented in Word Association Norms, making use of the fact that about 90% of occurrences of a word in a corpus (or of responses stimulated by a word in WAT) are associated with 1 or 2 its senses [Hanks 2000; Ovchinnikova, Stern 1989]. These most frequent senses of the most frequently used words constituted the Preliminary List of Russian BCs.
  3. To define relations between words inside and across the semantic fields we applied different methods of linguistic analysis, such as:

Definition analysis

The following guidelines allow us (semi)-automatically processing of explanatory dictionaries in order to determine the semantic relations between literals and synsets.

Derivational analysis

This method of analysis is necessary when there is a wide range of derivational relations between words which belong to the same semantic field. In such cases semantic nature of a word can be predicted by its morphological structure: some semantic components may get their own formal representation and appear as separate morphemes. Sense of morphemes may help us to define the meaning of words, to clarify the differences between cognate words and finally to define the relations between them.

E.g., both prefixes при- and под- have sense of “adding to, putting to” while being a part of verbs like: присоединить - подсоединить; примешать - подмешать; приколоть - подколоть, thus they regularly point to the relation of synonymy between corresponding words.

Another regular means of derivation is prefixes без-/бес-, не- that link the antonymous pairs, such as платный - бесплатный, внимание - невнимание.

Context analysis

  1. Pure substitution tests help us to identify the relations of synonymy, hyponymy/hyperonymy while examining the real contexts:

    Pure substitution is only one of the possible examples of applying tests to the relation verification. More general approach implies building test sentences for each relation (for more details see Relations).

  2. Analysing contextual markers, collocations
      ‘Contextual markers’ may detail several related aspects of the word’s environment in a text, including:
    1. Lexical markers: What lexical items is the word associated with?
    2. Semantic markers: What semantic class of lexical items is the word associated with?
    3. Domain markers: What domain do these lexical items belong to?
    4. Grammatical markers: What form(s) does the word appear in?
    5. Syntactic markers: What structure(s) does the word perform in a sentence?
    6. Textual markers: Is the word associated with any (position in any) textual organisation, i.e. does it have any textual colligations?

    Mostly we deal with lexical, semantic and syntactic markers. For example, what concerns lexical markers, according to semantic amalgamation rules stated by V.G. Gak, there is a specific type of syntagmatic relations between lexical items in a collocation (‘semantic concord’) that implies repetition of the same semantic components (at least one) in the meanings of each collocant.

    e.g.: Он уже успокоился, только немного сердился на учителя за этот спектакль.

    советник Арфарра сильно рассердился на меня за соглядатайство (Латынина Ю.)

    Татьяна немного разозлилась, и, разозлившись, тут же поняла, что это уже московская злость (Аксенов В.)

    Раскольников ужасно разозлился; ему вдруг захотелось как-нибудь оскорбить этого жирного франта. (Достоевский Ф.М.)

    vs.

    Это было равносильно измене, и царь Иван сильно разгневался. (Федоров Е.)

    Рассердиться, разозлиться co-occur with various adverbs of degree, while разгневаться is associated with adverbs of the high degree only. That provides evidence for the extreme intensity of emotion in case of разгневаться and for the unspecified intensity in case of рассердиться and разозлиться, and consequently {разгневаться} being the hyponym of the {рассердиться, разозлиться}.