RussNet: wordnet  for Russian
Русская версия

Database Structure


Inherited design principles

Being a wordnet-type lexicon, RussNet is structured along the same lines as Princeton WN, EWN components, and other wordnets:

It is worth noticing that relations are not the same for different PoS. Firstly, they are not equally important for structuring different PoS files, e.g., unlike nouns and verbs, adjectives do not obviously arrange hyponymy hierarchies, rather they are structured around antonymy and relations with modified nouns [Fellbaum, 1998]. Secondly, relations may differ in coverage and depth, e.g., noun hyponymy hierarchies may have as much as 12 levels, while depth of the tree for verbs seems never exceed 5. Thirdly, for different PoS relations acquire some specific features, e.g.for verbs troponymy relates more general synset to more specific ones, and in that respect it resembles hyponymy for nouns, but, on the other hand, it is a kind of lexical entailment, which in its turn has much in common with meronymy for nouns.

Nouns

Semantic fields

  1. Body Part
  2. Human
  3. Time
  4. Place
  5. Artefact

Relations

Noun synsets in RussNet are arranged mostly by hyponymy/hyperonymy and meronymy/holonymy relations.

Derivatrional_synonymy, and Derivatrional_hynonymy are of a great importance for noun literals.

See samples

Verbs

Semantic fields

  1. Possession
  2. Motion
  3. Emotion
    • Emotional States
    • Emotional Relations
  4. Social Relations
  5. Body Functions
  6. Cognition
    • Thinking
    • Knowing
  7. Communication
  8. Contact
  9. Creation
  10. Perception
  11. Existence
  12. Modality
  13. Location

Relations

Verb synsets in RussNet are organized mostly by hyponymy and other types of lexical entailment (causation, presupposition, subevent).

Valency frames

One of the most important differences between RussNet and its prototypes concerns the information on argument structure of verbs. It is generally accepted that syntactic features of words, especially verbs, are determined by their semantic properties, that the meaning of a verb outlines the form and semantic features of words accompanying it.

The term valency frame is used to refer to the semantic and syntactic structure of verb arguments. This characteristic is vital for Russian, as well as for other Slavonic languages [Pala, Sevecek, 1999].

Within RussNet we include the following elements into the description of verb semantics:
  • a list of valency frames for a synset (in terms of the grammatical form, order and necessity of arguments, preposition absence/presence) specifying what frame fits the member of a synset;
  • semantic features of verb’s arguments (in terms of Involved_Relations presenting their thematic roles, and Base Concepts indicating the corresponding semantic classes).

E. g., synset {влюбиться, увлечься} is accompanied with following description:

влюбиться: 1[N1 Agent {человек, лицо3}] + 2[в_N4 Object {человек, лицо3}]

увлечься: 1[N1 Agent {человек, лицо3}] + 2 [N5 Object {человек, лицо3}].

A common list of valency frames for a synset is better than separate description for each literal, because in this case the paradigm influencing the native speaker is presented.

Such a description is very useful in that it allows representing the inheritance of valency frames of a hyperonym by its hyponyms and that of a stem-word by its derivatives.

E. g., двигаться (to move) HYPONYM идти (to walk): двигаться has arguments: (a) 1[Source_Direction] + 2[Location], or (b) 1[Target_Direction] + 2[Location], which are inherited by its hyponym идти.

Another advantage is that such a description helps in distinguishing between senses of polysemous verbs. A verb may have different valency frames associated with different senses,

e.g.: бить1 INVOLVED_PATIENT посуду
{break apart2, break up13, crash7}
бить2 INVOLVED_OBJECT в барабан
{beat22, drum9, thrum1}
бить3 INVOLVED_PATIENT врага
{fight5, have a fight1, struggle4}

See samples

Adjectives

Semantic fields

  1. Descriptive
    • Size
    • Quality
  2. Relational
  3. Pronominal
  4. Ordinal

Relations

There is no common solution for structuring adjectives in wordnets. In Princeton WN adjectives are organized around pairs of antonyms and nouns they modify (‘value of’ relation). In EuroWordNet adjectives are treated in terms of their references to nouns, and ‘be_in_state’ and ‘state_of’ relations are introduced (e.g. noun colour is linked by the relation ‘be_in_state’ to adjectives coloured or colourless). In GermaNet adjectives are hierarchically structured according to semantic domains.

Following the GermaNet proposal to “make use of hyponymy relations wherever it’s possible” [Naumann, 2000], in RussNet we adopt more formal approach based on the adjective collocations with nouns. Empirical data proves that in Russian it’s the adjective that predicts the noun (class of nouns) to collocate with, not vise versa, e.g: долговязый (lanky, strapping) involves the pointer to a human being, i.e. it can collocate with such nouns as мальчик (a boy), человек (a man).

Thus, the main idea underlying our work is that hyponymy tree for descriptive adjectives may be built according to that of nouns: i.e. if 2 adjectives from the same semantic field collocate with 2 nouns linked by the hyponymy, we are to build the hyponymy link for these adjectives.

E.g.: высокий (tall) collocates with all nouns denoting entities: objects, human beings, animals, and so on, while долговязый (lanky, strapping) collocates only with words from the lower levels of this hyponymy tree: human beings only. Hence, высокий (tall) is thought of as a hyperonym for долговязый (lanky, strapping)

Thus, analyzing the co-occurrence of adjectives with nouns, we produce hyponymy structures of adjectives denoting the similar quality.

See samples

Adverbs

Semantic fields

  1. Time

Relations

Adverb synsets are organized mostly by the relations with corresponding adjectives, which is usually accompanied by derivational motivation. Some adverbs are also incorporated into hyponymy relations.

See samples

Definitions in RussNet

In so far as RussNet is designed not only for machine usage, but for human-machine interaction as well, it should contain some additional information that may help users to identify word senses properly, namely definitions.

We suppose that semantic relations themselves (hypernymy, hyponymy, synonymy, antonymy, meronymy etc.), already represented in RussNet structure, could be a reliable base for generating consistent word sense definitions.

It seems reasonable not to look for one common solution for all PoS (i.e. to build definitions according to one standard model, e.g. “genus proximum + differentia specificae”); we should rather choose appropriate types of definitions according to the priority of relations for each part of speech.

We can rank the relations from the view point of their ‘relevance' for the definition generation. "Genus proximum + differentia specificae" is no doubt the most frequently used model. Such definitions include not only references to the nearest superordinate synsets, but distinguishers that enable us to differentiate between co-ordinated synsets. This information could not be presented in any other type of semantic description. But in case a noun synset is incorporated in meronymy relations, it can hardly possess clear hyperonym, thus its definition should rather be based on its holonym or meronyms. E.g. голова is clearly defined with its holonym as ‘верхняя часть тела’, though it has a hyperonym. The similar situation holds for the cause relation between verbs: it has a priority over other relations, e.g. злить could not be defined other than ‘вызывать злость, заставлять злиться

  Send yours comments and remarks to russnet@yandex.ru  The last modification: 14 June 2005