.. -*- mode: rst -*-
.. include:: ../definitions.txt

.. _chap-lexicon:

===========
The Lexicon
===========

------------
Introduction
------------

One early approach to grammar took the view that all regular patterns
in language could be captured by systematic rules for different kinds
of representation: sound patterns in phonology, morphological
alternations in morphology, distributional patterns in syntax, and so
on. Everything which failed to obey regular patterns had simply to be
listed. And the lexicon was just such a list. That is, the lexicon was
viewed as a place where idiosyncratic information about words was
stored, information which could not be predicted by some other
component of grammar. This approach is pretty much the one that we
have enshrined in the context free grammars of Chapters chap-parse_
and chap-featgram_, where lexical items are treated as the terminal
symbols of |CFG| productions.

Nevertheless, it is clear that there are regularities `within`:em: the
lexicon, even though these regularities are typically open to
exceptions.  In this chapter, we will look in more detail at what
lexical patterns exist and how they can be captured.
Before exploring lexical patterning, however, we need to consider what
kind of information should be exressed in a lexical entry |mdash| what
are the properties of words that we need to represent?

--------------------------------------
Lexical Information and Representation
--------------------------------------

Linguistic Approaches
---------------------

* representing lexical information, redundancy
* lexical rules, hierarchical lexicon
* lexical semantics
* morphology/lexicon interaction
* grammar/lexicon interaction (Levin classes)

Concepts and Ontologies
-----------------------

When we translate a word from one language into another, we often
assume that the two words 'have the same meaning'; for example, `(the)
weather`:lx: translates as `(het) weer`:lx: in Dutch. Finding exact
translation equivalents is sometimes problematic, yet nevertheless it
is commonplace to find 'good enough' translations. So let's assume
that `weather`:lx: and `weer`:lx: have the same meaning. Whatever this
meaning is, it seems to be something that is not itself just a word,
since it is not exclusively in one language or the other. This
motivates us to say that there is a non-linguistic entity, namely a
`concept`:dt:, which stands as the shared meaning for the two words
`weather`:lx: and `weer`:lx:. If we wish to build some kind of formal
representation of conceptual meaning, we need to provide labels for
them. If we are speakers of English, it would be tempting to use
something like `Weather`:lex: as the concept label, whereas if we are
speakers of Dutch, the label `Weer`:lex: would be the obvious
choice. There are obvious pitfalls here. First, a naive bystander
might after all think that the concept was no different from the
word, contrary to what we have just argued. Second, if concepts are
non-linguistic, who's to say that one language should have privilege
in choosing the labels for these abstract entities? Despite these
misgivings, it is common to see labels such as  `Weather`:lex:, since
an alternative like `C_2455`:math: is not exactly
memorable. Nevertheless, the more opaque identifier has one important
advantage, namely to emphasise that in and of itself, `there is no
inherent meaning to a concept such as`:em: `Weather`:lex: /
`C_2455`:math:. That's to say, just by positing some abstract entity
which acts as semantic go-between  for `weather`:lx: and
`weer`:lx: has not given an explication of the meaning of
either word.

So we have postulated an abstract set of entities, i.e., concepts,
which could act as meaning representations for words, but so far, they
themselves are just meaningless symbols. Can we do any better than
this? One approach, which we shall not try to explore here, is to say
that concepts are, or correspond to, psychological entities, and can
be given a more robust characterization within a model of cognitive
information processing. Another approach is to look for the
`connections`:em: between concepts. We already pointed out in Chapter
chap-words_ that Wordnet established such connections between concepts
(represented as synsets). In particular, concepts are related in terms
of subsumption. For example, that concept `Bird`:lex: subsumes, or is
more general than, the concept `Robin`:lex:. There is a close relation
between concepts and sets. For every concept *&C* we can identify its
extension, that is, the set of individuals that fit the concept. These
individuals are also called `instances`:dt: of the
concept. Subsumption then corresponds to the superset-subset relation:
if *C*\ :sub:`1` is subsumed by *C*\ :sub:`2`, then the extension of
*C*\ :sub:`1` is a subset of *C*\ :sub:`2`. Phrased differently, every
instance *x* of concept *C*\ :sub:`1` is also anb instance of concept *C*\
:sub:`2`. Moreover, all instances of *C*\ :sub:`1` inherit attributes
of *C*\ :sub:`2`. Concepts arranged in this way are said to form an
`inheritance hierarchy`:dt:. 

Some aspects of inheritance hierarchies can be straightforwardly
modeled just using Python's class mechanism, as shown in
class-inheritance_.

.. pylisting:: class-inheritance
   :caption: Modelling Concepts with Python Classes

    class Bird(object):
	def __init__(self):
	    self.flies = True     # [_default-flies]
	    self.laysEggs = True
	    self.hasWings = 2

    class Robin(Bird):
	def __init__(self, name = None):
	    Bird.__init__(self)
	    self.colourOfBreast = 'yellow'

    class Penguin(Bird):
	def __init__(self):
	    Bird.__init__(self)
	    self.colourOfWings= 'black'
	    self.flies = False     # [_defeating-flies]
    >>> rob = Robin()
    >>> rob.colourOfBreast
    'red'
    >>> rob.hasWings
    2
    >>> rob.flies
    True
    >>> penny = Penguin()
    >>> penny.hasWings
    2
    >>> penny.flies
    False

As you can see, Python classes implement `default inherititance`:dt:
|mdash| whereas instances of the ``Robin`` class straightforwardly
inherit the attribute ``flies`` from ``Bird`` (line default-flies_)
with the `default value`:dt: ``True``, the class ``Penguin`` overrides
this default value in line defeating-flies_, and assigns the value
``False`` instead.

Python classes also support `multiple inheritance`:dt:, as shown in
listing multiple-inheritance_, which extends class-inheritance_.

.. pylisting:: multiple-inheritance
   :caption: Multiple Inheritance with Python Classes

    class Pet(object):
	def __init__(self):
	    self.funToCareFor = True

    class Budgie(Bird, Pet):
	def __init__(self, name = None):
	    Bird.__init__(self)
	    Pet.__init__(self)
	    self.colourOfPlumage = 'yellow'
    >>> bill = Budgie()
    >>> bill.funToCareFor
    True
    >>> bill.laysEggs
    True

Thus, ``bill`` inherits attributes from both ``Bird`` and ``Pet``.

Computational Approaches
------------------------

* MDRs
* extraction of lexical entries from corpora
* datr?
* inheritance
* kimmo?


Lexical Resources
-----------------

* comlex
* wordnet
* framenet
* celex
* verbnet


Multiword Expressions
---------------------

multiword expressions, collocations, idioms

----------
Conclusion
----------

-------
Summary
-------

---------------
Further Reading
---------------



.. include:: footer.txt
