.. -*- mode: rst -*-
.. include:: ../definitions.txt

.. _chap-featgram:

========================
9. Feature Based Grammar
========================

--------------
 Introduction
--------------

The framework of context-free grammars that we presented in Chapter
chap-parse_ describes syntactic constituents with the help of a
limited set of category labels. These atomic labels are adequate for
talking about the gross structure of sentences. But when we start to
make finer grammatical distinctions it becomes awkward to enrich the
set of categories in a systematic manner. In this chapter we will
address this challenge by decomposing categories using features
(somewhat similar to the key-value pairs of Python dictionaries). 

We will start off by looking at the phenomenon of syntactic agreement;
we will show how agreement constraints can be expressed elegantly using
features, and illustrate how their use in a simple grammar. Feature
structures are a general data structure for representing information
of any kind; we will briefly look at them from a more formal point of
view, and explain how they are made available in |NLTK|. In the
final part of the chapter, we demonstrate that the additional
expressiveness of features opens out a wide spectrum of possibilities
for describing sophisticated aspects of linguistic structure.

--------------------------------
Decomposing Lingustic Categories
--------------------------------

Syntactic Agreement
-------------------


Consider the following contrasts:

.. _thisdog:
.. ex::
   .. ex::
      this dog
   .. ex::
      \*these dog

.. _thesedogs:
.. ex::
   .. ex::
      these dogs
   .. ex::
      \*this dog

|nopar| In English, nouns are usually morphologically marked as being singular
or plural. The form of the demonstrative also varies in a
similar way; there is a singular form `this`:lx: and a plural form `these`:lx:.
Examples thisdog_ and thesedogs_ show that there are constraints on
the realization of demonstratives and nouns within a noun phrase:
either both are singular or both are plural. A similar kind
of constraint is observed with subjects and predicates:

.. _subjpredsg:
.. ex::
   .. ex::
      the dog runs
   .. ex::
      \*the dog run

.. _subjpredpl:
.. ex::
   .. ex::
      the dogs run
   .. ex::
      \*the dogs runs


|nopar| Here again, we can see that morphological properties of the verb co-vary
with morphological properties of the subject noun phrase; this co-variance is
usually termed `agreement`:dt:.  The element which determines the
agreement, here the subject noun phrase, is called the agreement
`controller`:dt:, while the element whose form is determined by
agreement, here the verb, is called the `target`:dt:.
If we look further at verb agreement in English, we will see that
present tense verbs typically have two inflected forms: one for third person
singular, and another for every other combination of person and
number:

.. table:: agreement-paradigm

    +------------+-------------+----------+
    |            |**singular** |**plural**|
    +------------+-------------+----------+
    |**1st per** |*I run*      |*we run*  |
    |            |             |          |
    +------------+-------------+----------+
    |**2nd per** |*you run*    |*you run* |
    |            |             |          |
    +------------+-------------+----------+
    |**3rd per** |*he/she/it   |*they run*|
    |            |runs*        |          |
    +------------+-------------+----------+

    Agreement Paradigm for English Regular Verbs

We can make the role of morphological properties a bit more explicit
as illustrated in runs_ and run_. These representations indicate that
the verb agrees with its subject in person and number. (We use "3" as
an abbreviation for 3rd person, "SG" for singular and "PL" for plural.)

.. _runs:
.. ex::
   .. gloss::
      the | dog       |run-s  
          | dog.3.SG  |run-3.SG  

.. _run:
.. ex::
   .. gloss::
      the | dog-s     |run
          | dog.3.PL  |run-3.PL 

Despite the undoubted interest of agreement as a topic in its own
right, we have introduced it here for another reason: we want to look
at what happens when we try encode agreement constraints in a
context-free grammar.  Suppose we take as our starting point the very
simple CFG in agcfg0_.

.. _agcfg0:
.. ex::
   .. parsed-literal::

     `s`:gc: |rarr| `np vp`:gc:
     `np`:gc: |rarr| `Det n`:gc: 
     `vp`:gc: |rarr| `v`:gc: 

     `Det`:gc: |rarr| 'this'
     `n`:gc: |rarr| 'dog'
     `v`:gc: |rarr| 'runs'

|nopar| agcfg0_ allows us to generate the sentence `this dog runs`:lx:;
however, what we really want to do is also generate `these dogs
run`:lx: while blocking unwanted strings such as `*this dogs run`:lx:
and `*these dog runs`:lx:. The most straightforward approach is to
add new non-terminals and productions to the grammar which reflect our
number distinctions and agreement constraints (we ignore person for the time being):

.. _agcfg1:
.. ex::
   .. parsed-literal::

     `S_SG`:gc: |rarr| `NP_SG VP_SG`:gc:
     `S_PL`:gc: |rarr| `NP_PL VP_PL`:gc:
     `NP_SG`:gc: |rarr| `Det_SG N_SG`:gc: 
     `NP_PL`:gc: |rarr| `Det_PL N_PL`:gc: 
     `VP_SG`:gc: |rarr| `V_SG`:gc: 
     `VP_PL`:gc: |rarr| `V_PL`:gc: 

     `Det_SG`:gc: |rarr| 'this'
     `Det_PL`:gc: |rarr| 'these'
     `N_SG`:gc: |rarr| 'dog'
     `N_PL`:gc: |rarr| 'dogs'
     `V_SG`:gc: |rarr| 'runs'
     `V_PL`:gc: |rarr| 'run'

|nopar| It should be clear that this grammar will do the required
task, but only at the cost of duplicating our previous set of
rules. Rule multiplication is of course more severe if we add in
person agreement constraints.


Using Attributes and Constraints
--------------------------------

We spoke informally of linguistic categories having *properties*; for
example, that a verb has the property of being plural. Let's try to
make this more explicit:

.. _num0:
.. ex::
   .. parsed-literal::

     `N`:gc:\ [`num`:feat: `pl`:fval:\ ]

|nopar| In num0_, we have introduced some new notation which says that the
category `N`:gc: has a `feature`:dt: called `num`:feat: (short for
'number') and that the value of this feature is `pl`:fval: (short for
'plural'). We can add similar annotations to other categories, and use
them in lexical entries:

.. _agcfg2:
.. ex::
   .. parsed-literal::

     `Det`:gc:\ [`num`:feat: `sg`:fval:\ ] |rarr| 'this'
     `Det`:gc:\ [`num`:feat: `pl`:fval:\ ]  |rarr| 'these'
     `N`:gc:\ [`num`:feat: `sg`:fval:\ ] |rarr| 'dog'
     `N`:gc:\ [`num`:feat: `pl`:fval:\ ] |rarr| 'dogs'
     `V`:gc:\ [`num`:feat: `sg`:fval:\ ] |rarr| 'runs'
     `V`:gc:\ [`num`:feat: `pl`:fval:\ ] |rarr| 'run'

|nopar| Does this help at all? So far, it looks just like a slightly more
verbose alternative to what was specified in agcfg1_. Things become
more interesting when we allow *variables* over feature values, and use
these to state constraints. This is illustrated in agcfg3_.

.. _agcfg3:
.. ex::
   .. _srule:
   .. ex::
      .. parsed-literal::

        `S`:gc: |rarr| `NP`:gc:\ [`num`:feat: `?n`:math:\ ] `VP`:gc:\ [`num`:feat: `?n`:math:\ ]

   .. _nprule:
   .. ex::
      .. parsed-literal::

       `NP`:gc:\ [`num`:feat: `?n`:math:\ ] |rarr| `Det`:gc:\ [`num`:feat: `?n`:math:\ ] `N`:gc:\ [`num`:feat: `?n`:math:\ ]

   .. _vprule:
   .. ex::
      .. parsed-literal::

       `VP`:gc:\ [`num`:feat: `?n`:math:\ ] |rarr| `V`:gc:\ [`num`:feat: `?n`:math:\ ]

|nopar| We are using '`?n`:math:' as a variable over values of `num`:feat:; it can
be instantiated either to `sg`:fval: or `pl`:fval:. Its scope is
limited to individual rules. That is, within srule_, for example,
`?n`:math: must be instantiated to the same constant value; we can
read the rule as saying that whatever value `NP`:gc: takes for the feature
`num`:feat:, `VP`:gc: must take the same value. 

In order to understand how these feature constraints work, it's
helpful to think about how one would go about building a tree. Lexical
rules will admit the following local trees (trees of
depth one):

.. ex::
   .. _this:
   .. ex:: 
      .. tree:: (Det[NUM\ sg] this) 
   .. _these:
   .. ex:: 
      .. tree:: (Det[NUM\ pl] these) 

.. ex::
   .. _dog:
   .. ex:: 
      .. tree:: (N[NUM\ sg] dog) 
   .. _dogs:
   .. ex:: 
      .. tree:: (N[NUM\ pl] dogs) 

|nopar| Now nprule_ says that whatever the `num`:feat: values of `N`:gc: and
`Det`:gc: are, they have to be the same. Consequently,  nprule_ will
permit this_ and dog_ to be combined into an `NP`:gc: as shown in
good1_ and it will also allow these_ and dogs_ to be combined, as in
good2_. By contrast,  bad1_ and bad2_ are prohibited because the roots
of their
constituent local trees differ in their values for the `num`:feat: feature.

.. ex::
   .. _good1:
   .. ex::
      .. tree:: (NP[NUM\ pl] (Det[NUM\ sg] this)(N[NUM\ sg] dog))

   .. _good2:
   .. ex::
      .. tree:: (NP[NUM\ pl] (Det[NUM\ pl] these)(N[NUM\ pl] dogs))

.. ex::
   .. _bad1:
   .. ex::
      .. tree:: (NP[NUM\ ...] (Det[NUM\ sg] this)(N[NUM\ pl] dogs))

   .. _bad2:
   .. ex::
      .. tree:: (NP[NUM\ ...] (Det[NUM\ PL] these)(N[NUM\ SG] dog))

Rule vprule_ can be thought of as saying that the `num`:feat: value of the
head verb has to be the same as the `num`:feat: value of the `VP`:gc:
mother. Combined with srule_, we derive the consequence that if the
`num`:feat: value of the subject head noun is `pl`:fval:, then so is
the `num`:feat: value of the `VP`:gc:\ 's head verb.

.. ex::
   .. tree:: (S (NP[NUM\ pl] (Det[NUM\ pl] these)(N[NUM\ pl] dogs))(VP[NUM\ pl] (V[NUM\ pl] run)))

Grammar feat0cfg_ illustrates most of the ideas we have introduced so
far in this chapter, plus a couple of new ones. As you will see, the
format of feature specifications in these productions inserts a ``'='``
between the feature and its value. In our exposition, we will stick to our earlier
convention of just leaving space between the feature and value, except
when we are directly referring to the |NLTK| grammar formalism.

.. _feat0cfg:
.. ex::
.. include:: ../../examples/parse/feat0.cfg
   :literal:

|nopar| You will notice that a feature annotation on a syntactic
category can contain more than one specification; for example,
`v`:gc:\ [`tense`:feat: `pres`:fval:, `num`:feat: `pl`:fval:\
]. In general, there is no upper bound on the number of features we
specify as part of our syntactic categories.

A second point is that we have used feature variables in lexical entries as well
as grammatical rules. For example, `the`:lx: has been assigned the
category `Det`:gc:\ [`num`:feat: `?n`:math:]. Why is this?  Well,
you know that the definite article `the`:lx: can combine with both
singular and plural nouns. One way of describing this would be to add
two lexical entries to the grammar, one each for the singular and
plural versions of `the`:lx:. However, a more elegant solution is to
leave the `num`:feat: value `underspecified`:dt: and letting it agree
in number with whatever noun it combines with.

A final detail about feat0cfg_ is the statement ``%start
S``. This a "directive" which tells the parser to take `s`:gc: as the
start symbol for the grammar.

In general, when we are trying to develop even a very small grammar,
it is convenient to put the rules in a file where they can be edited,
tested and revised. Assuming we have saved feat0cfg_ as a file named
``'feat0.cfg'``, the function ``GrammarFile.read_file()`` allows us to
read the grammar into NLTK, ready for use in parsing.

.. doctest-ignore::
    >>> from nltk_lite.parse import GrammarFile
    >>> from pprint import pprint
    >>> g = GrammarFile.read_file('feat0.cfg')

|nopar| We can inspect the rules and the lexicon using the commands ``print
g.earley_grammar()`` and  ``pprint(g.earley_lexicon())``.

Next, we can tokenize a sentence and use the ``get_parse_list()`` function to
invoke the Earley chart parser.

.. pylisting:: featurecharttrace
   :caption: Trace of Feature-Based Chart Parser

    >>> from nltk_lite import tokenize
    >>> sent = 'Kim likes children'
    >>> tokens = list(tokenize.whitespace(sent))
    >>> tokens 
    ['Kim', 'likes', 'children']
    >>> cp = g.earley_parser(trace=10)
    >>> trees = cp.get_parse_list(tokens)
	      |.K.l.c.|

    Predictor |> . . .| S -> * NP[NUM=?n] VP[NUM=?n] 
    Predictor |> . . .| NP[NUM=?n] -> * N[NUM=?n] 
    Predictor |> . . .| NP[NUM=?n] -> * PropN[NUM=?n] 
    Predictor |> . . .| NP[NUM=?n] -> * Det[NUM=?n] N[NUM=?n] 
    Predictor |> . . .| NP[NUM=pl] -> * N[NUM=pl] 
    Scanner   |[-] . .| [0:1] 'Kim' 
    Completer |[-] . .| NP[NUM=sg] -> PropN[NUM=sg] * 
    Completer |[-> . .| S -> NP[NUM=sg] * VP[NUM=sg] 
    Predictor |. > . .| VP[NUM=?n, TENSE=?t] -> * IV[NUM=?n, TENSE=?t] 
    Predictor |. > . .| VP[NUM=?n, TENSE=?t] -> * TV[NUM=?n, TENSE=?t] NP 
    Scanner   |. [-] .| [1:2] 'likes' 
    Completer |. [-> .| VP[NUM=sg, TENSE=pres] -> TV[NUM=sg, TENSE=pres] * NP 
    Predictor |. . > .| NP[NUM=?n] -> * N[NUM=?n] 
    Predictor |. . > .| NP[NUM=?n] -> * PropN[NUM=?n] 
    Predictor |. . > .| NP[NUM=?n] -> * Det[NUM=?n] N[NUM=?n] 
    Predictor |. . > .| NP[NUM=pl] -> * N[NUM=pl] 
    Scanner   |. . [-]| [2:3] 'children' 
    Completer |. . [-]| NP[NUM=pl] -> N[NUM=pl] * 
    Completer |. [---]| VP[NUM=sg, TENSE=pres] -> TV[NUM=sg, TENSE=pres] NP * 
    Completer |[=====]| S -> NP[NUM=sg] VP[NUM=sg] * 
    Completer |[=====]| [INIT] -> S * 


|nopar| It is important to observe that the parser works directly with
the underspecified productions given by the grammar. That is, the
Predictor rule does not attempt to compile out all admissible feature
combinations before trying to expand the non-terminals on the lefthand
side of a production. However, when the Scanner matches an input word
against a lexical rule that has been predicted, the new edge will
typically contain fully specified features; e.g., the edge
[PropN[`num`:feat: = `sg`:fval:] |rarr| 'Kim', (0, 1)]. Recall from
Chapter chap-parse_ that the Fundamental (or Completer) Rule in
standard CFGs is used to combine an incomplete edge that's expecting a
nonterminal *B* with a following, complete edge whose left hand side
matches *B*. In our current setting, rather than checking for a
complete match, we test whether the expected category *B* will
`unify`:dt: with the lefthand side *B'* of a following complete
edge. We will explain in more detail in Section sec-feat-comp_ how
unification works; for the moment, it is enough to know that as a
result of unification, any variable values of features in *B* will be
instantiated by constant values in the corresponding feature structure
in *B'*, and these instantiated values will be used in the new edge
added by the Completer. This instantiation can be seen, for example,
in the edge 
[`np`:gc:\ [`num`:feat: `sg`:fval:] |rarr| PropN[`num`:feat: `sg`:fval:] |dot|, (0, 1)]
in featurecharttrace_,
where the feature `num`:feat: has been assigned the value `sg`:fval:.


Finally, we can inspect the resulting parse trees (in this case, a
single one).

.. doctest-ignore::
     >>> for tree in trees: print tree
     ... 
     ([INIT]:
       (Start:
	 (S:
	   (NP[NUM=sg]: (PropN[NUM=sg]: 'Kim'))
	   (VP[NUM=sg, TENSE=pres]:
	     (TV[NUM=sg, TENSE=pres]: 'likes')
	     (NP[NUM=pl]: (N[NUM=pl]: 'children'))))))


Terminology
-----------

So far, we have only seen feature values like `sg`:fval: and
`pl`:fval:. These simple values are usually called `atomic`:dt:
|mdash| that is, they can't be decomposed into subparts. A special
case of atomic values are `boolean`:dt: values, that is, values which
just specify whether a property is true or false of a category. For
example, we might want to distinguish `auxiliary`:dt: verbs such as
`can`:lx:, `may`:lx:, `will`:lx: and `do`:lx: with the boolean feature
`aux`:feat:. Then our lexicon for verbs could include entries such as
the following:

.. ex::
      .. parsed-literal::

        `V`:gc:\ [`tense`:feat: `pres`:fval:, `aux`:feat: `+`:math:\ ] |rarr| 'can'
        `V`:gc:\ [`tense`:feat: `pres`:fval:, `aux`:feat: `+`:math:\ ] |rarr| 'may'

        `V`:gc:\ [`tense`:feat: `pres`:fval:, `aux`:feat: `-`:math:\ ] |rarr| 'walks'
        `V`:gc:\ [`tense`:feat: `pres`:fval:, `aux`:feat: `-`:math:\ ] |rarr| 'likes'


A frequently used abbreviation for boolean features allows the value
to be prepended to the feature:

.. ex::
      .. parsed-literal::

        `v`:gc:\ [`tense`:feat: `pres`:fval:, `+aux`:feat:\ ] |rarr| 'can'
        `v`:gc:\ [`tense`:feat: `pres`:fval:, `-aux`:feat:\ ] |rarr| 'walks'

We have spoken informally of attaching 'feature annotations' to
syntactic categories. A more general
approach is to treat the whole category |mdash| that is, the
non-terminal symbol plus the annotation |mdash| as a bundle of
features. Consider, for example, the object we have written as ncat0_.

.. _ncat0:
.. ex::
      .. parsed-literal::

        `n`:gc:\ [`num`:feat: `sg`:fval:\ ] 

|nopar| The syntactic category `n`:gc:, as we have seen before, provides part
of speech information. This information can itself be captured as a
feature value pair, using  `pos`:feat: to represent "part of speech":

.. _ncat1:
.. ex::
      .. parsed-literal::

        [`pos`:feat: `N`:fval:, `num`:feat: `sg`:fval:\ ] 

|nopar| In fact, we  regard ncat1_ as our "official" representation of a
feature-based linguistic category, and
ncat0_ as a convenient abbreviation.
A bundle of feature-value pairs is called a `feature structure`:dt:
or an `attribute value matrix`:dt: (AVM). A feature structure which
contains a specification for the feature `pos`:feat: is a `linguistic
category`:dt:. 

In addition to atomic-valued features, we allow features whose values
are themselves feature structures. For example, we might want to group
together agreement features (e.g., person, number and gender) as a
distinguished part of a category, as shown in agr0_.

.. _agr0:
.. ex::
      .. avm:: 
        [pos = N           ]
        [                  ]
        [agr = [per = 3   ]]
        [      [num = pl  ]]
        [      [gnd = fem ]]


|nopar| In this case, we say that the feature `agr`:feat: has a `complex`:dt: value.

There is no particular significance to the *order* of features in a
feature structure. So agr0_ is equivalent to agr0_.

.. _agr1:
.. ex::
      .. avm::
        [agr = [num = pl  ]]
        [      [per = 3   ]]
        [      [gnd = fem ]]
        [                  ]
        [pos = N           ]

Once we have the possibility of using features like `agr`:feat:, we
can refactor a grammar like feat0cfg_ so that agreement features are
bundled together. A tiny grammar illustrating this point is shown in agr2_.

.. _agr2:
.. ex::
    .. parsed-literal::

      `s`:gc: |rarr| `np`:gc:\ [`agr`:feat: `?n`:fval:\ ] `vp`:gc:\ [`agr`:feat: `?n`:fval:]
      `np`:gc:\ [`agr`:feat: `?n`:fval:] |rarr| `PropN`:gc:\ [`agr`:feat: `?n`:fval:] 
      `vp`:gc:\ [`tense`:feat: `?t`:fval:, `agr`:feat: `?n`:fval:] |rarr| `Cop`:gc:\ [`tense`:feat: `?t`:fval:, `agr`:feat: `?n`:fval:] Adj
      `Cop`:gc:\ [`tense`:feat: pres,  `agr`:feat: [`num`:feat: `sg`:fval:, `per`:feat: 3]] |rarr| 'is' 
      `PropN`:gc:\ [`agr`:feat: [`num`:feat: `sg`:fval:, `per`:feat: 3]] |rarr| 'Kim'
      `Adj`:gc: |rarr| 'happy'


Exercises
---------

#. |easy| What constraints are required to correctly parse strings like `I am
   happy`:lx: and `she is happy`:lx: but not `*you is happy`:lx: or
   `*they am happy`:lx:? Implement two solutions for the present tense
   paradigm of the verb `be`:lx: in English, first taking Grammar
   agcfg1_ as your starting point, and then taking Grammar agr2_
   as the starting point. 

#. |easy| Develop a variant of grammar feat0cfg_ which uses a
   `count`:feat: to make the distinctions shown below:

   .. ex:: 
       .. ex:: The boy sings.
       .. ex:: \*Boy sings.

   .. ex:: 
       .. ex:: The boys sing.
       .. ex:: Boys sing.

   .. ex:: 
       .. ex:: The boys sing.
       .. ex:: Boys sing.

   .. ex::
       .. ex:: The water is precious.
       .. ex:: Water is precious.

#. |soso| Develop a feature-based grammar that will correctly describe the following
   Spanish noun phrases:

    .. ex::
       .. gloss::
	  un                                  | cuadro  | hermos-o
	  INDEF.SG.MASC                       | picture | beautiful-SG.MASC
	  'a beautiful picture'               

       .. gloss::
	  un-os                               | cuadro-s   | hermos-os           
	  INDEF-PL.MASC                       | picture-PL | beautiful-PL.MASC   
	  'beautiful pictures'                  

       .. gloss::
	  un-a                                | cortina | hermos-a
	  INDEF.SG.FEM                        | curtain | beautiful-SG.FEM
	  'a beautiful curtain'     

       .. gloss::
	  un-as                               | cortina-s | hermos-as
	  INDEF.PL.FEM                        | curtain   | beautiful-PL.FEM
	  'beautiful curtains'     


#. |soso| Develop a wrapper for the ``earley_parser`` so that a trace
   is only printed if the input string fails to parse.

.. _sec-feat-comp:

---------------------------------
Computing with Feature Structures
---------------------------------

In this section, we will show how feature structures can be
constructed and manipulated in |NLTK|. We will also discuss the
fundamental operation of unification, which allows us to combine the
information contained in two different feature structures.

Feature Structures in NLTK
--------------------------

Feature structures in NLTK are declared with the
``FeatureStructure()`` constructor. Atomic feature values can be strings or
integers.

    >>> from nltk_lite.featurestructure import *
    >>> fs1 = FeatureStructure(TENSE='past', NUM='sg') 
    >>> print fs1
    [ NUM   = 'sg'   ]
    [ TENSE = 'past' ]

|nopar| We can think of a feature structure as being like a Python dictionary,
and access its values by indexing in the usual way.

    >>> fs1 = FeatureStructure(PER=3, NUM='pl', GND='fem')
    >>> print fs1['GND']
    fem

|nopar| However, we cannot use this syntax to *assign* values to features:

    >>> fs1['CASE'] = 'acc'
    Traceback (most recent call last):
    ...
    KeyError: 'CASE'

|nopar| We can also define feature structures which have complex values, as
discussed earlier.

    >>> fs2 = FeatureStructure(POS='N', AGR=fs1)
    >>> print fs2
    [       [ GND = 'fem' ] ]
    [ AGR = [ NUM  = 'pl'  ] ]
    [       [ PER  = 3     ] ]
    [                        ]
    [ POS = 'N'              ]
    >>> print fs2['AGR']
    [ GND = 'fem' ]
    [ NUM  = 'pl'  ]
    [ PER  = 3     ]
    >>> print fs2['AGR']['PER']
    3

An alternative method of specifying feature structures in NLTK is to
use the ``parse`` method of ``FeatureStructure``. This gives us the
facility to use square bracket notation for embedding one feature
structure within another.

    >>> FeatureStructure.parse("[POS='N', AGR=[PER=3, NUM='pl', GND='fem']]")
    [AGR=[GND='fem', NUM='pl', PER=3], POS='N']


Feature Structures as Graphs
----------------------------

Feature structures are not inherently tied to linguistic objects; they are
general purpose structures for representing knowledge. For example, we
could encode information about a person in a feature structure:

    >>> person01 = FeatureStructure(name='Lee', telno='01 27 86 42 96', age=33)

.. _person01:
.. ex::
      .. avm:: 

        [name = `Lee'            ]
        [telno = 01 27 86 42 96  ]
        [age = 33                ]


It is sometimes helpful to view feature structures as graphs; more
specifically, `directed acyclic graphs`:dt: (DAGs). dag01_ is equivalent to
the AVM person01_.

.. _dag01:
.. ex::
   .. image:: ../images/dag01.png
      :scale: 40

|nopar| The feature names appear as labels on the directed arcs, and feature
values appear as labels on the nodes which are pointed to by the arcs.

Just as before, feature values can be complex:

.. _dag02:
.. ex::
   .. image:: ../images/dag02.png
      :scale: 40

|nopar| When we look at such graphs, it is natural to think in terms of
paths through the graph. A `feature path`:dt: is a sequence of arcs
that can be followed from the root node. We will represent paths in NLTK as
tuples. Thus, ``('address', 'street')`` is a feature path whose value
in dag02_
is the string 'rue Pascal'.

Now let's consider a situation where Lee has a spouse named "Kim", and
Kim's address is the same as Lee's.
We might represent this as dag04_.

.. _dag04:
.. ex::
   .. image:: ../images/dag04.png
      :scale: 40

|nopar| However, rather than repeating the address
information in the feature structure, we can "share" the same
sub-graph between different arcs:

.. _dag03:
.. ex::
   .. image:: ../images/dag03.png
      :scale: 40


|nopar| In other words, the value of the path ``('ADDRESS')`` in dag03_ is
identical to the value of the path ``('SPOUSE', 'ADDRESS')``.  DAGs
such as dag03_ are said to involve `structure sharing`:dt: or
`reentrancy`:dt:. When two paths have the same value, they are said to
be `equivalent`:dt:.

There are a number of notations for representing reentrancy in
matrix-style representations of feature structures. In NLTK, we adopt
the following convention: the first occurrence of a shared feature structure 
is prefixed with an integer in parentheses, such as ``(1)``, and any
subsequent reference to that structure uses the notation
``->(1)``, as shown below.

    >>> fs=FeatureStructure.parse("""[NAME='Lee', ADDRESS=(1)[NUMBER=74, STREET='rue Pascal'], 
    ...                               SPOUSE=[NAME='Kim', ADDRESS->(1)]]""")
    >>> print fs
    [ ADDRESS = (1) [ NUMBER = 74           ] ]
    [               [ STREET = 'rue Pascal' ] ]
    [                                         ]
    [ NAME    = 'Lee'                         ]
    [                                         ]
    [ SPOUSE  = [ ADDRESS -> (1)  ]           ]
    [           [ NAME    = 'Kim' ]           ]


|nopar| This is similar to more conventional displays of AVMs, as shown in
reentrant01_.

.. _reentrant01:
.. ex::
      .. avm:: 

	 [ address = (1) [ number = 74           ] ]
	 [               [ street = 'rue Pascal' ] ]
	 [                                         ]
	 [ name    = 'Lee'                         ]
	 [                                         ]
	 [ spouse  = [ address -> (1)  ]           ]
	 [           [ name    = 'Kim' ]           ]

|nopar| The bracketed integer is sometimes called a `tag`:dt: or a
`coindex`:dt:. The choice of integer is not significant.
There can be any number of tags within a single feature structure.

    >>> fs1 = FeatureStructure.parse("[A='a', B=(1)[C='c'], D->(1), E->(1)]")

.. _reentrant02:
.. ex::
      .. avm::
 
	 [ A = 'a'             ]
	 [                     ]
	 [ B = (1) [ C = 'c' ] ]
	 [                     ]
	 [ D -> (1)            ]
	 [ E -> (1)            ]


.. TODO following AVM doesn't currently parse
..
    |nopar| We can also share empty structures:

	>>> fs2 = FeatureStructure.parse("[A=(1)[], B=(2)[], C->(1), D->(2)]")

    .. _reentrant03:
    .. ex::
	  .. avm:: 

	     [ A = (1) [ ] ]
	     [ B = (2) [ ] ]
	     [ C -> (1)    ]
	     [ D -> (2)    ]



Subsumption and Unification
---------------------------

It is standard to think of feature structures as providing `partial
information`:dt: about some object, in the sense that we can order
feature structures according to how general they are. For example,
fs01_ is more general (less specific) than fs02_, which in turn is more general than fs03_.

.. ex::
   .. _fs01:
   .. ex::
      .. avm::

         [number = 74]

   .. _fs02:
   .. ex::
      .. avm::

         [number = 74          ]
         [street = 'rue Pascal']

   .. _fs03:
   .. ex::
      .. avm::

         [number = 74          ]
         [street = 'rue Pascal']
         [city = 'Paris'       ]

|nopar| This ordering is called `subsumption`:dt:; a more general feature
structure `subsumes`:dt: a less general one. If `FS`:math:\
:subscript:`0` subsumes `FS`:math:\ :subscript:`1` (formally, we write
`FS`:math:\ :subscript:`0` |SquareSubsetEqual| `FS`:math:\
:subscript:`1`), then `FS`:math:\ :subscript:`1` must have all the
paths and path equivalences of `FS`:math:\ :subscript:`0`, and may
have additional paths and equivalences as well. Thus, dag04_ subsumes
dag03_, since the latter has additional path equivalences.. It should
be obvious that subsumption only provides a partial ordering on
feature structures, since some feature structures are
incommensurable. For example, fs04_ neither subsumes nor is subsumed
by fs01_.


.. _fs04:
.. ex::
   .. avm::

         [telno = 01 27 86 42 96]

So we have seen that some feature structures are more specific than
others. How do we go about specialising a given feature structure?
For example, we might decide that addresses should
consist of not just a street number and a street name, but also a
city. That is, we might want to *merge*  graph dag042_ with dag041_ to
yield dag043_.

.. ex::
     .. _dag041:
     .. ex::
	.. image:: ../images/dag04-1.png
	   :scale: 40

     .. _dag042:
     .. ex::
	.. image:: ../images/dag04-2.png
	   :scale: 40

     .. _dag043:
     .. ex::
	.. image:: ../images/dag04-3.png
	   :scale: 40

|nopar| Merging information from two feature structures is called
`unification`:dt: and in NLTK is supported by the ``unify()`` method
defined in the ``FeatureStructure`` class.

    >>> fs1 = FeatureStructure(NUMBER=74, STREET='rue Pascal')
    >>> fs2 = FeatureStructure(CITY='Paris')
    >>> print fs1.unify(fs2)
    [ CITY   = 'Paris'      ]
    [ NUMBER = 74           ]
    [ STREET = 'rue Pascal' ]

Unification is formally defined as a binary operation: `FS`:math:\
:subscript:`0` |SquareIntersection| `FS`:math:\
:subscript:`1`. Unification is symmetric, so 

.. ex::
    `FS`:math:\ :subscript:`0` |SquareIntersection| `FS`:math:\
    :subscript:`1` = `FS`:math:\ :subscript:`1` |SquareIntersection|
    `FS`:math:\ :subscript:`0`.

|nopar| The same is true in NLTK:

    >>> print fs2.unify(fs1)
    [ CITY   = 'Paris'      ]
    [ NUMBER = 74           ]
    [ STREET = 'rue Pascal' ]

.. TODO: also mention commutativity

.. but >>> fs1.unify(fs2) is fs2.unify(fs1)
       False
   only works with repr()

If we unify two feature structures which stand in the subsumption
relationship, then the result of unification is the most specific of
the two:

.. ex::
    If `FS`:math:\ :subscript:`0` |SquareSubsetEqual| `FS`:math:\
    :subscript:`1`,  then `FS`:math:\ :subscript:`0`
    |SquareIntersection| `FS`:math:\ :subscript:`1` = `FS`:math:\
    :subscript:`1` 

|nopar| For example, the result of unifying fs02_ with fs03_ is fs03_.

Unification between `FS`:math:\ :subscript:`0` and `FS`:math:\
:subscript:`1` will fail if the two feature structures share a path |pi|,
but the value of |pi| in `FS`:math:\ :subscript:`0` is a distinct
atom from the value of |pi| in `FS`:math:\ :subscript:`1`. In NLTK,
this is implemented by setting the result of unification to be
``None``.

    >>> fs0 = FeatureStructure(A='a')
    >>> fs1 = FeatureStructure(A='b')
    >>> fs2 = fs0.unify(fs1)
    >>> print fs2
    None

Now, if we look at how unification interacts with structure-sharing,
things become really interesting. First, let's define the |NLTK| version
of dag04_.

    >>> fs0=FeatureStructure.parse("""[NAME=Lee, 
    ...                                ADDRESS=[NUMBER=74, 
    ...                                         STREET='rue Pascal'], 
    ...                                SPOUSE= [NAME=Kim,
    ...                                         ADDRESS=[NUMBER=74, 
    ...                                                  STREET='rue Pascal']]]""")

.. _unification01:
.. ex::
      .. avm:: 

	 [ address = [ number = 74           ]               ]
	 [           [ street = `rue Pascal' ]               ]
	 [                                                   ]
	 [ name    = `Lee'                                   ]
	 [                                                   ]
	 [           [ address = [ number = 74           ] ] ]
	 [ spouse  = [           [ street = `rue Pascal' ] ] ]
	 [           [                                     ] ]
	 [           [ name    = `Kim'                     ] ]

|nopar| What happens when we augment Kim's address with a specification
for `city`:feat:? (Notice that ``fs1`` includes the whole path from the root of
the feature structure down to `city`:feat:.)

    >>> fs1=FeatureStructure.parse("[SPOUSE = [ADDRESS = [CITY = Paris]]]")

unification02_ shows the result of unifying ``fs0`` with ``fs1``:

.. _unification02:
.. ex::
      .. avm:: 

	 [ address = [ number = 74           ]               ]
	 [           [ street = `rue Pascal' ]               ]
	 [                                                   ]
	 [ name    = `Lee'                                   ]
	 [                                                   ]
	 [           [           [ city   = `Paris'      ] ] ]
	 [           [ address = [ number = 74           ] ] ]
	 [ spouse  = [           [ street = `rue Pascal' ] ] ]
	 [           [                                     ] ]
	 [           [ name    = `Kim'                     ] ]

|nopar| By contrast, the result is very different if ``fs1`` is unified with
the structure-sharing version ``fs2`` (also shown as dag03_):

    >>> fs2=FeatureStructure.parse("""[NAME=Lee, ADDRESS=(1)[NUMBER=74, STREET='rue Pascal'],
    ...                                SPOUSE=[NAME=Kim, ADDRESS->(1)]]""")

.. _unification03:
.. ex::
      .. avm:: 

	 [               [ city   = `Paris'      ] ]
	 [ address = (1) [ number = 74           ] ]
	 [               [ street = `rue Pascal' ] ]
	 [                                         ]
	 [ name    = `Lee'                         ]
	 [                                         ]
	 [ spouse  = [ address -> (1)  ]           ]
	 [           [ name    = `Kim' ]           ]

|nopar| Rather than just updating what was in effect Kim's "copy" of Lee's address,
we have now updated `both`:em: their addresses at the same time. More
generally, if a unification involves specialising the value of some
path |pi|, then that unification simultaneously specialises the value
of `any path that is equivalent to`:em: |pi|.

As we have already seen, structure sharing can also be stated in |NLTK|
using variables such as ``?x``. 

    >>> fs1=FeatureStructure.parse("[ADDRESS1=[NUMBER=74, STREET='rue Pascal']]")
    >>> fs2=FeatureStructure.parse("[ADDRESS1=?x, ADDRESS2=?x]")
    >>> print fs2
    [ ADDRESS1 = ?x ]
    [ ADDRESS2 = ?x ]
    >>> print fs2.unify(fs1)
    [ ADDRESS1 = (1) [ NUMBER = 74           ] ]
    [                [ STREET = 'rue Pascal' ] ]
    [                                          ]
    [ ADDRESS2 -> (1)                          ]



Exercises
---------

#. |easy| Write a function `subsumes()` which holds of two feature
   structures ``fs1`` and ``fs2`` just in case ``fs1`` subsumes ``fs2``.

#. |soso| Consider the feature structures shown in Listing featstructures_.
   
   .. pylisting:: featstructures

     fs1 = FeatureStructure.parse("[A = (1)b, B= [C ->(1)]]")
     fs2 = FeatureStructure.parse("[B = [D = d]]")
     fs3 = FeatureStructure.parse("[B = [C = d]]")
     fs4 = FeatureStructure.parse("[A = (1)[B = b], C->(1)]")
     fs5 = FeatureStructure.parse("[A = [D = (1)e], C = [E -> (1)] ]")
     fs6 = FeatureStructure.parse("[A = [D = (1)e], C = [B -> (1)] ]")
     fs7 = FeatureStructure.parse("[A = [D = (1)e, F = (2)[]], C = [B -> (1), E -> (2)] ]")
     fs8 = FeatureStructure.parse("[A = [B = b], C = [E = [G = e]]]")
     fs9 = FeatureStructure.parse("[A = (1)[B = b], C -> (1)]")

   Work out on paper what the result is of the following
   unifications. (Hint: you might find it useful to draw the graph structures.)

   * ``fs1`` and ``fs2``
   * ``fs1`` and ``fs3``
   * ``fs4`` and ``fs5``
   * ``fs5`` and ``fs6``
   * ``fs7`` and ``fs8``
   * ``fs7`` and ``fs9``

   Check your answers on the computer.


#. |soso| List two feature structures which subsume [A=?x, B=?x].

#. |soso| Ignoring structure sharing, give an informal algorithm for unifying
   two feature structures. 



---------------------------------
Extending a Feature-Based Grammar
---------------------------------

.. _sec-subcat:

Subcategorization
-----------------

In Chapter chap-parse_, we proposed to augment our
category labels in order to represent different subcategories of
verb. More specifically,
we introduced labels such as `iv`:gc: and `tv`:gc: for intransitive
and transitive verbs respectively.  This allowed us to write rules
like the following:

.. _subcatcfg0:
.. ex::
   .. parsed-literal::

      `vp`:gc: |rarr| `iv`:gc: 
      `vp`:gc: |rarr| `tv np`:gc: 

|nopar| Although it is tempting to think of `iv`:gc: and `tv`:gc: as two
kinds of `v`:gc:, this is unjustified: from a formal point of view,
`iv`:gc: has no closer relationship with `tv`:gc: than it does,
say, with `np`:gc:. As it stands, `iv`:gc: and `tv`:gc: are
unanalyzable nonterminal symbols from a CFG. One unwelcome consequence
is that we do not seem able to say anything about the class of verbs
in general. For example, we cannot say something like "All lexical
items of category `v`:gc: can be marked for tense", since `bark`:lx:,
say, is an item of category `iv`:gc:, not `v`:gc:.

Using features gives us some useful room for manoeuvre but there is no
obvious consensus on how to model subcategorization information. One
approach which has the merit of simplicity is due to Generalized
Phrase Structure Grammar (GPSG). GPSG stipulates that lexical
categories may bear a `subcat`:feat: whose values are integers. This
is illustrated in a modified portion of feat0cfg_, shown in subcatgpsg_.

.. _subcatgpsg:
.. ex::
   ::

     VP[TENSE=?t, NUM=?n] -> V[SUBCAT=0, TENSE=?t, NUM=?n]
     VP[TENSE=?t, NUM=?n] -> V[SUBCAT=1, TENSE=?t, NUM=?n] NP
     VP[TENSE=?t, NUM=?n] -> V[SUBCAT=2, TENSE=?t, NUM=?n] Sbar

     V[SUBCAT=0, TENSE=pres, NUM=sg] -> 'disappears' | 'walks'
     V[SUBCAT=1, TENSE=pres, NUM=sg] -> 'sees' | 'likes'
     V[SUBCAT=2, TENSE=pres, NUM=sg] -> 'says' | 'claims'

     V[SUBCAT=0, TENSE=pres, NUM=pl] -> 'disappear' | 'walk'
     V[SUBCAT=1, TENSE=pres, NUM=pl] -> 'see' | 'like'
     V[SUBCAT=2, TENSE=pres, NUM=pl] -> 'say' | 'claim'

     V[SUBCAT=0, TENSE=past, NUM=?n] -> 'disappeared' | 'walked'
     V[SUBCAT=1, TENSE=past, NUM=?n] -> 'saw' | 'liked'
     V[SUBCAT=2, TENSE=past, NUM=?n] -> 'said' | 'claimed'

|nopar| When we see a lexical category like `v`:gc:\ [`subcat`:feat: 
`1`:fval:\ ], we can interpret the `subcat`:feat: specification as a
pointer to the rule in which `v`:gc:\ [`subcat`:feat: `1`:fval:\ ]
is introduced as the head daughter in a `vp`:gc: expansion rule. By
convention, there is a one-to-one correspondence between
`subcat`:feat: values and rules which introduce lexical heads. It's
worth noting that the choice of integer which acts as a value for
`subcat`:feat: is completely arbitrary |mdash| we could equally well
have chosen 3999, 113 and 57 as our two values in subcatgpsg_.  On this
approach, `subcat`:feat: can *only* appear on lexical categories; it
makes no sense, for example, to specify a `subcat`:feat: value on
`vp`:gc:.

In our third class of verbs above, we have specified a category
`s-bar`:gc:. This is a label for subordinate clauses such as the
complement of `claim`:lx: in the example `You claim that you like
children`:lx:. We require two further rules to analyse such sentences:

.. _sbar:
.. ex::
   ::

     S-BAR -> Comp S
     Comp -> 'that'

|nopar| The resulting structure is the following.

.. _sbartree:
.. ex::
      .. tree::  (S (NP you)(VP (V[-AUX,\ SUBCAT\ 2] claim)(S-BAR (Comp that) (S (NP you)(VP (V[-AUX,\ SUBCAT\ 1] like)(NP children))))))

An alternative treatment of subcategorization, due originally to a framework
known as categorial grammar, is represented in feature-based frameworks such as PATR
and Head-driven Phrase Structure Grammar. Rather than using
`subcat`:feat: values as a way of indexing rules, the `subcat`:feat:
value directly encodes the valency of a head (the list of
arguments that it can combine with). For example, a verb like
`put`:lx: which takes  `np`:gc: and `pp`:gc: complements (`put the
book on the table`:lx:) 
might be represented as subcathpsg0_:

.. _subcathpsg0:
.. ex::  `v`:gc:\ [`subcat`:feat: |langle|\ `np`:gc:, `np`:gc:, `pp`:gc:\ |rangle| ] 

|nopar| This says that the verb can combine with three  arguments. The
leftmost element in the list is the subject `np`:gc:, while everything
else |mdash| an `np`:gc: followed by a `pp`:gc: in this case |mdash| comprises the
subcategorized-for complements. When a verb like `put`:lx: is combined
with appropriate complements, the requirements which are specified in
the  `subcat`:feat: are discharged, and only a subject `np`:gc: is
needed. This category, which corresponds to what is traditionally
thought of as `vp`:gc:, might be represented as follows.

.. _subcathpsg1:
.. ex::  `v`:gc:\ [`subcat`:feat: |langle|\ `np`:gc:\ |rangle| ] 

Finally, a sentence is a kind of verbal category which has *no*
requirements for further arguments, and hence has a `subcat`:feat:
whose value is the empty list. The tree subcathpsg2_ shows how these
category assigments combine in a parse of `Kim put the book on the table`:lx:.

.. _subcathpsg2:
.. ex::
      .. tree:: (V[SUBCAT\ \<\>] (NP Kim)(V[SUBCAT\ \<NP\>](V[SUBCAT\ \<NP,\ NP,\ VP\>] put)<NP the\ book><PP on\ the\ table>))

Heads Revisited
---------------

We noted in the previous section that by factoring subcategorization
information out of the main category label, we could express more
generalizations about properties of verbs. Another property of this
kind is the following: expressions of category `v`:gc: are heads of
phrases of category `vp`:gc:. Similarly (and more informally) `n`:gc:\
s are heads of `np`:gc:\ s,  `a`:gc:\
s (i.e., adjectives) are heads of `ap`:gc:\ s,  and `p`:gc:\
s (i.e., adjectives) are heads of `pp`:gc:\ s. Not all phrases have
heads |mdash| for example, it is standard to say that coordinate
phrases (e.g., `the book and the bell`:lx:) lack heads |mdash|
nevertheless, we would like our grammar formalism to express the
mother / head-daughter
relation where it holds. Now, although it looks as though there is
something in common  between, say, `v`:gc: and `vp`:gc:, this is more
of a handy convention than a real claim, since  `v`:gc: and `vp`:gc:
formally have no more in common than `v`:gc: and `Det`:gc:. 

X-bar syntax (cf. [Chomsky1970RN]_, [Jackendoff1977XS]_) addresses
this issue by abstracting out the notion of `phrasal level`:dt:. It is
usual to recognise three such levels. If `n`:gc: represents the
lexical level, then `n`:gc:\ ' represents the next level up,
corresponding to the more traditional category `Nom`:gc:, while
`n`:gc:\ '' represents the phrasal level, corresponding to the
category `np`:gc:. (The primes here replace the typographically more
demanding horizontal bars of [Chomsky1970RN]_). xbar0_ illustrates a
representative structure.

.. _xbar0:
.. ex::
   .. tree:: (N''(Det a)(N'(N student)(P'' of\ French)))

|nopar| The head of the structure xbar0_ is `n`:gc: while `n`:gc:\ '
and `n`:gc:\ '' are called `(phrasal) projections`:dt: of `n`:gc:. `n`:gc:\ ''
is the `maximal projection`:dt:, and `n`:gc: is sometimes called the
`zero projection`:dt:. One of the central claims of X-bar syntax is
that all constituents share a structural similarity. Using `x`:gc: as
a variable over `n`:gc:, `v`:gc:, `a`:gc: and `p`:gc:, we say that
directly subcategorized `complements`:em: of the head are always
placed as sisters of the lexical head, whereas `adjuncts`:em: are
placed as sisters of the intermediate category, `x`:gc:\ '. Thus, the
configuration of the `p`:gc:\ '' adjunct in xbar1_ contrasts with that
of the complement `p`:gc:\ '' in xbar0_.

.. _xbar1:
.. ex::
   .. tree:: (N''(Det a)(N'(N'(N student))(P'' from\ france)))

The productions in xbar2_ illustrate how bar levels can be encoded
using feature structures.

.. _xbar2:
.. ex::
   .. parsed-literal::

     `s`:gc: |rarr| `n`:gc:\ [`bar`:feat: 2] `v`:gc:\ [`bar`:feat: 2]
     `n`:gc:\ [`bar`:feat: 2] |rarr| `Det n`:gc:\ [`bar`:feat: 1]
     `n`:gc:\ [`bar`:feat: 1] |rarr| `n`:gc:\ [`bar`:feat: 1] `p`:gc:\ [`bar`:feat: 2] 
     `n`:gc:\ [`bar`:feat: 1] |rarr| `n`:gc:\ [`bar`:feat: 0] `p`:gc:\ [`bar`:feat: 2] 


Auxiliary verbs and Inversion
-----------------------------

Inverted clauses |mdash| where the order of subject and verb is
switched |mdash| occur in English interrogatives and also after
'negative' adverbs:

.. _inv1:
.. ex::
   .. _inv1a:
   .. ex::

      Do you like children?

   .. _inv1b:
   .. ex::

      Can Jody walk?

.. _inv2:
.. ex::
   .. _inv2a:
   .. ex::

      Rarely do you see Kim.

   .. _inv2b:
   .. ex::

      Never have I seen this dog.

|nopar| However, we cannot place just any verb in pre-subject position:

.. _inv3:
.. ex::
   .. _inv3a:
   .. ex::

      \*Like you children?

   .. _inv3b:
   .. ex::

      \*Walks Jody?

.. _inv4:
.. ex::
   .. _inv4a:
   .. ex::

      \*Rarely see you Kim.

   .. _inv4b:
   .. ex::

      \*Never saw I this dog.

Verbs which can be positioned inititally in inverted clauses belong to
the class known as `auxiliaries`:dt:, and as well as  `do`:lx:,
`can`:lx: and `have`:lx:  include `be`:lx:, `will`:lx:  and
`shall`:lx:. One way of capturing such structures is with the
following rule:

.. _sinv:
.. ex::
   ::

     S[+inv] -> V[+AUX] NP VP

|nopar| That is, a clause marked as [`+inv`:feat:] consists of an auxiliary
verb followed by a `vp`:gc:. (In a more detailed grammar, we would
need to place some constraints on the form of the `vp`:gc:, depending
on the choice of auxiliary.) invtree_ illustrates the structure of an
inverted clause.

.. _invtree:
.. ex::
      .. tree:: (S[+INV](V[+AUX,\ SUBCAT\ 3] do)(NP you)(VP(V[-AUX,\ SUBCAT\ 1] like)(NP children)))



Unbounded Dependency Constructions
----------------------------------

Consider the following contrasts: 

.. _gap1:
.. ex::
   .. _gap1a:
   .. ex::

      You like Jody.

   .. _gap1b:
   .. ex::

      \*You like.

.. _gap2:
.. ex::
   .. _gap2a:
   .. ex::

      You put the card into the slot.

   .. _gap2b:
   .. ex::

      \*You put into the slot.

   .. _gap2c:
   .. ex::

      \*You put the card.

   .. _gap2d:
   .. ex::

      \*You put.

The verb `like`:lx: requires an `np`:gc: complement, while
`put`:lx: requires both a following `np`:gc: and `pp`:gc:. Examples
gap1_ and gap2_ show that these complements are *obligatory*:
omitting them leads to ungrammaticality. Yet there are contexts in
which obligatory complements can be omitted, as gap3_ and gap4_
illustrate.

.. _gap3:
.. ex::
   .. _gap3a:
   .. ex::

      Kim knows who you like.

   .. _gap3b:
   .. ex::

      This music, you really like.

.. _gap4:
.. ex::
   .. _gap4a:
   .. ex::

      Which card do you put into the slot?

   .. _gap4b:
   .. ex::

      Which slot do you put the card into?

|nopar| That is, an obligatory complement can be omitted if there is an
appropriate `filler`:dt: in the sentence, such as the question word
`who`:lx: in gap3a_, the preposed topic `this music`:lx: in gap3b_, or
the `wh`:lx: phrases `which card/slot`:lx: in gap4_. It is common to
say that sentences like gap3_ |ndash| gap4_ contain `gaps`:dt: where
the obligatory complements have been omitted, and these gaps are
sometimes made explicit using an underscore:

.. _gap5:
.. ex::
   .. _gap5a:
   .. ex::

      Which card do you put __ into the slot?

   .. _gap5b:
   .. ex::

      Which slot do you put the card into __?

|nopar| So, a gap can occur if it is `licensed`:dt: by a filler. Conversely,
fillers can only occur if there is an appropriate gap elsewhere  in
the sentence, as shown by the following examples.

.. _gap6:
.. ex::
   .. _gap6a:
   .. ex::

      \*Kim knows who you like Jody.

   .. _gap6b:
   .. ex::

      \*This music, you really like hip-hop.

.. _gap7:
.. ex::
   .. _gap7a:
   .. ex::

      \*Which card do you put this into the slot?

   .. _gap7b:
   .. ex::

      \*Which slot do you put the card into this one?

The mutual co-occurence between filler and gap leads to gap3_ |ndash|
gap4_ is sometimes termed a "dependency". One issue of considerable
importance in theoretical linguistics has been the nature of the
material that can intervene between a filler and the gap that it
licenses; in particular, can we simply list a finite set of strings
that separate the two? The answer is No: there is no upper bound on
the distance between filler and gap. This fact can be easily
illustrated with constructions involving sentential complements, as
shown in gap8_. 

.. _gap8:
.. ex::
   .. _gap8a:
   .. ex::

      Who do you like __?

   .. _gap8b:
   .. ex::

      Who do you claim that you like __?

   .. _gap8c:
   .. ex::

      Who do you claim that Jody says that you like __?

|nopar| Since we can have indefinitely deep recursion of sentential
complements, the gap can be embedded indefinitely far inside the whole
sentence. This constellation of properties leads to the notion of an
`unbounded dependency construction`:dt:; that is, a filler-gap
dependency where there is no upper bound on the distance between
filler and gap.

A variety of mechanisms have been suggested for handling unbounded
dependencies in formal grammars; we shall adopt an approach due to
Generalized Phrase Structure Grammar that involves something called
`slash categories`:dt:. A slash category is something of the form
`y/xp`:gc:; we interpret this as a phrase of category `y`:gc: which
is somewhere missing a sub-constituent of category `xp`:gc:. For example,
`s/np`:gc: is an `s`:gc: which is missing an `np`:gc:. The use of
slash categories is illustrated in gaptree1_. 
 
.. _gaptree1:
.. ex::
      .. tree:: (S(NP[+WH] who)(S[+INV]\/NP (V[+AUX,\ SUBCAT\ 3] do)(NP[-WH] you)(VP/NP(V[-AUX,\ SUBCAT\ 1] like)(NP/NP e))))

|nopar| The top part of the tree introduces the filler `who`:lx: (treated as
an expression of category `np`:gc:\ [`+wh`:feat:]) together with a
corresponding gap-containing constituent `s/np`:gc:. The gap information is
then "percolated" down the tree via the `vp/np`:gc: category, until it
reaches the category `np/np`:gc:. At this point, the dependency 
is discharged by realizing the gap information as the empty string `e`
immediately dominated by `np/np`:gc:.

Do we need to think of slash categories as a completely new kind of
object in our grammars?  Fortunately, no, we don't |mdash| in fact, we
can accommodate them within our existing feature-based framework. We
do this by treating slash as a feature, and the category to its right
as a value. In other words, our "official" notation for `s/np`:gc:
will be `s`:gc:\ [`slash`:feat: = `NP`:fval:\ ]. Once we have taken this
step, it is straightforward to write a small grammar in NLTK for
analyzing unbounded dependency constructions. feat1cfg_ illustrates
the main principles of slash categories, and also includes rules for
inverted clauses. To simplify presentation, we have omitted any
specification of tense on the verbs.

.. _feat1cfg:
.. ex::
.. include:: ../../examples/parse/feat1.cfg
   :literal:

feat1cfg_ contains one gap-introduction rule, namely

.. ex::
   .. parsed-literal::

      `s[-inv]`:gc: |rarr| `np`:gc: `s/np`:gc: 

In order to percolate the slash feature correctly, we need to add
slashes with variable values to both sides of the arrow in rules which
expand `s`:gc:, `vp`:gc: and `np`:gc:. For example,

.. ex::
   .. parsed-literal::

      `vp/?x`:gc: |rarr| `v`:gc: `s-bar/?x`:gc: 

|nopar| says that a slash value can be specified on the `vp`:gc: mother of a
constituent if the same value is also specified on the `s-bar`:gc:
daughter. Finally, empty_ allows the slash information on `np`:gc: to
be discharged as the empty string.

..  _empty:
.. ex::
   .. parsed-literal::

      `np/np`:gc: |rarr|

Using feat1cfg_, we can parse the string `who do you claim that you
like`:lx:  into the tree shown in gapparse_.

.. _gapparse:
.. ex::
    .. tree:: (S[-INV](NP[+WH] who)(S[SLASH\ NP,+INV](V[+AUX,SUBCAT\ 3] do)(NP[-WH] you)(VP[SLASH\ NP](V[-AUX,SUBCAT\ 2] claim)(S-BAR[SLASH\ NP](Comp that)(S[SLASH\ NP,-INV](NP[-WH] you)(VP[SLASH\ NP](V[-AUX,SUBCAT\ 1] like)(NP[SLASH\ NP] )))))))


Case and Gender in German
-------------------------

Compared with English, German has a relatively rich morphology for
agreement. For example, the definite article in German varies with
case, gender and number, as shown in Table german-def-art_.

.. table:: german-def-art

    +-----------+-----------+-----------+-----------+------------+
    | **Case**  | **Masc**  | **Fem**   |  **Neut** | **Plural** |
    +-----------+-----------+-----------+-----------+------------+
    |  *Nom*    |  der      |  die      |   das     |   die      |
    +-----------+-----------+-----------+-----------+------------+
    |  *Gen*    |  des      |  der      |   des     |   der      |
    +-----------+-----------+-----------+-----------+------------+
    |  *Dat*    |  dem      |  der      |   dem     |   den      |
    +-----------+-----------+-----------+-----------+------------+
    |  *Acc*    |  den      |  die      |   das     |   die      |
    +-----------+-----------+-----------+-----------+------------+

    Morphological Paradigm for the German definite Article

|nopar| Subjects in German take the nominative case, and most verbs
govern their objects in the accusative case. However, there are
exceptions like `helfen`:lx: which govern the dative case.

.. ex::
      .. gloss::
         Die                          | katze                    | sieht                | dem                          | hund            
         the.NOM.FEM.SG               | cat.3.FEM.SG             | see.3.SG             | the.ACC.MASC.SG              | dog.3.MASC.SG   
         'the cat sees the dog'

      .. gloss::
         \*Die                        | katze                    | sieht                |  den                         | hund
         the.NOM.FEM.SG               | cat.3.FEM.SG             | see.3.SG             |  the.DAT.MASC.SG             | dog.3.MASC.SG

.. ex:: 
     .. gloss::
         Die                         | katze                    | hilft                 | den                          | hund            
         the.NOM.FEM.SG              | cat.3.FEM.SG             | help.3.SG             | the.DAT.MASC.SG              | dog.3.MASC.SG   
         'the cat helps the dog'

     .. gloss::
         \*Die                       | katze                    | hilft                 | dem                          | hund
         the.NOM.FEM.SG              | cat.3.FEM.SG             | help.3.SG             | the.ACC.MASC.SG              | dog.3.MASC.SG


The grammar german1cfg_ illustrates the interaction of agreement
(comprising person, number and gender) with case. 

.. _german1cfg:
.. ex::
.. include:: ../../examples/parse/german1.cfg
   :literal:

|nopar| As you will see, the feature `objcase`:feat: is used to
 specify the case which the verb governs on its object.

Exercises
---------

#. |easy| Modify the grammar illustrated in  subcatgpsg_ to
   incorporate a `bar`:feat: feature for dealing with phrasal projections.

#. |easy| Modify the German grammar in german1cfg_ to incorporate the
   treatment of subcategorization presented in sec-subcat_. 

#. |soso| Extend the German grammar in german1cfg_ so that it can
   handle so-called verb-second structures like the following:

   .. ex:: Heute sieht der hund die katze.

#. |hard| Morphological paradigms are rarely completely regular, in
   the sense of every cell in the matrix having a different
   realisation. For example, the present tense conjugation of the
   lexeme `walk`:lex: only has two distinct forms: `walks`:lx: for the
   3rd person singular, and `walk`:lx: for all other combinations of
   person and number. A successful analysis should not require
   redundantly specifying that 5 out of the 6 possible morphological
   combinations have the same realization.  Propose and implement a
   method for dealing with this.
 
#. |hard| So-called `head features`:dt: are shared between the mother
   and head daughter. For example, `tense`:feat: is a head feature
   that is shared between a `vp`:gc: and its head `v`:gc:
   daughter. See [Gazdar1985GPS]_ for more details. Most of the
   features we have looked at are head features |mdash| exceptions are
   `subcat`:feat: and `slash`:feat:. Since the sharing of head
   features is predictable, it should not need to be stated explicitly
   in the grammar rules. Develop an approach which automatically
   accounts for this regular behaviour of head features.  


-------
Summary
-------

* The traditional categories of context-free grammar are atomic
  symbols. An important motivation feature structures is to capture
  fine-grained distinctions which would otherwise require a masssive
  multiplication of atomic categories.

* By using variables over feature values, we can express constraints
  in grammar rules which allow the realization of different feature
  specifications to be inter-dependent.

* Typically we specify fixed values of features at the lexical level
  and constrain the values of features in phrases to unify with the
  corresponding values in their daughters. 

* Feature values are either atomic or complex. A particular subcase of
  atomic value is the Boolean value, represented by convention as [+/-
  `f`:feat:]. 

* Two features can share a value (either atomic or
  complex). Structures with shared values are said to be
  re-entrant. Shared values are represented by numerical indices (or
  tags) in AVMs.

* A path in a feature structure is a tuple of features
  corresponding to the labels on  a sequence of arcs from the root of the graph
  representation.

* Two paths are equivalent if they share a value.

* Feature structures are partially ordered by subsumption.
  `FS`:math:\ :subscript:`0` subsumes `FS`:math:\ :subscript:`1` when
  `FS`:math:\ :subscript:`0` is more general (less informative) than
  `FS`:math:\ :subscript:`1`. 

* The unification of two structures `FS`:math:\ :subscript:`0` and
  `FS`:math:\ :subscript:`1`, if successful, is the feature
  structure `FS`:math:\ :subscript:`2` which contains the combined
  information of both `FS`:math:\ :subscript:`0` and `FS`:math:\
  :subscript:`1`.

* If unification specialises a path |pi| in `FS`:math:, then it also
  specialises every path |pi|\ ' equivalent to |pi|.

* We can use feature structures to build succinct analyses of a wide
  variety of linguistic phenomena, including verb subcategorization,
  inversion constructions, unbounded dependency constructions and case government.

-----------------
 Further Reading
-----------------

The earliest use of features in theoretical linguistics was designed
to capture phonological properties of phonemes. For example, a sound
like /**b**/ might be decomposed into the structure [`+labial`:feat:,
`+voice`:feat:]. An important motivation was to capture
generalizations across classes of segments; for example, that /**n**/ gets
realized as /**m**/ preceding any `+labial`:feat: consonant.
Within Chomskyan grammar, it was standard to use atomic features for
phenomena like agreement, and also to capture generalizations across
syntactic categories, by analogy with phonology.
A radical expansion of the use of features in theortical syntax was
advocated by Generalized Phrase Structure Grammar (GPSG;
[Gazdar1985GPS]_), particularly in the use of features with complex values.

Coming more from the perspective of computational linguistics,
[Kay1984UG]_ proposed that functional aspects of language could be
captured by unification of attribute-value structures, and a similar
approach was elaborated by [Shieber1983FIP]_ within the PATR-II
formalism. Early work in Lexical-Functional grammar (LFG;
[Kaplan1982LFG]_) introduced the notion of an `f-structure`:dt: which
was primarily intended to represent the grammatical relations and
predicate-argument structure associated with a constituent structure
parse.  [Shieber1986IUB]_ provides an excellent introduction to this
phase of research into feature-based grammars.

One conceptual difficulty with algebraic approaches to feature
structures arose when researchers attempted to model negation. An
alternative perspective, pioneered by [Kasper1986LSF]_ and
[Johnson1988AVL]_, argues that grammars involve `descriptions`:em: of
feature structures rather than the structures themselves. These
descriptions are combined using logical operations such as
conjunction, and negation is just the usual logical operation over
feature descriptions. This description-oriented perspective was
integral to LFG from the outset (cf. [Kaplan1989FAL]_, and was also adopted by later
versions of Head-Driven Phrase Structure Grammar (HPSG; [Sag1999ST]_).

Feature structures, as presented in this chapter, are unable to
capture important constraints on linguistic information. For example,
there is no way of saying that the only permissible values for
`num`:feat: are `sg`:fval: and `pl`:fval:, while a specifcation such
as [`num`:feat: `masc`:fval:] is anomalous. Similarly, we cannot say
that the complex value of `agr`:feat: `must`:em: contain
specifications for the features `per`:feat:, `num`:feat: and
`gnd`:feat:, but `cannot`:em: contain a specification such as
[`subcat`:feat: 3].  `Typed feature structures`:dt: were developed to
remedy this deficiency. To begin with, we stipulate that feature
values are always typed. In the case of atomic values, the values just
are types. For example, we would say that the value of `num`:feat: is
the type *num*. Moreover, *num* is the most general type of value for
`num`:feat:. Since types are organized hierarchically, we can be more
informative by specifying the value of `num`:feat: is a `subtype`:dt:
of *num*, namely either *sg* or *pl*.

In the case of complex values, we say that feature structures are
themselves typed. So for example the value of `agr`:feat: will be a
feature structure of type *agr*. We also stipulate that all and only
`per`:feat:, `num`:feat: and `gnd`:feat: are `appropriate`:dt: features for
a structure of type *agr*.  A good early review of work on typed
feature structures is [Emele1990TUG]_. A more comprehensive examination of
the formal foundations can be found in [Carpenter1992LTF]_, while
[Copestake2002ITF]_ focusses on implementing an HPSG-oriented approach
to typed feature structures.

There is a copious literature on the analysis of German within
feature-based grammar frameworks. [Nerbonne1994GHD]_ is a good
starting point for the HPSG literature on this topic, while
[Mueller1999DSK]_ gives a very extensive and detailed analysis of
German syntax in HPSG.

.. include:: footer.txt
