.. -*- mode: rst -*-
.. include:: ../definitions.txt

=====================================
Appendix: Python and NLTK Cheat Sheet
=====================================

------
Python
------

Strings
-------

    >>> x = 'Python'; y = 'NLTK'; z = 'Natural Language Processing'
    >>> x + '/' + y
    'Python/NLTK'
    >>> 'LT' in y
    True
    >>> x[2:]
    'thon'
    >>> x[::-1]
    'nohtyP'
    >>> len(x)
    6
    >>> z.count('a')
    4
    >>> z.endswith('ing')
    True
    >>> z.index('Language')
    8    
    >>> '; '.join([x,y,z])
    'Python; NLTK; Natural Language Processing'
    >>> y.lower()
    'nltk'
    >>> z.replace(' ', '\n')
    'Natural\nLanguage\nProcessing'
    >>> print z.replace(' ', '\n')
    Natural
    Language
    Processing
    >>> z.split()
    ['Natural', 'Language', 'Processing']

For more information, type `help(str)` at the Python prompt.

Lists
-----

    >>> x = ['Natural', 'Language']; y = ['Processing']
    >>> x[0]
    'Natural'
    >>> list(x[0])
    ['N', 'a', 't', 'u', 'r', 'a', 'l']
    >>> x + y
    ['Natural', 'Language', 'Processing']
    >>> 'Language' in x
    True
    >>> len(x)
    2
    >>> x.index('Language')
    1

The following functions modify the list in-place:

    >>> x.append('Toolkit')
    >>> x
    ['Natural', 'Language', 'Toolkit']
    >>> x.insert(0, 'Python')
    >>> x
    ['Python', 'Natural', 'Language', 'Toolkit']
    >>> x.reverse()
    >>> x
    ['Toolkit', 'Language', 'Natural', 'Python']
    >>> x.sort()
    >>> x
    ['Language', 'Natural', 'Python', 'Toolkit']
    
For more information, type `help(list)` at the Python prompt.

Dictionaries
------------

    >>> d = {'natural': 'adj', 'language': 'noun'}
    >>> d['natural']
    'adj'
    >>> d['toolkit'] = 'noun'
    >>> d
    {'natural': 'adj', 'toolkit': 'noun', 'language': 'noun'}
    >>> 'language' in d
    True
    >>> d.items()
    [('natural', 'adj'), ('toolkit', 'noun'), ('language', 'noun')]
    >>> d.keys()
    ['natural', 'toolkit', 'language']
    >>> d.values()
    ['adj', 'noun', 'noun']

For more information, type `help(dict)` at the Python prompt.

Regular Expressions
-------------------

.. note:: to be written

----
NLTK
----

Tokenization
------------

    >>> text = '''NLTK, the Natural Language Toolkit, is a suite of program
    ... modules, data sets and tutorials supporting research and teaching in
    ... computational linguistics and natural language processing.'''
    >>> from nltk_lite import tokenize
    >>> list(tokenize.line(text))
    ['NLTK, the Natural Language Toolkit, is a suite of program', 'modules,
    data sets and tutorials supporting research and teaching in', 'computational
    linguistics and natural language processing.']
    >>> list(tokenize.whitespace(text))
    ['NLTK,', 'the', 'Natural', 'Language', 'Toolkit,', 'is', 'a', 'suite',
     'of', 'program', 'modules,', 'data', 'sets', 'and', 'tutorials',
     'supporting', 'research', 'and', 'teaching', 'in', 'computational',
     'linguistics', 'and', 'natural', 'language', 'processing.']
    >>> list(tokenize.wordpunct(text))
    ['NLTK', ',', 'the', 'Natural', 'Language', 'Toolkit', ',', 'is', 'a',
     'suite', 'of', 'program', 'modules', ',', 'data', 'sets', 'and',
     'tutorials', 'supporting', 'research', 'and', 'teaching', 'in',
     'computational', 'linguistics', 'and', 'natural', 'language',
     'processing', '.']
    >>> list(tokenize.regexp(text, ', ', gaps=True))
    ['NLTK', 'the Natural Language Toolkit', 'is a suite of program\nmodules',
     'data sets and tutorials supporting research and teaching in\ncomputational
     linguistics and natural language processing.']

Stemming
--------

    >>> tokens = list(tokenize.wordpunct(text))
    >>> from nltk_lite import stem
    >>> stemmer = stem.Regexp('ing$|s$|e$')
    >>> for token in tokens:
    ...     print stemmer.stem(token),
    NLTK , th Natural Languag Toolkit , i a suit of program module ,
    data set and tutorial support research and teach in computational
    linguistic and natural languag process .
    >>> stemmer = stem.Porter()
    >>> for token in tokens:
    ...     print stemmer.stem(token),
    NLTK , the Natur Languag Toolkit , is a suit of program modul ,
    data set and tutori support research and teach in comput linguist
    and natur languag process .

Tagging
-------

.. note:: to be written

.. include:: footer.txt
