Web Corpus

This is a collection of diverse, contemporary text genres,
collected by scraping publicly accessible archives of web postings.
This data is disseminated in preference to publishing URLs for
individuals to download and clean up (the usual model for web corpora).

overheard: Overheard in New York (partly censored) http://www.overheardinnewyork.com/ (2006)
wine:  Fine Wine Diary http://www.finewinediary.com/ (2005-06)
pirates: Movie script from Pirates of the Carribean: Dead Man's Chest http://www.imsdb.com/  (2006)
singles: Singles ads  http://search.classifieds.news.com.au
