| BostonHousing {mlbench} | R Documentation |
Boston Housing Data
Description
Housing data for 506 census tracts of Boston from the 1970
census. The data frame BostonHousing contains the original
data by Harrison and Rubinfeld (1978), the dataf rame
BostonHousing2 the corrected version with additional spatial
information (see references below).
Usage
data("BostonHousing", package = "mlbench")
data("BostonHousing2", package = "mlbench")
Format
The original data are 506 observations on 14 variables,
medv being the target variable:
| crim | per capita crime rate by town |
| zn | proportion of residential land zoned for lots over 25,000 sq.ft |
| indus | proportion of non-retail business acres per town |
| chas | Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) |
| nox | nitric oxides concentration (parts per 10 million) |
| rm | average number of rooms per dwelling |
| age | proportion of owner-occupied units built prior to 1940 |
| dis | weighted distances to five Boston employment centres |
| rad | index of accessibility to radial highways |
| tax | full-value property-tax rate per USD 10,000 |
| ptratio | pupil-teacher ratio by town |
| b | 1000(B - 0.63)^2 where B is the proportion of blacks by town |
| lstat | percentage of lower status of the population |
| medv | median value of owner-occupied homes in USD 1000's |
The corrected data set has the following additional columns:
| cmedv | corrected median value of owner-occupied homes in USD 1000's |
| town | name of town |
| tract | census tract |
| lon | longitude of census tract |
| lat | latitude of census tract |
Source
The original data were taken from the UCI Repository Of Machine Learning Databases (Blake and Merz 1998) and no longer seem to be available from the UC Irvine Machine Learning Repository (now at https://archive.ics.uci.edu/). The corrected data were taken from Statlib at https://lib.stat.cmu.edu/datasets/. See Statlib and references there for details on the corrections. Both were converted to R format by Friedrich Leisch.
References
Blake CL, Merz CJ (1998). “UCI Repository of Machine Learning Databases.” University of California, Irvine, Department of Information and Computer Science. Formerly available from ‘http://www.ics.uci.edu/~mlearn/MLRepository.html’. Gilley OW, Pace RK (1996). “On the Harrison and Rubinfeld Data.” Journal of Environmental Economics and Management, 31(3), 403–405. ISSN 0095-0696. doi:10.1006/jeem.1996.0052. [Provided corrections and examined censoring.]. Harrison D, Rubinfeld DL (1978). “Hedonic Housing Prices and the Demand for Clean Air.” Journal of Environmental Economics and Management, 5(1), 81–102. ISSN 0095-0696. doi:10.1016/0095-0696(78)90006-2. Pace RK, Gilley OW (1997). “Using the Spatial Configuration of the Data to Improve Estimation.” The Journal of Real Estate Finance and Economics, 14, 333–340. doi:10.1023/A:1007762613901. [Added georeferencing and spatial estimation.].
Examples
data("BostonHousing", package = "mlbench")
summary(BostonHousing)
data("BostonHousing2", package = "mlbench")
summary(BostonHousing2)