Soybean {mlbench}R Documentation

Soybean Database

Description

There are 19 classes, only the first 15 of which have been used in prior work. The folklore seems to be that the last four classes are unjustified by the data since they have so few examples. There are 35 categorical attributes, some nominal and some ordered. The value “dna” means does not apply. The values for attributes are encoded numerically, with the first value encoded as “0,” the second as “1,” and so forth.

Usage

data("Soybean", package = "mlbench")

Format

A data frame with 683 observations on 36 variables. There are 35 categorical attributes, all numerical and a nominal denoting the class.

[,1] Class the 19 classes
[,2] date apr(0),may(1),june(2),july(3),aug(4),sept(5),oct(6).
[,3] plant.stand normal(0),lt-normal(1).
[,4] precip lt-norm(0),norm(1),gt-norm(2).
[,5] temp lt-norm(0),norm(1),gt-norm(2).
[,6] hail yes(0),no(1).
[,7] crop.hist dif-lst-yr(0),s-l-y(1),s-l-2-y(2), s-l-7-y(3).
[,8] area.dam scatter(0),low-area(1),upper-ar(2),whole-field(3).
[,9] sever minor(0),pot-severe(1),severe(2).
[,10] seed.tmt none(0),fungicide(1),other(2).
[,11] germ 90-100%(0),80-89%(1),lt-80%(2).
[,12] plant.growth norm(0),abnorm(1).
[,13] leaves norm(0),abnorm(1).
[,14] leaf.halo absent(0),yellow-halos(1),no-yellow-halos(2).
[,15] leaf.marg w-s-marg(0),no-w-s-marg(1),dna(2).
[,16] leaf.size lt-1/8(0),gt-1/8(1),dna(2).
[,17] leaf.shread absent(0),present(1).
[,18] leaf.malf absent(0),present(1).
[,19] leaf.mild absent(0),upper-surf(1),lower-surf(2).
[,20] stem norm(0),abnorm(1).
[,21] lodging yes(0),no(1).
[,22] stem.cankers absent(0),below-soil(1),above-s(2),ab-sec-nde(3).
[,23] canker.lesion dna(0),brown(1),dk-brown-blk(2),tan(3).
[,24] fruiting.bodies absent(0),present(1).
[,25] ext.decay absent(0),firm-and-dry(1),watery(2).
[,26] mycelium absent(0),present(1).
[,27] int.discolor none(0),brown(1),black(2).
[,28] sclerotia absent(0),present(1).
[,29] fruit.pods norm(0),diseased(1),few-present(2),dna(3).
[,30] fruit.spots absent(0),col(1),br-w/blk-speck(2),distort(3),dna(4).
[,31] seed norm(0),abnorm(1).
[,32] mold.growth absent(0),present(1).
[,33] seed.discolor absent(0),present(1).
[,34] seed.size norm(0),lt-norm(1).
[,35] shriveling absent(0),present(1).
[,36] roots norm(0),rotted(1),galls-cysts(2).

Source

These data have been taken from the UCI Repository Of Machine Learning Databases (Blake and Merz 1998) and were converted to R format by Evgenia Dimitriadou in the late 1990s.

The current version of the UC Irvine Machine Learning Repository Soybean (Large) data set is available from doi:10.24432/C5JG6Z.

References

Michalski RS, Chilausky RL (1980). “Learning by Being Told and Learning from Examples: An Experimental Comparison of the Two Methods for Knowledge Acquisition in the Context of Developing an Expert System for Soybean Disease Diagnosis.” International Journal of Policy Analysis and Information Systems, 4(2), 125–161.

Tan M, Eshelman L (1988). “Using Weighted Networks to Represent Classification Knowledge in Noisy Domains.” In Laird J (ed.), Machine Learning Proceedings 1988, 121-134. Morgan Kaufmann, San Francisco (CA). ISBN 978-0-934613-64-4. doi:10.1016/B978-0-934613-64-4.50018-9.
– IWN recorded a 97.1% classification accuracy
– 290 training and 340 test instances

Fisher DH, Schlimmer JC (1988). “Concept Simplification and Prediction Accuracy.” In Laird J (ed.), Machine Learning Proceedings 1988, 22–28. Morgan Kaufmann, San Francisco (CA). ISBN 978-0-934613-64-4. doi:10.1016/B978-0-934613-64-4.50007-4.
– Notes why this database is highly predictable

Blake CL, Merz CJ (1998). “UCI Repository of Machine Learning Databases.” University of California, Irvine, Department of Information and Computer Science. Formerly available from ‘⁠http://www.ics.uci.edu/~mlearn/MLRepository.html⁠’.

Examples

data("Soybean", package = "mlbench")
summary(Soybean)

[Package mlbench version 2.1-7 Index]