Efficient multisplitting on numerical data
Numerical data poses a problem to symbolic learning methods since numerical value ranges inherently need to be partitioned into intervals for representation and handling. An evaluation function is used to approximate the goodness of different partition candidates. Most existing methods for multisplitting on numerical attributes are based in heuristics because of the apparent efficiency advantages. In this paper a class of well-behaved cumulative evaluation functions are characterized for which efficient discovery of the optimal multisplit is possible by dynamic programming. A single pass through the data suffices to evaluate multisplits of all arities; this class contains many important attribute evaluation functions familiar from symbolic learning research. These empirical experiments convey that there is no significant differences in efficiency between the methods that produces optimal partitions and those that are based on heuristics. Moreover, it is demonstrated that optimal multisplitting can be beneficial in decision tree learning in contrast to using the much applied binarization of numerical or heuristical multisplitting.
Bibliographic Reference: Paper presented: First European Symposium on Data Mining and Knowledge Discovery, Trondheim (NO), June 25-27, 1997
Availability: Available from (1) as Paper EN 40564 ORA
Record Number: 199710668 / Last updated on: 1997-06-09
Original language: en
Available languages: en