On the well-behavedness of important attribute evaluation functions
The class of well-behaved evaluation functions simplifies and makes efficient the handling of numerical attributes, for them it suffices to concentrate on the boundary points in searching for the optimal partition. This holds always for binary partitions and also for multisplits only if the function is cumulative in addition to being well-behaved. The class of well behaved evaluation functions is a proper superclass of convex evaluation functions. Thus a large proportion of the most important attribute evaluation functions are well behaved. This paper explores the extent and boundaries of well-behaved functions. In particular, we examine C4.5's default attribute evaluation function gain ratio, which has been known to have problems with numerical attributes. We show that gain ratio is not convex, but is still well-behaved with respect to binary partioning. However, it cannot handle higher arity partitioning well. Our empirical experiments show that a very simple cumulative rectification to the poor bias of information gain significantly outperforms gain ratio.
Bibliographic Reference: Paper presented: Sixth Scandinavian Conference on Artificial Intelligence, Helsinki (FI), August 18-20, 1997
Availability: Available from (1) as Paper EN 40731 ORA
Record Number: 199710914 / Last updated on: 1997-07-23
Original language: en
Available languages: en