Tuesday, February 5, 2008

Meta-learning

I just finished reading Rice's seminal paper on algorithm selection [Rice, J.R. (1976). The Algorithm Selection Problem. Advances in Computers, 15:65-118]. For obvious reasons, it does not talk about meta-learning (look at the date!) but meta-learning is clearly one natural approach to solving the algorithm selection problem.
Kate Smith-Miles recently wrote a very nice survey paper (to appear in ACM Computing Surveys) where she uses Rice's framework to review and describe most known attempts at algorithm selection.
Rice does indeed offer a very clean formalism for the problem of algorithm selection, where a problem X from some problem space P is mapped, via some feature extraction process, to a representation f(X) in some feature space F, and the selection algorithm S maps f(X) to some algorithm Y in some algorithm space A, so that the performance of Y on X (for some adequately chosen performance measure) is in some sense optimal. Hence, as pointed out, "the selection mapping now depends only on the features f(X), yet the performance mapping still depends on the problem X" and, of course, "the determination of the best (or even good) features is one o the most important, yet nebulous, aspects of the algorithm selection process."
Rice is also quick to point out that "ideally, those problems with the same features would have the same performance for any algorithm being considered." I actually also pointed that out in my recent paper[Giraud-Carrier, C. (2005). The Data Mining Advisor: Meta-learning at the Service of Practitioners. In Proceedings of the 4th International Conference on Machine Learning Applications, 113-119] where I stated that unless for all X and X' (X <> X'), f(X)=f(X') implies p(X)=p(X') (where p is the performance measure) then the meta-training set may be noisy and meta-learning may in turn be sub-optimal.
Rice's framework naturally covers various forms of selection (e.g., best algorithm, best algorithm for a subclass of problems, etc.) as well as multi-criteria performance measures.
Another important point brought out by Rice, and often overlooked in the machine learning community, is that "most algorithms are developed for a particular class of problems even though the class is never explicitly defined. Thus the performance of algorithms is unlikely to be understood without some idea of the problem class associated with their development. Foster Provost and I called that the Strong Assumption of Machine Learning in our paper on the justification of meta-learning [Giraud-Carrier, C. and Provost, F. (2005). Towards a Justification of Meta-learning: Is the No Free Lunch Theorem a Show-stopper. In Proceedings of the ICML-05 Workshop on Meta-learning, 12-19]. I (and others) have often argued that the notion of delimiting the class of problems on which an algorithm performs well is critical to advances in machine learning.
Anyways, although Rice offers no specific method to solve the algorithm selection problem, the paper is highly relevant and very well-written. A must read for anyone interested in meta-learning.

No comments: