Abstract:
In the course of data modeling, many models could be created. Much work has been done on formulating guidelines for model selection. However, by and large, these guidelines are conservative or too speci c. Instead of using general guidelines, models could be selected for a particular task based on statistical tests. When selecting one model, others are discarded. Instead of losing potential sources of information, models could be combined to yield better performance. We review the basics of model selection and combination and discuss their di erences. Two examples of opportunistic and principled combinations are presented. The rst demonstrates that mediocre quality models could be combined to yield signi cantly better performance. The latter is the main contribution of the paper; it describes and illustrates a novel heuristic approach called the SG (k-NN) ensemble for the generation of good quality and diverse models that can even improve excellent quality models.