When working on adaptive scheduling for heterogeneous architectures JVGP2009, we performed online modeling of program behavior vs dataset parameters vs CPU/GPU execution. We would like to unify this approach in cM and provide a mechanism to quickly scan large dataset space (using our probabilistic approach FT2010, FOTP2005 or ANOVA to focus exploration on points with unusual behavior and quickly reduce dimensions) combined with some off-the-shelf decision trees or hybrid Multivariate adaptive regression splines (MARS) LCWP2009. We have prototypes ready and plan to add off-the-shelf models from R or Scientific Python to cM. Collaborations and help are welcome!