Collective Mind CMD demos

We expect that users at least briefly read Collective Mind intro , user guide and possibly long-term vision paper. Since 1999, when started working on program and architecture auto-tuning, machine learning and co-design, we faced numerous problems including huge optimization spaces, ever changing tools and their interfaces, lack of mechanisms and repositories to preserve and exchange design and optimization knowledge apart from numerous publications where reproducibility is often not even considered. Therefore, we are trying to develop a community-driven integrated framework and repository (cTuning since 2007 and Collective Mind since 2011) to deal with all these problems and enable collaborative, systematic and reproducible research and exprimentation in computer engineering through customizable experimental pipelines while still focusing on top-down auto-tuning (starting from algorithm tuning, high-level source-to-source optimizations, compiler flags, fine-grain compiler transformations, etc), machine learning and crowdsourcing.

Below is a brief description of available command-line based demos for Collective Mind on compiler flag tuning and program dataset behavior exploration. Related scripts can be found in the $CM_ROOT/scripts/2-examples-of-auto-tuning. We currently support various versions of GCC, LLVM, ICC, Open64 and related libraries for Linux, Windows and Android that can easily co-exist with each other in cM repositories, and can be collaboratively extended.

"Build and run" experimental pipeline

Here we describe how to use our current common "build and run" experimental pipeline (sub-directory 1-basic-build-and-run). Associated cM code for this pipeline can be found in $CM_ROOT/default/.cmr/module/ctuning.pipeline.build_and_run_program/module.py . It connects all the related artifacts from repository such as source code, data set, compiler, profiling tools (gprog, Intel vTune, likwid), OS, processor, etc. or chain other modules (such as statistical analysis of characteristics, Pareto frontier filter, etc) to build, run program and measure/collect related characteristics such as execution time, code size, compilation time, correctness of the output, hardware counters, etc. Input is customized through cM json file or command line parameters (which are internally converted to json). Output is also recorded as json format. Internally json is directly converted to python dictionary.

Multi-objective compiler auto-tuning

Dataset exploration

Visualization

Reproducibility

Misc

Collective Mind web-based demos

In 2013, we deprecated our previous cTuning1 public repository (opened in 2008), and opened a new live Collective Mind repository at c-mind.org/repo. Any user can self-register and access the latest public codelets, benchmarks, datasets, packages, models, experimental data shared by the community. Users can also rank existing data or upload their own data (an on-going work to make it more intuitive and user-friendly).

Here we collect some demos at the live c-mind.org server to give users some ideas about what cM can do or help with (note that user can reproduce these actions locally using cM as described in detail in cM user guide.

Platforms, approximate power consumption and costs of platforms (2013/May) used for demos (watts for laptop were measured with the off-the-shelf power meter, cM running susan corners benchmark in a loop, and with max frequency):

	Approximate power consumption (max)	Approximate cost	Description
P1=Samsung Galaxy Y	?	~100 euros	Broadcom BCM21553 ARM11 processor (ARMv6), 0.832 GHz, Memory 290MB
P2=Archos 101IT	~7 Watts	~140 euros	ARM Cortex A8 single core 1GHz, L1=32KB, L2=512KB
P3=Dell Latitude D630	~46Watts	~180 euros	Intel Core2 Centrino T7500 2.2GHz, Merom, L1=32KB 8-way set-associative, L2=4MB, 16-way set associative
P4=Dell Latitude E4300	~50 Watts	~200 euros	Intel Core2 Duo Centrino P9400 2.4GHz, Penryn, L1=32KB, 8-way set associative, L2=6Mb, 24-way set associative, Memory=DDR3 4Gb, Dual 530MHz
P5=Dell Latitude E6320	~52 Watts	~800 euros	Intel Core i5 2540M 2.6GHz, Sandy Bridge, L1=32KB 8-way set associative, L2=256KB 8-way set associative, L3=3MB, 12-way set associative, Memory DDR3 dual channels, 8GB, 665MHz

Program multi-objective auto-tuning with Paretto-like filtering (compiler optimizations)

Note, that above graphs can be easily converted to pdf, eps or png for publications or presentations - just press the associated button (when using Python MatplotLib as Graph engine).

Program crowd-tuning using available Android mobiles or cloud services (compiler optimizations)

As the first public crowd-tuning scenario of computer systems, any self-registered user can participate in systematizing of compiler flag tuning for multiple objectives on their own laptops, desktops and cloud/GRID services using standard cM framework, or using "Collective Mind Node" client on any mobile phone or Tablet with Android >= 2.x. Latest tuning results can be viewed here - currently we validate our past research and use data mining and machine learning to analyse this data and correlate most profitable compiler flags with program and architecture features. This is an on-going work and collaborations are welcome (we plan to continue collaborative development of such scenarios).

Universal program/architecture parameter exploration for modeling and adaptation

We can easily perform tuning of various dimensions in our experimental pipeline. For example, we analyze CPI vs dataset size (or any other dimension in the pipeline that is very useful for data mining) for the ludcmp numerical codelet on several Intel architectures using Intel vTune amplifier (the same can be done with perf):

Note that these graphs motivate our run-time adaptation solution for heterogeneous architectures (servers, supercomputers, clouds) when depending on the code and dataset parameters, it's faster or more power efficient to run them on different processors or with different frequency. It is based on static multi-versioning and dynamic run-time adaptation schemes as described in ^FCOP2005 ^LCWP2009 ^JVGP2009 ^FT2010, and we plan to add these support for adaptive scheduling of CPU/CUDA/OpenCL kernels using cM in the future (see out future possible collaborative projects):

Optimization prediction using collective knowledge about program/architecture properties

Interactive demo of online optimization predictor based on semantic program features (properties) extracted using MILEPOST GCC and combined with nearest-neighbor classifier to match continuously optimized codelets and programs by the crowd

Ctuning-cc/ctuning-fortran/ctuning-cpp demos (cM universal tuning and learning compiler wrapper) to predict compiler optimizations on the fly are available in cM and described in cM user guide.

If you would like to add (or see) more demos, please get in touch!

Contents