(Created page with "= cM demos<br/> = In 2013, we deprecated our previous [http://ctuning.org/wiki/index.php/CDatabase cTuning1 public repository (opened in 2008)], and opened a new live Collect...") |
|||
Line 1: | Line 1: | ||
− | = | + | = Collective Mind demos<br/> = |
In 2013, we deprecated our previous [http://ctuning.org/wiki/index.php/CDatabase cTuning1 public repository (opened in 2008)], and opened a new live Collective Mind repository at [http://c-mind.org/repo c-mind.org/repo]. Any user can self-register and access the latest public codelets, benchmarks, datasets, packages, models, experimental data shared by the community. Users can also rank existing data or upload their own data. | In 2013, we deprecated our previous [http://ctuning.org/wiki/index.php/CDatabase cTuning1 public repository (opened in 2008)], and opened a new live Collective Mind repository at [http://c-mind.org/repo c-mind.org/repo]. Any user can self-register and access the latest public codelets, benchmarks, datasets, packages, models, experimental data shared by the community. Users can also rank existing data or upload their own data. | ||
Line 15: | Line 15: | ||
|- | |- | ||
| '''P1'''=Samsung Galaxy Y<br/> | | '''P1'''=Samsung Galaxy Y<br/> | ||
− | | ?<br/> | + | | ?<br/> |
| ~100 euros<br/> | | ~100 euros<br/> | ||
| Broadcom BCM21553 ARM11 processor (ARMv6), 0.832 GHz, Memory 290MB<br/> | | Broadcom BCM21553 ARM11 processor (ARMv6), 0.832 GHz, Memory 290MB<br/> | ||
Line 83: | Line 83: | ||
Ctuning-cc/ctuning-fortran/ctuning-cpp demos (cM universal tuning and learning compiler wrapper) to predict compiler optimizations on the fly are available in cM and described in [[Tools:CM:User Guide:V1.0#Predicting optimizations and run-time adaptation using collective knowledge|cM user guide]]. | Ctuning-cc/ctuning-fortran/ctuning-cpp demos (cM universal tuning and learning compiler wrapper) to predict compiler optimizations on the fly are available in cM and described in [[Tools:CM:User Guide:V1.0#Predicting optimizations and run-time adaptation using collective knowledge|cM user guide]]. | ||
+ | |||
+ | == Program multi-objective auto-tuning from command line<br/> == | ||
+ | |||
+ | |||
+ | |||
+ | |||
<br/>''If you would like to add (or see) more demos, please [http://cTuning.org/lab/people/gfursin get in touch]!'' | <br/>''If you would like to add (or see) more demos, please [http://cTuning.org/lab/people/gfursin get in touch]!'' |
Revision as of 07:32, 19 December 2013
Contents
- 1 Collective Mind demos
- 1.1 Program multi-objective auto-tuning with Paretto-like filtering (compiler optimizations)
- 1.2 Program crowd-tuning using available Android mobiles or cloud services (compiler optimizations)
- 1.3 Universal program/architecture parameter exploration for modeling and adaptation
- 1.4 Optimization prediction using collective knowledge about program/architecture properties
- 1.5 Program multi-objective auto-tuning from command line
Collective Mind demos
In 2013, we deprecated our previous cTuning1 public repository (opened in 2008), and opened a new live Collective Mind repository at c-mind.org/repo. Any user can self-register and access the latest public codelets, benchmarks, datasets, packages, models, experimental data shared by the community. Users can also rank existing data or upload their own data.
Here we collect some demos at the live c-mind.org server to give users some ideas about what cM can do or help with (note that user can reproduce these actions locally using cM as described in detail in cM user guide.
Platforms, approximate power consumption and costs of platforms (2013/May) used for demos (watts for laptop were measured with the off-the-shelf power meter, cM running susan corners benchmark in a loop, and with max frequency):
|
Approximate power consumption (max) |
Approximate cost |
Description |
P1=Samsung Galaxy Y |
? |
~100 euros |
Broadcom BCM21553 ARM11 processor (ARMv6), 0.832 GHz, Memory 290MB |
P2=Archos 101IT |
~7 Watts |
~140 euros |
ARM Cortex A8 single core 1GHz, L1=32KB, L2=512KB |
P3=Dell Latitude D630 |
~46Watts |
~180 euros |
Intel Core2 Centrino T7500 2.2GHz, Merom, L1=32KB 8-way set-associative, L2=4MB, 16-way set associative |
P4=Dell Latitude E4300 |
~50 Watts |
~200 euros |
Intel Core2 Duo Centrino P9400 2.4GHz, Penryn, L1=32KB, 8-way set associative, L2=6Mb, 24-way set associative, Memory=DDR3 4Gb, Dual 530MHz |
P5=Dell Latitude E6320 |
~52 Watts |
~800 euros |
Intel Core i5 2540M 2.6GHz, Sandy Bridge, L1=32KB 8-way set associative, L2=256KB 8-way set associative, L3=3MB, 12-way set associative, Memory DDR3 dual channels, 8GB, 665MHz |
Program multi-objective auto-tuning with Paretto-like filtering (compiler optimizations)
- Graph: Analysis of execution time variation (susan corners benchmark, Intel i5 processor, high-performance power scheme, 30 repetitions)
- Graph: Analysis of execution time variation (susan corners benchmark, Intel i5 processor, power scheme changed from max to min performance, 30 repetitions)
- Graph: compiler tuning (susan corners benchmark, Samsung Galaxy Y mobile, ARM v6, Sourcery GCC 4.7.2, 100 exploration points with random flags, kernel execution time vs binary size(
- Graph: compiler tuning (susan corners benchmark, Samsung Galaxy Y mobile, ARM v6, Sourcery GCC 4.7.2, 100 exploration points with random flags, kernel execution time vs binary size) - multigraph with references optimizations separated (-O1,-O2,-O3,-Os, -fast,etc)
- Graph: compiler tuning (susan corners benchmark, Samsung Galaxy Y mobile, ARM v6, Sourcery GCC 4.7.2, Pareto fronteer after 100 exploration points with random flags, kernel execution time vs binary size) - multigraph with references optimizations separated (-O1,-O2,-O3,-Os, -fast,etc)
- Graph: compiler tuning (susan corners benchmark, {Samsung Galaxy Y mobile, ARM v6 vs Archos 101 Internet Tablet, ARM v7}, Sourcery GCC 4.7.2, Pareto frontier after 100 exploration points with random flags, kernel execution time vs binary size) - multigraph with references optimizations separated (-O1,-O2,-O3,-Os, -fast,etc)
- Graph: compiler tuning (susan corners benchmark, Samsung Galaxy Y mobile, {Sourcery GCC 4.7.2 vs 4.6.3}, Pareto frontier after 100 exploration points with random flags, kernel execution time vs binary size) - multigraph with references optimizations separated (-O1,-O2,-O3,-Os, -fast,etc)
- Graph: compiler tuning (susan corners benchmark, Archos 101 Internet Tablet, {Sourcery GCC 4.7.2 vs LLVM 3.1}, 100 exploration points with random flags, kernel execution time vs binary size) - multigraph
Note, that above graphs can be easily converted to pdf, eps or png for publications or presentations - just press the associated button (when using Python MatplotLib as Graph engine).
Program crowd-tuning using available Android mobiles or cloud services (compiler optimizations)
As the first public crowd-tuning scenario of computer systems, any self-registered user can participate in systematizing of compiler flag tuning for multiple objectives on their own laptops, desktops and cloud/GRID services using standard cM framework, or using "Collective Mind Node" client on any mobile phone or Tablet with Android >= 2.x. Latest tuning results can be viewed here - currently we validate our past research and use data mining and machine learning to analyse this data and correlate most profitable compiler flags with program and architecture features. This is an on-going work and collaborations are welcome (we plan to continue collaborative development of such scenarios).
- Android application: Collective Mind Node to crowd source auto-tuning (current example - compiler flags tuning)
- Table: most profitable compiler flags found by the community (from mobiles, cloud services, etc)
Universal program/architecture parameter exploration for modeling and adaptation
We can easily perform tuning of various dimensions in our experimental pipeline. For example, we analyze CPI vs dataset size (or any other dimension in the pipeline that is very useful for data mining) for the ludcmp numerical codelet on several Intel architectures using Intel vTune amplifier (the same can be done with perf):
- Graph: P5
- {Graph: P3 vs P5}
- {Graph: P3 vs P4} - note alignment misses that relates to cache hierarchy
Note that these graphs motivate our run-time adaptation solution for heterogeneous architectures (servers, supercomputers, clouds) when depending on the code and dataset parameters, it's faster or more power efficient to run them on different processors or with different frequency. It is based on static multi-versioning and dynamic run-time adaptation schemes as described in FCOP2005 LCWP2009 JVGP2009 FT2010, and we plan to add these support for adaptive scheduling of CPU/CUDA/OpenCL kernels using cM in the future (see out future possible collaborative projects):
Optimization prediction using collective knowledge about program/architecture properties
Ctuning-cc/ctuning-fortran/ctuning-cpp demos (cM universal tuning and learning compiler wrapper) to predict compiler optimizations on the fly are available in cM and described in cM user guide.
Program multi-objective auto-tuning from command line
If you would like to add (or see) more demos, please get in touch!