(109 intermediate revisions by the same user not shown)
Line 1: Line 1:
= Collective Mind Framework and tools =
 
  
* [[Tools:CM|Collective Mind]] - framework and repository for collaborative and reproducible research
 
** [[Tools:OpenME|OpenME]] - interface to "open up" third-party tools and applications to make them prepared for auto-tuning using cM
 
** [[Tools:Alchemist|Alchemist]] - OpenME plugin to convert compilers into interactive analysis and optimization toolsets
 
  
= Latest publications and presentations =
 
* [http://arxiv.org/abs/1308.2410 ArXiv open access technical report with long term vision "Collective Mind: cleaning up the research and experimentation mess in computer engineering using crowdsourcing, big data and machine learning"]
 
* [[Media:Presentation_long_term_vision.pdf|Long term vision - "Systematizing tuning of computer systems using crowdsourcing and statistics"]] (6.5Mb)
 
* [[Media:Presentation_cM_basics.pdf|cM basics - "Collective Mind infrastructure and repository to crowdsource auto-tuning"]] (4.8Mb)
 
  
= Steering committee =
 
* [http://cTuning.org/lab/people/gfursin Grigori Fursin]
 
* Zbigniew Chamski
 
  
= Motivation =
+
<p style="text-align: center"><span style="font-size:x-large">
 +
'''''In September 2015, we have released a brand new Collective Knowledge framework for collaborative, systematic and reproducible computer system's research, and moved all further developments to GitHub: [http://github.com/ctuning/ck src], [http://github.com/ctuning/ck/wiki docs]'''''
 +
</span></p>
  
Designing, analyzing and optimizing computer systems is nowadays a tremendously tedious, messy, ad-hoc, costly and error prone process due to an enormous number of available design and optimization choices combined with complex interactions between all components, and rapidly changing technology, tools and interfaces.
 
  
Auto-tuning, run-time adaptation and machine learning based techniques have been investigated for more than a decade to address some of these challenges but are still far from the widespread production use. This is not only due to unbearably long tuning times and ever changing interfaces of analysis and optimization tools, but also due to a lack of a common methodology to discover, preserve and share knowledge about behavior of existing computer systems. Current solutions are mainly proprietary and include redesign of the whole SW/HW stack by very large groups (Liquid Metal project from IBM, SW/HW co-design initiatives from Intel, SciDAC SUPER project).
 
  
When developing cTuning and Collective Mind technology our main goal was to develop such a holistic methodology, framework and repository for collaborative and systematic research and experimentation on software/hardware co-design which will not require redesign of the whole SW/HW stack but could easily and transparently evolve with the evolution of existing systems and tools. At the same time we wanted to implement the core of cTuning/Collective Mind by just a few engineers using Agile methodology and enable easy extensibility by the community to share, reproduce and validate various benchmarks, data sets, tools, models, experimental results, etc.
 
  
We believe that in the past 14 years, we found all the missing pieces of the puzzle to address the above challenges and enable systematic and reproducible characterization and optimization of computer systems through unified and scalable repositories of knowledge and crowdsourcing. In this approach, multi-objective program and architecture tuning to balance performance, power consumption, compilation time, code size and any other important metric is transparently distributed among multiple users while utilizing any available mobile, cluster or cloud computer services. Collected information about program and architecture properties and behavior is continuously processed using statistical and predictive modeling techniques to build, keep and share only useful knowledge at multiple levels of granularity. Gradually increasing and systematized knowledge can be used to predict most profitable program optimizations, run-time adaptation scenarios and architecture configurations depending on user requirements.
 
  
Collective Mind Framework (cM) is a public, open-source, plugin-based infrastructure and repository that attempts to implement the above methodology.  Motivated by physics, biology and AI sciences, this framework helps researchers to gradually expose tuning choices, properties and characteristics of any tool or application at multiple granularity levels in existing systems through multiple plugins ("wrappers"). These wrappers use simple and extensible no-type interfaces and no-SQL JSON-based extensible file-based repositories. Such wrappers can be easily combined like research "LEGO" to prepare various exploration, analysis and optimization scenarios and connected to customizable public or private in-house repositories of shared data (applications, data sets, codelets, micro-benchmarks and architecture descriptions), modules (classification, predictive modeling, run-time adaptation) and statistics about behavior of computer systems. Collected data can be continuously analyzed and extrapolated using online learning to predict better optimizations or hardware configurations to effectively balance performance, power consumption and other characteristics. We start from the top-down analysis and optimization of existing computer systems and together with the community start gradually increasing the complexity until we understand and systematize the behavior of existing computer systems to be able to either quickly predict how to optimize them or to build new better systems by extrapolating existing knowledge.
 
  
[[File:universal_learning_node_small.png|center]]
 
  
In 2013 we opened 3rd generation of the [http://c-mind.org/repo live cM repository] (that substituted previous public version opened in 2008) and released all his past research developments, codelets, benchmarks, data sets, models, online learning plugins and tools to start top-down analysis and optimization of existing computer systems, and together with the community start gradually increasing the complexity until we understand and systematize the behavior of existing computer systems to be able to either quickly predict how to optimize them or to build new better systems by extrapolating existing knowledge. This methodology should also push forward new international publication model in computer engineering where experimental results are continuously validated and improved by the community.
 
  
''cM is a community-based and continuously evolving project that uses agile development methodology. Hence, interfaces and modules may be changing from time to time to provide needed functionality. We are very thankful for your understanding, patience and any help to extend and improve this framework while making it clean, simple and easy to use.''
+
 
 +
 
 +
----
 +
 
 +
'''This page is not updated since summer 2015 - see this [http://cTuning.org/reproducibility-wiki wiki] instead!'''''
 +
 
 +
 
 +
 
 +
<p style="text-align: center"><span style="font-size:x-large">'''<span style="font-family: tahoma,geneva,sans-serif">Collective Mind<br/><span style="font-size:large">''towards collaborative, systematic and reproducible computer engineering''</span></span>'''</span></p>
 +
{| border="0" cellpadding="10" cellspacing="1" width="1118"
 +
|- valign="top"
 +
| width="170" | <p style="text-align: center">[[File:Validate by community.png|Validate by community.png|link=http://cTuning.org/reproducibility]]</p><p style="text-align: center">[[File:CTuning foundation logo1.png|none|CTuning foundation logo1.png|link=http://cTuning.org]]</p>
 +
| rowspan="1" |
 +
We are a group of researchers working with the community on a new methodology, infrastructure and repository to enable collaborative and reproducible research and experimentation in computer engineering as a side effect of our projects on combining performance/energy/size auto-tuning with run-time adaptation, crowdsourcing, big data and predictive analytics ('''see our [http://cTuning.org/history manifesto and history]'''). Our approach, in turn, helped to enable [http://c-mind.org/reproducibility new publication model] where all research material (code and data artifacts) is shared along with articles to be continuously discussed, validated and improved by the community! To evangelize this community-driven approach and set up an example, we started releasing all our benchmarks, data sets, predictive models and tools with unified interfaces since 2007 at [http://cTuning.org cTuning.org ]and later at [http://c-mind.org/repo c-mind.org/repo]. We use&nbsp; [http://adapt-workshop.org ADAPT workshop] on self-tuning computing systems to validate our new research and publication model. We hope that it can complement well recent academic initiatives on reproducible research at major conferences while focusing more on [http://c-mind.org/reproducibility technological aspects] of collaborative and reproducible research in computer engineering (rather than just sharing and validating artifacts).
 +
<p style="text-align: center">''This R&D is supported by [http://cTuning.org the cTuning foundation].''</p>
 +
|}
 +
 
 +
= Our long term vision<br/> =
 +
 
 +
With the rapid advances in information technology and all other fields of science comes dramatic growth in the amount of processing data ("big data"). Scientists, engineers and students are drowning in experimental data and often have to divert their research path towards data management, mining, and visualization. Such approaches often require additional interdisciplinary skills including statistical analysis, machine learning, programming and parallelization, database management, and Internet technologies, which still few researchers have or can afford to learn in parallel with their main research work. Multiple frameworks, languages and public data repositories started appearing recently to enable collaborative data analysis and processing but they are often either covering very narrow research topics and too simplistic (just data and code sharing) or very formal and still require special programming skills often including Object Oriented Programming.
 +
 
 +
Collective Mind technology (cM) attempts to fill in this gap by providing researchers and companies a simple, portable, technology-neutral and practically transparent way to gradually systematize and classify all their data, code and tools. Open source cM framework and repository fully relies on customizable public&nbsp; or private plugins (mostly written in python with support of any other language through OpenME interface) to gradually describe and classify similar data and code objects, or abstract interfaces of ever changing tools thus effectively protecting researchers' experimental setups. cM helps to easily preserve any complex research artifact (collection of files, benchmarks, codelets, datasets, tools, traces, models) with gradually and easily extensible JSON based meta description including classification, properties and either direct or semantic data connections. Furthermore, meta descriptions of all&nbsp; data can be transparently and easily indexed using third-party [http://www.elasticsearch.org ElasticSearch] enabling very fast and complex queries. At the same time, all research artifacts can be exposed to any public or workgroup user through unified web services to crowdsource experimentation, ranking, online learning and knowledge management.
 +
<p style="text-align: center">[[File:C-mind-picture.png]]</p>
 +
cM uses agile top-down methodology originating from physics to represent any experimental scenario and gradually decompose it into connected plugins with associated data or compose it from already shared plugins similar to "research LEGO". Universal structure immediately enables replay mode for any experiment, thus making this framework suitable for recent projects on reproducibility of experimental results and new publication model where experiments and techniques are validated, ranked and improved by the community. For example, we easily moved all our past R&D on program and architecture multi-objective auto-tuning, co-design and dynamic adaptation to cM plugins and gradually make them available together with all research artifacts at [http://c-mind.org/repo http://c-mind.org/repo]. We hope that cM will be useful to a broad range of researchers and companies either as an open-source, community driven solution to systematize their research and experimentation, or possibly as an intermediate step before investing into more complex or commercial knowledge management systems.
 +
 
 +
''Here is [[Reproducibility:Links|our list of links]] to initiatives, publications, tools and techniques related to collaborative and reproducible reserarch, experimentation and development in computer engineering.''
 +
 
 +
==== Related Collective Mind publications and presentations<br/> ====
 +
 
 +
*Our new open publication model proposal (2014) ([http://dl.acm.org/citation.cfm?id=2618142 ACM pdf], [http://arxiv.org/abs/1406.4020 arXiv pdf])- it summarizes our practical experience with sharing and reviewing experimental results and research artifacts since 2007; we plan to validate it at our [http://adapt-workshop.org ADAPT'15]
 +
*[http://hal.inria.fr/inria-00436029 GCC Summit publication (2009)] - introducing our vision on reproducible research and describing cTuning.org framework for collaborative and reproducible program and architecture analysis, optimization and co-design (all code and data for machine learning based compiler MILEPOST GCC have been publicly shared at [http://cTuning.org cTuning.org])
 +
*[http://arxiv.org/abs/1308.2410 INRIA/arXiv technical report (2013)] - introducing long term Collective Mind vision; considerably updated journal version will be available in Fall 2014
 +
*[http://c-mind.org/repo/?view_cid=shared1:dissemination.publication:530e5f456ea259de ACM TACO publication (2012)] - introducing crowdtuning (crowdsourcing auto-tuning)
 +
*[http://c-mind.org/repo/?view_cid=shared1:dissemination.publication:a31e374796869125 IJPP publication (2011)] - introducing machine learning based compiler, cTuning.org and reproducible R&D on program and architecture optimization
 +
*[http://www.slideshare.net/GrigoriFursin/presentation-fursin-hpsc2013fursin1 Long term vision slides] - "Systematizing tuning of computer systems using crowdsourcing and statistics"
 +
*[http://www.slideshare.net/GrigoriFursin/presentation-fursin-hpsc2013fursin2 cM basics slides] - "Collective Mind infrastructure and repository to crowdsource auto-tuning"
 +
 
 +
= Public repository of knowledge<br/> =
 +
 
 +
''Do not waste your research material - use Collective Mind Framework and Repository to describe, run and share your experiments with the community!''
 +
 
 +
*[http://c-mind.org/repo Beta live Collective Mind repository] (3rd generation opened in 2013 substituting previous cTuning repository and infrastructure available since 2008) - we described and shared all our past research developments, codelets, benchmarks, data sets, models, statistical analysis, modeling and online learning plugins and tools to start top-down analysis and optimization of existing computer systems. We used it as the first practical example to motivate new publication model where all research artifacts are continuously shared, validated and improved by the community. After many years, it seems that community finally started moving in this direction and we even see some related initiatives in major conferences including OOPSLA and PLDI. We believe that our project and feedback from the community collected since 2006 is complementary and can help with various technological aspects of collaborative and reproducible research in computer engineering.
 +
 
 +
= Common infrastructure and support tools =
 +
 
 +
*[[Tools:CM|Collective Mind Infrastructure (cM)]] - plugin-based framework and repository for collaborative, systematic and reproducible research and experimentation
 +
**[[Tools:OpenME|OpenME]] - universal and simple event-based interface to "open up" black box applications and third-party tools such as GCC, LLVM and Open64 to be able to monitor, learn and predict any fine-grain optimization decision inside through external plugins
 +
**[[Tools:Alchemist|Alchemist]] - OpenME plugin to convert compilers into interactive analysis and optimization toolsets
 +
 
 +
= Events =
 +
 
 +
*[http://ctuning.org/cm/wiki/Events%3ATRUST2014 Workshop TRUST 2014 on reproducible research methodologies and new publication models] @ PLDI 2014 (Edinburgh, UK)
 +
*[http://www.occamportal.org/reproduce Workshop REPRODUCE 2014 on reproducible research methodologies and new publication models] @ HPCA 2014 (Orlando, Florida, USA)
 +
*[http://adapt-workshop.org/program.htm Panel on reproducible research methodologies and new publication models] at ADAPT 2014 @ HiPEAC 2014 (Vienna, Austria)
 +
*[http://www.hipeac.net/thematic-session/making-computer-engineering-science Thematic session on making computer engineering a science] @&nbsp; ACM ECRC 2013 / HiPEAC computing week 2013 (Paris, France)
 +
*[http://www.hipeac.net/thematic-session/collective-characterization-optimization-and-design-computer-systems Thematic session on collective characterization, optimization and design of computer systems] @ HiPEAC spring computing week 2012&nbsp; (Goteborg, Sweden)
 +
*[http://www.hipeac.net/conference/pisa/speedup Tutorial on Speedup-Test: Statistical Methodology to Evaluate Program Speedups and their Optimisation Techniques] @ HiPEAC 2010 (Pisa, Italy)
 +
*[http://c-mind.org/repo/?view_cid=77154d189d2e226c:0053bdf524fb9a58 Tutorial on cTuning tools for collaborative and reproducible program and architecture characterization and auto-tuning] @ HiPEAC computing systems week 2009 (Infineon, Munich, Germany)
 +
*[http://hal.inria.fr/inria-00436029 Public discussion on collaborative and reproducible analysis, design and optimization of computer systems] @ GCC Summit 2009 (Montreal, Canada)
 +
 
 +
= Current customized usage scenarios<br/> =
 +
 
 +
Designing novel many-core computer systems becomes intolerably complex, &nbsp;ad-hoc, costly and error prone due to limitations of available technology, enormous number of available design and optimization choices, and complex interactions between all software and hardware components. Empirical auto-tuning combined with run-time adaptation and machine learning has been demonstrating good potential to address above challenges for more than a decade but still far from the widespread production use due to unbearably long exploration and training times, ever changing tools and their interfaces, lack of a common experimental methodology, and lack of unified mechanisms for knowledge building and exchange apart from publications where reproducibility of results is often not even considered. Since 1993, we have spent more time on preparing and analyzing huge amount of heterogeneous experiments for self-tuning machine-learning based computer systems or trying to validate and reproduce others research results rather than on <span data-scayt_word="exending" data-scaytid="24">exending</span> our novel ideas.
 +
 
 +
In 2007, we decided to start collaborative systematization and unification of design and optimization of computer systems combined with a [http://cTuning.org/cm-journal new publication model] where experimental results are validated by the community. One of the possible promising solutions is to combine public repository of knowledge with online auto-tuning, machine learning and <span data-scayt_word="crowdsourcing" data-scaytid="25">crowdsourcing</span> techniques where <span data-scayt_word="HiPEAC" data-scaytid="26">HiPEAC</span> and <span data-scayt_word="cTuning" data-scaytid="27">cTuning</span> communities already have a good practical experience. Such collaborative approach should allow community to continuously validate, systematize and improve collective knowledge about computer systems, and extrapolate it to build faster, more power efficient and reliable computer systems. It can also help to restore the attractiveness of computer engineering making it a more systematic and rigorous discipline rather than "hacking".
 +
 
 +
We develop <span data-scayt_word="cTuning" data-scaytid="28">cTuning</span> collaborative research and development infrastructure and repository (current version is <span data-scayt_word="cTuning3" data-scaytid="30">cTuning3</span> aka Collective Mind) that enables:
 +
 
 +
*gradual decomposition and <span data-scayt_word="parametrization" data-scaytid="32">parametrization</span> of complex computer systems and experiments into unified and inter-connected Collective Mind modules (components or plugins) with extensible meta-information
 +
*easy co-existance of multiple versions of tools and libraries
 +
*implementation of experimental pipelines with all related artifacts necessary for collaborative and reproducible research and experimentation
 +
*collection and sharing of statistics, benchmarks, <span data-scayt_word="codelets" data-scaytid="31">codelets</span>, tools, data sets and predictive models from the community
 +
*<span data-scayt_word="systematizaton" data-scaytid="34">systematizaton</span> of optimization, design space exploration and run-time adaptation techniques (co-design and auto-tuning)
 +
*collaborative evaluation and improvement of various data mining, classification and predictive modeling techniques for off-line and on-line auto-tuning
 +
*new publication model (workshops, conferences, journals) with validation of experimental results by the community
 +
 
 +
Current <span data-scayt_word="cM" data-scaytid="35">cM</span> version includes public benchmarks, datasets, tools, techniques and some stats from past [http://cTuning.org/lab/people/gfursin <span data-scayt_word="Grigori" data-scaytid="36">Grigori</span> <span data-scayt_word="Fursin&#039;s" data-scaytid="39">Fursin's</span> research]:
 +
 
 +
*support for most <span data-scayt_word="OSes" data-scaytid="40">OSes</span> and platforms (Linux, Android, Windows; servers, cloud nodes, mobiles, laptops, tablets, <span data-scayt_word="supercomputers" data-scaytid="41">supercomputers</span>)
 +
*multiple benchmarks (<span data-scayt_word="cBench" data-scaytid="43">cBench</span>, <span data-scayt_word="polybench" data-scaytid="45">polybench</span>, <span data-scayt_word="SPEC95" data-scaytid="46">SPEC95</span>,<span data-scayt_word="SPEC2000" data-scaytid="47">SPEC2000</span>,<span data-scayt_word="SPEC2006" data-scaytid="48">SPEC2006</span>,<span data-scayt_word="EEMBC" data-scaytid="49">EEMBC</span>,etc), hundreds of MILEPOST/CAPS <span data-scayt_word="codelets" data-scaytid="42">codelets</span>, <span data-scayt_word="thosands" data-scaytid="51">thosands</span> of <span data-scayt_word="cBench" data-scaytid="44">cBench</span> datasets
 +
*multiple compilers (<span data-scayt_word="GCC" data-scaytid="52">GCC</span>, <span data-scayt_word="LLVM" data-scaytid="53">LLVM</span>, <span data-scayt_word="Open64" data-scaytid="54">Open64</span>, <span data-scayt_word="PathScale" data-scaytid="55">PathScale</span>, Intel, IBM, <span data-scayt_word="PGI" data-scaytid="56">PGI</span>)
 +
*tools for program and architecture characterization (MILEPOST <span data-scayt_word="GCC" data-scaytid="57">GCC</span> for semantic features and code patterns; hardware counters for dynamic analysis)
 +
*plugins for powerful visualization and data export in various formats
 +
*experimental pipeline for universal program and architecture co-design, auto-tuning, performance/energy modeling and machine learning
 +
*<span data-scayt_word="OpenME" data-scaytid="59">OpenME</span> interface to instrument programs or statically enable adaptive binaries through <span data-scayt_word="multi-versioning" data-scaytid="62">multi-versioning</span> and decision trees for run-time adaptation/scheduling while easily mixing CPU/<span data-scayt_word="CUDA" data-scaytid="63">CUDA</span>/<span data-scayt_word="OpenCL" data-scaytid="64">OpenCL</span> <span data-scayt_word="codelets" data-scaytid="60">codelets</span> or any other heterogeneous programming models
 +
*plugins for online auto-tuning and performance model building
 +
*machine-learning enabled self-tuning <span data-scayt_word="cTuning" data-scaytid="66">cTuning</span> CC compiler that can wrap any existing compiler while using crowd-tuning and collective knowledge to continuously improve its own behavior
 +
*plugins for universal <span data-scayt_word="P2P" data-scaytid="69">P2P</span> data exchange through <span data-scayt_word="cM" data-scaytid="67">cM</span> web services
 +
*optimization statistics for various ARM, Intel and NVidia chips
 +
 
 +
 
 +
See [http://cTuning.org/reproducibility-wiki reproducibility wiki for further details].

Latest revision as of 21:08, 1 April 2016



In September 2015, we have released a brand new Collective Knowledge framework for collaborative, systematic and reproducible computer system's research, and moved all further developments to GitHub: src, docs







This page is not updated since summer 2015 - see this wiki instead!


Collective Mind
towards collaborative, systematic and reproducible computer engineering

Validate by community.png

CTuning foundation logo1.png

We are a group of researchers working with the community on a new methodology, infrastructure and repository to enable collaborative and reproducible research and experimentation in computer engineering as a side effect of our projects on combining performance/energy/size auto-tuning with run-time adaptation, crowdsourcing, big data and predictive analytics (see our manifesto and history). Our approach, in turn, helped to enable new publication model where all research material (code and data artifacts) is shared along with articles to be continuously discussed, validated and improved by the community! To evangelize this community-driven approach and set up an example, we started releasing all our benchmarks, data sets, predictive models and tools with unified interfaces since 2007 at cTuning.org and later at c-mind.org/repo. We use  ADAPT workshop on self-tuning computing systems to validate our new research and publication model. We hope that it can complement well recent academic initiatives on reproducible research at major conferences while focusing more on technological aspects of collaborative and reproducible research in computer engineering (rather than just sharing and validating artifacts).

This R&D is supported by the cTuning foundation.

Our long term vision

With the rapid advances in information technology and all other fields of science comes dramatic growth in the amount of processing data ("big data"). Scientists, engineers and students are drowning in experimental data and often have to divert their research path towards data management, mining, and visualization. Such approaches often require additional interdisciplinary skills including statistical analysis, machine learning, programming and parallelization, database management, and Internet technologies, which still few researchers have or can afford to learn in parallel with their main research work. Multiple frameworks, languages and public data repositories started appearing recently to enable collaborative data analysis and processing but they are often either covering very narrow research topics and too simplistic (just data and code sharing) or very formal and still require special programming skills often including Object Oriented Programming.

Collective Mind technology (cM) attempts to fill in this gap by providing researchers and companies a simple, portable, technology-neutral and practically transparent way to gradually systematize and classify all their data, code and tools. Open source cM framework and repository fully relies on customizable public  or private plugins (mostly written in python with support of any other language through OpenME interface) to gradually describe and classify similar data and code objects, or abstract interfaces of ever changing tools thus effectively protecting researchers' experimental setups. cM helps to easily preserve any complex research artifact (collection of files, benchmarks, codelets, datasets, tools, traces, models) with gradually and easily extensible JSON based meta description including classification, properties and either direct or semantic data connections. Furthermore, meta descriptions of all  data can be transparently and easily indexed using third-party ElasticSearch enabling very fast and complex queries. At the same time, all research artifacts can be exposed to any public or workgroup user through unified web services to crowdsource experimentation, ranking, online learning and knowledge management.

C-mind-picture.png

cM uses agile top-down methodology originating from physics to represent any experimental scenario and gradually decompose it into connected plugins with associated data or compose it from already shared plugins similar to "research LEGO". Universal structure immediately enables replay mode for any experiment, thus making this framework suitable for recent projects on reproducibility of experimental results and new publication model where experiments and techniques are validated, ranked and improved by the community. For example, we easily moved all our past R&D on program and architecture multi-objective auto-tuning, co-design and dynamic adaptation to cM plugins and gradually make them available together with all research artifacts at http://c-mind.org/repo. We hope that cM will be useful to a broad range of researchers and companies either as an open-source, community driven solution to systematize their research and experimentation, or possibly as an intermediate step before investing into more complex or commercial knowledge management systems.

Here is our list of links to initiatives, publications, tools and techniques related to collaborative and reproducible reserarch, experimentation and development in computer engineering.

Related Collective Mind publications and presentations

  • Our new open publication model proposal (2014) (ACM pdf, arXiv pdf)- it summarizes our practical experience with sharing and reviewing experimental results and research artifacts since 2007; we plan to validate it at our ADAPT'15
  • GCC Summit publication (2009) - introducing our vision on reproducible research and describing cTuning.org framework for collaborative and reproducible program and architecture analysis, optimization and co-design (all code and data for machine learning based compiler MILEPOST GCC have been publicly shared at cTuning.org)
  • INRIA/arXiv technical report (2013) - introducing long term Collective Mind vision; considerably updated journal version will be available in Fall 2014
  • ACM TACO publication (2012) - introducing crowdtuning (crowdsourcing auto-tuning)
  • IJPP publication (2011) - introducing machine learning based compiler, cTuning.org and reproducible R&D on program and architecture optimization
  • Long term vision slides - "Systematizing tuning of computer systems using crowdsourcing and statistics"
  • cM basics slides - "Collective Mind infrastructure and repository to crowdsource auto-tuning"

Public repository of knowledge

Do not waste your research material - use Collective Mind Framework and Repository to describe, run and share your experiments with the community!

  • Beta live Collective Mind repository (3rd generation opened in 2013 substituting previous cTuning repository and infrastructure available since 2008) - we described and shared all our past research developments, codelets, benchmarks, data sets, models, statistical analysis, modeling and online learning plugins and tools to start top-down analysis and optimization of existing computer systems. We used it as the first practical example to motivate new publication model where all research artifacts are continuously shared, validated and improved by the community. After many years, it seems that community finally started moving in this direction and we even see some related initiatives in major conferences including OOPSLA and PLDI. We believe that our project and feedback from the community collected since 2006 is complementary and can help with various technological aspects of collaborative and reproducible research in computer engineering.

Common infrastructure and support tools

  • Collective Mind Infrastructure (cM) - plugin-based framework and repository for collaborative, systematic and reproducible research and experimentation
    • OpenME - universal and simple event-based interface to "open up" black box applications and third-party tools such as GCC, LLVM and Open64 to be able to monitor, learn and predict any fine-grain optimization decision inside through external plugins
    • Alchemist - OpenME plugin to convert compilers into interactive analysis and optimization toolsets

Events

Current customized usage scenarios

Designing novel many-core computer systems becomes intolerably complex,  ad-hoc, costly and error prone due to limitations of available technology, enormous number of available design and optimization choices, and complex interactions between all software and hardware components. Empirical auto-tuning combined with run-time adaptation and machine learning has been demonstrating good potential to address above challenges for more than a decade but still far from the widespread production use due to unbearably long exploration and training times, ever changing tools and their interfaces, lack of a common experimental methodology, and lack of unified mechanisms for knowledge building and exchange apart from publications where reproducibility of results is often not even considered. Since 1993, we have spent more time on preparing and analyzing huge amount of heterogeneous experiments for self-tuning machine-learning based computer systems or trying to validate and reproduce others research results rather than on exending our novel ideas.

In 2007, we decided to start collaborative systematization and unification of design and optimization of computer systems combined with a new publication model where experimental results are validated by the community. One of the possible promising solutions is to combine public repository of knowledge with online auto-tuning, machine learning and crowdsourcing techniques where HiPEAC and cTuning communities already have a good practical experience. Such collaborative approach should allow community to continuously validate, systematize and improve collective knowledge about computer systems, and extrapolate it to build faster, more power efficient and reliable computer systems. It can also help to restore the attractiveness of computer engineering making it a more systematic and rigorous discipline rather than "hacking".

We develop cTuning collaborative research and development infrastructure and repository (current version is cTuning3 aka Collective Mind) that enables:

  • gradual decomposition and parametrization of complex computer systems and experiments into unified and inter-connected Collective Mind modules (components or plugins) with extensible meta-information
  • easy co-existance of multiple versions of tools and libraries
  • implementation of experimental pipelines with all related artifacts necessary for collaborative and reproducible research and experimentation
  • collection and sharing of statistics, benchmarks, codelets, tools, data sets and predictive models from the community
  • systematizaton of optimization, design space exploration and run-time adaptation techniques (co-design and auto-tuning)
  • collaborative evaluation and improvement of various data mining, classification and predictive modeling techniques for off-line and on-line auto-tuning
  • new publication model (workshops, conferences, journals) with validation of experimental results by the community

Current cM version includes public benchmarks, datasets, tools, techniques and some stats from past Grigori Fursin's research:

  • support for most OSes and platforms (Linux, Android, Windows; servers, cloud nodes, mobiles, laptops, tablets, supercomputers)
  • multiple benchmarks (cBench, polybench, SPEC95,SPEC2000,SPEC2006,EEMBC,etc), hundreds of MILEPOST/CAPS codelets, thosands of cBench datasets
  • multiple compilers (GCC, LLVM, Open64, PathScale, Intel, IBM, PGI)
  • tools for program and architecture characterization (MILEPOST GCC for semantic features and code patterns; hardware counters for dynamic analysis)
  • plugins for powerful visualization and data export in various formats
  • experimental pipeline for universal program and architecture co-design, auto-tuning, performance/energy modeling and machine learning
  • OpenME interface to instrument programs or statically enable adaptive binaries through multi-versioning and decision trees for run-time adaptation/scheduling while easily mixing CPU/CUDA/OpenCL codelets or any other heterogeneous programming models
  • plugins for online auto-tuning and performance model building
  • machine-learning enabled self-tuning cTuning CC compiler that can wrap any existing compiler while using crowd-tuning and collective knowledge to continuously improve its own behavior
  • plugins for universal P2P data exchange through cM web services
  • optimization statistics for various ARM, Intel and NVidia chips


See reproducibility wiki for further details.


(C) 2011-2014 cTuning foundation