Collective Mind Framework and tools
- Collective Mind - framework and repository for collaborative and reproducible research
Latest publications and presentations
- ArXiv open access technical report with long term vision "Collective Mind: cleaning up the research and experimentation mess in computer engineering using crowdsourcing, big data and machine learning"
- Long term vision - "Systematizing tuning of computer systems using crowdsourcing and statistics"
- cM basics - "Collective Mind infrastructure and repository to crowdsource auto-tuning"
- Grigori Fursin
- Zbigniew Chamski
Designing, analyzing and optimizing computer systems is nowadays a tremendously tedious, messy, ad-hoc, costly and error prone process due to an enormous number of available design and optimization choices combined with complex interactions between all components, and rapidly changing technology, tools and interfaces.
Auto-tuning, run-time adaptation and machine learning based techniques have been investigated for more than a decade to address some of these challenges but are still far from the widespread production use. This is not only due to unbearably long tuning times and ever changing interfaces of analysis and optimization tools, but also due to a lack of a common methodology to discover, preserve and share knowledge about behavior of existing computer systems. Current solutions are mainly proprietary and include redesign of the whole SW/HW stack by very large groups (Liquid Metal project from IBM, SW/HW co-design initiatives from Intel, SciDAC SUPER project).
When developing cTuning and Collective Mind technology our main goal was to develop such a holistic methodology, framework and repository for collaborative and systematic research and experimentation on software/hardware co-design which will not require redesign of the whole SW/HW stack but could easily and transparently evolve with the evolution of existing systems and tools. At the same time we wanted to implement the core of cTuning/Collective Mind by just a few engineers using Agile methodology and enable easy extensibility by the community to share, reproduce and validate various benchmarks, data sets, tools, models, experimental results, etc.
We believe that in the past 14 years, we found all the missing pieces of the puzzle to address the above challenges and enable systematic and reproducible characterization and optimization of computer systems through unified and scalable repositories of knowledge and crowdsourcing. In this approach, multi-objective program and architecture tuning to balance performance, power consumption, compilation time, code size and any other important metric is transparently distributed among multiple users while utilizing any available mobile, cluster or cloud computer services. Collected information about program and architecture properties and behavior is continuously processed using statistical and predictive modeling techniques to build, keep and share only useful knowledge at multiple levels of granularity. Gradually increasing and systematized knowledge can be used to predict most profitable program optimizations, run-time adaptation scenarios and architecture configurations depending on user requirements.
Collective Mind Framework (cM) is a public, open-source, plugin-based infrastructure and repository that attempts to implement the above methodology. Motivated by physics, biology and AI sciences, this framework helps researchers to gradually expose tuning choices, properties and characteristics of any tool or application at multiple granularity levels in existing systems through multiple plugins ("wrappers"). These wrappers use simple and extensible no-type interfaces and no-SQL JSON-based extensible file-based repositories. Such wrappers can be easily combined like research "LEGO" to prepare various exploration, analysis and optimization scenarios and connected to customizable public or private in-house repositories of shared data (applications, data sets, codelets, micro-benchmarks and architecture descriptions), modules (classification, predictive modeling, run-time adaptation) and statistics about behavior of computer systems. Collected data can be continuously analyzed and extrapolated using online learning to predict better optimizations or hardware configurations to effectively balance performance, power consumption and other characteristics. We start from the top-down analysis and optimization of existing computer systems and together with the community start gradually increasing the complexity until we understand and systematize the behavior of existing computer systems to be able to either quickly predict how to optimize them or to build new better systems by extrapolating existing knowledge.
In 2013 we opened 3rd generation of the live cM repository (that substituted previous public version opened in 2008) and released all his past research developments, codelets, benchmarks, data sets, models, online learning plugins and tools to start top-down analysis and optimization of existing computer systems, and together with the community start gradually increasing the complexity until we understand and systematize the behavior of existing computer systems to be able to either quickly predict how to optimize them or to build new better systems by extrapolating existing knowledge. This methodology should also push forward new international publication model in computer engineering where experimental results are continuously validated and improved by the community.
cM is a community-based and continuously evolving project that uses agile development methodology. Hence, interfaces and modules may be changing from time to time to provide needed functionality. We are very thankful for your understanding, patience and any help to extend and improve this framework while making it clean, simple and easy to use.