NEWS:
- Please, join our panel on reproducible research at ADAPT'14 @ HiPEAC 2014 in January 2014, or submit papers to TRUST'14 @ PLDI 2014
- Download latest stable BSD-licensed Collective Mind framework and repository from Sourceforge (1.0.2318beta)
- Access pilot live Collective Mind repository
Since 2006, we are working on a common methodology, infrastructure and repository to enable collaborative and reproducible research and experimentation in computer engineering while focusing on auto-tuning, co-design machine learning and run-time adaptation of computer systems! Such approach enables new publication model where all research materials (artifacts) are continuously shared, validated and improved by the community! To set up an example, we started collecting, unifying and releasing all benchmarks, data sets, models and tools with unified interfaces at cTuning.org and later at c-mind.org since 2008. In spite of original hostility to this project from the academic community, we glad to eventually see recent similar initiatives at major conferences! However, our project is complementary and focuses more on technological aspects of collaborative and reproducible research in computer engineering rather than just sharing and validating artifacts. If you are interested in this community project, join our events and effort, collaborate, invest or contact Grigori Fursin (project founder) for more details!
Contents
Collaborative, systematic and reproducible computer engineering
With the rapid advances in information technology and all other fields of science comes dramatic growth in the amount of processing data ("big data"). Scientists, engineers and students are drowning in experimental data and often have to divert their research path towards data management, mining, and visualization. Such approaches often require additional interdisciplinary skills including statistical analysis, machine learning, programming and parallelization, database management, and Internet technologies, which still few researchers have or can afford to learn in parallel with their main research work. Multiple frameworks, languages and public data repositories started appearing recently to enable collaborative data analysis and processing but they are often either covering very narrow research topics and too simplistic (just data and code sharing) or very formal and still require special programming skills often including Object Oriented Programming. Collective Mind technology (cM) attempts to fill in this gap by providing researchers and companies a simple, portable, technology-neutral and practically transparent way to gradually systematize and classify all their data, code and tools. Open source cM framework and repository fully relies on customizable public or private plugins (mostly written in python with support of any other language through OpenME interface) to gradually describe and classify similar data and code objects, or abstract interfaces of ever changing tools thus effectively protecting researchers' experimental setups. cM helps to easily preserve any complex research artifact (collection of files, benchmarks, codelets, datasets, tools, traces, models) with gradually and easily extensible JSON based meta description including classification, properties and either direct or semantic data connections. Furthermore, meta descriptions of all data can be transparently and easily indexed using third-party ElasticSearch enabling very fast and complex queries. At the same time, all research artifacts can be exposed to any public or workgroup user through unified web services to crowdsource experimentation, ranking, online learning and knowledge management. cM uses agile top-down methodology originating from physics to represent any experimental scenario and gradually decompose it into connected plugins with associated data or compose it from already shared plugins similar to "research LEGO". Universal structure immediately enables replay mode for any experiment, thus making this framework suitable for recent projects on reproducibility of experimental results and new publication model where experiments and techniques are validated, ranked and improved by the community. For example, we easily moved all our past R&D on program and architecture multi-objective auto-tuning, co-design and dynamic adaptation to cM plugins and gradually make them available together with all research artifacts at http://c-mind.org/repo. We hope that cM will be useful to a broad range of researchers and companies either as an open-source, community driven solution to systematize their research and experimentation, or possibly as an intermediate step before investing into more complex or commercial knowledge management systems. |
|
Related vision publications and presentations
- ArXiv open access technical report with long term vision "Collective Mind: cleaning up the research and experimentation mess in computer engineering using crowdsourcing, big data and machine learning"
- GCC Summit 2009 publication introducing our vision and cTuning framework for collaborative and reproducible analysis, design and optimization of computer systems
- Long term vision slides - "Systematizing tuning of computer systems using crowdsourcing and statistics"
- cM basics slides - "Collective Mind infrastructure and repository to crowdsource auto-tuning"
Public repository of knowledge
Do not waste your research material - use Collective Mind Framework and Repository to describe, run and share your experiments with the community!
- Beta live Collective Mind repository (3rd generation opened in 2013 substituting previous cTuning repository and infrastructure available since 2008) - we described and shared all our past research developments, codelets, benchmarks, data sets, models, statistical analysis, modeling and online learning plugins and tools to start top-down analysis and optimization of existing computer systems. We used it as the first practical example to motivate new publication model where all research artifacts are continuously shared, validated and improved by the community. After many years, it seems that community finally started moving in this direction and we even see some related initiatives in major conferences including OOPSLA and PLDI. However, our project is complementary and focuses more on technological aspects of collaborative and reproducible research in computer engineering rather than just sharing and validating artifacts.
Common infrastructure and support tools
- Collective Mind Infrastructure - plugin-based framework and repository for collaborative and reproducible research and experimentation
Discussions
- Collective Mind google group - recent group related to wide aspects of reproducible computer engineering
- Older cTuning google group - related to auto-tuning and machine learning (now outdated)
- Collective Mind possible extension projects
Events
Upcoming
- Workshop TRUST 2014 on reproducible research methodologies and new publication models @ PLDI 2014 (Edinburgh, UK)
- Workshop REPRODUCE 2014 on reproducible research methodologies and new publication models @ HPCA 2014 (Orlando, Florida, USA)
- Panel on reproducible research methodologies and new publication models at ADAPT 2014 @ HiPEAC 2014 (Vienna, Austria)
Past
- Thematic session on making computer engineering a science @ ACM ECRC 2013 / HiPEAC computing week 2013 (Paris, France)
- Thematic session on collective characterization, optimization and design of computer systems @ HiPEAC spring computing week 2012 (Goteborg, Sweden)
- Tutorial on Speedup-Test: Statistical Methodology to Evaluate Program Speedups and their Optimisation Techniques @ HiPEAC 2010 (Pisa, Italy)
- Tutorial on cTuning tools for collaborative and reproducible program and architecture characterization and auto-tuning @ HiPEAC computing systems week 2009 (Infineon, Munich, Germany)
Collective Mind is a community-based and continuously evolving project that uses agile development methodology. Hence, interfaces and modules may be changing from time to time to provide needed functionality. We are very thankful for your understanding, patience and any help to extend and improve this framework while making it clean, simple and easy to use.