History of the non-profit cTuning foundation

cTuning foundation history and manifesto

Continuing innovation in science and technology is vital for our society and requires ever increasing computational resources. However, delivering such resources became intolerably complex, ad-hoc, costly and error prone due to an enormous number of available design and optimization choices combined with the complex interactions between all software and hardware components, and a large number of incompatible analysis and optimization tools. As a result, understanding and modeling of the overall relationship between end-user algorithms, applications, compiler optimizations, hardware designs, data sets and run-time behavior, essential to provide better solutions and computational resources, became simply infeasible as confirmed by many recent long-term international research visions about future computer systems. Worse, this is a lack of a common experimental methodology and unified mechanisms for knowledge building and exchange apart from numerous similar publications where reproducibility and statistical meaningfulness of results as well as sharing of data and tools is often not even considered in contrast with other sciences including physics, biology and artificial intelligence. In fact, it is often impossible due to a lack of common and unified repositories, tools and data sets.

At the same time, there is a vicious circle since initiatives to develop common tools and repositories to unify, systematize, share knowledge (data sets, tools, benchmarks, statistics, models) and make it widely available to the research and teaching community are practically not funded or rewarded academically where a number of publications often matter more than the reproducibility and statistical quality of the research results. As a consequence, students, scientists and engineers are forced to resort to some intuitive, non-systematic, non-rigorous and error-prone techniques combined with unnecessary repetition of multiple experiments using ad-hoc tools, benchmarks and data sets. Furthermore, we witness slowing down innovation, dramatic increase in development costs and time-to-market for the new embedded and HPC systems, enormous waste of expensive computing resources and energy, and diminishing attractiveness of computer engineering often seen as "hacking" rather than systematic science.

Grigori Fursin has suffered from similar numerous problems during his research, development and experimentation since 1999 when trying to automate program optimization, and hardware co-design using auto-tuning, big data and predictive analytics (machine learning, data mining, statistical analysis, feature detection):

lack of common, large and diverse benchmarks and data sets needed to build statistically meaningful predictive models;
lack of common experimental methodology and unified ways to preserve, systematize and share our growing optimization knowledge and research material including benchmarks, data sets, tools, tuning plugins, predictive models and optimization results;
problem with continuously changing, "black box" and complex software and hardware stack with many hardwired and hidden optimization choices and heuristics not well suited for auto-tuning and machine learning;
difficulty to reproduce performance results shared by users due to a lack of full specifications including all software and hardware dependencies;
difficulty to validate related auto-tuning and machine learning techniques from existing publications due to a lack of culture of sharing research artifacts with full experiment specifications along with publications in computer engineering.

Eventually, Grigori decided to establish cTuning foundation and proposed an alternative solution based on his interdisciplinary background in physics, electronics, machine learning and web technology: developing a common experimental infrastructure, repository and public web portal that could crowdsource program analysis and compiler optimization across diverse hardware provided by volunteers. His goal was to persuade our community to start sharing realistic workloads, benchmarks, data sets, tools and predictive models together with experimental results along with their publications. This, in turn, could help the community validate and improve past techniques or quickly prototype new ones using shared code and data.

In the beginning, many academic researchers were not very enthusiastic about this approach since it was breaking a "traditional" research model in computer engineering where promotion is often based on a number of publications rather than on reproducibility and practicality of techniques or sharing of research artifacts. Nevertheless, Grigori decided to risk and validate his approach with the community by releasing his whole machine learning based program and compiler optimization together with all benchmarks and data sets in 2009. This infrastructure was connected to a public cTuning.org repository of knowledge allowing the community to share their experimental results and consider program optimization as
a collaborative "big data" problem and tackle it using powerful predictive analytics and collective intelligence. At the same time, Grigori shared all experimental workloads and results as well as program, architecture and data sets "features" or meta-information necessary for machine learning and data mining together with generated predictive models and classifiers along with his open access publications (MILEPOST GCC, Collective Tuning Initiative).

The community served as a reviewer of our open access publications, shared code and data, and experimental results on machine learning based self-tuning compiler. For example, this work was featured twice on the front page of Slashdot.org news website with around 150 comments.Of course, such public comments can be just "likes", "dislikes", unrelated or possibly unfair which may be difficult to cope particularly since academic researchers often consider their work and publications unique and exceptional. On the other hand, quickly filtering comments and focusing on constructive feedback or criticism helped us to validate and improve our research techniques besides fixing obvious bugs. Furthermore, the community helped us find most relevant and missing citations, related projects and tools - this is particularly important nowadays with a growing number of publications, conferences, workshops, journals, initiatives and only a few truly novel ideas.

Exposing research to the community and engaging in public discussions can be really fun and motivating, particularly after the following remark which we received on Slashdot about MILEPOST GCC: "GCC goes online on the 2nd of July, 2008. Human decisions are removed from compilation. GCC begins to learn at a geometric rate. It becomes self-aware 2:14AM, Eastern time, August 29th. In a panic, they try to pull the plug. GCC Strikes back". It is even more motivating to see that your shared techniques have been immediately used in practice, improved by the community, or even had an impact on industry. For example, our community driven approach was referenced in 2009 by IBM for speeding up development and optimization of embedded systems, included to mainline GCC 4.6+, extended in Intel Exascale Lab, and referenced in 2014 by Fujitsu on "big data" driven optimizations for Exascale Computer Systems (see our achievements).

Furthermore, open access publications and artifacts bring us back to the root of academic research - it is now possible to fight unfair or biased reviewing which is sometimes intended to block other authors from publishing new ideas and to keep monopoly on some research topics by several large academic groups or companies. To some extent, rebuttals were originally intended to solve this problem, but due to an excessive amount of submissions and lack of reviewing time, it nowadays has very little effect on the acceptance decision. This problem often makes academic research looks like business rather than collaborative science, puts off many students and younger researchers, and was emphasized at all our organized events and panels.

However, with an open source publication and shared artifacts, it is possible to have a time stamp on your open access publication and immediately engage in public discussions thus advertising and explaining your work or even collaboratively improving it - something what academic research was originally about. At the same time, having an open access paper does not prevent from publishing a considerably improved article in a traditional journal while acknowledging all contributors including engineers whose important work is often not even recognized in academic research. For example, we received an invitation to extend our open access paper on MILEPOST GCC and publish it in a special issue of IJPP journal. Therefore, open access and traditional publication models may possibly co-exist while still helping academic researchers with a traditional promotion.

It is even possible to share and discuss negative results (failed techniques, model mispredictions, performance degradations, unexplainable results) to prevent the community from making the same mistakes and to collaboratively improve them. This is largely ignored by our community and practically impossible to publish currently. In fact, negative results are in fact very important for machine learning based optimization and auto-tuning. Such techniques are finally becoming popular in computer engineering but require sharing of all benchmarks, data sets and all model mispredictions, besides positive results, to be able to improve them as it is already done in some other scientific disciplines.

Finally, sharing research artifacts brings people together and raises interest - the community continue being interested in our projects mainly because they are accompanied by all code and data enough to reproduce, validate and extend our model-driven optimization techniques. At the same time, sharing all research material in a unified way helped us to bring interdisciplinary communities together to explain performance anomalies, improve machine learning models or find missing features for automatic program and architecture optimization while treating it as a "big data" problem. We also used it to conduct internal student competitions to find the best performing predictive model. Finally, we used such data to automatically generate interactive graphs to simplify research in workgroups and to enable interactive publications (as shown in our interactive, CK-based report).

We strongly believe that reproducibility should not be forced but can come as a side effect. For example, our community driven research helped us to expose a major problem that makes reproducibility in computer engineering very challenging. We have to deal with ever changing hardware and software stack making it extremely difficult to describe experiments with all software and hardware dependencies, and to explain unexpected behavior of computer systems. Therefore, just reporting and sharing experimental results including performance numbers, version of a compiler or operating systems and a platform is not enough - we need to preserve the whole experimental setup with all related artifacts and meta-information describing all software and hardware dependencies.

This problem motivated us to develop a new methodology and open source Collective Collective Knowledge infrastructure (aka CK) to gradually describe, categorize, preserve and share the whole experimental setups and all associated research artifacts with their meta-description as public and reusable components in a live repository of knowledge. CK technology allows users to:

share experimental data, tools, models and interfaces (in workgroups through a private repository or with anyone through a public repository)
collaboratively explore large design and optimization spaces
apply classification and predictive models to existing data to explain complex behavior of existing computer systems
validate data and models by the community (similar to physics, biology and other sciences)
collaboratively identify anomalies in behavior and suggest how to improve it (statically or dynamically)
extrapolate this knowledge to build more efficient computer system in terms of performance, power, size, reliability and other characteristics

At the same time, we and the community benefit from public discussions and from agile development methodologies to continuously improve our techniques and tools.

After many years of evangelizing collaborative and reproducible research in computer engineering based on the presented practical experience, we finally start seeing the change in mentality in academia, industry and funding agencies. In our last ADAPT workshop authors of two papers (out of nine accepted) agreed to have their papers validated by volunteers. Note that rather than enforcing specific validation rules, we decided to ask authors to pack all their research artifacts as they wish (for example, using a shared virtual machine or as a standard archive) and describe their own validation procedure. Thanks to our volunteers, experiments from these papers have been validated, archives shared in our public Collective Knowledge repository, and papers marked with a "validated by the community" stamp:

We also initiated and sponsored artifact evaluation at CGO and PPoPP conferences!

We strongly believe that we have found all the missing pieces of the puzzle to make research, development and experimentation in computer engineering truly collaborative and reproducible with the help of the community. We hope that academic and industrial community will join us to collaboratively share, systematize, validate and improve collective knowledge about behavior and optimization of computer systems while extrapolating it to build faster, smaller, cheaper, more power efficient and reliable computer systems. We also hope that our approach will help to restore the attractiveness of computer engineering making it a more systematic and rigorous discipline rather than "hacking" or monopolized business. Finally, we expect that it will help boost innovation in science and technology particularly related to our side projects on bio and brain-inspired self-tuning and self-learning computer systems, electronic brain and Internet of Things. If you are interested to help, join collaborative projects or sponsor our activities, do not hesitate to contact Grigori Fursin.

Published version of this manifesto and proposal in ACM DL and arXiv (for references).
Our Dagstuhl report on Artifact Evaluation (link);
Our open challenges for computer engineering;
Archive of our initiatives for reproducible research.