Making computer engineering a science;
Systematizing program and system analysis and optimization using auto-tuning, machine learning and social networking.
Continuing innovation in science and technology is vital for our society and requires ever increasing computational resources. However, delivering such resources became intolerably complex, ad-hoc, costly and error prone due to an enormous number of available design and optimization choices combined with the complex interactions between all software and hardware components, and a large number of incompatible analysis and optimization tools. As a result, understanding and modeling of the overall relationship between end-user algorithms, applications, compiler optimizations, hardware designs, data sets and run-time behavior, essential to provide better solutions and computational resources, became simply infeasible as confirmed by many recent long-term international research visions about future computer systems.
Since 1997, based on our interdisciplinary background, we started a long-term and painful proсess of revisiting research and development methodology as well as publication model in computer engineering that favors collaborative discovery, systematization, sharing and reuse of knowledge. Motivated by physics, biology and AI sciences, we developed the first version of public repository and infrastructure (cTuning.org) to allow researchers share data (applications, data sets, codelets and architecture descriptions), modules (classification, predictive modeling, run-time adaptation) and statistics about behavior of computer systems manually or automatically. Having common infrastructure and repository allows users to quickly reproduce and validate existing results, and focus their effort on novel approaches combined with with data mining, classification and predictive modeling rather than spending considerable effort on building new tools with already existing functionality or using some ad-hoc tuning heuristics. It will also allow conferences and journals to favor publications that can be collaboratively validated by the community.
Current cTuning framework is aging and is currently undergoing major upgrade (repository, tools, models, interfaces). We plan to release it in summer 2012.
I have been asked many times why I developed cTuning technology particularly since my original background was not in computer engineering. So, I decided to write a brief history of cTuning.
When I was young, I encountered several books in our family library in around 1987 that heavily influenced my research path: "I, robot" by A.Azimov, "Eye, brain, vision" by D.H.Hubel and a few Russian books on "architecture of modern computers" and "semiconductor neural computers of the future". Since then, I became extremely interested in robotics and understanding how brain/mind works since it opens up so many exciting opportunities for the society.
In 1993, just after high school, I joined Moscow Institute of Physics and Technology, taking classes in semiconductor electronics, physics, mathematics, statistics and AI, while participating in several R&D projects trying to build new semiconductor neural networks to mimic object recognition in a cortex. Though we had some success, the major problem I faced was actually in computer modeling of such networks which became too time consuming, unreliable and considerably slowed down my research. By 1995, I had been spending most of my time trying to manually optimize my code using different languages including Basic, Pascal, C, C++, Fortran, writing kernels in assembler and trying some random transformations with random parameters but was slowed down by the lack of common tuning methodology. In 1997, I decided to switch to computer engineering since I realized that without making computers of that time faster, I will not be able to continue my research on brain modelling and neural computers.
For my MS project in 1997, I managed to get an access to a supercomputer and started parallelizing and optimizing my neural network simulation software using MPI but ever changing hardware and lack of unified tools and interfaces again slowed me down. At that time, Internet and Java has been gaining popularity in Russia so for my MS thesis I proposed to try to abstract HPC systems and make a unified access to their resources as web services. I finished the working prototype in 1998 which attracted attention of my colleagues at the University of Edinburgh.
Eventually, in 1999, I decided to enrolled to a PhD program at the University of Edinburgh and proposed a project to systematize program optimization based on empirical auto-tuning combined with predictive modeling - something that I have been working since 1995 but mainly as hacking. However, lack of common analysis and transformations tools again slowed me down. Actually, first years of my PhD I spent only on developing program analysis and polyhedral transformation tools while struggling to reproduce experimental results and algorithms of numerous papers. Furthermore, my research was again slowed down by the necessity to use extra slow architecture simulators just like I was slowed down originally by modeling neural networks.
This forced me to search for some alternative solutions based on my background in physics. I proposed to characterize program and architecture behavior by reactions to some modifications even if they break original semantics. For example, we can remove memory accesses to understand if a program is memory or CPU bound in a realistic environment, very quickly and without any simulation or hardware counters sampling. Adding or removing random threads, we can detect if code and architecture can be effected by cache contentions, etc. Furthermore, this approach helped me to reduce optimization search spaces: if the code is memory bound, we should evaluate memory optimizations or reduce frequency of the processor to save power; if the code is CPU bound, we should focus on parallelization, etc.
At the same time, instead of developing new compilers from scratch, I decided to "open up" existing compilers using simple event and plugin-based system that I called "Interactive Compilation Interface". In 2004, I developed ICI for Open64 and PathScale compilers that allowed me to access and tune most of the internal transformations such as tiling, unrolling, prefetching, vectrorization, array padding, etc while using production-quality compilers. In 2005, I joined INRIA as a postdoct and continue developing ICI for GCC to enable fine-grain transformations, pass reordering and extraction of code properties that we used in multiple collaborative projects and eventually moved to mainline GCC.
Finally, I was ready to get back to research on predictive modeling for program and architecture optimizations, but faced yet another problem: any machine learning requires huge amount of training data for classification and modeling. Based on my experience in building web-services to access supercomputers, I proposed to build infrastructure for the MILEPOST project (2006-2009) to distribute training and modeling among multiple users and enable data and model sharing through unified web-services: cTuning1 infrastructure was born. During MILEPOST project I have built cTuning web portal, open-source tools and MILEPOST GCC/cTuning CC compiler to transparently share statistics about behavior of computer systems from mutiple users in the cTuning repository and build predictive models on the fly. Now, instead of developing various ad-hoc benchmarks and models, we can build the most realistic data set ("big data") accessible by the community to explain anomalies in the behavior, improve predictive models, suggest how to optimize existing programs or build more efficient computer systems.
I would like to thank Michael O'Boyle, Olivier Temam and all my colleagues for interesting and sometimes tough discussions, feedback, collaborations and support during development of cTuning technology.
I strongly believe that cTuning-like technology is the missing piece of the puzzle about how to build efficient adaptive computer systems using big data, predictive modeling and collective intelligence. I am happy that cTuning technology helped to develop first machine learning enabled compiler and has been adopted by the Intel Exascale Lab in France in 2010 to build future Exascale machines. I currently focus on the development of the new public cTuing3 framework (Collective Mind Infrastructure) implementing new ideas and promoting new publication model in computer engineering - if you are interested in this technology, join our collaborative effort!