Tools:CM:Specification:V1.0

Disclaimer

The alpha variant of this document has been written by Grigori Fursin very quickly during development of the cM framework. Specification may still be changing until the official release. Please contact us if you notice mistakes or inconsistencies to collaboratively improve this document!

Abbreviations

cM - Collective Mind Framework
CMR - Collective Mind Repository
UOA - cM UID or alias

cM concept and design motivation

Current Collective Mind concept and design accumulates all our past 20 years of R&D experience:

In our past research, we spent most of the time not on data analysis and checking novel research ideas, but on dealing with ever changing tools, architectures and huge amounts of heterogeneous data. Therefore, we decided to use some wrappers to abstract all tools. These wrappers became cM plugins (modules), i.e. source code have an associated module code.source, compiler - ctuning.compiler, binary - code, dataset - dataset, etc.
Various cM modules may have different functions (actions) such as code.source build to build project, code run to run code, etc. Therefore, to unify access to modules, we use a command line front-end cm which allows one to access all modules and their functions using cm <name of the module> <action> parameters
For each of the module we may need to store some associated data, i.e. if it's a dataset, we may need to store a real data set (i.e. image file, video file, text file, audio, etc), for source code, we may want to keep all the source code files including Makefiles, etc. Previously, we used MySQL but it was very long and complex to extend it for each of module (we had to rebuild tables, check all relations, etc) or to keep binary data. Also, if some experimental data goes wrong, it's very long to "clean up" and update the repository. Finally, as researchers, we often want to have a direct access to our experimental files, etc. That's why we often keep myriads of csv files, etc. Therefore, for cM, we decided to use our own very simple directory and file based repository: cM repository can be inside any directory and starts with a .cmr root directory, followed by UID or alias of the module and then UID or alias of associated entries:

Another problem that we faces in the past research, was dealing with evolution of our own software. Hence we decided to provide unique IDs for each module and data entry while allowing high-level aliases, i.e. module code.source has cM UID 45741e3fbcf4024b. We can call high-level modules or data using alias but when the module API changes dramatically (not just extended while keeping backwards compatibility), we keep the alias but change the UID! Most of the cM modules can deal with both UID and alias - this combination is called cM UOA (UID or Alias). Since repository is also data, it has its own UID. Therefore, any data can be found using either <module UOA>: and then it is searched through all available repositories, or using <repo UOA>:<module UOA>:<data UOA>. Unique data identifier in cM is called CID (cM ID) and has the format of (<repo UOA>:)<module UOA>:<data UOA>
Naturally, such design is very flexible but can be slow for search, etc. However, such design is very easy to combine with existing indexing tools. We decided to use ElasticSearch that works directly with JSON and can perform fast search and complex queries. We provided support for on-the-fly indexing of data in cM.
Yet another problem that we had was the use of different frameworks when we wanted to either just run experiments (mobiles, GRID, cloud, supercomputers) or perform analysis or provide web front-end or build graphs, etc. Now, we can use the same framework with various module selection (minimal cM core is only around 500KB).
Interestingly, modules are also entries inside repositories making it possible to continuously evolve framework and models when more "knowledge" is available.
We added module "class" to start gradually classifying all data entries. We can also rank useful data entries (that can be most profitable compiler optimizations or models, etc).
Since format of cM is now open and easily extensible, we can easily combine auto-tuning and expert knowledge (as module ctuning.advice, for example).

Now, we believe that we have a framework that is easy to extend to continue collaborative systematization of characterization, optimization, learning and design of computer systems:

We use gradual top-down decomposition and learning of computer systems (to keep complexity under control and balancing our return on investment (analysis/optimization cost vs benefit) and gradually improve our knowledge):

Cm top down decomposition and learning.png

cM UIDs and aliases

All cM entries (including modules which are treated as cM entries too) have a unique ID to be able to easily share data in P2P environments or dedicated repositories. cM UID is a lowercase string of 16 hexadecimal digits, such as a8aac7d3fec45ca9. Entries are stored as directories with a given UID.

However, for user convenience each entry may have a hardwired alias or display as alias. Hardwired alias means that it will be used for entry's directory name instead of UID. Display as alias means that directory name for a given entry will still be UID but an associated meta data will include a key 'display_as_alias' - this is useful when visualizing data in cM web front-end.

Aliases can have only alphanumeric characters and '_', '-', '.' for cross OS compatibility (since they will be used as directory names on Linux and Windows).

In most of the cases, cM can automatically find entry by both UID and alias that is referred as cM UOA (UID or alias); otherwise we use UID or alias respectively.

Repository

The core of the cM framework is the adaptive repository, i.e. the repository that can easily evolve during decomposition of complex systems. We decided to build a very simple and portable NoSQL repository that can be installed on any computer systems including mobiles, tablets, desktops, HPC servers, cloud services. We decided to use directory and file based repository with json description that is very portable, scalable and can be easily indexed by third-party indexing services such as ElasticSearch.

The current directory structure of any cM repository is:

.cmr/                                           # Repository directory.
.cmr/UOA of module/                             # First level directory associated with a given module.
.cmr/UOA of module/UOA of data                  # Second level directory associated data associated with a given module.
                                               #  This repository can contain any files and directories including traces, 
                                               #  benchmarks, data sets, executables, models, archives, pdfs, csv files, etc.
.cmr/UOA of module/UOA of data/.cm/config.json  # Data description: extensible meta data in json that summaries data content. 
                                               # Whenever data is accessed, only this file is loaded first and depending on the content and module action, 
                                               #  other data can be accessed too.

Note, that when directory have UOA entries and alias is used instead of UID, this directory will also include .cm sub directory that will disambiguate UIDs and aliases using 2 simple text files:

alias-a-<hardwired_alias_name> that contains associated UID
alias-u-<UID> that contains hardwired alias name

We need such structure to be able to speedup access to entries with aliases while avoiding search (i.e. entry can be loaded directly).

Modules

Since we envision that community will be sharing both data and modules, we found an interesting solution to keep modules as data in the repository thus simplifying and unifying the whole cM framework. cM module 'module' provides an abstraction to deal with modules in the framework. Hence all modules are kept in the following directory in the repository:

.cmr/module/UOA of module
.cmr/module/UOA of module/.cm/config.json      # Expose properties, characteristics, choices of the module/data or associated tool.
                        /module.py            # Code of module with actions. Core of cM framework is written in Python 
                                              #  but we are developing API to access cM modules and data from any other language.
                                              #  We hope that community will help with this effort.
                        /c/module.c
                        /cpp/module.cc     
                        /fortran/module.f
                        /php/module.php

Implementation and compatibility

We envision that cM framework will constantly evolve, hence we decided to implement all cM logic (kernel, access to repository, web server, visualization, predictive modeling, security, etc) as modules inside cM repository. Instead of providing versions of the framework and all modules, we use module UID as a version number. If the module evolves with backwards compatible API, neither alias nor UID is changing. However, if API or associated data becomes backwards incompatible, a new module should be created with a new UID while alias may remain unless the meaning changed. Internally, modules should call other modules only through UID and not through alias to solve compatibility issues: multiple versions of the module can co-exist, but other modules should be explicitly upgraded to support new modules. This also solves data compatibility - we do not allow to mix up data from incompatible versions of a module.

TBD: Version and build number of a module can be provided to keep track of changes.

Single access function

For simplicity and unification, we decided to have one access function to cM to accesses all cM modules and their functions with dictionary (can be directly represented as json) as input and output to allow simple mixing of all modules together no matter which language they are written on. Furthermore, any external script, program, architecture and tool in any language can be easily and universally connected to cM through light-weight OpenME interface. For now, only python and php parts are implemented while API for other languages calls cm command line. We gradually and collaboratively extend OpenME interface to support multiple languages.

Basic cM functionality to access repository, deal with cM UOA, to authenticate users, etc is implemented in kernel module. This module is always available in all other python modules as cm_kernel

Here is an example of loading entry os:android-generic-32 in some module:

r=cm_kernel.access({'cm_run_module_uoa':'os',
                   'cm_action':'load',
                   'cm_data_uoa':'android-generic-32'})
if r['cm_return']>0: return r

That's all! If error happens, cM framework properly deals with that if you provide the above if statement. If command was successful, you can get data as dictionary:

data=r['cm_data_obj']['cfg']

Note, that this command is equivalent to cM CMD invocation as:

cm os load cm_data_uoa=android-generic-32

cM has several internal parameters such as:

cm_console - controls CMD output and can be 'txt', 'json', 'json_with_indent', 'web' (used for plugins that return html or web data)
cm_user_uoa - user UOA in case authentication is needed

TBD

Framework directory structure

bin/                            # cm and cm.bat - scripts to call kernel module in python

repos/                          # directory with several repositories
repos/default/.cmr              # default cM repository with all modules and data for basic cM functionality
repos/default/.cmr/module/      # directories with all basic cM modules

repos/ctuning/.cmr/             # repository for collective characterization and optimization of computer systems
repos/ctuning-experiments/.cmr/ # repository for temporal experiments during optimization of computer systems

docs/                           # some documentation in text format
logs/                           # logs for web and index servers

scripts/                        # various demo scripts (examples, development, etc)

tmp/                            # directory for temporary files such as generation of plots, etc

Default data entry dictionary

Data in cM is represented as extensible no-schema dictionary that is stored as human-readable json file that can be directly edited if needed. Besides meta-description of the data itself, this dictionary can contain various cM internal keys:

{
"cm_access_control": {                     # Controls access to this entry (only in web services when auth is on)
 "comments_groups": "admin",              # Groups that can comment or rank entries
 "read_groups": "registered",             # Groups that can read entries
 "write_groups": "owner"                  # Groups that can write entries
}, 
"cm_classes_uoa": [                        # List of classes this entry belongs to
 "62a455f8c8042f90"                       
], 
"cm_description": "",                      # General description
"cm_display_as_alias": "",                 # User friendly alias when visualizing this data
"cm_dissemination": {                      # Whether this entry was publicly disseminated
 "publications": [                        #  for example, through publication, etc
   "0c44d9a2db3de3c9"                     
 ]
}, 
"cm_updated": [                            # When entry was updated
 {
   "cm_iso_datetime": "2012-04-12",       # Date and time in python format
   "cm_module_uid": "8a7141c59cd335f5",   # Which module updated this entry
   "cm_user_uoa": "0728a400aa1c86fe"      # Which user updated this entry
 }
], 
"powered_by": {                            # Which version of cM was used to create this entry
 "name": "Collective Mind Engine", 
 "version": "1.0.1977.beta"
}
...                                        # No-schema meta-data for the entry
}

Default data entry description

We specially use no-schema data representation in cM to allow researchers quickly prototype ideas rather than spending their effort on preparing strict typed data that can evolve over time. Instead, if prototype was successful and is planned to be released with cM, only then user can provide description of the data. This allows researchers to focus on quickly prototyping ideas rather than spending valuable time on preparing types and fixed data structures that may be useless in the future.

Data description of a cM entry can be found in a data description of an associated module under cm_data_description key. It has the following format:

{
 "<FLAT_KEY1>": <DESC1>,
 "<FLAT_KEY2>": <DESC2>
 ...
}

<FLAT KEY> describes any key in the dictionary hierarchy. It always starts with # followed by #key if it's a dictionary key or @number if it's a value in a list.

For example, <FLAT_KEY> for key c in dictionary {"a":[{"c":"d"}]} is ##a@0#c

<DESC> is the description dictionary of format:

{
 "type": ""                     # Choices are text, textarea, dict, list, url, integer, float, email, UOA or choice
 "desc_text": ""                # Description text that will be visualized
 "default_value": ""            # Default value
 "sort_index": ""               # Sort index when visualizing
 "has_choice":""                # 'yes' or 'no'
 "choice":[...]                 # Choices if type is choice or has_choice==yes

  "cm_module_uoa": ""            # Module UOA is type is uoa

  "disable":""                   # 'yes' or 'no' - disable during visualization

  "explorable": ""               # if 'yes' this is a tuning dimension that can be explored
 "forbid_disable_at_random": "" # if 'yes', forbid disabling during random exploration

}

Note: when entry is visualized using cM web services, this description is used to prepare user-friendly visualization of json with various choices or dependencies.

TBD: add all types of keys and dependencies!!!

cM kernel and configuration

Since all functionality in cM is in modules, the most low-level functionality is implemented in the module kernel while associated data entries keep basic cM configuration. When some cM module/function is invoked, cm bootstraps and initializes itself using configuration from cM data entries kernel:local or kernel:default (or from environment variable CM_DEFAULT_CFG (points to json file)). Then, any module has 2 objects:

ini dictionary that includes various cM parameters and configuration
cm_kernel module

The most useful to users and developers dictionary is ini['cfg'] that is a configuration of invoked module, i.e. data of entry module:<invoked_module>

cm_kernel object provides users cM configuration as dictionary in cm_kernel.ini['dcfg'], i.e. data of entry kernel:local or kernel:default or referenced by environment variable as described above.

cm_kernel also provides various bootstrap parameters and all cM low-level functions which are listed together with their API here.

These low level functions include:

loading module and data
generate or check cM UID
creating/deleting/updating data entries
finding entries
creating/deleting aliases
flattening/de-flattening dictionaries
loading/saving json
getting list of all UOAs in a path
unicode printing for console and web (to support Python 3.x in the future)
smart merging of dictionaries
converting string to SHA1
authenticating users
main single access function and variations (remote access through web-service, through CMD, as string, etc)

Note, that later, we provided higher-level and more user-friendly abstraction for repositories (module repo) and for dealing with searching/adding/updating/deleting data (module core) so users will most of the time use core or repo to access repository.

cM module repo

Module repo enables multiple repositories in cM that can be shared (SVN/GIT), remote (using cM web-services), excluded from search, etc. Basically adding or deleting repositories means that an associated repo entry describing repository is created or deleted from cM.

Module repo provides some high level functions to get a list of repositories, perform search for a given entry, download remote repository, copy or clean it, etc, which are described here.

cM module core

of the high-level functionality to deal with data entries in cM is implemented in module core. It includes:

adding data entry
loading data entry
updating data entry
deleting data entry
renaming data entry
finding data entry
listing data entries
searching in data entries

All functions are described here.

Note that all these core cM functions are inherited in all other modules. For example, it is possible to list all entries using the same function in all modules:

cm kernel list
cm repo list
cm processor list

We included cmx tool to cM that allows to perform all above functions in a user-friendly manner from CMD. Current functions in cmx include (retrieved using cmx --help):

cM simplified and user friendly command line module to deal with cM repositories

 cmx test                                                        - test cM functionality (alpha)

 cmx repo add/create (repo name)                                 - create repo in the current directory (.cmr)
                                                                    or import existing one;
cmx repo rm/del/delete (-f) (entry name)                        - delete repository reference
cmx repo mv/move/ren/rename (old entry name) (new entry name)   - rename repository reference
cmx repo update/co/checkout (repo name)                         - update shared repository
cmx repo share/commit (repo name)                               - commit to shared repository
cmx repo download (repo name)                                   - download repo from the web

 cmx add <entry name>/CID (param1=value1 ... @file1.json ...)    - add entry to current repository/module
                                                                    local repository by default
cmx update <entry name>/CID (param1=value1 ... @file1.json ...) - add entry to current repository/module
                                                                    local repository by default
cmx rm/del/delete (-f) <entry name>/CID                         - delete entry from the current repository/module
                                                                    local repository by default
cmx svnrm/svndel/svndelete <entry name>/CID                     - delete entry from the current repository/module using SVN
                                                                    local repository by default
cmx ren/rename <old entry name> <new entry name>                - rename entry in the current repository/module
                                                                    local repository by default
cmx mv/move <old CID> <new CID>                                 - move entry
                                                                    local repository by default
cmx cp/copy <old CID> <new CID>                                 - copy entry
                                                                  -k - keep old UID
cmx clean alias1 alias2 ...                                     - remove orphaned aliases

 cmx info/load (CID)                                             - show all info about entry (in json)
                                                                    global repository by default
cmx find (CID)                                                  - show path to the entry
                                                                    global repository by default
cmx list <module>                                               - list data for a given module
                                                                    global repository by default
cmx index (info)                                                - show status of indexing server
cmx index on                                                    - turn on usage of indexing by cM
cmx index off                                                   - turn off usage of indexing by cM
cmx index test                                                  - test indexing server
cmx index flush                                                 - clear the whole index
cmx index CID                                                   - index entry or entries
                                                                    local repository by default
                                                                    data can include patterns * and ?
cmx search (CID key1=value1 value2 ...)                         - search data
                                                                    if indexing server is on, use it,
                                                                    otherwise very slow
                                                                    if indexing is on, can use wildcards * and ?
                                                                    -s turns on progress for internal slow search
                                                                    -t=timeout - sets timeout for local slow search
cmx web_view/wv (CID)                                           - view entry in cM web front-end
cmx web_update/wu (CID)                                         - update entry in cM web front-end

 cmx uid                                                         - generate cM UID

 cmx restore_uid_from_alias <alias>                              - restore UID from alias file

 cmx password                                                    - prepare cM SHA1 password
                                                                   (if -s or --short, output only SHA1 password)

 cmx                                                             - print current CID
                                                                    global repository by default

 cmx help                                                        - help

Additional flags:
 -h, --help, -?                                                 - this help
                                                                    (otherwise global by default)
 -g, --global                                                   - apply command to global repository
 -l, --local                                                    - apply command to local repository
                                                                    (otherwise global by default)
 -c, --class="class1,class2,..."                                  - use classes for the command
                                                                    for example, add or list
                                                                    (it can be both UID or alias)
                                                                    (all classes should be present in an entry)
                                                                    (alias can include wild cards * and ?)
 -q, --quiet                                                    - quiet mode when deleting files, etc
 -f, --full                                                     - add repo to CID if needed
 -j, --json                                                     - return json if applicable

cM packages

When performing long-term experiments, it often happens that various tools evolved and users either need to re-run experiments or even run experiments with several tools co-existing in the system at the same time. Therefore, we provided a notion of universal packages (module package) in cM to describe installation process of various tools, dependencies on host OS, target OS, processor, compiler and any other package, and environment variables.

When package is installed, a new entry with a cM UID is created under code module where installation code will reside. At the same time, an OS script is created in the $CM_ROOT/bin with a name cm_code_env_<UID>.sh (for Linux) or %CM_ROOI%\bin with a name cm_code_env_<UID>.bat that described all PATHs, LIBs, and any other necessary environment variable for a given package.

Now, various packages can easily co-exist in the system: GCC, LLVM, ICC, Open64 with different versions; different libraries (lapack, blas, magma); various auto-tuning plugins or even codelets targeting CPU/CUDA/OpenCL that elegantly solves the problem of experiments and reproducibility in in heterogeneous and rapidly evolving systems!

Note, that user can manually call the above script to experiment with a given tool, or cM will automatically call such a script in high-level experiment scenarios (for example, in program and architecture tuning and learning pipeline).

Packages can be easily installed using cM web front-end (Usage -> Install packages).

Multiple repositories

There can be any number of repositories in cM. Repositories are represented by module repo in cM. When adding a new repository, an associated entry is created in cM that provides the path or URL of the repository and various meta information, i.e. if the repository is shared in the workgroup using SVN (or GIT), what are the access rights, etc.

It is possible to see all available repositories using cmx list repo.

When accessing entry without specifying a repository, all available local repositories will be searched.

cM by default includes remote cmind repository that is used for crowd-tuning and P2P data sharing.

Data classification

cM allows gradual classification of any data in the repositories using module class. Classes are added to the data entry description under key cm_classes_uoa (list of classes UOA).

Current classes can be viewed using cmx list class.

OpenME: opening up tools and applications for analysis and auto-tuning

This section should be extended

OpenME is an event-based modular framework (C/C++/Fortran/PHP) that allows to "open up" various tools and applications using just a few lines of code and dynamic plugins to connect them to cM for analysis, tuning and run-time adaptation. It is an evolution of Interactive Compilation Interface that Grigori Fursin developed originally for Open64 in 2005 and later in GCC, and that was eventually moved to GCC mainline (>=4.6) after collaboration with Google and Mozilla. OpenME may not be the fastest interface, but due to its simplicity, it targets novice users who want to quickly prototype ideas rather than spending time on mastering very complex event-based frameworks.

OpenME has 3 main functions:

openme_init(…) - initialize/load plugin
openme_callback(char* event_name, void* params) - call event
openme_finish(…) - finalize (if needed)

Example: OpenME for LLVM 3.x

This piece of code can control LLVM unrolling on the fly:

Source code: tools/clang/tools/driver/cc1_main.cpp:

#include "openme.h“
…
int cc1_main(const char **ArgBegin, const char **ArgEnd,

const char *Argv0, void *MainAddr) {

   openme_init("UNI_ALCHEMIST_USE", "UNI_ALCHEMIST_PLUGINS", NULL, 0);
  …
 // Execute the frontend actions.  
 Success = ExecuteCompilerInvocation(Clang.get());  
 openme_callback("ALC_FINISH", NULL);
 …
}

Source code: lib/Transforms/Scalar/LoopUnrollPass.cpp

#include <cJSON.h>
#include "openme.h“
…
bool LoopUnroll::runOnLoop(Loop *L, LPPassManager &LPM) {

  struct alc_unroll   {
    const char *func_name;    
    const char *loop_name;    
    cJSON *json;    
    int factor;    
  } alc_unroll;
…
alc_unroll.func_name=(Header->getParent()->getName()).data();  alc_unroll.loop_name=(Header->getName()).data();  
openme_callback("ALC_TRANSFORM_UNROLL_INIT", &alc_unroll);
…
// Unroll the loop.  
alc_unroll.factor=Count;  
openme_callback("ALC_TRANSFORM_UNROLL", &alc_unroll);  
Count=alc_unroll.factor;

if (!UnrollLoop(L, Count, TripCount, UnrollRuntime, TripMultiple, LI, &LPM))    
  return false;
…
}

It uses dynamic Alchemist plugin (available in cM as a dynamic library) that is used for fine-grain compiler analysis and tuning.

Example of Alchemist source code:

#include <cJSON.h>
#include <openme.h>

int openme_plugin_init(struct openme_info *ome_info) {
…
 openme_register_callback(ome_info, "ALC_TRANSFORM_UNROLL_INIT", alc_transform_unroll_init);  
openme_register_callback(ome_info, "ALC_TRANSFORM_UNROLL", alc_transform_unroll);  
openme_register_callback(ome_info, "ALC_TRANSFORM_UNROLL_FEATURES", alc_transform_unroll_features);   
openme_register_callback(ome_info, "ALC_FINISH", alc_finish);
…
}

extern void alc_transform_unroll_init(struct alc_unroll *alc_unroll){
  …
}

extern void alc_transform_unroll(struct alc_unroll *alc_unroll) {
 …
}
…

Example of OpenME for OpenCL/CUDA C application

Source code: 2mm.c / 2mm.cu

…
#ifdef OPENME
#include <openme.h>
#endif
…

int main(void) {
…
#ifdef OPENME  
 openme_init(NULL,NULL,NULL,0);  
 openme_callback("PROGRAM_START", NULL);
#endif
…
#ifdef OPENME  
openme_callback("ACC_KERNEL_START", NULL);
#endif

cl_launch_kernel();
 or
mm2Cuda(A, B, C, D, E, E_outputFromGpu);

#ifdef OPENME  
openme_callback("ACC_KERNEL_END", NULL);
#endif
…

…
#ifdef OPENME  
openme_callback("KERNEL_START", NULL);
#endif

mm2_cpu(A, B, C, D, E);

#ifdef OPENME  
 openme_callback("KERNEL_END", NULL);
#endif

#ifdef OPENME    openme_callback("PROGRAM_END", NULL);
#endif
…
}

Example of OpenME for OpenCL/CUDA Fortran application

Source code: matmul.F

PROGRAM MATMULPROG
…

INTEGER*8 OBJ, OPENME_CREATE_OBJ_F      
CALL OPENME_INIT_F(""//CHAR(0), ""//CHAR(0), ""//CHAR(0), 0)      
CALL OPENME_CALLBACK_F("PROGRAM_START"//CHAR(0))

…
CALL OPENME_CALLBACK_F("KERNEL_START"//CHAR(0));      
DO I=1, I_REPEAT       
 CALL MATMUL      
END DO      
CALL OPENME_CALLBACK_F("KERNEL_END"//CHAR(0));

…
CALL OPENME_CALLBACK_F("PROGRAM_END"//CHAR(0))      
END

cM evolution and comparison with previous frameworks

Here is the list and brief comparison of frameworks that Grigori Fursin developed during his past R&D to show evolution and reasoning behind framework/repository internals.

	1993-1997	1997-1999	1999-2004	2005-2006	2007-2009	2010-2011	2012-cur.
Name	-	SCS - SuperComputer Services	EOS - Edinburgh Optimizing Software	FOS - Framework for Continuous Optimization	cTuning1 / MILEPOST	cTuning2 aka Codelet Tuning Infrastructure	cTuning3 aka Collective Mind
Purpose	Modeling of semiconductor neural nework-based accelerators and computers	Universal and simple access to supercomputers through web services (similar to cloud)	Performance analysis and auto-tuning framework and repository for large numerical applications; preparaction for machine learning based optimizations	Fine-grain plugin-based auto-tuning framework and repository	Collaborative R&D framework and repository for program and architecture analysis and co-optimization using machine learning	Customized repository to automate experimentation at Intel Exascale Lab for codelet characterization and optimization combined with auto-tuning and machine learning based on past Grigori's techniques.	Universal plugin-based collaborative R&D infrastructure and repository for systematic and reproducible analysis, optimization and run-time adaptation of computer systems
Support	MIPT grant	MIPT grant	EU FP5 MHAOTEU project and Universityof Edinburgh	HiPEAC, INRIA	EU FP6 MILEPOST project, IBM, CAPS, ARC (Synopsys), University of Edinburgh, INRIA	Intel/CEA Exascale Lab	INRIA, HiPEAC (academic and industrial partners), cTuning.org community
Publicly available	-	yes	yes	yes	yes, cTuning.org	unlikely (status, May 2013). Since 2012, Grigori moved all his related R&Ds to the new public, universal and customizable cTuning3 (Collective Mind) collaborative R&D infrastructure and repository.	yes, cTuning.org and c-mind.org
License	-	GPL	GPL	GPLv2	GPLv2	plans for GPLv3	BSD and LGPL
Fully integrated solution for collaborative and reproducible R&D	-	partial	partial	partial	yes	no (passive repository)	yes
Associated new publication model	-	-	-	-	yes	-	yes - new theme for HiPEAC 2012-2020
Academic community	-	-	-	-	yes	-	yes
Industrial community	-	-	-	-	yes	-	yes
Interesting usages	developed semiconductor neural network and all modeling software used for teaching at MIPT	universal supercomputer web-based service have been used for some time for novice users at Russian Academy of Sciences until more advanced GRID/cloud services became available	testing pre-cTuning concepts used in several academic and industrial projects on auto-tuning of large realistic applications for supercomputer centers	Used by ICT to tune compilers and programs for Loongson processor	Tuned default compiler optimization heuristic of a new GCC version for ARC customers for real-time application with performance/code size constraints Tuning various applications in multiple academic and industrial international projects	NDA	Used for new publication model Prepared for HiPEAC common research and development infrastructure Being testsed in several academic and industrial projects
Repository type	csv/xls files	MySQL	MySQL	file-based	MySQL / file-based	flat file-based, very slow - requires indexing	hierarchical file-based
Queries	-	MySQL queries	MySQL queries	-	MySQL queries	Obligatory ElasticSearch indexing	On demand ElasticSearch indexing and queries
Interface to open up tools and applications for analysis and tuning	-	-	-	Interactive Compilation Interface for Open64, GCC and PathScale compilers	Interactive Compilation Interface for GCC that was included to mainline since GCC 4.6	-	OpenME - universal interface for interactive or online analysis and auto-tuning for any tool (LLVM, GCC, Open64, run-time system, etc) or application
Language	C, C++	MSVC, perl	Java, C, C++	C	C, C++, php	C and python	python and OpenME interface to connect to any other language
Web service	-	third-party web server	integrated java based server	-	third party web server and mediawiki integration	third party web server	unified and integrated web server plugin
Auto-tuning plugins	-	low-level plugins for basic random tuning	low-level plugins for various auto-tuning strategies	low-level plugins for off-line and on-line analysis, auto-tuning and adaptation	high-level and low-level plugins for off-line and on-line analysis, auto-tuning and adaptation	-	unified and extensible auto-tuning plugins for off-line and on-line tuning of CPU/CUDA/OpenCL codelets and applications
Focused search	-	-	first plugins to focus exploration on areas with high probability of "unusual" behavior	first plugins to focus exploration on areas with high probability of "unusual" behavior	various plugins for probabilistic focused search	-	on-going: unified and extensible exploration and modeling plugins
Program/architecture feature extraction for machine learning	-	-	started	prototypes	plugins for MILEPOST GCC and for collection of hardware counters	-	unified cM plugins for cTuning CC, MILEPOST GCC, Alchemist and other tools to obtain semantic features, code patterns, hardware counters, architecture features, etc
Machine learning plugins	-	-	started	prototypes	plugins for nearest neighbour classifier	-	unified and extensible predictive modeling plugins such as nearest neighbour classifier, SVM, etc
Run-time adaptation plugins	-	-	-	first prototype of static multi-versioning combined with run-time monitoring and adaptation	static multi-versioning and run-time adaptation/scheduling (supporting CPU/CUDA codelets)	-	on-going using OpenME interface and combination of static multiversioning, machine learning and run-time adaptation. We can mix various programming models (CPU/CUDA/OpenCL codelets)
Memory/CPU bound detection plugins	-	-	source-to-source tool	-	-	-	on-going within Alchemist
Documentation	-	partial	partial	partial	full documentation at cTuning.org wiki (and technical report)	some	full DoxyGen and Wiki-based community-based documentation

Contents