Disclaimer

The alpha variant of this document has been written by Grigori Fursin very quickly during development of the cM framework. Specification may still be changing until the official release. Please contact us if you notice mistakes or inconsistencies to collaboratively improve this document!

Abbreviations

  • cM - Collective Mind Framework
  • CMR - Collective Mind Repository
  • UOA - cM UID or alias

cM concept and design motivation

Current Collective Mind concept and design accumulates all our past 20 years of R&D experience:

  • In our past research, we spent most of the time not on data analysis and checking novel research ideas, but on dealing with ever changing tools, architectures and huge amounts of heterogeneous data. Therefore, we decided to use some wrappers to abstract all tools. These wrappers became cM plugins (modules), i.e. source code have an associated module code.source, compiler - ctuning.compiler, binary - code, dataset - dataset, etc.
  • Various cM modules may have different functions (actions) such as code.source build to build project, code run to run code, etc. Therefore, to unify access to modules, we use a command line front-end cm which allows one to access all modules and their functions using cm <name of the module> <action> parameters
  • For each of the module we may need to store some associated data, i.e. if it's a dataset, we may need to store a real data set (i.e. image file, video file, text file, audio, etc), for source code, we may want to keep all the source code files including Makefiles, etc. Previously, we used MySQL but it was very long and complex to extend it for each of module (we had to rebuild tables, check all relations, etc) or to keep binary data. Also, if some experimental data goes wrong, it's very long to "clean up" and update the repository. Finally, as researchers, we often want to have a direct access to our experimental files, etc. That's why we often keep myriads of csv files, etc. Therefore, for cM, we decided to use our own very simple directory and file based repository: cM repository can be inside any directory and starts with a .cmr root directory, followed by UID or alias of the module and then UID or alias of associated entries:
Cm repository structure.png
  • Another problem that we faces in the past research, was dealing with evolution of our own software. Hence we decided to provide unique IDs for each module and data entry while allowing high-level aliases, i.e. module code.source has cM UID 45741e3fbcf4024b. We can call high-level modules or data using alias but when the module API changes dramatically (not just extended while keeping backwards compatibility), we keep the alias but change the UID! Most of the cM modules can deal with both UID and alias - this combination is called cM UOA (UID or Alias). Since repository is also data, it has its own UID. Therefore, any data can be found using either <module UOA>: and then it is searched through all available repositories, or using <repo UOA>:<module UOA>:<data UOA>. Unique data identifier in cM is called CID (cM ID) and has the format of (<repo UOA>:)<module UOA>:<data UOA>
  • Naturally, such design is very flexible but can be slow for search, etc. However, such design is very easy to combine with existing indexing tools. We decided to use ElasticSearch that works directly with JSON and can perform fast search and complex queries. We provided support for on-the-fly indexing of data in cM.
  • Yet another problem that we had was the use of different frameworks when we wanted to either just run experiments (mobiles, GRID, cloud, supercomputers) or perform analysis or provide web front-end or build graphs, etc. Now, we can use the same framework with various module selection (minimal cM core is only around 500KB).
  • Interestingly, modules are also entries inside repositories making it possible to continuously evolve framework and models when more "knowledge" is available.
  • We added module "class" to start gradually classifying all data entries. We can also rank useful data entries (that can be most profitable compiler optimizations or models, etc).
  • Since format of cM is now open and easily extensible, we can easily combine auto-tuning and expert knowledge (as module ctuning.advice, for example).

Now, we believe that we have a framework that is easy to extend to continue collaborative systematization of characterization, optimization, learning and design of computer systems:

Cm overall structure.png

We use gradual top-down decomposition and learning of computer systems (to keep complexity under control and balancing our return on investment (analysis/optimization cost vs benefit) and gradually improve our knowledge):

Cm top down decomposition and learning.png

cM UIDs and aliases

All cM entries (including modules which are treated as cM entries too) have a unique ID to be able to easily share data in P2P environments or dedicated repositories. cM UID is a lowercase string of 16 hexadecimal digits, such as a8aac7d3fec45ca9. Entries are stored as directories with a given UID.

However, for user convenience each entry may have a hardwired alias or display as alias. Hardwired alias means that it will be used for entry's directory name instead of UID. Display as alias means that directory name for a given entry will still be UID but an associated meta data will include a key 'display_as_alias' - this is useful when visualizing data in cM web front-end.

Aliases can have only alphanumeric characters and '_', '-', '.' for cross OS compatibility (since they will be used as directory names on Linux and Windows).

In most of the cases, cM can automatically find entry by both UID and alias that is referred as cM UOA (UID or alias); otherwise we use UID or alias respectively.

Repository

The core of the cM framework is the adaptive repository, i.e. the repository that can easily evolve during decomposition of complex systems. We decided to build a very simple and portable NoSQL repository that can be installed on any computer systems including mobiles, tablets, desktops, HPC servers, cloud services. We decided to use directory and file based repository with json description that is very portable, scalable and can be easily indexed by third-party indexing services such as ElasticSearch.

The current directory structure of any cM repository is:

.cmr/                                           # Repository directory.
.cmr/UOA of module/                             # First level directory associated with a given module.
.cmr/UOA of module/UOA of data                  # Second level directory associated data associated with a given module.
                                               #  This repository can contain any files and directories including traces, 
                                               #  benchmarks, data sets, executables, models, archives, pdfs, csv files, etc.
.cmr/UOA of module/UOA of data/.cm/config.json  # Data description: extensible meta data in json that summaries data content. 
                                               # Whenever data is accessed, only this file is loaded first and depending on the content and module action, 
                                               #  other data can be accessed too.

Note, that when directory have UOA entries and alias is used instead of UID, this directory will also include .cm sub directory that will disambiguate UIDs and aliases using 2 simple text files:

  • alias-a-<hardwired_alias_name> that contains associated UID
  • alias-u-<UID> that contains hardwired alias name

We need such structure to be able to speedup access to entries with aliases while avoiding search (i.e. entry can be loaded directly).

Modules

Since we envision that community will be sharing both data and modules, we found an interesting solution to keep modules as data in the repository thus simplifying and unifying the whole cM framework. cM module 'module' provides an abstraction to deal with modules in the framework. Hence all modules are kept in the following directory in the repository:

.cmr/module/UOA of module
.cmr/module/UOA of module/.cm/config.json      # Expose properties, characteristics, choices of the module/data or associated tool.
                        /module.py            # Code of module with actions. Core of cM framework is written in Python 
                                              #  but we are developing API to access cM modules and data from any other language.
                                              #  We hope that community will help with this effort.
                        /c/module.c
                        /cpp/module.cc     
                        /fortran/module.f
                        /php/module.php

Implementation and compatibility

Cm overall structure.png

We envision that cM framework will constantly evolve, hence we decided to implement all cM logic (kernel, access to repository, web server, visualization, predictive modeling, security, etc) as modules inside cM repository. Instead of providing versions of the framework and all modules, we use module UID as a version number. If the module evolves with backwards compatible API, neither alias nor UID is changing. However, if API or associated data becomes backwards incompatible, a new module should be created with a new UID while alias may remain unless the meaning changed. Internally, modules should call other modules only through UID and not through alias to solve compatibility issues: multiple versions of the module can co-exist, but other modules should be explicitly upgraded to support new modules. This also solves data compatibility - we do not allow to mix up data from incompatible versions of a module.

TBD: Version and build number of a module can be provided to keep track of changes.

Single access function

For simplicity and unification, we decided to have one access function to cM to accesses all cM modules and their functions with dictionary (can be directly represented as json) as input and output to allow simple mixing of all modules together no matter which language they are written on. Furthermore, any external script, program, architecture and tool in any language can be easily and universally connected to cM through light-weight OpenME interface. For now, only python and php parts are implemented while API for other languages calls cm command line. We gradually and collaboratively extend OpenME interface to support multiple languages.

Basic cM functionality to access repository, deal with cM UOA, to authenticate users, etc is implemented in kernel module. This module is always available in all other python modules as cm_kernel

Here is an example of loading entry os:android-generic-32 in some module:

r=cm_kernel.access({'cm_run_module_uoa':'os',
                   'cm_action':'load',
                   'cm_data_uoa':'android-generic-32'})
if r['cm_return']>0: return r

That's all! If error happens, cM framework properly deals with that if you provide the above if statement. If command was successful, you can get data as dictionary:

data=r['cm_data_obj']['cfg']

Note, that this command is equivalent to cM CMD invocation as:

cm os load cm_data_uoa=android-generic-32

cM has several internal parameters such as:

  • cm_console - controls CMD output and can be 'txt', 'json', 'json_with_indent', 'web' (used for plugins that return html or web data)
  • cm_user_uoa - user UOA in case authentication is needed

TBD

Framework directory structure

bin/                            # cm and cm.bat - scripts to call kernel module in python
repos/                          # directory with several repositories
repos/default/.cmr              # default cM repository with all modules and data for basic cM functionality
repos/default/.cmr/module/      # directories with all basic cM modules
repos/ctuning/.cmr/             # repository for collective characterization and optimization of computer systems
repos/ctuning-experiments/.cmr/ # repository for temporal experiments during optimization of computer systems
docs/                           # some documentation in text format
logs/                           # logs for web and index servers
scripts/                        # various demo scripts (examples, development, etc)
tmp/                            # directory for temporary files such as generation of plots, etc

Default data entry dictionary

Data in cM is represented as extensible no-schema dictionary that is stored as human-readable json file that can be directly edited if needed. Besides meta-description of the data itself, this dictionary can contain various cM internal keys:

{
"cm_access_control": {                     # Controls access to this entry (only in web services when auth is on)
 "comments_groups": "admin",              # Groups that can comment or rank entries
 "read_groups": "registered",             # Groups that can read entries
 "write_groups": "owner"                  # Groups that can write entries
}, 
"cm_classes_uoa": [                        # List of classes this entry belongs to
 "62a455f8c8042f90"                       
], 
"cm_description": "",                      # General description
"cm_display_as_alias": "",                 # User friendly alias when visualizing this data
"cm_dissemination": {                      # Whether this entry was publicly disseminated
 "publications": [                        #  for example, through publication, etc
   "0c44d9a2db3de3c9"                     
 ]
}, 
"cm_updated": [                            # When entry was updated
 {
   "cm_iso_datetime": "2012-04-12",       # Date and time in python format
   "cm_module_uid": "8a7141c59cd335f5",   # Which module updated this entry
   "cm_user_uoa": "0728a400aa1c86fe"      # Which user updated this entry
 }
], 
"powered_by": {                            # Which version of cM was used to create this entry
 "name": "Collective Mind Engine", 
 "version": "1.0.1977.beta"
}
...                                        # No-schema meta-data for the entry
}

Default data entry description

We specially use no-schema data representation in cM to allow researchers quickly prototype ideas rather than spending their effort on preparing strict typed data that can evolve over time. Instead, if prototype was successful and is planned to be released with cM, only then user can provide description of the data. This allows researchers to focus on quickly prototyping ideas rather than spending valuable time on preparing types and fixed data structures that may be useless in the future.

Data description of a cM entry can be found in a data description of an associated module under cm_data_description key. It has the following format:

{
 "<FLAT_KEY1>": <DESC1>,
 "<FLAT_KEY2>": <DESC2>
 ...
}

<FLAT KEY> describes any key in the dictionary hierarchy. It always starts with # followed by #key if it's a dictionary key or @number if it's a value in a list.

For example, <FLAT_KEY> for key c in dictionary {"a":[{"c":"d"}]} is ##a@0#c

<DESC> is the description dictionary of format:

{
 "type": ""                     # Choices are text, textarea, dict, list, url, integer, float, email, UOA or choice
 "desc_text": ""                # Description text that will be visualized
 "default_value": ""            # Default value
 "sort_index": ""               # Sort index when visualizing
 "has_choice":""                # 'yes' or 'no'
 "choice":[...]                 # Choices if type is choice or has_choice==yes
  "cm_module_uoa": ""            # Module UOA is type is uoa
  "disable":""                   # 'yes' or 'no' - disable during visualization
  "explorable": ""               # if 'yes' this is a tuning dimension that can be explored
 "forbid_disable_at_random": "" # if 'yes', forbid disabling during random exploration 

}

Note: when entry is visualized using cM web services, this description is used to prepare user-friendly visualization of json with various choices or dependencies.

TBD: add all types of keys and dependencies!!!

cM kernel and configuration

Since all functionality in cM is in modules, the most low-level functionality is implemented in the module kernel while associated data entries keep basic cM configuration. When some cM module/function is invoked, cm bootstraps and initializes itself using configuration from cM data entries kernel:local or kernel:default (or from environment variable CM_DEFAULT_CFG (points to json file)). Then, any module has 2 objects:

  • ini dictionary that includes various cM parameters and configuration
  • cm_kernel module

The most useful to users and developers dictionary is ini['cfg'] that is a configuration of invoked module, i.e. data of entry module:<invoked_module>

cm_kernel object provides users cM configuration as dictionary in cm_kernel.ini['dcfg'], i.e. data of entry kernel:local or kernel:default or referenced by environment variable as described above.

cm_kernel also provides various bootstrap parameters and all cM low-level functions which are listed together with their API here.

These low level functions include:

  • loading module and data
  • generate or check cM UID
  • creating/deleting/updating data entries
  • finding entries
  • creating/deleting aliases
  • flattening/de-flattening dictionaries
  • loading/saving json
  • getting list of all UOAs in a path
  • unicode printing for console and web (to support Python 3.x in the future)
  • smart merging of dictionaries
  • converting string to SHA1
  • authenticating users
  • main single access function and variations (remote access through web-service, through CMD, as string, etc)

Note, that later, we provided higher-level and more user-friendly abstraction for repositories (module repo) and for dealing with searching/adding/updating/deleting data (module core) so users will most of the time use core or repo to access repository.

cM module repo

Module repo enables multiple repositories in cM that can be shared (SVN/GIT), remote (using cM web-services), excluded from search, etc. Basically adding or deleting repositories means that an associated repo entry describing repository is created or deleted from cM.

Module repo provides some high level functions to get a list of repositories, perform search for a given entry, download remote repository, copy or clean it, etc, which are described here.

cM module core

of the high-level functionality to deal with data entries in cM is implemented in module core. It includes:

  • adding data entry
  • loading data entry
  • updating data entry
  • deleting data entry
  • renaming data entry
  • finding data entry
  • listing data entries
  • searching in data entries

All functions are described here.

Note that all these core cM functions are inherited in all other modules. For example, it is possible to list all entries using the same function in all modules:

cm kernel list
cm repo list
cm processor list

We included cmx tool to cM that allows to perform all above functions in a user-friendly manner from CMD. Current functions in cmx include (retrieved using cmx --help):

cM simplified and user friendly command line module to deal with cM repositories
 cmx test                                                        - test cM functionality (alpha)
 cmx repo add/create (repo name)                                 - create repo in the current directory (.cmr)
                                                                    or import existing one;
cmx repo rm/del/delete (-f) (entry name)                        - delete repository reference
cmx repo mv/move/ren/rename (old entry name) (new entry name)   - rename repository reference
cmx repo update/co/checkout (repo name)                         - update shared repository
cmx repo share/commit (repo name)                               - commit to shared repository
cmx repo download (repo name)                                   - download repo from the web
 cmx add <entry name>/CID (param1=value1 ... @file1.json ...)    - add entry to current repository/module
                                                                    local repository by default
cmx update <entry name>/CID (param1=value1 ... @file1.json ...) - add entry to current repository/module
                                                                    local repository by default
cmx rm/del/delete (-f) <entry name>/CID                         - delete entry from the current repository/module
                                                                    local repository by default
cmx svnrm/svndel/svndelete <entry name>/CID                     - delete entry from the current repository/module using SVN
                                                                    local repository by default
cmx ren/rename <old entry name> <new entry name>                - rename entry in the current repository/module
                                                                    local repository by default
cmx mv/move <old CID> <new CID>                                 - move entry
                                                                    local repository by default
cmx cp/copy <old CID> <new CID>                                 - copy entry
                                                                  -k - keep old UID
cmx clean alias1 alias2 ...                                     - remove orphaned aliases
 cmx info/load (CID)                                             - show all info about entry (in json)
                                                                    global repository by default
cmx find (CID)                                                  - show path to the entry
                                                                    global repository by default
cmx list <module>                                               - list data for a given module
                                                                    global repository by default
cmx index (info)                                                - show status of indexing server
cmx index on                                                    - turn on usage of indexing by cM
cmx index off                                                   - turn off usage of indexing by cM
cmx index test                                                  - test indexing server
cmx index flush                                                 - clear the whole index
cmx index CID                                                   - index entry or entries
                                                                    local repository by default
                                                                    data can include patterns * and ?
cmx search (CID key1=value1 value2 ...)                         - search data
                                                                    if indexing server is on, use it,
                                                                    otherwise very slow
                                                                    if indexing is on, can use wildcards * and ?
                                                                    -s turns on progress for internal slow search
                                                                    -t=timeout - sets timeout for local slow search
cmx web_view/wv (CID)                                           - view entry in cM web front-end
cmx web_update/wu (CID)                                         - update entry in cM web front-end
 cmx uid                                                         - generate cM UID
 cmx restore_uid_from_alias <alias>                              - restore UID from alias file
 cmx password                                                    - prepare cM SHA1 password
                                                                   (if -s or --short, output only SHA1 password)
 cmx                                                             - print current CID
                                                                    global repository by default
 cmx help                                                        - help
Additional flags:
 -h, --help, -?                                                 - this help
                                                                    (otherwise global by default)
 -g, --global                                                   - apply command to global repository
 -l, --local                                                    - apply command to local repository
                                                                    (otherwise global by default)
 -c, --class="class1,class2,..."                                  - use classes for the command
                                                                    for example, add or list
                                                                    (it can be both UID or alias)
                                                                    (all classes should be present in an entry)
                                                                    (alias can include wild cards * and ?)
 -q, --quiet                                                    - quiet mode when deleting files, etc
 -f, --full                                                     - add repo to CID if needed
 -j, --json                                                     - return json if applicable

cM packages

When performing long-term experiments, it often happens that various tools evolved and users either need to re-run experiments or even run experiments with several tools co-existing in the system at the same time. Therefore, we provided a notion of universal packages (module package) in cM to describe installation process of various tools, dependencies on host OS, target OS, processor, compiler and any other package, and environment variables.

When package is installed, a new entry with a cM UID is created under code module where installation code will reside. At the same time, an OS script is created in the $CM_ROOT/bin with a name cm_code_env_<UID>.sh (for Linux) or %CM_ROOI%\bin with a name cm_code_env_<UID>.bat that described all PATHs, LIBs, and any other necessary environment variable for a given package.

Now, various packages can easily co-exist in the system: GCC, LLVM, ICC, Open64 with different versions; different libraries (lapack, blas, magma); various auto-tuning plugins or even codelets targeting CPU/CUDA/OpenCL that elegantly solves the problem of experiments and reproducibility in in heterogeneous and rapidly evolving systems!

Note, that user can manually call the above script to experiment with a given tool, or cM will automatically call such a script in high-level experiment scenarios (for example, in program and architecture tuning and learning pipeline).

Packages can be easily installed using cM web front-end (Usage -> Install packages).

Multiple repositories

There can be any number of repositories in cM. Repositories are represented by module repo in cM. When adding a new repository, an associated entry is created in cM that provides the path or URL of the repository and various meta information, i.e. if the repository is shared in the workgroup using SVN (or GIT), what are the access rights, etc.

It is possible to see all available repositories using cmx list repo.

When accessing entry without specifying a repository, all available local repositories will be searched.

cM by default includes remote cmind repository that is used for crowd-tuning and P2P data sharing.

Data classification

cM allows gradual classification of any data in the repositories using module class. Classes are added to the data entry description under key cm_classes_uoa (list of classes UOA).

Current classes can be viewed using cmx list class.

OpenME: opening up tools and applications for analysis and auto-tuning

This section should be extended

OpenME is an event-based modular framework (C/C++/Fortran/PHP) that allows to "open up" various tools and applications using just a few lines of code and dynamic plugins to connect them to cM for analysis, tuning and run-time adaptation. It is an evolution of Interactive Compilation Interface that Grigori Fursin developed originally for Open64 in 2005 and later in GCC, and that was eventually moved to GCC mainline (>=4.6) after collaboration with Google and Mozilla. OpenME may not be the fastest interface, but due to its simplicity, it targets novice users who want to quickly prototype ideas rather than spending time on mastering very complex event-based frameworks.

OpenME has 3 main functions:

  • openme_init(…) - initialize/load plugin
  • openme_callback(char* event_name, void* params) - call event
  • openme_finish(…) - finalize (if needed)

Example: OpenME for LLVM 3.x

This piece of code can control LLVM unrolling on the fly:

Source code: tools/clang/tools/driver/cc1_main.cpp:

#include "openme.h“
…
int cc1_main(const char **ArgBegin, const char **ArgEnd,             

const char *Argv0, void *MainAddr) {

   openme_init("UNI_ALCHEMIST_USE", "UNI_ALCHEMIST_PLUGINS", NULL, 0);
  …
 // Execute the frontend actions.  
 Success = ExecuteCompilerInvocation(Clang.get());  
 openme_callback("ALC_FINISH", NULL);
 …
}

Source code: lib/Transforms/Scalar/LoopUnrollPass.cpp

#include <cJSON.h>
#include "openme.h“
…
bool LoopUnroll::runOnLoop(Loop *L, LPPassManager &LPM) {
  struct alc_unroll   {
    const char *func_name;    
    const char *loop_name;    
    cJSON *json;    
    int factor;    
  } alc_unroll;
…
alc_unroll.func_name=(Header->getParent()->getName()).data();  alc_unroll.loop_name=(Header->getName()).data();  
openme_callback("ALC_TRANSFORM_UNROLL_INIT", &alc_unroll);
…
// Unroll the loop.  
alc_unroll.factor=Count;  
openme_callback("ALC_TRANSFORM_UNROLL", &alc_unroll);  
Count=alc_unroll.factor;   
if (!UnrollLoop(L, Count, TripCount, UnrollRuntime, TripMultiple, LI, &LPM))    
  return false;
…
}

It uses dynamic Alchemist plugin (available in cM as a dynamic library) that is used for fine-grain compiler analysis and tuning.

Example of Alchemist source code:

#include <cJSON.h>
#include <openme.h>
int openme_plugin_init(struct openme_info *ome_info) {
…
 openme_register_callback(ome_info, "ALC_TRANSFORM_UNROLL_INIT", alc_transform_unroll_init);  
openme_register_callback(ome_info, "ALC_TRANSFORM_UNROLL", alc_transform_unroll);  
openme_register_callback(ome_info, "ALC_TRANSFORM_UNROLL_FEATURES", alc_transform_unroll_features);   
openme_register_callback(ome_info, "ALC_FINISH", alc_finish);
…
}
extern void alc_transform_unroll_init(struct alc_unroll *alc_unroll){
  …
}
extern void alc_transform_unroll(struct alc_unroll *alc_unroll) {
 …
}
…

Example of OpenME for OpenCL/CUDA C application

Source code: 2mm.c / 2mm.cu

…
#ifdef OPENME
#include <openme.h>
#endif
…
int main(void) {
…
#ifdef OPENME  
 openme_init(NULL,NULL,NULL,0);  
 openme_callback("PROGRAM_START", NULL);
#endif
…
#ifdef OPENME  
openme_callback("ACC_KERNEL_START", NULL);
#endif  
cl_launch_kernel();
 or
mm2Cuda(A, B, C, D, E, E_outputFromGpu);
#ifdef OPENME  
openme_callback("ACC_KERNEL_END", NULL);
#endif
…
…
#ifdef OPENME  
openme_callback("KERNEL_START", NULL);
#endif    
mm2_cpu(A, B, C, D, E);
#ifdef OPENME  
 openme_callback("KERNEL_END", NULL);
#endif
#ifdef OPENME    openme_callback("PROGRAM_END", NULL);
#endif
…
}

Example of OpenME for OpenCL/CUDA Fortran application

Source code: matmul.F

PROGRAM MATMULPROG
…
INTEGER*8 OBJ, OPENME_CREATE_OBJ_F      
CALL OPENME_INIT_F(""//CHAR(0), ""//CHAR(0), ""//CHAR(0), 0)      
CALL OPENME_CALLBACK_F("PROGRAM_START"//CHAR(0))
…
CALL OPENME_CALLBACK_F("KERNEL_START"//CHAR(0));      
DO I=1, I_REPEAT       
 CALL MATMUL      
END DO      
CALL OPENME_CALLBACK_F("KERNEL_END"//CHAR(0));
…
CALL OPENME_CALLBACK_F("PROGRAM_END"//CHAR(0))      
END

cM evolution and comparison with previous frameworks

Here is the list and brief comparison of frameworks that Grigori Fursin developed during his past R&D to show evolution and reasoning behind framework/repository internals.


1993-1997
1997-1999
1999-2004
2005-2006
2007-2009
2010-2011
2012-cur.
Name
-
SCS - SuperComputer Services
EOS - Edinburgh Optimizing Software
FOS - Framework for Continuous Optimization
cTuning1 / MILEPOST
cTuning2 aka Codelet Tuning Infrastructure
cTuning3 aka Collective Mind
Purpose
Modeling of semiconductor neural nework-based accelerators and computers
Universal and simple access to supercomputers through web services (similar to cloud)
Performance analysis and auto-tuning framework and repository for large numerical applications; preparaction for machine learning based optimizations
Fine-grain plugin-based auto-tuning framework and repository
Collaborative R&D framework and repository for program and architecture analysis and co-optimization using machine learning
Customized repository to automate experimentation at Intel Exascale Lab for codelet characterization and optimization combined with auto-tuning and machine learning based on past Grigori's techniques.
Universal plugin-based collaborative R&D infrastructure and repository for systematic and reproducible analysis, optimization and run-time adaptation of computer systems
Support
MIPT grant
MIPT grant
EU FP5 MHAOTEU project and Universityof Edinburgh
HiPEAC, INRIA
EU FP6 MILEPOST project, IBM, CAPS, ARC (Synopsys), University of Edinburgh, INRIA
Intel/CEA Exascale Lab
INRIA, HiPEAC (academic and industrial partners), cTuning.org community
Publicly available
-
yes
yes
yes
yes, cTuning.org
unlikely (status, May 2013). Since 2012, Grigori moved all his related R&Ds to the new public, universal and customizable cTuning3 (Collective Mind) collaborative R&D infrastructure and repository.
yes, cTuning.org and c-mind.org
License
-
GPL
GPL
GPLv2
GPLv2
plans for GPLv3
BSD and LGPL
Fully integrated solution for collaborative and reproducible R&D
-
partial
partial
partial
yes
no (passive repository)
yes
Associated new publication model
-
-
-
-
yes
-
yes - new theme for HiPEAC 2012-2020
Academic community
-
-
-
-
yes
-
yes
Industrial community
-
-
-
-
yes
-
yes
Interesting usages
developed semiconductor neural network and all modeling software used for teaching at MIPT
universal supercomputer web-based service have been used for some time for novice users at Russian Academy of Sciences until more advanced GRID/cloud services became available
testing pre-cTuning concepts used in several academic and industrial projects on auto-tuning of large realistic applications for supercomputer centers
Used by ICT to tune compilers and programs for Loongson processor

Tuned default compiler optimization heuristic of a new GCC version for ARC customers for real-time application with performance/code size constraints

Tuning various applications in multiple academic and industrial international projects

NDA

Used for new publication model

Prepared for HiPEAC common research and development infrastructure

Being testsed in several academic and industrial projects

Repository type
csv/xls files
MySQL
MySQL
file-based
MySQL / file-based
flat file-based, very slow - requires indexing
hierarchical file-based
Queries
-
MySQL queries
MySQL queries
-
MySQL queries
Obligatory ElasticSearch indexing
On demand ElasticSearch indexing and queries
Interface to open up tools and applications for analysis and tuning
-
-
-
Interactive Compilation Interface for Open64, GCC and PathScale compilers
Interactive Compilation Interface for GCC that was included to mainline since GCC 4.6
-
OpenME - universal interface for interactive or online analysis and auto-tuning for any tool (LLVM, GCC, Open64, run-time system, etc) or application
Language
C, C++
MSVC, perl
Java, C, C++
C
C, C++, php
C and python
python and OpenME interface to connect to any other language
Web service
-
third-party web server
integrated java based server
-
third party web server and mediawiki integration
third party web server
unified and integrated web server plugin
Auto-tuning plugins
-
low-level plugins for basic random tuning low-level plugins for various auto-tuning strategies low-level plugins for off-line and on-line analysis, auto-tuning and adaptation
high-level and low-level plugins for off-line and on-line analysis, auto-tuning and adaptation
-
unified and extensible auto-tuning plugins for off-line and on-line tuning of CPU/CUDA/OpenCL codelets and applications
Focused search
-
-
first plugins to focus exploration on areas with high probability of "unusual" behavior
first plugins to focus exploration on areas with high probability of "unusual" behavior
various plugins for probabilistic focused search
-
on-going: unified and extensible exploration and modeling plugins
Program/architecture feature extraction for machine learning
-
-
started
prototypes
plugins for MILEPOST GCC and for collection of hardware counters
-
unified cM plugins for cTuning CC, MILEPOST GCC, Alchemist and other tools to obtain semantic features, code patterns, hardware counters, architecture features, etc
Machine learning plugins
-
-
started
prototypes
plugins for nearest neighbour classifier
-
unified and extensible predictive modeling plugins such as nearest neighbour classifier, SVM, etc 
Run-time adaptation plugins
-
-
-
first prototype of static multi-versioning combined with run-time monitoring and adaptation
static multi-versioning and run-time adaptation/scheduling (supporting CPU/CUDA codelets)
-
on-going using OpenME
interface and combination of static multiversioning, machine learning and run-time adaptation. We can mix various programming models (CPU/CUDA/OpenCL codelets)
Memory/CPU bound detection plugins
-
-
source-to-source tool
-
-
-
on-going within Alchemist
Documentation
-
partial
partial
partial
full documentation at cTuning.org wiki (and technical report)
some
full DoxyGen and Wiki-based community-based documentation

(C) 2011-2014 cTuning foundation