Contents
- 1 Disclaimer
- 2 Abbreviations
- 3 cM concept and design motivation
- 4 cM UIDs and aliases
- 5 Repository
- 6 Modules
- 7 Implementation and compatibility
- 8 Single access function
- 9 Framework directory structure
- 10 Default data entry dictionary
- 11 Default data entry description
- 12 cM kernel and configuration
- 13 cM module repo
- 14 cM module core
- 15 cM packages
- 16 Multiple repositories
- 17 Data classification
- 18 OpenME: opening up tools and applications for analysis and auto-tuning
- 19 cM evolution and comparison with previous frameworks
Disclaimer
The alpha variant of this document has been written by Grigori Fursin very quickly during development of the cM framework. Specification may still be changing until the official release. Please contact us if you notice mistakes or inconsistencies to collaboratively improve this document!
Abbreviations
- cM - Collective Mind Framework
- CMR - Collective Mind Repository
- UOA - cM UID or alias
cM concept and design motivation
Current Collective Mind concept and design accumulates all our past 20 years of R&D experience:
- In our past research, we spent most of the time not on data analysis and checking novel research ideas, but on dealing with ever changing tools, architectures and huge amounts of heterogeneous data. Therefore, we decided to use some wrappers to abstract all tools. These wrappers became cM plugins (modules), i.e. source code have an associated module code.source, compiler - ctuning.compiler, binary - code, dataset - dataset, etc.
- Various cM modules may have different functions (actions) such as code.source build to build project, code run to run code, etc. Therefore, to unify access to modules, we use a command line front-end cm which allows one to access all modules and their functions using cm <name of the module> <action> parameters
- For each of the module we may need to store some associated data, i.e. if it's a dataset, we may need to store a real data set (i.e. image file, video file, text file, audio, etc), for source code, we may want to keep all the source code files including Makefiles, etc. Previously, we used MySQL but it was very long and complex to extend it for each of module (we had to rebuild tables, check all relations, etc) or to keep binary data. Also, if some experimental data goes wrong, it's very long to "clean up" and update the repository. Finally, as researchers, we often want to have a direct access to our experimental files, etc. That's why we often keep myriads of csv files, etc. Therefore, for cM, we decided to use our own very simple directory and file based repository: cM repository can be inside any directory and starts with a .cmr root directory, followed by UID or alias of the module and then UID or alias of associated entries:
- Another problem that we faces in the past research, was dealing with evolution of our own software. Hence we decided to provide unique IDs for each module and data entry while allowing high-level aliases, i.e. module code.source has cM UID 45741e3fbcf4024b. We can call high-level modules or data using alias but when the module API changes dramatically (not just extended while keeping backwards compatibility), we keep the alias but change the UID! Most of the cM modules can deal with both UID and alias - this combination is called cM UOA (UID or Alias). Since repository is also data, it has its own UID. Therefore, any data can be found using either <module UOA>: and then it is searched through all available repositories, or using <repo UOA>:<module UOA>:<data UOA>. Unique data identifier in cM is called CID (cM ID) and has the format of (<repo UOA>:)<module UOA>:<data UOA>
- Naturally, such design is very flexible but can be slow for search, etc. However, such design is very easy to combine with existing indexing tools. We decided to use ElasticSearch that works directly with JSON and can perform fast search and complex queries. We provided support for on-the-fly indexing of data in cM.
- Yet another problem that we had was the use of different frameworks when we wanted to either just run experiments (mobiles, GRID, cloud, supercomputers) or perform analysis or provide web front-end or build graphs, etc. Now, we can use the same framework with various module selection (minimal cM core is only around 500KB).
- Interestingly, modules are also entries inside repositories making it possible to continuously evolve framework and models when more "knowledge" is available.
- We added module "class" to start gradually classifying all data entries. We can also rank useful data entries (that can be most profitable compiler optimizations or models, etc).
- Since format of cM is now open and easily extensible, we can easily combine auto-tuning and expert knowledge (as module ctuning.advice, for example).
Now, we believe that we have a framework that is easy to extend to continue collaborative systematization of characterization, optimization, learning and design of computer systems:
We use gradual top-down decomposition and learning of computer systems (to keep complexity under control and balancing our return on investment (analysis/optimization cost vs benefit) and gradually improve our knowledge):
cM UIDs and aliases
All cM entries (including modules which are treated as cM entries too) have a unique ID to be able to easily share data in P2P environments or dedicated repositories. cM UID is a lowercase string of 16 hexadecimal digits, such as a8aac7d3fec45ca9. Entries are stored as directories with a given UID.
However, for user convenience each entry may have a hardwired alias or display as alias. Hardwired alias means that it will be used for entry's directory name instead of UID. Display as alias means that directory name for a given entry will still be UID but an associated meta data will include a key 'display_as_alias' - this is useful when visualizing data in cM web front-end.
Aliases can have only alphanumeric characters and '_', '-', '.' for cross OS compatibility (since they will be used as directory names on Linux and Windows).
In most of the cases, cM can automatically find entry by both UID and alias that is referred as cM UOA (UID or alias); otherwise we use UID or alias respectively.
Repository
The core of the cM framework is the adaptive repository, i.e. the repository that can easily evolve during decomposition of complex systems. We decided to build a very simple and portable NoSQL repository that can be installed on any computer systems including mobiles, tablets, desktops, HPC servers, cloud services. We decided to use directory and file based repository with json description that is very portable, scalable and can be easily indexed by third-party indexing services such as ElasticSearch.
The current directory structure of any cM repository is:
.cmr/ # Repository directory. .cmr/UOA of module/ # First level directory associated with a given module. .cmr/UOA of module/UOA of data # Second level directory associated data associated with a given module. # This repository can contain any files and directories including traces, # benchmarks, data sets, executables, models, archives, pdfs, csv files, etc. .cmr/UOA of module/UOA of data/.cm/config.json # Data description: extensible meta data in json that summaries data content. # Whenever data is accessed, only this file is loaded first and depending on the content and module action, # other data can be accessed too.
Note, that when directory have UOA entries and alias is used instead of UID, this directory will also include .cm sub directory that will disambiguate UIDs and aliases using 2 simple text files:
- alias-a-<hardwired_alias_name> that contains associated UID
- alias-u-<UID> that contains hardwired alias name
We need such structure to be able to speedup access to entries with aliases while avoiding search (i.e. entry can be loaded directly).
Modules
Since we envision that community will be sharing both data and modules, we found an interesting solution to keep modules as data in the repository thus simplifying and unifying the whole cM framework. cM module 'module' provides an abstraction to deal with modules in the framework. Hence all modules are kept in the following directory in the repository:
.cmr/module/UOA of module .cmr/module/UOA of module/.cm/config.json # Expose properties, characteristics, choices of the module/data or associated tool. /module.py # Code of module with actions. Core of cM framework is written in Python # but we are developing API to access cM modules and data from any other language. # We hope that community will help with this effort. /c/module.c /cpp/module.cc /fortran/module.f /php/module.php
Implementation and compatibility
We envision that cM framework will constantly evolve, hence we decided to implement all cM logic (kernel, access to repository, web server, visualization, predictive modeling, security, etc) as modules inside cM repository. Instead of providing versions of the framework and all modules, we use module UID as a version number. If the module evolves with backwards compatible API, neither alias nor UID is changing. However, if API or associated data becomes backwards incompatible, a new module should be created with a new UID while alias may remain unless the meaning changed. Internally, modules should call other modules only through UID and not through alias to solve compatibility issues: multiple versions of the module can co-exist, but other modules should be explicitly upgraded to support new modules. This also solves data compatibility - we do not allow to mix up data from incompatible versions of a module.
TBD: Version and build number of a module can be provided to keep track of changes.
Single access function
For simplicity and unification, we decided to have one access function to cM to accesses all cM modules and their functions with dictionary (can be directly represented as json) as input and output to allow simple mixing of all modules together no matter which language they are written on. Furthermore, any external script, program, architecture and tool in any language can be easily and universally connected to cM through light-weight OpenME interface. For now, only python and php parts are implemented while API for other languages calls cm command line. We gradually and collaboratively extend OpenME interface to support multiple languages.
Basic cM functionality to access repository, deal with cM UOA, to authenticate users, etc is implemented in kernel module. This module is always available in all other python modules as cm_kernel
Here is an example of loading entry os:android-generic-32 in some module:
r=cm_kernel.access({'cm_run_module_uoa':'os', 'cm_action':'load', 'cm_data_uoa':'android-generic-32'}) if r['cm_return']>0: return r
That's all! If error happens, cM framework properly deals with that if you provide the above if statement. If command was successful, you can get data as dictionary:
data=r['cm_data_obj']['cfg']
Note, that this command is equivalent to cM CMD invocation as:
cm os load cm_data_uoa=android-generic-32
cM has several internal parameters such as:
- cm_console - controls CMD output and can be 'txt', 'json', 'json_with_indent', 'web' (used for plugins that return html or web data)
- cm_user_uoa - user UOA in case authentication is needed
TBD
Framework directory structure
bin/ # cm and cm.bat - scripts to call kernel module in python
repos/ # directory with several repositories repos/default/.cmr # default cM repository with all modules and data for basic cM functionality repos/default/.cmr/module/ # directories with all basic cM modules
repos/ctuning/.cmr/ # repository for collective characterization and optimization of computer systems repos/ctuning-experiments/.cmr/ # repository for temporal experiments during optimization of computer systems
docs/ # some documentation in text format logs/ # logs for web and index servers
scripts/ # various demo scripts (examples, development, etc)
tmp/ # directory for temporary files such as generation of plots, etc
Default data entry dictionary
Data in cM is represented as extensible no-schema dictionary that is stored as human-readable json file that can be directly edited if needed. Besides meta-description of the data itself, this dictionary can contain various cM internal keys:
{ "cm_access_control": { # Controls access to this entry (only in web services when auth is on) "comments_groups": "admin", # Groups that can comment or rank entries "read_groups": "registered", # Groups that can read entries "write_groups": "owner" # Groups that can write entries }, "cm_classes_uoa": [ # List of classes this entry belongs to "62a455f8c8042f90" ], "cm_description": "", # General description "cm_display_as_alias": "", # User friendly alias when visualizing this data "cm_dissemination": { # Whether this entry was publicly disseminated "publications": [ # for example, through publication, etc "0c44d9a2db3de3c9" ] }, "cm_updated": [ # When entry was updated { "cm_iso_datetime": "2012-04-12", # Date and time in python format "cm_module_uid": "8a7141c59cd335f5", # Which module updated this entry "cm_user_uoa": "0728a400aa1c86fe" # Which user updated this entry } ], "powered_by": { # Which version of cM was used to create this entry "name": "Collective Mind Engine", "version": "1.0.1977.beta" } ... # No-schema meta-data for the entry }
Default data entry description
We specially use no-schema data representation in cM to allow researchers quickly prototype ideas rather than spending their effort on preparing strict typed data that can evolve over time. Instead, if prototype was successful and is planned to be released with cM, only then user can provide description of the data. This allows researchers to focus on quickly prototyping ideas rather than spending valuable time on preparing types and fixed data structures that may be useless in the future.
Data description of a cM entry can be found in a data description of an associated module under cm_data_description key. It has the following format:
{ "<FLAT_KEY1>": <DESC1>, "<FLAT_KEY2>": <DESC2> ... }
<FLAT KEY> describes any key in the dictionary hierarchy. It always starts with # followed by #key if it's a dictionary key or @number if it's a value in a list.
For example, <FLAT_KEY> for key c in dictionary {"a":[{"c":"d"}]} is ##a@0#c
<DESC> is the description dictionary of format:
{ "type": "" # Choices are text, textarea, dict, list, url, integer, float, email, UOA or choice "desc_text": "" # Description text that will be visualized "default_value": "" # Default value "sort_index": "" # Sort index when visualizing "has_choice":"" # 'yes' or 'no' "choice":[...] # Choices if type is choice or has_choice==yes
"cm_module_uoa": "" # Module UOA is type is uoa
"disable":"" # 'yes' or 'no' - disable during visualization
"explorable": "" # if 'yes' this is a tuning dimension that can be explored "forbid_disable_at_random": "" # if 'yes', forbid disabling during random exploration
}
Note: when entry is visualized using cM web services, this description is used to prepare user-friendly visualization of json with various choices or dependencies.
TBD: add all types of keys and dependencies!!!
cM kernel and configuration
Since all functionality in cM is in modules, the most low-level functionality is implemented in the module kernel while associated data entries keep basic cM configuration. When some cM module/function is invoked, cm bootstraps and initializes itself using configuration from cM data entries kernel:local or kernel:default (or from environment variable CM_DEFAULT_CFG (points to json file)). Then, any module has 2 objects:
- ini dictionary that includes various cM parameters and configuration
- cm_kernel module
The most useful to users and developers dictionary is ini['cfg'] that is a configuration of invoked module, i.e. data of entry module:<invoked_module>
cm_kernel object provides users cM configuration as dictionary in cm_kernel.ini['dcfg'], i.e. data of entry kernel:local or kernel:default or referenced by environment variable as described above.
cm_kernel also provides various bootstrap parameters and all cM low-level functions which are listed together with their API here.
These low level functions include:
- loading module and data
- generate or check cM UID
- creating/deleting/updating data entries
- finding entries
- creating/deleting aliases
- flattening/de-flattening dictionaries
- loading/saving json
- getting list of all UOAs in a path
- unicode printing for console and web (to support Python 3.x in the future)
- smart merging of dictionaries
- converting string to SHA1
- authenticating users
- main single access function and variations (remote access through web-service, through CMD, as string, etc)
Note, that later, we provided higher-level and more user-friendly abstraction for repositories (module repo) and for dealing with searching/adding/updating/deleting data (module core) so users will most of the time use core or repo to access repository.
cM module repo
Module repo enables multiple repositories in cM that can be shared (SVN/GIT), remote (using cM web-services), excluded from search, etc. Basically adding or deleting repositories means that an associated repo entry describing repository is created or deleted from cM.
Module repo provides some high level functions to get a list of repositories, perform search for a given entry, download remote repository, copy or clean it, etc, which are described here.
cM module core
of the high-level functionality to deal with data entries in cM is implemented in module core. It includes:
- adding data entry
- loading data entry
- updating data entry
- deleting data entry
- renaming data entry
- finding data entry
- listing data entries
- searching in data entries
All functions are described here.
Note that all these core cM functions are inherited in all other modules. For example, it is possible to list all entries using the same function in all modules:
cm kernel list cm repo list cm processor list
We included cmx tool to cM that allows to perform all above functions in a user-friendly manner from CMD. Current functions in cmx include (retrieved using cmx --help):
cM simplified and user friendly command line module to deal with cM repositories
cmx test - test cM functionality (alpha)
cmx repo add/create (repo name) - create repo in the current directory (.cmr) or import existing one; cmx repo rm/del/delete (-f) (entry name) - delete repository reference cmx repo mv/move/ren/rename (old entry name) (new entry name) - rename repository reference cmx repo update/co/checkout (repo name) - update shared repository cmx repo share/commit (repo name) - commit to shared repository cmx repo download (repo name) - download repo from the web
cmx add <entry name>/CID (param1=value1 ... @file1.json ...) - add entry to current repository/module local repository by default cmx update <entry name>/CID (param1=value1 ... @file1.json ...) - add entry to current repository/module local repository by default cmx rm/del/delete (-f) <entry name>/CID - delete entry from the current repository/module local repository by default cmx svnrm/svndel/svndelete <entry name>/CID - delete entry from the current repository/module using SVN local repository by default cmx ren/rename <old entry name> <new entry name> - rename entry in the current repository/module local repository by default cmx mv/move <old CID> <new CID> - move entry local repository by default cmx cp/copy <old CID> <new CID> - copy entry -k - keep old UID cmx clean alias1 alias2 ... - remove orphaned aliases
cmx info/load (CID) - show all info about entry (in json) global repository by default cmx find (CID) - show path to the entry global repository by default cmx list <module> - list data for a given module global repository by default cmx index (info) - show status of indexing server cmx index on - turn on usage of indexing by cM cmx index off - turn off usage of indexing by cM cmx index test - test indexing server cmx index flush - clear the whole index cmx index CID - index entry or entries local repository by default data can include patterns * and ? cmx search (CID key1=value1 value2 ...) - search data if indexing server is on, use it, otherwise very slow if indexing is on, can use wildcards * and ? -s turns on progress for internal slow search -t=timeout - sets timeout for local slow search cmx web_view/wv (CID) - view entry in cM web front-end cmx web_update/wu (CID) - update entry in cM web front-end
cmx uid - generate cM UID
cmx restore_uid_from_alias <alias> - restore UID from alias file
cmx password - prepare cM SHA1 password (if -s or --short, output only SHA1 password)
cmx - print current CID global repository by default
cmx help - help
Additional flags: -h, --help, -? - this help (otherwise global by default) -g, --global - apply command to global repository -l, --local - apply command to local repository (otherwise global by default) -c, --class="class1,class2,..." - use classes for the command for example, add or list (it can be both UID or alias) (all classes should be present in an entry) (alias can include wild cards * and ?) -q, --quiet - quiet mode when deleting files, etc -f, --full - add repo to CID if needed -j, --json - return json if applicable
cM packages
When performing long-term experiments, it often happens that various tools evolved and users either need to re-run experiments or even run experiments with several tools co-existing in the system at the same time. Therefore, we provided a notion of universal packages (module package) in cM to describe installation process of various tools, dependencies on host OS, target OS, processor, compiler and any other package, and environment variables.
When package is installed, a new entry with a cM UID is created under code module where installation code will reside. At the same time, an OS script is created in the $CM_ROOT/bin with a name cm_code_env_<UID>.sh (for Linux) or %CM_ROOI%\bin with a name cm_code_env_<UID>.bat that described all PATHs, LIBs, and any other necessary environment variable for a given package.
Now, various packages can easily co-exist in the system: GCC, LLVM, ICC, Open64 with different versions; different libraries (lapack, blas, magma); various auto-tuning plugins or even codelets targeting CPU/CUDA/OpenCL that elegantly solves the problem of experiments and reproducibility in in heterogeneous and rapidly evolving systems!
Note, that user can manually call the above script to experiment with a given tool, or cM will automatically call such a script in high-level experiment scenarios (for example, in program and architecture tuning and learning pipeline).
Packages can be easily installed using cM web front-end (Usage -> Install packages).
Multiple repositories
There can be any number of repositories in cM. Repositories are represented by module repo in cM. When adding a new repository, an associated entry is created in cM that provides the path or URL of the repository and various meta information, i.e. if the repository is shared in the workgroup using SVN (or GIT), what are the access rights, etc.
It is possible to see all available repositories using cmx list repo.
When accessing entry without specifying a repository, all available local repositories will be searched.
cM by default includes remote cmind repository that is used for crowd-tuning and P2P data sharing.
Data classification
cM allows gradual classification of any data in the repositories using module class. Classes are added to the data entry description under key cm_classes_uoa (list of classes UOA).
Current classes can be viewed using cmx list class.
OpenME: opening up tools and applications for analysis and auto-tuning
This section should be extended
OpenME is an event-based modular framework (C/C++/Fortran/PHP) that allows to "open up" various tools and applications using just a few lines of code and dynamic plugins to connect them to cM for analysis, tuning and run-time adaptation. It is an evolution of Interactive Compilation Interface that Grigori Fursin developed originally for Open64 in 2005 and later in GCC, and that was eventually moved to GCC mainline (>=4.6) after collaboration with Google and Mozilla. OpenME may not be the fastest interface, but due to its simplicity, it targets novice users who want to quickly prototype ideas rather than spending time on mastering very complex event-based frameworks.
OpenME has 3 main functions:
- openme_init(…) - initialize/load plugin
- openme_callback(char* event_name, void* params) - call event
- openme_finish(…) - finalize (if needed)
Example: OpenME for LLVM 3.x
This piece of code can control LLVM unrolling on the fly:
Source code: tools/clang/tools/driver/cc1_main.cpp:
#include "openme.h“ … int cc1_main(const char **ArgBegin, const char **ArgEnd,
const char *Argv0, void *MainAddr) {
openme_init("UNI_ALCHEMIST_USE", "UNI_ALCHEMIST_PLUGINS", NULL, 0); … // Execute the frontend actions. Success = ExecuteCompilerInvocation(Clang.get()); openme_callback("ALC_FINISH", NULL); … }
Source code: lib/Transforms/Scalar/LoopUnrollPass.cpp
#include <cJSON.h> #include "openme.h“ … bool LoopUnroll::runOnLoop(Loop *L, LPPassManager &LPM) {
struct alc_unroll { const char *func_name; const char *loop_name; cJSON *json; int factor; } alc_unroll; … alc_unroll.func_name=(Header->getParent()->getName()).data(); alc_unroll.loop_name=(Header->getName()).data(); openme_callback("ALC_TRANSFORM_UNROLL_INIT", &alc_unroll); … // Unroll the loop. alc_unroll.factor=Count; openme_callback("ALC_TRANSFORM_UNROLL", &alc_unroll); Count=alc_unroll.factor;
if (!UnrollLoop(L, Count, TripCount, UnrollRuntime, TripMultiple, LI, &LPM)) return false; … }
It uses dynamic Alchemist plugin (available in cM as a dynamic library) that is used for fine-grain compiler analysis and tuning.
Example of Alchemist source code:
#include <cJSON.h> #include <openme.h>
int openme_plugin_init(struct openme_info *ome_info) { … openme_register_callback(ome_info, "ALC_TRANSFORM_UNROLL_INIT", alc_transform_unroll_init); openme_register_callback(ome_info, "ALC_TRANSFORM_UNROLL", alc_transform_unroll); openme_register_callback(ome_info, "ALC_TRANSFORM_UNROLL_FEATURES", alc_transform_unroll_features); openme_register_callback(ome_info, "ALC_FINISH", alc_finish); … }
extern void alc_transform_unroll_init(struct alc_unroll *alc_unroll){ … }
extern void alc_transform_unroll(struct alc_unroll *alc_unroll) { … } …
Example of OpenME for OpenCL/CUDA C application
Source code: 2mm.c / 2mm.cu
… #ifdef OPENME #include <openme.h> #endif …
int main(void) { … #ifdef OPENME openme_init(NULL,NULL,NULL,0); openme_callback("PROGRAM_START", NULL); #endif … #ifdef OPENME openme_callback("ACC_KERNEL_START", NULL); #endif
cl_launch_kernel(); or mm2Cuda(A, B, C, D, E, E_outputFromGpu);
#ifdef OPENME openme_callback("ACC_KERNEL_END", NULL); #endif …
… #ifdef OPENME openme_callback("KERNEL_START", NULL); #endif
mm2_cpu(A, B, C, D, E);
#ifdef OPENME openme_callback("KERNEL_END", NULL); #endif
#ifdef OPENME openme_callback("PROGRAM_END", NULL); #endif … }
Example of OpenME for OpenCL/CUDA Fortran application
Source code: matmul.F
PROGRAM MATMULPROG …
INTEGER*8 OBJ, OPENME_CREATE_OBJ_F CALL OPENME_INIT_F(""//CHAR(0), ""//CHAR(0), ""//CHAR(0), 0) CALL OPENME_CALLBACK_F("PROGRAM_START"//CHAR(0))
… CALL OPENME_CALLBACK_F("KERNEL_START"//CHAR(0)); DO I=1, I_REPEAT CALL MATMUL END DO CALL OPENME_CALLBACK_F("KERNEL_END"//CHAR(0));
… CALL OPENME_CALLBACK_F("PROGRAM_END"//CHAR(0)) END
cM evolution and comparison with previous frameworks
Here is the list and brief comparison of frameworks that Grigori Fursin developed during his past R&D to show evolution and reasoning behind framework/repository internals.
|
1993-1997 |
1997-1999 |
1999-2004 |
2005-2006 |
2007-2009 |
2010-2011 |
2012-cur. |
---|---|---|---|---|---|---|---|
Name |
- |
SCS - SuperComputer Services |
EOS - Edinburgh Optimizing Software |
FOS - Framework for Continuous Optimization |
cTuning1 / MILEPOST |
cTuning2 aka Codelet Tuning Infrastructure |
cTuning3 aka Collective Mind |
Purpose |
Modeling of semiconductor neural nework-based accelerators and computers |
Universal and simple access to supercomputers through web services (similar to cloud) |
Performance analysis and auto-tuning framework and repository for large numerical applications; preparaction for machine learning based optimizations |
Fine-grain plugin-based auto-tuning framework and repository |
Collaborative R&D framework and repository for program and architecture analysis and co-optimization using machine learning |
Customized repository to automate experimentation at Intel Exascale Lab for codelet characterization and optimization combined with auto-tuning and machine learning based on past Grigori's techniques. |
Universal plugin-based collaborative R&D infrastructure and repository for systematic and reproducible analysis, optimization and run-time adaptation of computer systems |
Support |
MIPT grant |
MIPT grant |
EU FP5 MHAOTEU project and Universityof Edinburgh |
HiPEAC, INRIA |
EU FP6 MILEPOST project, IBM, CAPS, ARC (Synopsys), University of Edinburgh, INRIA |
Intel/CEA Exascale Lab |
INRIA, HiPEAC (academic and industrial partners), cTuning.org community |
Publicly available |
- |
yes |
yes |
yes |
yes, cTuning.org |
unlikely (status, May 2013). Since 2012, Grigori moved all his related R&Ds to the new public, universal and customizable cTuning3 (Collective Mind) collaborative R&D infrastructure and repository. |
yes, cTuning.org and c-mind.org |
License |
- |
GPL |
GPL |
GPLv2 |
GPLv2 |
plans for GPLv3 |
BSD and LGPL |
Fully integrated solution for collaborative and reproducible R&D |
- |
partial |
partial |
partial |
yes |
no (passive repository) |
yes |
Associated new publication model |
- |
- |
- |
- |
yes |
- |
yes - new theme for HiPEAC 2012-2020 |
Academic community |
- |
- |
- |
- |
yes |
- |
yes |
Industrial community |
- |
- |
- |
- |
yes |
- |
yes |
Interesting usages |
developed semiconductor neural network and all modeling software used for teaching at MIPT |
universal supercomputer web-based service have been used for some time for novice users at Russian Academy of Sciences until more advanced GRID/cloud services became available |
testing pre-cTuning concepts used in several academic and industrial projects on auto-tuning of large realistic applications for supercomputer centers |
Used by ICT to tune compilers and programs for Loongson processor |
Tuned default compiler optimization heuristic of a new GCC version for ARC customers for real-time application with performance/code size constraints Tuning various applications in multiple academic and industrial international projects |
NDA |
Used for new publication model Prepared for HiPEAC common research and development infrastructure Being testsed in several academic and industrial projects |
Repository type |
csv/xls files |
MySQL |
MySQL |
file-based |
MySQL / file-based |
flat file-based, very slow - requires indexing |
hierarchical file-based |
Queries |
- |
MySQL queries |
MySQL queries |
- |
MySQL queries |
Obligatory ElasticSearch indexing |
On demand ElasticSearch indexing and queries |
Interface to open up tools and applications for analysis and tuning |
- |
- |
- |
Interactive Compilation Interface for Open64, GCC and PathScale compilers |
Interactive Compilation Interface for GCC that was included to mainline since GCC 4.6 |
- |
OpenME - universal interface for interactive or online analysis and auto-tuning for any tool (LLVM, GCC, Open64, run-time system, etc) or application |
Language |
C, C++ |
MSVC, perl |
Java, C, C++ |
C |
C, C++, php |
C and python |
python and OpenME interface to connect to any other language |
Web service |
- |
third-party web server |
integrated java based server |
- |
third party web server and mediawiki integration |
third party web server |
unified and integrated web server plugin |
Auto-tuning plugins |
- |
low-level plugins for basic random tuning | low-level plugins for various auto-tuning strategies | low-level plugins for off-line and on-line analysis, auto-tuning and adaptation |
high-level and low-level plugins for off-line and on-line analysis, auto-tuning and adaptation |
- |
unified and extensible auto-tuning plugins for off-line and on-line tuning of CPU/CUDA/OpenCL codelets and applications |
Focused search |
- |
- |
first plugins to focus exploration on areas with high probability of "unusual" behavior |
first plugins to focus exploration on areas with high probability of "unusual" behavior |
various plugins for probabilistic focused search |
- |
on-going: unified and extensible exploration and modeling plugins |
Program/architecture feature extraction for machine learning |
- |
- |
started |
prototypes |
plugins for MILEPOST GCC and for collection of hardware counters |
- |
unified cM plugins for cTuning CC, MILEPOST GCC, Alchemist and other tools to obtain semantic features, code patterns, hardware counters, architecture features, etc |
Machine learning plugins |
- |
- |
started |
prototypes |
plugins for nearest neighbour classifier |
- |
unified and extensible predictive modeling plugins such as nearest neighbour classifier, SVM, etc |
Run-time adaptation plugins |
- |
- |
- |
first prototype of static multi-versioning combined with run-time monitoring and adaptation |
static multi-versioning and run-time adaptation/scheduling (supporting CPU/CUDA codelets) |
- |
on-going using OpenME interface and combination of static multiversioning, machine learning and run-time adaptation. We can mix various programming models (CPU/CUDA/OpenCL codelets) |
Memory/CPU bound detection plugins |
- |
- |
source-to-source tool |
- |
- |
- |
on-going within Alchemist |
Documentation |
- |
partial |
partial |
partial |
full documentation at cTuning.org wiki (and technical report) |
some |
full DoxyGen and Wiki-based community-based documentation |