Contents
- 1 cM user guide
- 1.1 Installation and configuration
- 1.2 Checking cM
- 1.3 Basic usage scenarios
- 1.3.1 Basic cM repository functions through CMD
- 1.3.2 Basic cM repository functions through cM web front-end
- 1.3.3 Searching data (internal cM engine or third-party ElasticSearch engine)
- 1.3.4 Listing data and pruning by cM classes
- 1.3.5 Collective Tuning through cM front-end
- 1.3.6 Installing packages for collaborative experimentation
- 1.4 Advanced experimental scenarios (command line based)
- 1.5 Advanced experimental scenarios (web based)
cM user guide
This user guide is continuously evolving and describes various cM usage scenarios. In the mean time, user may also be interested to look at at available cM demos and use existing modules and their functionality as examples, or contact us for more details on how to build and use customized repositories and auto-tuning modules.
Installation and configuration
cM can work "out of the box", however we strongly suggest to perform 2 simple steps (or 3 if you would like to install third-party tools for advanced cM functionality):
1) Set 2 environment variables:
- CM_ROOT - root cM directory
- PATH - $CM_ROOT/bin (or %CM_ROOT%/bin on Windows)
Now, you should be able to run cm from command line and see the following:
Collective Mind Use "cm <module> help" or "cm <module> info" for more information.
2) Run cM configurator: Run . ./configure.sh on Unix (configure.bat on Windows) in cM root directory. It will allow you to register cM user at c-ming.org live repository and set/update several important cM parameters including web authentication and indexing (most of the time you can just press Enter to select/keep default value except when registering new cM user):
- checking latest development notes and announcements about cM
- detecting/selecting host OS
- register unique cM username at live c-mind.org/repo repository (very important to be able to download shared packages, codelets, benchmarks, datasets, models, etc)
- configuring several cM web server parameters (mainly authentication and local URL/port)
- configuring cM index server (uses ElasticSearch for fast queries and search)
- checking if CURL is installed (to speed up cM index server)
- downloading latest shared codelets, benchmarks, data sets, packages, etc
You can re-run configuration at any time. If you run cM web server, you can also use cM web front-end to update configuration at http://localhost:3333?cm_menu=scenarios&cm_submenu=core_configure.
3) Install third-party tools for advanced cM functionality: You may need third-party tools and python packages for the following functionality:
Advanced cM features such as analyzing variance in experiments, building graphs, performing data analysis, etc, require installation of additional python modules and third-party tools:
- Speeding up indexing:
- curl (curl.exe already included in cM for Windows)
- Generating images (qr-code, etc):
- Python package "imaging" (on Linux install using apt-get/yum install python-imaging)
- Performing various numerical analysis and generating graphs (see http://matplotlib.sourceforge.net/users/installing.html, http://www.scipy.org/Installing_SciPy/Linux):
- Python package "python-matplotlib" (on Linux install using apt-get/yum install python-matplotlib)
- Python package "python-numpy" (on Linux install using apt-get/yum install python-numpy)
- Python package "python-scipy" (on Linux install using apt-get/yum install python-scipy)
- Various numerical analysis such as variation of characteristics and normality test:
- R (cM was tested with R 2.14.2)
- R package "normtest"
- Building decision trees
- graphviz (on Linux install using apt-get/yum install graphviz*)
- Using interactive graphics
- pygtk2 (on Linux install using apt-get/yum install pygtk2)
- Using both cM web front-end and terminal
- xterm (cmd.exe is used on Windows)
- Running remote experiments through cM web front-end
- xauth
If you have problems installing cM, please check cM knowledge base and/or contact us
Checking cM
Users can access all cM functionality from CMD using cm or cmx front-end (accessing repository, executing experimental pipelines, reproducing experiments, compose new experiments, etc). To test that basic cM CMD version is working, user should execute cm from the command line and should receive the following output:
Collective Mind Use "cm <module> help" or "cm <module> info" for more information.
cM also features built-in full featured python-based web server to allow users access most of cM functionality through user-friendly cM web front-end or expose data to the world through unified cM web services - user can start web server using cm_web_start.sh in a separate terminal (new terminals will be opened through this root terminal in some usage scenarios) and access cM front-end at http://localhost:3333
Basic usage scenarios
Basic cM repository functions through CMD
Any module can be accessed from CMD using the following command:
cm <module name or alias> <action> (param1=value1 param2=value2 @file1.json @file2.json -- unparsed string)
However, we suggest to use more user-friendly cmx wrapper for cm since it was specially designed to have similar syntax as svn, git, etc. You can list all available options using:
cmx --help
Creating repository in some directory:
cmx repo add abc
where abc is the cM UOA of the repository (UID or alias)
You should now see .cmr directory in your current path. If you run cmx repo list you should also see abc among list of available repositories
You can now add some entry xyz, say for tmp module and with parameter x=y:
cmx add tmp:xyz x=y
Now you can see this entry in your abc repository (by listing all entries for module tmp):
cmx list tmp
Note, that you can see UIDs along alises when listing entries using the following command:
cm list tmp show_alias_and_uid=yes
You can load entry xyz:
cmx load tmp:xyz
You will see json output with various cM internal info for this entry. User data for this entry is in dictionary parameter "cm_data_obj" -> "cfg", where you will see "x":"y" besides other additional internal information.
Note, that you can see the same json and can edit it directly at:
.cmr/tmp/xyz..cm/data.json
You can add various files (packages, data sets, codelets, benchmarks, traces directly to .cmr/tmp/xyz).
You can update entry xyz (say parameter x to z):
cmx update tmp:xyz x=z
Note, that load and update now support coarse-grain locking and waiting for remote parallel aggregation of data.
Now, when you load this entry again, you will see "x":"z" in "cm_data_obj"->"cfg"
cmx load tmp:xyz
Finally, you can delete this entry using
cmx rm tmp:xyz
Together with renaming and copying functions, these are all basic repository functions that you need to know to work with cM. These functions are easily accessed in a standard way from the cM modules.
Also, note that you can simply archive your .cmr repository, move it to some other machine with cM and then use the same command cmx repo create abc to import existing repository to the system.
Repositories themselves are treated as entries under repo module so you can add and remove them just like any other entries! cM support remote repositories which can be either shared as SVN (and we plan to add GIT and other frameworks later) or downloaded directly using unified cM web services
Various additional scripts demonstrating cM functionality can be found in $CM_ROOT/scripts/1-examples-of-basic-repo-usage
Basic cM repository functions through cM web front-end
User can access most of the cM above functionality using cM web front-end (self explanatory):
- User-friendly repository browsing. Try this function at c-mind.org.
- Finding entries by their UID. Try this function at c-mind.org.
- Searching entries. Try this function at c-mind.org.
Searching data (internal cM engine or third-party ElasticSearch engine)
There are two types of search in cM.
- Very basic “internal” search that does just searches keys and possibly values in all (or pruned) entries. It is very slow but it does not rely on any external tools so it can be used for searching through fairly small amounts of data (useful when deployed in cloud services or GRID with limited space and where we do not want to analyze huge amounts of data).
- Powerful search using third-party indexing/caching frameworks – it should be used on either servers with live repositories or on machines where analysis of data is performed and should have considerable amount of memory available.
Currently, we use popular Apache-licensed ElasticSearch that is a Java-based scalable server that is accessible through standard http and directly indexes json (that’s why it was very easy to add to cM – basically it is abstracted with cM “index” module, several functions and less than 1000 lines of python code).
Positive news about this approach is that we can index all data on the fly (transparently to cM end-users), and we have an access to very fast and powerful queries not available in the basic search. Negative news is that we now depend on third-party tool and Java (it seems that there are similar python-based index servers but we leave it for the future work).
Examples of search usage:
For both types of searches, the search function is only one (idea is that end-user should not know if there is a slow or fast search): “search” in the “core” module.
Its description in doxygen is available at c-mind live repo
You can access it locally too, but you will need to regenerate documentation (you can do it from cM web front-end but make sure that doxygen is installed in the system)
Note, that when you use cM CMD front-end, the command line of param1=value1 param2=value2 @file.json is simply converted to dictionary (json) and passed to a given function, that’s why if you check the DoxyGen documentation for a given function, you should know how to either invoke cM CMD or call function directly from the modules ...
So, for simplicity you can try the following (I expect that indexing is OFF):
cm core search key_1=cm_display_as_alias value_1="mibench consumer" show_search_progress=yes timeout=100 cm_console=json_with_indent
It should find several benchmarks related to mibench consumer.
You can add more keys and values (key_2 and value_2, etc). cM will use AND to find occurrences with multiple keys/values
You can also prune the search by repository and module (otherwise it will search through all repositories and modules). However, if you use CMD, you will need to add some special characters to add lists, therefore we use @@ and ^22^ to pass json directly through CMD:
cm core search key_1=cm_display_as_alias value_1="mibench consumer" show_search_progress=yes timeout=100 cm_console=json_after_text "@@{^22^repo_selection^22^:[^22^ctuning^22^], ^22^module_selection^22^:[^22^code.source^22^]}"
Naturally, you can just substitute it all to a nice json file:
{ “cm_action”:”search”, “key_1”:”cm_display_as_alias”, “value_1”:”mibench consumer”, “show_search_progress”:”yes”, “use_internal_search”:”yes”, “timeout”:”100”, “cm_console”:”json_after_txt”, “repo_selection”: [“ctuning”], “module_selection”: [“code.source”] }
and use
cm core @input.json
Now, if you want to considerably speed this search up, you should use indexing.
First start indexing server in a separate terminal:
cm_start_indexing.sh (or cm_start_indexing.bat if on Windows)
Normally, it should run with default configurations.
Then you should tell cM that you use indexing. You can turn it on simply from CMD using
cmx index on
You can also turn it off at any time using
cmx index off
You can also check status of indexing using
cmx index
If you use cM web front-end, you will now need to restart it to use indexing.
If you use indexing for the first time, you will need to reindex all data (quite long but you need to do it only once):
cmx index *
Now, if everything is working properly, you should be able to perform the above search just in a few seconds (through all repositories and modules):
cm core search key_1=cm_display_as_alias value_1="mibench consumer" cm_console=json_with_indent
You can also use complex ElasticSearch queries with AND, OR, wildcards, etc directly (though now your modules are dependent on ElasticSearch):
cm core search cm_es_string="cm_display_as_alias:*milepost*" cm_console=json_with_indent
More info about ElasticSearch queries can be found here.
In fact cM was designed as such to minimize search and be able to access/load most of the entries directly (it considerably speeds up data processing).
Currently, search is actively used when installing packages or in automatic tuning where cM finds entries for a given OS, architecture, compiler, benchmark, dataset, etc and which contains the best found combination of flags (for example in this Android application that connects to c-mind.org repo and needs to try to check some random flags, but it needs to find/generate flags for its own ARM processor, possibly frequency, etc).
For example, user can find packages that support ARM processors using:
cm core search key_1=##cm_dependencies#target_processor value_1=58d15092852a3e90 use_flat_array=yes cm_console=json_with_indent show_search_progress=yes "@@{^22^module_selection^22^:[^22^package^22^]}"
Most packages should have a JSON key/value (unless it supports any processor):
{ “cm_dependences”: { “target_processor”:"...” } }
Note, that here target_processor key keeps UID, hence it is not possible to search by alias (you need a higher-level plugin that disambiguates that such as pruning experimental results in cM, for example). To find the UID of the processor in the above example, user need to do the following:
cm processor list show_alias_and_uid=yes
or it is possible to load the entry and find cm_uid in the output array:
cmx load processor:generic-arm
Listing data and pruning by cM classes
cM uses notion of classes to start gradual classification of computer systems and their behavior (similar to tags or keywords). Classes are entries themselves under module class. For example, it is possible to list all entries for a given module and then prune it using “classes” , i.e.
cm code.source list or cm code.source list "@@{^22^cm_classes_uoa^22^:[^22^914178a17c102fb5^22^]}"
Where 914178a17c102fb5 is the UID of class “library” – user can find it by browsing or search repository
This is also useful when user find packages supporting ARM architecture:
cm packages list "@@{^22^cm_classes_uoa^22^:[^22^914178a17c102fb5^22^]}"
Collective Tuning through cM front-end
Various scenarios are created by combining existing modules just like research "LEGO".
For now, we omit most of the low-level examples, since it's much easier to use cM web front-end that keeps track of all dependencies, etc.
The idea is quite simple: we created a very long experimental pipeline that includes compilation, execution and profiling of any shared codelet or benchmarks with any shared dataset. This pipeline exposes "explorable" or "tunable" properties and monitored characteristics. You can see this pipeline on your local machine here.
Then, we collaboratively add various scenarios as modules such as exploration of various tuning dimensions combined with multi-objective Paretto filters, R normailty tests, online focused exploration and tuning, etc that call this pipeline automatically. For example, you can see universal exploration scenario (module "ctuning.scenario.exploration") on your local machine [here]. You can also see specialized compiler tuning scenario (module "ctuning.scenario.compiler.optimizations" and module "ctuning.scenario.compiler.optimizations.crowd" to demo crowd sourcing of compiler flag auto-tuning across mobile systems and cloud services). Eventually, end-users at the universities or companies can extend this pipeline or add new scenarios according to their needs and R&D projects.
Finally, users can list all experiments or prepare graphs (again, modules that are collaboratively extended to provide more powerful functionality).
Installing packages for collaborative experimentation
Collective Mind enabled easy co-existance of multiple versions of tools and libraries. Go to cM web front end –> Usage –> Install / monitor packages. Then select Collective Tuning Setup according to your system and install necessary packages (tools, libraries, etc). If package require compiler, please select available compiler in the top right menu before installation/buiding.
Advanced experimental scenarios (command line based)
Various demos are available here.
Advanced experimental scenarios (web based)
Compiling and running program using universal pipeline
1) Go to cM web front end –> Usage –> Prepare experiment/tuning repository. Then just click button (Clean and) Copy.
You should see a new xterm (if you use Linux) that will copy source codes from repos\ctuning\.cmr\code.source to repos\ctuning-experiments\.cmr\code.source.
If later some old source codes will be in ctuning-experiments, they will be deleted, so be careful Winking smile ... The idea is to keep ctuning clean and perform experiments in ctuning-experiments which is now working repository. When pipeline is run, code is compiled and run inside an entry in the repository directly.
2) Now you should normally be able to run a simple test using pipeline if you have some default GCC or LLVM installed on the machine. Let’s consider for now that the experiment will also run on your machine and not on remote board or mobile (I will show how to do it later in the email).
Go to cM web front end –> Usage –> Manually tune program/architecture through pipeline. Most of the parameters are preset, but you still need to set a few: a) Choose “Collective Tuning Setup” depending on your platform – first is the host machines (where framework runs and where compilation will happen), second is the target (where execution will happen). If you run on Linux, it will be likely Generic Linux64 – Linux64. It is done for user’s convenience to automatically preset some other parameters in the pipeline.
Note, that you have to wait a bit until cM is refreshed when you make some selection - it’s done because by default we use simple plain html for portability. I hope that one day we can add ajax or some java-script not to refresh all state collaboratively.
b) Select some program – for example “CBench: automative susan (copy)”
When you select it, it will automatically update possible “command line” and “dataset” (in cM I have description of dependencies – just need to document that too one day).
You can use default one for now.
c) Choose you compiler, for example GCC.
d) Choose “Build Script Name” – by default it is set to “Build dynamic binary from C sources through assembler” but you should either set it to “Build dynamic binary directly” or “Build dynamic binary from C sources through LLVM BC” if you use LLVM.
That should be all.
Now you can scroll to the end of the page and press ‘Start’. If everything correct, program should be compiled and run with auto-calibration (automatically rerun program and adapt repeat number around main kernel (if supported) until execution time is around 5 seconds. By default, I measure execution time around program and since some programs with some dataset runs only milliseconds which is less than the OS variation in execution time, I automatically move it to a reasonably long time.
At the end, you should see execution time reported. Note, that in fact, pipeline is just yet another module with json as input (that what web pipeline is directly converted to) and an output. So in fact, you may also set “Save output json to file” to some file at the very end of the web page, and then at the end of the execution, the whole state of the pipeline will be dumped to the file. This is also done to be able to reproduce results since if you save input and output of the pipeline, you can simply reproduce it from the command line using:
cm ctuning.pipeline.build_and_run_program @input.json
You can save input state (what you selected in the web), by adding some name of file in the “Save input json to file” at the very end of the web page.
Also, basically auto-tuning scenarios are built on top of this pipeline and they tune exposed choices and monitor characteristics.
Note, that I have support for measuring kernels inside program using OpenME and all currently shared benchmarks already support it (including numerical codes for CUDA and OpenCL).
3) Some advanced modes – installing packages: Since you may have various versions of the same compiler (say GCC with multiple versions or LLVM with your own extensions), I provide ‘packages’ in cM that allows easy co-existance of many variants of tools or libraries at the same time. Basically you can install or build various packages or libraries that will install their instances inside repository in different entries under “code” module which is associated with binaries (tools, applications, libraries).
For example, as a simple test, you may want to install Sourcery GCC 4.7.2 for ARM and run your code on some off-the-shelf Android phone connected to your system through adb
- Go to cM web front end –> Usage –> Install/monitor packages
- Select in Collective Tuning Setup “Generic Linux64 – Android”
- Find “Sourcery GCC 4.7.2 for Linux ARM” and just click Install.
- You should normally see new xterm where installation will happen.
- If it’s ended with note “... successfully installed” or “... successfully build” you can close xterm and refresh cM web front-end.
- Normally you should see Install status “Success” in front of the “Sourcery GCC 4.7.2 for Linux ARM” and entry uid where it was installed! That’s all!
4) Now, you can go back to tuning pipeline and refresh it.
- Select in Collective Tuning Setup “Generic Linux64 – Android”
- Now, you should be able to see in compiler newly installed “Sourcery GCC 4.7.2 for Linux ARM”.
- You can select it and just press again the button ‘start’.
- Normally, the code should be compiled with the Sourcery GCC 4.7.2 for Linux ARM and then run on the Android using adb (i.e. in the pipeline I detect that it’s Android, and copy datasets and all necessary files to the Android automatically). That’s all.
Some demos are available at c-mind.org/repo and described here.
Recording/reviewing/reproducing/visualizing experimental results from the pipeline
One of the main intentions of cM is the possibility to collaboratively record, review, visualize and reproduce experimental results.
Recording experimental results:
cM includes a universal cTuning.space module that abstracts any multidimensional exploration and analysis space. Associated entries contain information about pipeline UOA and explored points in space that always include pipeline input and output.
When using program tuning pipeline, one can manually save explored points in a repository by selecting or entering some UOA in a 'Save pipeline input and ouput in cTuning space' sub-section. Then, at the end of the successful pipeline execution, a point will be recorded in the repository. Additional run of a pipeline, will append points to an already existing ctuning.space entry or will create a new ctuning.space entry.
Reviewing experimental results:
Naturally, after performing many experiments with different platforms, compilers, codelets, datasets, there may be a considerable amount of experiments collected. We envision that results will be either recorded in multiple cM repositories or pruned by search. A user may view and prune experimental results using cM web front-end in 'Usage'->'View/prune experimental results from all scenarios'.
For example, if a user performed 2 runs of universal cTuning build and run pipeline for the same program/dataset/platform/compiler but with 2 different flags and recorded results in the ctuning.space, then by selecting Collective Tuning exploration space in Exploration scenario module and clicking on Prune button, it will be possible to show a table with all available experimental results. Entries in this table can be easily pruned further for different compilers, platforms, programs, datasets, notes, etc. Furthermore, a user can see 3 additional options on the right side of all explored points in space Reproduce, View and Graph that are described below.
Note: using indexing will dramatically improve the pruning speed!
Reproducing experimental results:
Recording both pipeline input and output allows one to easily reproduce the results. When clicking on Reproduce button (this can be done from CMD too), cM will open a new terminal and will run a pipeline with the input recorded in the selected entry. After execution, new pipeline output will be compared with the recorded and all changes will be reported allowing users to analyze performance, compilation time and power variability (or any other metrics collaboratively added to the pipeline) or reproduce bugs in applications and compilers, etc.
Viewing results:
When clicking on View button, one can view a ctuning.space entry in cM front-end.
Visualizing results:
After exploring various points, user usually would like to visualize various dimensions of this space for further analysis, modeling or mining of interesting correlations. Therefore, cM has a powerful universal and extensible capability to visualize spaces. When clicking on Graph, a user will see a simple cM page for graph customization where there is a possibility to select graph engine and dimensions to visualize (the same page can be opened directly from cM web front-end using 'Usage'->'Build graph of multi-dimensional optimization space').
We currently added various ones including 2D column and scatter from Google Web Services and Python MatPlotLib, and 1D density variation graph from Python MatPlotLib, and expect that users will be adding more graph engines or capabilities collaboratively. Using Python MatPlotLib require installation of this module but allows export of graphs to pdf, eps and png formats to easily add them to publications, presentations and web sites. Users can also aggregate points from multiple entries or visualize them as separate graphs by selecting Use multiple graphs for powerful visual comparisons.
Automatically exploring multiple dimensions in cTuning pipeline
We provide a way to mark various parts of cTuning pipeline as characteristics, properties, system state, coarse-grain choices and fine-grain choices. This allows developers or even users to collaboratively extend or add their own exploration, analysis and modeling scenarios on top of this (or any other) pipeline:
We created several scenarios based on our past research as described below. User can run them using cM web front-end 'Usage'->'Automatically explore program/architecture tuning choices' and then select 'Exploration scenario module'. Depending on selected scenarios, additional scenario parameters and choices will be available. Various analysis and modeling modules (such as Pareto frontier detection, normality test for variation of characteristics, or predictive modeling will also be chained together depending on exploration scenario.
Note, that 'coarse-grain choices' in cM currently allows automatic selection of various programs, command lines, datasets andd compilers to prepare experimental pipeline. Then 'fine-grain choices' will be explored and recorded per given selection of 'coarse-grain choices'.
Current modes for exploration includes:
- Fixed to allow user make a choices manually
- Fixed (always first choice) to let cM select first choice (for example, useful for datasets, to explore only first dataset instead of many)
- Random/flexible to let cM make a random selection from either available entries or from a range of parameters (specified by start, stop and step). Flexible mode is reserved for the future where search is focused using probabilistic approaches, PCA, genetic algorithms, etc.
- All (loop #) to let cM loop around available choices (either UOA, or as a loop with stop, start and step). Note, that loop has a number - it allows one to explore several dimensions simultaneously if loops have the same number. Otherwise, loops will be explored one after another according to the loop number.
Also, note that cTuning pipeline can use OpenME interface to expose choices and characteristics in any application or in an OpenME-compatible compiler (such as dataset parameters, number of threads in OpenME or Intel TBB, scheduling policies in OpenME or fine-grain tuning such as tiling, unrolling, etc).
Example: universal exploration
If 'universal exploration' is selected, a user can explore various dimensions in a pipeline for analysis and modeling of various characteristics vs properties, for example CPI vs dataset size or execution time vs number of threads. Also, user can regularly run all available benchmarks and test them or compilers for bugs, or compare compilation time or performance benefits of various compilers (see cM demos), etc. At the same time, user have a possibility to specify total number of coarse-grain and fine-grain iterations, number of statistics repetitions (if more than 7, additional module for normality check of variation of all exposed characteristics will be used (requires R)), etc. Experimental results can be viewed, reproduced, pruned and plotted as was described above.
Example: compiler optimizations auto-tuning
We added yet another tuning module to unify exploration of compiler optimizations. We use top-down approach starting from coarse-grain combinations of compiler flags, something that was not yet after so many years of R&D, and later collaboratively adding fine-grain optimizations (support is already available through OpenME-compatible compilers or pragmas and third-party source-to-source transformation tools). We hope that unifying this tuning and supporting any compiler with crowdtuning will help us understand how to build compiler optimization heuristic automatically.
When user selects 'compiler optimization' scenario, additional self-explanatory parameters will be available. It is possible to select additional modules for statistical analysis of experimental results and for online learning (currently we added Pareto-frontier detection). When using Pareto frontier module, one can select dimensions for frontier in 'Init pipeline'->'Tuning objective'. Currently we prepared several tuning objectives:
- global execution time
- global compilation time (useful when designing and comparing compilers)
- global code size
- global execution time vs code size (useful for embedded systems)
- global execution time vs compilation time (useful for JIT and split compilation)
- global execution time vs compilation time vs code size (embedded systems and JIT/split-compilation)
- global execution time vs energy (currently using CPU frequency - experimental)
Basically, developers can add more modules depending on their tuning objectives. Note, that this module is very simple and detects all points on frontier and we hope to extend it to leave only a few optimal points on the frontier.
Experimental results can be viewed, reproduced, pruned and plotted as was described above.