From cTuning.org

Jump to: navigation, search

Navigation: cTuning.org > CTools > CTuningCC

cTuning CC V2.5 with MILEPOST GCC 4.4.x

General configuration

 Check ./_set_environment_for_analysis_compiler__milepost_gcc.sh - normally
 all environment variables should be already properly set (check variable CCC_UUID -
 the uuid tool). You have to source this file before using cTuning CC - it tells
 cTuning CC to use MILEPOST GCC for program analysis (extraction of features
 and access to fine-grain optimizations through ICI).

 Importantly, cTuning CC can now use any compiler that supports ICI and cTuning/MILEPOST 
 technology for code analysis and characterization, that is configured through
 the following environment variables (using GCC as an example):

   CTUNING_ANALYSIS_CC=gcc
   CTUNING_ANALYSIS_CPP=g++
   CTUNING_ANALYSIS_FORTRAN=gfortran

 File ./_set_environment_for_plugin_tests.sh sets up environment
 variables for low-level ICI tests and should also be already properly
 set. If you plan to use only high-level cTuning CC, you can skip it.

Compiler configuration

 * You can find how to use cTuning Compiler Collection either transparently
   without Makefile modifications or explicitly using multiple
   benchmarks in the demo directory (bitcount, bzip2, libvorbis, matmul).

   You need to first configure environment variables in the
   ___common_environment.sh which are user-dependent:

   cTuning CC can use 2 separate compilers - one for analysis (should support
   ICI and cTuning/MILEPOST technology for program and architecture characterization,
   self-tuning and adaptation such as MILEPOST GCC) and another can be any user compiler
   (GCC, LLVM, ICC, ROSE, Open64, XL, etc) driven by the analysis compiler.

   User compiler is defined using the following environment variables (using GCC as an example):

     CTUNING_COMPILER_CC=gcc
     CTUNING_COMPILER_CPP=g++
     CTUNING_COMPILER_FORTRAN=gfortran

   CCC_CTS_USER and CCC_CTS_PASS should be set to your username and password when
   self-registering at http://cTuning.org/wiki/index.php/Special:UserLogin

   NOW YOU CAN TEST cTuning CC wrapper and communication with the cTuning database
   by invoking __test_milepost_gcc.sh. If everything is installed correctly, you
   should get a response from the cTuning web-service: "Test passed successfully".

   In order to continue using cTuning CC, you can check the following variables:
   Note that they already have default parameters so you do not have to change that
   unless you want to tune cTuning CC:

   CCC_CTS_URL=cTuning.org/wiki/index.php/Special:CDatabase?request=
               - points to the cTuning web-service.

   CCC_CTS_DB=cod_opt_cases - points to the database with optimization cases
              from the community.

   ICI_PLUGIN_VERBOSE=1 - if set to 1, additional diagnostic information from ICI plugins.
   ICI_VERBOSE=1 - if set to 1, additional diagnostic information from ICI.

   ICI_PROG_FEAT_PASS=fre - sets pass after which to extract static program features.

   CCC_COMPILER_FEATURES_ID=129504539516446542 - sets compiler ID which was used
                            to extract static program features for all programs
                            at cTuning.org. Do not changed it unless you really
                            understand what you are doing ;) !..

   CCC_OPTS="-O3" - sets combination of flags to be used if cTuning prediction web-service
                    did not return optimization flags.

   CCC_OPT_ARCH_USE=1 - if set to 1, cTuning CC will also use architecture-dependent flags
                        (such as -march=athlon64) from cTuning.org. If set to 0, architecture
                        dependent flags will be ignored.

   TIME_THRESHOLD=0.3 - when calculating speedups at cTuning.org, only optimization cases
                        with EXECUTION TIME more than this threshold are considered.

   NOTES= - when <>"", only those optimization cases are returned that have this NOTES.

   PG_USE=0 - if set to 1, only those optimization cases are returned that have function and other
              level profiling. If unset or set to 0, use only those cases that do not have profiling
              to avoid speedup skewing due to profiling.

   OUTPUT_CORRECT=1 - if set to 1, only those optimization cases are returned that have been
                      checked for correctness by comparing benchmark outputs for the original
                      and transformed program (note that it still does not guarantee that
                      the combination of optimizations is correct, but it helps to reduce
                      obvious wrong cases).

   RUN_TIME=RUN_TIME - sets which execution time to use when calculating speedups
                       (RUN_TIME - overall program execution time,
                        while RUN_TIME USER - only user execution time)

   SORT=012 - when predicting optimizations, the best combinations of optimizations
              are selected from the most similar program. Naturally, that program
              can have flags that improve not only execution time, but also code
              size and compilation time among other parameters. Hence a user can
              suggest an order of sorting speedups by:
               0 - execution time
               1 - code size,
               2 - compilation time
              before returning the top optimization. For example, when setting this variable to
              012 - cTuning returns the optimization case with the highest execution time
              and only then sorts them by code size improvement and compilation time speedup;
              102 - cTuning returns the optimization case with the highest code size improvement,
              then execution time speedup and then compilation time;
              201 - cTuning returns the optimization case with the highest compilation time speedup,
              then execution time speedup and only then code size.

   CT_OPT_REPORT=1 - when set to 1, cTuning returns all optimization cases sorted according to SORT
                     environment variable together with the associated optimization ID so that user
                     could later force different optimization case, particularly when having multi-objective
                     optimization scenarios.

                     Here is an example of such output:

                        ****************************************************************************
                        Checking program features (and aggregating them if generated) ...

                        Static program features:
                        ft1=9, ft2=2, ft3=1, ft4=0, ft5=4, ft6=1, ft7=0, ft8=2, ft9=1, ft10=0, ft11=0,
                        ft12=0, ft13=5, ft14=0, ft15=0, ft16=8, ft17=0, ft18=0, ft24=27, ft25=13.50, ft19=0,
                        ft39=0, ft20=1, ft21=0, ft33=0, ft21=24, ft35=2, ft22=11, ft23=0, ft34=6, ft36=3,
                        ft37=0, ft38=0, ft40=0, ft41=8, ft42=0, ft43=0, ft44=0, ft45=0, ft46=1, ft48=3, ft47=9,
                        ft49=0, ft51=0, ft50=55, ft52=21, ft53=0, ft54=2, ft55=0, ft26=0, ft27=0, ft28=0, ft29=0,
                        ft30=5, ft31=0, ft32=0

                        Submitting features to the cTuning web-service to predict good optimizations ...

                        cTuning Optimization Report (optimal optimization cases):

                        Distance from most close program (462.libquantum) = 0.639

                        Selected opt. case = 23011215880571251

                        Optimal cases on frontier (averaged speedups):
                        Ex.time:   Code size:   Comp. time:        cTuning opt. case:

                           1.18         0.80           1.00         15423655473087225
                           1.21         0.80           0.80            29686176401405
                           1.25         0.70           0.80          4614589283098526
                           1.29         0.67           0.80         23011215880571251
                           1.25         0.70           0.80         15721270875126789
                           1.26         0.69           0.80         15128754576807000
                           1.29         0.67           1.00         19230939973657069
                           1.07         1.02           1.00          3258730975700728
                           1.21         0.80           1.00         23810155474721838
                           1.24         0.71           1.00          4699569679776380
                           1.26         0.68           0.83         15492934568598271

                        Predicted flags:
                        -O2 -fdelete-null-pointer-checks -fno-tree-pre -funroll-all-loops

                        Invoking command:
                        gcc -O2 -fdelete-null-pointer-checks -fno-tree-pre -funroll-all-loops  bitarray.c
                                 bitcnt_1.c bitcnt_2.c bitcnt_3.c bitcnt_4.c bitcnts.c bitfiles.c bitstrng.c
                                 bstr_i.c loop-wrap.c 
                        ****************************************************************************

   Multi-objective optimizations:
    When there are many optimization cases that improve at the same time execution time, code size
    and compilation time, the selection of an optimal optimization case depends on depends on end-user
    usage scenarios: improving both execution time and code size is often required for embedded applications,
    improving both compilation and execution time is important for data centers and real-time systems,
    while improving only execution time is common for desktops and supercomputers. Hence, we provided several
    other environment variables to select optimization cases on the frontier of the optimization space:

   DIM=012 - returns optimization cases only on the frontier of all optimization cases.
             For example DIM=01 produces 2D frontier for execution time speedup and code size improvement,
             DIM=02 produces 2D frontier for execution time and compilation time speedups,
             DIM=12 produces 2D frontier for code size improvement and compilation time speedup,
             DIM=012 produces 3D frontier for all constraints.

   CUT=0,0,0 - cuts optimization cases frontier on each dimension, i.e. if CUT=0,0,1.2
               the frontier optimization cases should have compilation time speedup > 1.2,
               if CUT=1,1,1, all optimization cases on frontier should have execution time
               speedup > 1, code size improvement > 1 and compilation time > 1.

   When using this mode with DIM=012 and CUT=1,1,1, only one optimization case will be returned
   (when using CT_OPT_REPORT=1):

                           1.07         1.02           1.00          3258730975700728

   Note, that you have to select such cases manually, because cTuning CC will still use
   the top optimization case before building frontier since the last one really depend on
   user scenario.

   The following info is very important to find optimization cases from similar program
   for the following architecture (you can most similar architecture to yours at
   with optimization case at http://cTuning.org/cdatabase)

   CCC_PLATFORM_ID=2111574609159278179    (example for AMD Athlon 64 3700+)
   CCC_ENVIRONMENT_ID=2781195477254972989 (example for Linux Mandriva 2.6.17-10alchemy)
   CCC_COMPILER_ID=331350613878705696     (example for GCC 4.4.0)

   When compiling large applications, feature extraction can take a very long time
   (and this is part of the future work to speed it up), so a user may want to
   extract features only of a few functions. In this case, a user should add
   the file _ctuning_select_functions.txt to the compilation directory where
   only those functions should be listed that need to be processed
   (one function per line).

 * If you want to test low-level plugins, you can find self-explanatory
 tests in plugins-ici-2.05/tests directory.

Usage

 * cTuning web-services test:

   ctuning-cc --ct-test *.c

   You can also use test script ./__test_ctuning_web_service_for_ctuning_cc.sh

 * Using optimization cases directly from the Collective Optimization Database
   (referenced by unique ID) - it is useful for multi-objective optimization,
   to share optimization cases within the community or when publishing papers
   and results on program optimization:

   ctuning-cc --ct-opt=11475790782770590 *.c

   You can also use demo script ./__compile_using_ctuning_cc_with_fixed_optimization.sh
   to understand how to configure your own system.

 * Predict good optimizations (execution time, code size, compilation time)
   based on correlation of program features and optimizations using collective optimization
   knowledge (empirical iterative feedback-directed compilation performed by multiple
   users and shared in the Collective Optimization Database):

   ctuning-cc -Oml *.c

   You can also use demo script ./__compile_using_ctuning_cc_with_predicted_optimization.sh
   (or ./__compile_using_ctuning_cc_with_predicted_optimization_tr.sh for transparent
   invocation of this mode without flags through environment variables)
   to understand how to configure your own system.

 * Extract program structure:

   ctuning-cc -O3 --ct-extract-structure *.c

   You can also use demo script ./__extract_program_structure_using_ctuning_cc.sh
   (or __extract_program_structure_using_ctuning_cc_tr.sh. for transparent
   invocation of this mode without flags through environment variables)
   to understand how to configure your own system.

 * Extract program features:

   ctuning-cc -O3 --ct-extract-features *.c

   You can also use demo script ./__extract_program_features_using_ctuning_cc.sh
   (or __extract_program_features_using_ctuning_cc_tr.sh. for transparent
   invocation of this mode without flags through environment variables)
   to understand how to configure your own system.

 * Some of the above methods can be invoked transparently without any Makefile modifications,
   using CTUNING_* environment variables. Look at the scripts in demo directory.

Real usage cases

Nikhil Kapur is trying to use cTuning CC/MILEPOST GCC to optimize Mozilla code. More info is available at his blog.

Yuriy Kashikov has been using cTuning CC/MILEPOST GCC to optimize BerkeleyDB and reported speedups of 1.4 times in comparison with native GCC 4.4.0 (-O3) on several Intel Xeon machines without loosing much compilation time or code size.

Grigori Fursin and Abdul Memon have been using cTuning CC/MILEPOST GCC to optimize several audio/video libraries for ARC/AMD/Intel platforms for execution time, code size and compilation time constraints.

CTools:CTuningCC:Usage

From cTuning.org

Contents

cTuning CC V2.5 with MILEPOST GCC 4.4.x

General configuration

Compiler configuration

Usage

Real usage cases