|
|
(7 intermediate revisions not shown.) |
Line 3: |
Line 3: |
| {{CMenu:CTools|MilepostGCC}} | | {{CMenu:CTools|MilepostGCC}} |
| | | |
- | = MILEPOST 1.5 GCC 4.4.0 =
| + | * [[CTools:MilepostGCC:Documentation:MILEPOST_V2.1|MILEPOST GCC V2.1 (GCC 4.4.0, 4.4.1, 4.4.2, 4.4.3)]] |
- | | + | * MILEPOST GCC V1.5 & V2.0 - unreleased, internal versions |
- | === License ===
| + | * [[CTools:MilepostGCC:Documentation:MILEPOST_V1.0|MILEPOST GCC V1.0 (GCC 4.4.0)]] |
- | | + | |
- | This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation.
| + | |
- | | + | |
- | This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the [http://www.gnu.org/copyleft/gpl.html GNU General Public License for more details].
| + | |
- | | + | |
- | If you found this software useful, you are welcome to reference http://cTuning.org website and these publications {{Ref|FMTP2008}},{{Ref|Fur2009}},{{Ref|FT2009}} in your derivative works.
| + | |
- | | + | |
- | === Authors ===
| + | |
- | | + | |
- | * [http://fursin.net/research Grigori Fursin] (INRIA, France) - original design of the MILEPOST/ICI/cTuning framework
| + | |
- | * Mircea Namolaru (IBM Research Lab, Israel) - feature extractor pass
| + | |
- | | + | |
- | === Framework high-level overview ===
| + | |
- | | + | |
- | <div align="left">http://ctuning.org/wiki/images/img-milepost-gcc-structure1.gif</div>
| + | |
- | | + | |
- | === History ===
| + | |
- | MILEPOST GCC V1.5 (4.4.0) - TBA - Fully updated compiler that includes
| + | |
- | parts of CCC framework and can communicate
| + | |
- | with cTuning web-services to predict good optimization
| + | |
- | cases to improve execution time, code size and compilation time
| + | |
- | using correlation between program features and optimizations.
| + | |
- | | + | |
- | MILEPOST GCC 4.4.0 - 20090629 - New official version of MILEPOST GCC with new ICI v2.0
| + | |
- | and updated static feature extractor.
| + | |
- | | + | |
- | MILEPOST GCC 4.2.2 - 20080613 - Stable MILEPOST GCC version used in most MILEPOST Year 3 experiments.
| + | |
- | | + | |
- | === Requirements ===
| + | |
- | | + | |
- | In order to install MILEPOST GCC, you will need:
| + | |
- | | + | |
- | * C compiler that can compile [http://gcc.gnu.org GCC 4.x].
| + | |
- | * uuid or uudigen tool to generate unique identifiers.
| + | |
- | * [http://www.php.net PHP] ''(needed to communicate with cTuning web-services)''.
| + | |
- | | + | |
- | === Directory structure ===
| + | |
- | | + | |
- | gcc-4.4.0 - MILEPOST GCC 4.4.0 source directory (core + gfortran)
| + | |
- | | + | |
- | ccc-framework - MILEPOST V1.5 wrapper and necessary tools to communicate
| + | |
- | with cTuning web-services (standard part of CCC framework)
| + | |
- | | + | |
- | src-third-party - Third party support tools
| + | |
- | |
| + | |
- | +-- gmp-4.3.0 - GMP library
| + | |
- | +-- mpfr-2.4.1 - MPFR library
| + | |
- | +-- ppl-0.10.2 - PPL library (for GRAPHITE)
| + | |
- | +-- cloog - CLOOG library (for GRAPHITE)
| + | |
- | +-- XSB - Prolog for machine learning tools (MILEPOST, UNIDAPT, cTuning)
| + | |
- | | + | |
- | plugins-ici-2.0 - Plugins for GCC 4.4.0 ICI 2.0 (see README inside this directory)
| + | |
- | | + | |
- | demo - Demo files for MILEPOST GCC V1.5
| + | |
- | | + | |
- | install - Directory with installed binaries
| + | |
- | | + | |
- | === Installation ===
| + | |
- | | + | |
- | First, check in all scripts that you have the same BUILD_EXT variable
| + | |
- | that points to the install directory! You may have different names
| + | |
- | if you install MILEPOST GCC for several architectures on the shared
| + | |
- | file system ...
| + | |
- | | + | |
- | Invoke:
| + | |
- | ./build_gcc.sh to build GCC with all the third-party tools.
| + | |
- | ./build_ccc.sh to build CCC framework with MILEPOST GCC wrapper.
| + | |
- | | + | |
- | ./build_plugins.sh will build all non-machine learning plugins.
| + | |
- | ./build_plugins_ml.sh will build all machine learning plugins.
| + | |
- | | + | |
- | === General configuration ===
| + | |
- | | + | |
- | Check ./_set_environment_for_milepost_gcc.sh - normally all environment
| + | |
- | variables should be already properly set (check variable CCC_UUID -
| + | |
- | the uuid tool). You have to source this file before using MILEPOST GCC .
| + | |
- | | + | |
- | File ./_set_environment_for_milepost_gcc.sh sets up environment
| + | |
- | variables for low-level ICI tests and should also be already properly
| + | |
- | set. If you plan to use only high-level MILEPOST GCC, you can skip it.
| + | |
- | | + | |
- | === Configuration for demos ===
| + | |
- | | + | |
- | * You can find how to use MILEPOST GCC using bitcount benchmark in the demo directory: /demo/bitcount. | + | |
- | | + | |
- | You need to first configure environment variables in the
| + | |
- | ___common_environment.sh which are user-dependent:
| + | |
- | | + | |
- | CCC_CTS_USER and CCC_CTS_PASS should be set to your username and password when
| + | |
- | self-registering at http://ctuning.org/wiki/index.php/Special:UserLogin
| + | |
- | | + | |
- | NOW YOU CAN TEST MILEPOST GCC wrapper and communication with the cTuning database
| + | |
- | by invoking __test_milepost_gcc.sh. If everything is installed correctly, you
| + | |
- | should get a response from the cTuning web-service: "Test passed successfully".
| + | |
- | | + | |
- | In order to continue using MILEPOST GCC, you can check the following variables:
| + | |
- | Note that they already have default parameters so you do not have to change that
| + | |
- | unless you want to tune MILEPOST GCC:
| + | |
- | | + | |
- | CCC_CTS_URL=cTuning.org/wiki/index.php/Special:CDatabase?request=
| + | |
- | - points to the cTuning web-service.
| + | |
- | | + | |
- | CCC_CTS_DB=cod_opt_cases - points to the database with optimization cases
| + | |
- | from the community.
| + | |
- | | + | |
- | ICI_PLUGIN_VERBOSE=1 - if set to 1, additional diagnostic information from ICI plugins.
| + | |
- | ICI_VERBOSE=1 - if set to 1, additional diagnostic information from ICI.
| + | |
- | | + | |
- | | + | |
- | ICI_PROG_FEAT_PASS=fre - sets pass after which to extract static program features.
| + | |
- | | + | |
- | CCC_COMPILER_FEATURES_ID=129504539516446542 - sets compiler ID which was used
| + | |
- | to extract static program features for all programs
| + | |
- | at cTuning.org. Do not changed it unless you really
| + | |
- | understand what you are doing ;) !..
| + | |
- | | + | |
- | CCC_OPTS="-O3" - sets combination of flags to be used if cTuning prediction web-service
| + | |
- | did not return optimization flags.
| + | |
- | | + | |
- | CCC_OPT_ARCH_USE=1 - if set to 1, MILEPOST GCC will also use architecture-dependent flags
| + | |
- | (such as -march=athlon64) from cTuning.org. If set to 0, architecture
| + | |
- | dependent flags will be ignored.
| + | |
- | | + | |
- | TIME_THRESHOLD=0.3 - when calculating speedups at cTuning.org, only optimization cases
| + | |
- | with EXECUTION TIME more than this threshold are considered.
| + | |
- | | + | |
- | NOTES= - when <>"", only those optimization cases are returned that have this NOTES.
| + | |
- | | + | |
- | PG_USE=0 - if set to 1, only those optimization cases are returned that have function and other
| + | |
- | level profiling. If unset or set to 0, use only those cases that do not have profiling
| + | |
- | to avoid speedup skewing due to profiling.
| + | |
- | | + | |
- | OUTPUT_CORRECT=1 - if set to 1, only those optimization cases are returned that have been
| + | |
- | checked for correctness by comparing benchmark outputs for the original
| + | |
- | and transformed program (note that it still does not guarantee that
| + | |
- | the combination of optimizations is correct, but it helps to reduce
| + | |
- | obvious wrong cases).
| + | |
- | | + | |
- | RUN_TIME=RUN_TIME - sets which execution time to use when calculating speedups
| + | |
- | (RUN_TIME - overall program execution time,
| + | |
- | while RUN_TIME USER - only user execution time)
| + | |
- | | + | |
- | SORT=012 - when predicting optimizations, the best combinations of optimizations
| + | |
- | are selected from the most similar program. Naturally, that program
| + | |
- | can have flags that improve not only execution time, but also code
| + | |
- | size and compilation time among other parameters. Hence a user can
| + | |
- | suggest an order of sorting speedups by:
| + | |
- | 0 - execution time
| + | |
- | 1 - code size,
| + | |
- | 2 - compilation time
| + | |
- | before returning the top optimization. For example, when setting this variable to
| + | |
- | 012 - cTuning returns the optimization case with the highest execution time
| + | |
- | and only then sorts them by code size improvement and compilation time speedup;
| + | |
- | 102 - cTuning returns the optimization case with the highest code size improvement,
| + | |
- | then execution time speedup and then compilation time;
| + | |
- | 201 - cTuning returns the optimization case with the highest compilation time speedup,
| + | |
- | then execution time speedup and only then code size.
| + | |
- | | + | |
- | CT_OPT_REPORT=1 - when set to 1, cTuning returns all optimization cases sorted according to SORT
| + | |
- | environment variable together with the associated optimization ID so that user
| + | |
- | could later force different optimization case, particularly when having multi-objective
| + | |
- | optimization scenarios.
| + | |
- | | + | |
- | Here is an example of such output:
| + | |
- | | + | |
- | ****************************************************************************
| + | |
- | MILEPOST GCC V1.5 (wrapper for GCC to communicate with cTuning web services)
| + | |
- | <BR>
| + | |
- | Invoking collective tuning and machine learning mode ...
| + | |
- | <BR>
| + | |
- | Extracting program static features (-O1) ...
| + | |
- | <BR>
| + | |
- | Aggregating features ...
| + | |
- | <BR>
| + | |
- | Static program features:
| + | |
- | ft1=9, ft2=2, ft3=1, ft4=0, ft5=4, ft6=1, ft7=0, ft8=2, ft9=1, ft10=0, ft11=0,
| + | |
- | ft12=0, ft13=5, ft14=0, ft15=0, ft16=8, ft17=0, ft18=0, ft24=27, ft25=13.50,
| + | |
- | ft19=0, ft39=0, ft20=1, ft21=0, ft33=0, ft21=24, ft35=2, ft22=11, ft23=0, ft34=6,
| + | |
- | ft36=3, ft37=0, ft38=0, ft40=0, ft41=8, ft42=0, ft43=0, ft44=0, ft45=0, ft46=1,
| + | |
- | ft48=3, ft47=9, ft49=0, ft51=0, ft50=55, ft52=21, ft53=0, ft54=2, ft55=0, ft26=0,
| + | |
- | ft27=0, ft28=0, ft29=0, ft30=5, ft31=0, ft32=0
| + | |
- | <BR>
| + | |
- | Submitting features to the cTuning web-service to predict good optimizations ...
| + | |
- | <BR>
| + | |
- | cTuning Optimization Report (optimal optimization cases):
| + | |
- | <BR>
| + | |
- | Distance from most close program (462.libquantum) = 0.639
| + | |
- | <BR>
| + | |
- | Selected opt. case = 23011215880571251
| + | |
- | <BR>
| + | |
- | Optimal cases on frontier (averaged speedups):
| + | |
- | Ex.time: Code size: Comp. time: cTuning opt. case:
| + | |
- | <BR>
| + | |
- | 1.18 0.80 1.00 15423655473087225
| + | |
- | 1.21 0.80 0.80 29686176401405
| + | |
- | 1.25 0.70 0.80 4614589283098526
| + | |
- | 1.29 0.67 0.80 23011215880571251
| + | |
- | 1.25 0.70 0.80 15721270875126789
| + | |
- | 1.26 0.69 0.80 15128754576807000
| + | |
- | 1.29 0.67 1.00 19230939973657069
| + | |
- | 1.07 1.02 1.00 3258730975700728
| + | |
- | 1.21 0.80 1.00 23810155474721838
| + | |
- | 1.24 0.71 1.00 4699569679776380
| + | |
- | 1.26 0.68 0.83 15492934568598271
| + | |
- | <BR>
| + | |
- | Predicted flags:
| + | |
- | -O2 -fdelete-null-pointer-checks -fno-tree-pre -funroll-all-loops
| + | |
- | <BR>
| + | |
- | Invoking command:
| + | |
- | gcc -O2 -fdelete-null-pointer-checks -fno-tree-pre -funroll-all-loops
| + | |
- | bitarray.c bitcnt_1.c bitcnt_2.c bitcnt_3.c bitcnt_4.c
| + | |
- | bitcnts.c bitfiles.c bitstrng.c bstr_i.c loop-wrap.c
| + | |
- | ****************************************************************************
| + | |
- | | + | |
- | Multi-objective optimizations:
| + | |
- | When there are many optimization cases that improve at the same time execution time, code size
| + | |
- | and compilation time, the selection of an optimal optimization case depends on depends on end-user
| + | |
- | usage scenarios: improving both execution time and code size is often required for embedded applications,
| + | |
- | improving both compilation and execution time is important for data centers and real-time systems,
| + | |
- | while improving only execution time is common for desktops and supercomputers. Hence, we provided several
| + | |
- | other environment variables to select optimization cases on the frontier of the optimization space:
| + | |
- | | + | |
- | DIM=012 - returns optimization cases only on the frontier of all optimization cases.
| + | |
- | For example DIM=01 produces 2D frontier for execution time speedup and code size improvement,
| + | |
- | DIM=02 produces 2D frontier for execution time and compilation time speedups,
| + | |
- | DIM=12 produces 2D frontier for code size improvement and compilation time speedup,
| + | |
- | DIM=012 produces 3D frontier for all constraints.
| + | |
- | | + | |
- | CUT=0,0,0 - cuts optimization cases frontier on each dimension, i.e. if CUT=0,0,1.2
| + | |
- | the frontier optimization cases should have compilation time speedup > 1.2,
| + | |
- | if CUT=1,1,1, all optimization cases on frontier should have execution time
| + | |
- | speedup > 1, code size improvement > 1 and compilation time > 1.
| + | |
- | | + | |
- | When using this mode with DIM=012 and CUT=1,1,1, only one optimization case will be returned
| + | |
- | (when using CT_OPT_REPORT=1):<BR><BR> 1.07 1.02 1.00 3258730975700728<BR><BR>
| + | |
- | Note, that you have to select such cases manually, because MILEPOST GCC will still use
| + | |
- | the top optimization case before building frontier since the last one really depend on
| + | |
- | user scenario.
| + | |
- | | + | |
- | The following info is very important to find optimization cases from similar program
| + | |
- | for the following architecture (you can most similar architecture to yours at
| + | |
- | with optimization case at http://cTuning.org/cdatabase)
| + | |
- | | + | |
- | CCC_PLATFORM_ID=2111574609159278179 (example for AMD Athlon 64 3700+)
| + | |
- | CCC_ENVIRONMENT_ID=2781195477254972989 (example for Linux Mandriva 2.6.17-10alchemy)
| + | |
- | CCC_COMPILER_ID=331350613878705696 (example for GCC 4.4.0)
| + | |
- | | + | |
- | When compiling large applications, feature extraction can take a very long time
| + | |
- | (and this is part of the future work to speed it up), so a user may want to
| + | |
- | extract features only of a few functions. In this case, a user should add
| + | |
- | the file _ctuning_select_functions.txt to the compilation directory where
| + | |
- | only those functions should be listed that need to be processed
| + | |
- | (one function per line).
| + | |
- | | + | |
- | * If you want to test low-level plugins, you can find self-explanatory tests in plugins-ici-2.0/tests directory.
| + | |
- | | + | |
- | === Usage ===
| + | |
- | | + | |
- | * MILEPOST GCC / cTuning web-services test:
| + | |
- | | + | |
- | milepost-gcc --ct-test *.c
| + | |
- | | + | |
- | You can also use test script ./__test_milepost_gcc
| + | |
- | | + | |
- | * Using optimization cases directly from the Collective Optimization Database (referenced by unique ID) - it is useful for multi-objective optimization, to share optimization cases within the community or when publishing papers and results on program optimization:
| + | |
- | | + | |
- | milepost-gcc --ct-opt=11475790782770590 *.c
| + | |
- | | + | |
- | You can also use demo script ./__compile_using_milepost_gcc_with_fixed_optimization to understand how to configure your own system.
| + | |
- | | + | |
- | * Predict good optimizations (execution time, code size, compilation time) based on correlation of program features and optimizations using collective optimization knowledge (empirical iterative feedback-directed compilation performed by multiple users and shared in the Collective Optimization Database):
| + | |
- | | + | |
- | milepost-gcc -Oml *.c
| + | |
- | | + | |
- | You can also use demo script ./__compile_using_milepost_gcc_with_prediction_optimization
| + | |
- | to understand how to configure your own system.
| + | |
- | | + | |
- | <BR>
| + | |
- | <BR>
| + | |
- | <BR>
| + | |
- | <BR>
| + | |
- | <BR>
| + | |
- | <BR>
| + | |
- | <BR>
| + | |
- | <BR>
| + | |
- | <BR>
| + | |
- | <BR>
| + | |
- | <BR>
| + | |
- | <BR>
| + | |
- | <BR>
| + | |
- | <BR>
| + | |
- | <BR>
| + | |
- | <BR>
| + | |
- | <BR>
| + | |
- | <BR>
| + | |
- | <BR>
| + | |
- | <BR>
| + | |
- | <BR>
| + | |
- | <BR>
| + | |
- | <BR>
| + | |
- | <BR>
| + | |
- | <BR>
| + | |