From cTuning.org

Jump to: navigation, search
Navigation: cTuning.org > CTools > MilepostGCC

MILEPOST V2.1 GCC 4.4.x with ICI v2.05 (Interactive Compilation Interface) and feature extractor v2.0

Contents

License

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 3 as published by the Free Software Foundation.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

If you found this software useful, you are welcome to reference http://cTuning.org website and these publications FMTP2008,Fur2009,FT2009 in your derivative works.

Coordination/research/development

  • Grigori Fursin, UNIDAPT Group, UVSQ, France - original R&D for MILEPOST framework/ICI prototypes/CCC framework/Collective Optimization Database and cTuning.org - After the release of MILEPOST GCC V2.1 I am taking a sabbatical to help create a new Exascale Research Center in France and will have less time to coordinate these developments. I hope that my students and the community will extend this framework if they find it useful...

Development/testing/evaluation

  • V2.1 (2009-2010)
    • Grigori Fursin (UVSQ, France) - new design, MILEPOST GCC wrapper, CCC framework and optimization/prediction services
    • Yuriy Kashnikov (UVSQ, France) - testing/evaluation on Berkeley DB
    • Abdul Wahid Memon (UVSQ, France) - testing/evaluation on cBench
    • Joern Rennecke (UK) - testing/providing support for g++
    • Jeremy Singer (University of Manchester, UK) - adding new static program features
    • Nikhil Kapur - testing on Mozilla/libvorbis
  • beta versions, V1.0 (2006-2009):
    • Grigori Fursin (INRIA, France) - original design of the MILEPOST/ICI/cTuning framework
    • Mircea Namolaru (IBM Research Lab, Israel) - feature extractor pass
    • Cupertino Miranda (INRIA, France) - ICI extensions
    • Zbigniew Chamski (INRIA, France) - ICI extensions

Includes ICI and CCC frameworks - you can find more information about those framework in the associated READMEs.

Framework high-level overview

img-milepost-gcc-structure1.gif

History

MILEPOST GCC V2.1 (4.4.x) - 20100315 - Fully updated compiler that includes
                                       parts of CCC framework and can transparently communicate
                                       with cTuning web-services to suggest good optimization
                                       cases to improve/balance execution time, code size and compilation time
                                       using correlation between program features and optimizations.
                                       The MILEPOST GCC wrapper from CCC framework can be easily converted
                                       to work with any other compiler such as LLVM, Open64, Intel compilers, etc
                                       to predict good optimizations based on correlation between program features,
                                       optimization and run-time behavior.
                                       It also allows to directly and transparently use optimizations
                                       Collective Optimization Database (http://cTuning.org/cdatabase)
                                       referenced by unique optimiation ID that is useful for sharing
                                       of profitable optimization cases with the community.
                 
                                       Preliminary experiments show that it is now possible to transparently
                                       recompile standard programs/libraries/Linux kernel and the whole
                                       Linux with new MILEPOST GCC. We are looking for voolontiers to evaluate
                                       performance for Linux individual programs/libraries/kernel.
                                       MILEPOST GCC V2.1 now officially supports C,C++ and Fortran.
MILEPOST GCC V1.5 and V2.0    - Internal development versions of compiler that were not officially released.
MILEPOST GCC 4.4.0 - 20090629 - New official version of MILEPOST GCC with new ICI v2.0
                                and updated static feature extractor.
MILEPOST GCC 4.2.2 - 20080613 - Stable MILEPOST GCC version used in most MILEPOST Year 3 experiments.

Requirements

In order to install MILEPOST GCC, you will need:

  • C compiler that can compile GCC 4.x.
  • uuid or uudigen tool to generate unique identifiers.
  • PHP (needed to communicate with cTuning web-services).

Directory structure

gcc-4.4.x                       - MILEPOST GCC 4.4.x source directory (core + g++ + gfortran)
ccc-framework                   - MILEPOST GCC wrapper and necessary tools to communicate
                                  with cTuning web-services (part of CCC framework)
src-third-party                 - Third party support tools
 |     
 +-- gmp-4.3.0                  - GMP library
 +-- mpfr-2.4.1                 - MPFR library
 +-- ppl-0.10.2                 - PPL library (for GRAPHITE)
 +-- cloog                      - CLOOG library (for GRAPHITE)
 +-- XSB                        - Prolog to calculate program features
plugins-ici-2.0x                - Plugins for GCC 4.4.x with ICI (see README inside this directory)
demo                            - Demo files for MILEPOST GCC
 |
 +-- bitcount.c                   - bitcount example written in C from cBench.
 +-- bzip2-1.0.5                  - bzip2 written in C with a few scripts to show how to use MILEPOST GCC 
 |                                  with standard programs without any project changes.
 +-- libvorbis-1.2.3              - standard vorbis library to show how to use MILEPOST GCC 
 |                                  with standard libraries/kernel without any project changes.
 +-- matmul.c                     - simple matmul example written in C.
 +-- matmul.cpp                   - simple matmul example written in C++.
 +-- matmul.fortran               - simple matmul example written in Fortran.
install                         - Directory with installed binaries

Installation

First, check in all scripts that you have the same BUILD_EXT variable that points to the install directory! You may have different names if you install MILEPOST GCC for several architectures on the shared file system ...

Invoke:

./_build_all.sh to build the whole MILEPOST GCC with the all necessary tools.
This script invokes the following scripts: 
 ./_build_gcc.sh to build GCC with all the third-party tools.
 ./_build_ccc.sh to build CCC framework with MILEPOST GCC wrapper.
 ./_build_plugins.sh will build all non-machine learning plugins.
 ./_build_plugins_ml.sh will build all machine learning plugins.

General configuration

Check ./_set_environment_for_milepost_gcc.sh - normally all environment variables should be already properly set (check variable CCC_UUID - the uuid tool). You have to source this file before using MILEPOST GCC .

File ./_set_environment_for_milepost_gcc.sh sets up environment variables for low-level ICI tests and should also be already properly set. If you plan to use only high-level MILEPOST GCC, you can skip it.

Configuration for demos

  • You can find how to use MILEPOST GCC using bitcount benchmark in the demo directory: /demo/bitcount.
   You need to first configure environment variables in the 
   ___common_environment.sh which are user-dependent:
   CCC_CTS_USER and CCC_CTS_PASS should be set to your username and password when
   self-registering at http://ctuning.org/wiki/index.php/Special:UserLogin
   NOW YOU CAN TEST MILEPOST GCC wrapper and communication with the cTuning database
   by invoking __test_milepost_gcc.sh. If everything is installed correctly, you
   should get a response from the cTuning web-service: "Test passed successfully".
   In order to continue using MILEPOST GCC, you can check the following variables:
   Note that they already have default parameters so you do not have to change that 
   unless you want to tune MILEPOST GCC:
   CCC_CTS_URL=cTuning.org/wiki/index.php/Special:CDatabase?request= 
               - points to the cTuning web-service.
   CCC_CTS_DB=cod_opt_cases - points to the database with optimization cases
              from the community.
   ICI_PLUGIN_VERBOSE=1 - if set to 1, additional diagnostic information from ICI plugins.
   ICI_VERBOSE=1 - if set to 1, additional diagnostic information from ICI.


   ICI_PROG_FEAT_PASS=fre - sets pass after which to extract static program features.
   CCC_COMPILER_FEATURES_ID=129504539516446542 - sets compiler ID which was used
                            to extract static program features for all programs
                            at cTuning.org. Do not changed it unless you really
                            understand what you are doing ;) !..
   CCC_OPTS="-O3" - sets combination of flags to be used if cTuning prediction web-service
                    did not return optimization flags.
   CCC_OPT_ARCH_USE=1 - if set to 1, MILEPOST GCC will also use architecture-dependent flags
                        (such as -march=athlon64) from cTuning.org. If set to 0, architecture
                        dependent flags will be ignored.
   TIME_THRESHOLD=0.3 - when calculating speedups at cTuning.org, only optimization cases
                        with EXECUTION TIME more than this threshold are considered.
   NOTES= - when <>"", only those optimization cases are returned that have this NOTES.
   PG_USE=0 - if set to 1, only those optimization cases are returned that have function and other
              level profiling. If unset or set to 0, use only those cases that do not have profiling
              to avoid speedup skewing due to profiling.
   OUTPUT_CORRECT=1 - if set to 1, only those optimization cases are returned that have been
                      checked for correctness by comparing benchmark outputs for the original
                      and transformed program (note that it still does not guarantee that
                      the combination of optimizations is correct, but it helps to reduce
                      obvious wrong cases).
   RUN_TIME=RUN_TIME - sets which execution time to use when calculating speedups 
                       (RUN_TIME - overall program execution time, 
                        while RUN_TIME USER - only user execution time)
   SORT=012 - when predicting optimizations, the best combinations of optimizations
              are selected from the most similar program. Naturally, that program
              can have flags that improve not only execution time, but also code
              size and compilation time among other parameters. Hence a user can
              suggest an order of sorting speedups by:
               0 - execution time
               1 - code size, 
               2 - compilation time 
              before returning the top optimization. For example, when setting this variable to 
              012 - cTuning returns the optimization case with the highest execution time
              and only then sorts them by code size improvement and compilation time speedup;
              102 - cTuning returns the optimization case with the highest code size improvement, 
              then execution time speedup and then compilation time;
              201 - cTuning returns the optimization case with the highest compilation time speedup,
              then execution time speedup and only then code size.
   CT_OPT_REPORT=1 - when set to 1, cTuning returns all optimization cases sorted according to SORT
                     environment variable together with the associated optimization ID so that user 
                     could later force different optimization case, particularly when having multi-objective
                     optimization scenarios.
                     Here is an example of such output:
                        ****************************************************************************
                        MILEPOST GCC V2.1 (wrapper for GCC to communicate with cTuning web services)
                        
Invoking collective tuning and machine learning mode ...
Extracting program static features (-O1) ...
Aggregating features ...
Static program features: ft1=9, ft2=2, ft3=1, ft4=0, ft5=4, ft6=1, ft7=0, ft8=2, ft9=1, ft10=0, ft11=0, ft12=0, ft13=5, ft14=0, ft15=0, ft16=8, ft17=0, ft18=0, ft24=27, ft25=13.50, ft19=0, ft39=0, ft20=1, ft33=0, ft21=24, ft35=2, ft22=11, ft23=0, ft34=6, ft36=3, ft37=0, ft38=0, ft40=0, ft41=8, ft42=0, ft43=0, ft44=0, ft45=0, ft46=1, ft48=3, ft47=9, ft49=0, ft51=0, ft50=55, ft52=21, ft53=0, ft54=2, ft55=0, ft26=0, ft27=0, ft28=0, ft29=0, ft30=5, ft31=0, ft32=0
Submitting features to the cTuning web-service to predict good optimizations ...
cTuning Optimization Report (optimal optimization cases):
Distance from most close program (462.libquantum) = 0.639
Selected opt. case = 23011215880571251
Optimal cases on frontier (averaged speedups): Ex.time: Code size: Comp. time: cTuning opt. case:
1.18 0.80 1.00 15423655473087225 1.21 0.80 0.80 29686176401405 1.25 0.70 0.80 4614589283098526 1.29 0.67 0.80 23011215880571251 1.25 0.70 0.80 15721270875126789 1.26 0.69 0.80 15128754576807000 1.29 0.67 1.00 19230939973657069 1.07 1.02 1.00 3258730975700728 1.21 0.80 1.00 23810155474721838 1.24 0.71 1.00 4699569679776380 1.26 0.68 0.83 15492934568598271
Predicted flags: -O2 -fdelete-null-pointer-checks -fno-tree-pre -funroll-all-loops
Invoking command: gcc -O2 -fdelete-null-pointer-checks -fno-tree-pre -funroll-all-loops bitarray.c bitcnt_1.c bitcnt_2.c bitcnt_3.c bitcnt_4.c bitcnts.c bitfiles.c bitstrng.c bstr_i.c loop-wrap.c ****************************************************************************
   Multi-objective optimizations:
    When there are many optimization cases that improve at the same time execution time, code size
    and compilation time, the selection of an optimal optimization case depends on depends on end-user 
    usage scenarios: improving both execution time and code size is often required for embedded applications, 
    improving both compilation and execution time is important for data centers and real-time systems, 
    while improving only execution time is common for desktops and supercomputers. Hence, we provided several
    other environment variables to select optimization cases on the frontier of the optimization space:
   DIM=012 - returns optimization cases only on the frontier of all optimization cases.
             For example DIM=01 produces 2D frontier for execution time speedup and code size improvement,
             DIM=02 produces 2D frontier for execution time and compilation time speedups,
             DIM=12 produces 2D frontier for code size improvement and compilation time speedup,
             DIM=012 produces 3D frontier for all constraints.
   CUT=0,0,0 - cuts optimization cases frontier on each dimension, i.e. if CUT=0,0,1.2
               the frontier optimization cases should have compilation time speedup > 1.2,
               if CUT=1,1,1, all optimization cases on frontier should have execution time
               speedup > 1, code size improvement > 1 and compilation time > 1.
   When using this mode with DIM=012 and CUT=1,1,1, only one optimization case will be returned
   (when using CT_OPT_REPORT=1):

1.07 1.02 1.00 3258730975700728

Note, that you have to select such cases manually, because MILEPOST GCC will still use the top optimization case before building frontier since the last one really depend on user scenario.
   The following info is very important to find optimization cases from similar program
   for the following architecture (you can most similar architecture to yours at 
   with optimization case at http://cTuning.org/cdatabase)
   CCC_PLATFORM_ID=2111574609159278179    (example for AMD Athlon 64 3700+)
   CCC_ENVIRONMENT_ID=2781195477254972989 (example for Linux Mandriva 2.6.17-10alchemy)
   CCC_COMPILER_ID=331350613878705696     (example for GCC 4.4.0)
   When compiling large applications, feature extraction can take a very long time
   (and this is part of the future work to speed it up), so a user may want to
   extract features only of a few functions. In this case, a user should add
   the file _ctuning_select_functions.txt to the compilation directory where
   only those functions should be listed that need to be processed
   (one function per line).
  • If you want to test low-level plugins, you can find self-explanatory tests in plugins-ici-2.0/tests directory.

Feature extractor

  • Low level pass ml-feat (gcc-4.4.x/gcc/ml-feat.c) invoked through ICI after a given pass (currently fre). It saves low-level info about program into external file that is later processed by high-level feature extractor.
  • High level feature extractor (plugins-ici-2.05/src-ml/extract-program-static-features.legacy/ml-feat-proc/featlstn.P) is written in Prolog to calculate features based on low-level information obtained from ml-feat pass).
V2.0 - featlstn.P - 55 features with removed duplicate feature ft21.
       featlstn1.P - 56 features (move duplicate feature to ft56).
       featlstn2.P - 57-65 features added by Jeremy Singer.
       NOTE: Current cTuning.org prediction web-services, etc are hardwired to work with
             the original feature list featlstn.P. In the future we should change that to
             support any feature list. For example, we plan to add polyhedral program representation
             as a feature set and then use cTuning learning and prediction services directly.
V1.0 - featlstn.P - had two duplicate features ft21 (thanks to Jeremy Singer who reported that bug).

Usage

  • MILEPOST GCC / cTuning web-services test:
   milepost-gcc --ct-test *.c
   You can also use test script ./__test_ctuning_web_service_for_milepost_gcc.sh
  • Using optimization cases directly from the Collective Optimization Database (referenced by unique ID) - it is useful for multi-objective optimization, to share optimization cases within the community or when publishing papers and results on program optimization:
   milepost-gcc --ct-opt=11475790782770590 *.c
   You can also use demo script ./__compile_using_milepost_gcc_with_fixed_optimization.sh to understand how to configure your own system.
  • Predict good optimizations (execution time, code size, compilation time) based on correlation of program features and optimizations using collective optimization knowledge (empirical iterative feedback-directed compilation performed by multiple users and shared in the Collective Optimization Database):
   milepost-gcc -Oml *.c
   You can also use demo script ./__compile_using_milepost_gcc_with_prediction_optimization.sh
   to understand how to configure your own system.

Demos

Directory demo contains some benchmarks, real programs and libraries written in C, C++ and Fortran that demonstrate the usage of MILEPOST GCC.

  • All directories have file ___common_environment.sh - it is needed to configure MILEPOST GCC including cTuning web-services

and a username (you can self-register at cTuning.org to access web-services), default optimization level, how to balance predicted optimizations for improve execution time, code size and compilation time, and the most similar platform/environment/compiler to the user one in order to predict optimizations.

#!/bin/bash

# Copyright (C) 2007-2010 by Grigori Fursin
#
# http://fursin.net/research
# 
# UNIDAPT Group
# http://unidapt.org

export ICI_PLUGIN_VERBOSE=1
export ICI_VERBOSE=1
export ICI_PROG_FEAT_PASS=fre

#cTuning web-services:
export CCC_CTS_URL=ctuning.org/wiki/index.php/Special:CDatabase?request=
export CCC_CTS_DB=cod_opt_cases
export CCC_CTS_USER=gfursin

#misc parameters - don't change unless you understand them!

#compiler which was used to extract features for all programs to keep at cTuning.org
#do not change it unless you understand what you do ;) ...
export CCC_COMPILER_FEATURES_ID=129504539516446542

#if best combination of flags is not found use flags from CCC_OPTS instead
export CCC_OPTS="-O3"

#use architecture flags from cTuning
export CCC_OPT_ARCH_USE=0

#retrieve opt cases only when execution time > TIME_THRESHOLD
export TIME_THRESHOLD=0.3

#retrieve opt cases only with specific notes
#export NOTES=

#retrieve opt cases only when profile info is !=""
#export PG_USE=1

#retrieve opt cases only when execution output is correct (or not if =0)
export OUTPUT_CORRECT=1

#check user or total execution time
#export RUN_TIME=RUN_TIME_USER
export RUN_TIME=RUN_TIME

#Sort optimization case by speedup (0 - ex. time, 1 - code size, 2 - comp time)
export SORT=012

#produce additional optimization report including optimization space froniters
export CT_OPT_REPORT=1

#Produce optimization space frontier
#export DIM=01 (2D frontier)
#export DIM=02 (2D frontier)
#export DIM=12 (2D frontier)
#export DIM=012 (3D frontier)
#export DIM=012

#Cut cases when producing frontier (select cases when speedup 0,1 or 2 is more than some threshold)
#export CUT=0,0,1.2
#export CUT=1,0.80,1
#export CUT=0,0,1

#find similar cases from the following platform
export CCC_PLATFORM_ID=2111574609159278179
export CCC_ENVIRONMENT_ID=2781195477254972989
export CCC_COMPILER_ID=331350613878705696
  • All directories also have scripts:
    • __test_ctuning_web_service_for_milepost_gcc.sh to test cTuning web-service for MILEPOST GCC.
    • __compile_using_milepost_gcc.sh to compile program using milepost-gcc as if it's a standard gcc compiler.
    • __compile_using_milepost_gcc_with_fixed_optimization.sh - to compile program using milepost-gcc and transparently use optimizations referenced by the unique ID from the Collective Optimization Database.
    • __compile_using_milepost_gcc_with_predicted_optimization.sh to compile program using milpost-gcc, transparently extract features and attempt to predict good optimizations in an attempt to improve execution time, code size and compilation time depending on user optimization scenario (note that it doesn't always work well and it is an on-going R&D to improve prediction, the quality of features and association between features and optimizations).
    • __test_ctuning_web_service_for_milepost_gcc.sh to run a program
  • All directories have a file:
    • _ctuning_select_functions.txt where users can specify which functions to process during optimization prediction (since feature extraction is quite slow at the moments and requires more work/local caching/etc). Users can remove this file to process the whole program .
  • Additionally, directory libvorbis-1.2.3 has scripts:
    • __configure_with_milepost_gcc_and_fixed_optimization.sh - to configure library with MILEPOST GCC flags to substitute optimizations with the ones from Collective Optimization Database referenced by unique ID. The library is then recompiled using make command.
    • __configure_with_milepost_gcc_and_optimization_prediction.sh - to configure library with MILEPOST GCC flags to predict optimizations in an attempt to improve execution time, code size and compilation time depending on user optimization scenario. The library is then recompiled using make command.

cTuning compiler unique IDs (COMPILER_ID)

  • GCC 4.4.0 ICI 2.05 MILEPOST 2.1, COMPILER_ID=53769624635567103
  • GCC 4.4.1 ICI 2.05 MILEPOST 2.1, COMPILER_ID=1396734766204239
  • GCC 4.4.2 ICI 2.05 MILEPOST 2.1, COMPILER_ID=3348758923146401
  • GCC 4.4.3 ICI 2.05 MILEPOST 2.1, COMPILER_ID=455357654976124

Acknowledgments (MILEPOST GCC / Interactive Compilation Interface / cTuning)

  • Fabio Arnone (STMicroelectronics, France)
  • Phil Barnard (ARC, UK)
  • Francois Bodin (CAPS Entreprise, France)
  • Zbigniew Chamski (InfraSoft IT Solutions, Poland)
  • Bjorn Franke (University of Edinburgh, UK)
  • Grigori Fursin (INRIA/UVSQ, France)
  • Taras Glek (Mozilla, USA)
  • Nikhil Kapur
  • Yuriy Kashnikov (UVSQ, France)
  • Abdul Wahid Memon (UVSQ, France)
  • Cupertino Miranda (INRIA, France)
  • Mircea Namolaru (IBM, Israel)
  • Diego Novillo (Google, USA)
  • Sebastian Pop (AMD, USA)
  • Joern Rennecke (UK)
  • Jeremy Singer (University of Manchester, UK)
  • Basile Starynkevitch (CEA, France)
  • Ayal Zaks (IBM, Israel)
  • Other colleagues from IBM, NXP, STMicroelectronics, ARC, CAPS Enterprise, Mozilla, UVSQ ...



























Locations of visitors to this page

Tweet