CTools:CCC:Documentation:CCC V2.5

From cTuning.org

Navigation: cTuning.org > CTools > CCC

CCC V2.5 documentation

This is the first draft of the documentation. It should be updated. Any help is appreciated.

MySQL client and development headers /include/mysql/* (not strictly required but functionality will be reduced).
PHP (not strictly required but functionality will be reduced - used for iterative compilation and analysis plugins).
PAPI library (not strictly required - used in auxiliary tools).
OProfile (not strictly required - needed for transparent profiling).
R language (not strictly required - needed for some statistical analysis plugins).

Directory structure

/
- README.txt - brief README and GPL license info
- INSTALL.sh - main installation script that calls ccc-configure* scripts one by one
- ccc-configure* - distributed configuration scripts for different tasks (See Installation section)
- ccc-build.cfg - CCC Framework build number
- ccc-build-db.cfg - lower and upper version number of the Collective Optimization Database which this version is intended to work with (to avoid incompatibility when both CCC Framework and COD are evolving).

/src-plat-dep - platform dependent tools and plugins
- /include/ccc - CCC Framework header files
- /lib - auxiliary functions
- /plugins - plugins sources
  - /compilation - iterative feedback-directed compilation plugins
  - /ml-prediction - machine learning prediction plugins
- /tools - low-level tools
  - /ccc-time - substitution for standard time program to collect different profile info about program including hardware counters support
  - /ccc-comp - basic tool dealing with program compilation
  - /ccc-run - basic tool dealing with program execution and profiling collection
  - /ccc-db-send-stats-comp - send compilation statistics to COD
  - /ccc-db-send-stats-comp-passes - send info about compiler optimizations (at function-level) to COD
  - /ccc-db-send-stats-prog-feat - send info about program static features (machine learning) to COD
  - /ccc-db-send-stats-run - send execution statistics to COD
  - /milepost-gcc - MILEPOST GCC wrapper to support -ml-e, -ml-c, -ml machine learning optimization flags that invoke ML ICI plugins to extract static program features and query COD web-services to predict good optimizations to improve execution time, code size or both respectively.
- /tools-aux - auxiliary tools if system supports that
  - /hardware-counters-papi - collecting hardware counters statistics (dynamic program features for machine learning or statistical analysis)

/src-plat-indep - platform independent tools and plugins
- /include/ccc_script_functions.php - library that supports platform independent plugins and deals mostly with COD
- /plugins - plugins and scripts

/cfg - CCC configuration directories for different architectures/environments/compilers
- /default - default configurations including optimizations for several compilers (GCC,Open64,PathScale)

/install - installation directory for (platform-dependent) tools and scripts

/apps - applications converted to work with CCC and scripts to automate iterative compilation

Installation

Installation is performed using INSTALL.sh script for each platform (architecture/environment) on which iterative compilation experiments will be performed. This script calls individual ccc-configure-* scripts to configure the following:

ccc-configure-platform-name.sh - Select local name of the platform (the directory with this name will be created in cfg and install directories).

ccc-configure-uuid.sh - Select uuid generator (uuid, uuidgen, etc) to generate unique ID for all data.

ccc-configure-database.sh - Configure Collective Optimization Database access if it is used for experiments. If it is not used, all compilation and execution statistics is recorded in local files and can later be send to database in case you would like to share your optimization cases.

ccc-configure-database-test.sh - Test COD access using parameters from the previous step.

ccc-configure-platform.sh - Provide architecture info for experiments. You can view if similar already exists (view) and use its unique ID or add new architecture info directly (add). Each architecture has its unique ID to be able to share optimization cases.

ccc-configure-environment.sh - Provide environment info for experiments. You can view if similar already exists (view) and use its unique ID or add new environment info directly (add). Each environment has its unique ID to be able to share optimization cases.

ccc-configure-compiler.sh - Provide compiler info for experiments. You can view if similar already exists (view) and use its unique ID or add new compiler info directly (add). Each compiler has its unique ID to be able to share optimization cases.

During compiler installation, user can configure compiler paths for binaries, libraries, plugins, etc. This information is recorded in the cfg/<platform name>/ccc-env.c.<short compiler name> script and is invoked during compilation (ccc-comp). This allows multiple versions of the same compiler co-exist on the system. Information about all compilers with their unique IDs and short names is recorded in the cfg/<platform_name>/ccc-compilers.cfg file.

ccc-configure-runtime-environment.sh - Provide runtime environment info for experiments (such as VM, architecture simulator, etc ). You can view if similar already exits (view) and use its unique ID or add new compiler info directly (add). Each runtime environment has its unique ID to be able to share optimization cases.

During runtime environment installation, user can configure compiler paths for binaries, libraries, plugins, etc. This information is recorded in the cfg/<platform name>/ccc-env.re.<short runtime environment name> script and is invoked during compilation (ccc-run). This allows multiple versions of the same runtime environments co-exist on the system. Information about all runtime environments with their unique IDs and short names is recorded in the cfg/<platform_name>/ccc-re.cfg file.

ccc-configure-compile-all-tools.sh - Compile all low-level tools

ccc-configure-compile-all-plugins.sh - Compile and configure all plugins

ccc-configure-compile-all-tools-aux.sh - Compile all auxiliary tools if platform supports them

ccc-configure-update.sh - Check for update

ccc-configure-set-environment.sh - Set environment based on the information entered in all previous steps. The environment files ccc-env.sh and ccc-env.csh for your platform will be created in the directory cfg/<platform_name>/. You can edit them to correct paths to specific compilers such as GCC with ICI, MILEPOST GCC, LLVM, Open64, PathScale, Testarossa, Intel, etc.

This script has to be invoked for a given platform before performing any experiments. The ccc-configure* scripts can later be invoked individually if needed.

Compiler optimization file format

Compiler optimization files (ccc-glob-flags.<local compiler name>.cfg) are located in cfg directory and have the following format:

First parameter is the optimization type:

1 - optimization flag that takes parameter
1, <start_parameter>, <end_parameter>, flag
2 - optimization flag is on or off
2, flag
3 - select on flag from a list of flags
3, number of flags in a list, flags separated by comma

Example from GCC:

1, 0, 3, -O
1, 1, 64, -fsched-stalled-insns-dep=
2, -m32
2, -m3dnow
3, 2, -fbranch-count-reg, -fno-branch-count-reg
3, 2, -fbranch-target-load-optimize, -fno-branch-target-load-optimize
3, 2, -fbtr-bb-exclusive, -fno-btr-bb-exclusive

We are extending framework to handle optimization passes and fine-grain optimization similar to outdated FCO framework.

Collective Optimization Database

COD has been recently separated into 2 parts: common and experimental. The common database keeps information about architectures, environments, compilers, programs, compiler flags, architecture configurations, etc - the information that can be common for many users. The experimental databases keep local information about optimization cases after iterative feedback-directed compilation. They can have user-sensitive information and hence should not always be shared. User can later select interesting optimization cases to share.

Some more info about COD web-services/API is here.
COD structure

Applications

CCC Framework is intended to automate a large number of iterative compilation experiments. Application has to be slightly modified to work with CCC. For example, Collective Benchmark is already prepared to be used directly with the latest CCC Framework. Here you can find more info about benchmark format. Eventually, we plan to add full support to enable application optimizations transparently without any Makefile modifications.

Low-level tools

There are 3 main low-level tools that abstract platform from iterative compilation experiments:

ccc-time

Command line: ccc-time -fe <name_of_executable> -fp <command_line_for_executable> -ft <file to save time>

This program substitutes native time to profile program and potentially support different architectural features such as hardware counters, etc. Normally, if users work with ported to CCC Framework applications such as CBench, they will manipulate only with ccc-comp and ccc-run or high-level plugins and will not use this tool directly.

ccc-comp

Command line: ccc-comp <compiler extension> "Compiler optimization flags" "Additional flags that should not be recorded (not optimization related)"

This tool source cfg/<platform name>/ccc-env.c.<compiler extension> script with compiler paths, invokes Makefile associated with the compiler extension and compiles program with the specified optimization flags. Then it invokes scripts to send statistics to COD. ccc-comp is controlled by multiple environment variables that are described in the following section.

ccc-run

Command line: ccc-run <Dataset> <Base_line_run_param>

If the CCC_RE environment variable is set, this tool first source cfg/<platform name>/ccc-env.re.$CCC_RE script with runtime environment paths. Then it executes application with a given dataset number. If it's the first baseline run (to be able to compare execution time, code size and compilation time improvements and compare output for correctness with the consecutive iterative feedback-directed runs), the base_line_run_param should be set to 1, otherwise to 0. ccc-run is controlled by multiple environment variables that are described in the following section.

Iterative feedback-directed compilation example

Directory with applications apps has one test directory CCC-TEST-APP. You can download the whole CBench and datasets using ccc-admin--get-cbench-from-svn.sh and ccc-admin--get-cbench-datasets-from-svn.sh.

The list of all benchmarks set up for 1 dataset is in file ccc--bench-list.dataset1.txt. The list of all benchmarks with all datasets is in file ccc--bench-list.dataset_all.txt. One of those files should be copied into ccc--bench-list.txt that is the working file with the list of benchmarks to be processed automatically by scripts. If you want to use test benchmark, just leave it in the ccc--bench-list.txt.

Before performing any experiments you should create temporal source directories using ccc-admin--create-work-dirs.sh that copies all src directories to src-tmp directories. Those directories can later be deleted using ccc-admin--delete-work-dirs.sh script.

You can then invoke the test compile/run script ccc-test--compile-run.sh in one of the tmp directories. This script will compile application with -O3 flag and make a base line run, and then compile program with -O2 flags and make a experimental run. This script shows environmental variables that influence ccc-comp and ccc-run:

#!/bin/bash

# Copyright (C) 2004-2009 by Grigori G.Fursin
#
# http://fursin.net/research
# 
# UNIDAPT Group
# http://unidapt.org

##############################################################
#Record compiler passes (through ICI)
#export CCC_ICI_PASSES_RECORD=1

#Load compiler passes from files or environment (through ICI)
#export CCC_ICI_PASSES_USE=1
#export CCC_ICI_PASSES_OPT_BASE=-O3
#export ICI_PASSES_ALL=...

#Produce verbose output from the ICI plugins
#export ICI_PLUGIN_VERBOSE=1
#export ICI_VERBOSE=1

#Extract program static features (through ICI)
#export CCC_ICI_FEATURES_STATIC_EXTRACT=1
#export ICI_PROG_FEAT_PASS=fre

#Record run-time background info when working in realistic environments 
#to know how other applications interfere with optimizations
#export CCC_RUN_TIME_BACKGROUND="matmul 16Mb array, etc"

#Profile application using hardware counters and PAPI library
#export CCC_HC_PAPI_USE=$CCC_HC_PAPI_LIST
#export CCC_HC_PAPI_USE=PAPI_L1_DCMx,PAPI_L2_DCMx,PAPI_TLB_DMx,PAPI_L1_LDMx,PAPI_L1_STMx,PAPI_L2_LDMx,PAPI_L2_STMx,PAPI_BR_TKNx,PAPI_BR_MSPx,PAPI_TOT_INSx,PAPI_FP_INSx,PAPI_BR_INSx,PAPI_VEC_INSx,PAPI_TOT_CYCx,PAPI_L1_DCHx,PAPI_FP_OPSx

#Profile application using gprof
#export CCC_GPROF=1

#Profile application using oprof
#export CCC_OPROF=1
#export CCC_OPROF_PARAM="--event=CPU_CLK_UNHALTED:6000"

#Perform compilation only (no run).
#export CCC_NO_RUN=1

#Repeat execution a number of times with the same dataset to check execution time variation on the system.
export CCC_RUNS=1

#Use time-run to kill application if it runs for too long
#The reason is that during interative compilation some produced binaries
#are corrupt and have infinite loops.
export CCC_TIMED_RUN="timed-run 3000"

#Architecture specific optimization flags
#export CCC_OPT_PLATFORM="-mA7 -ffixed-r12 -ffixed-r16 -ffixed-r17 -ffixed-r18 -ffixed-r19 -ffixed-r20 -ffixed-r21 -ffixed-r22 -ffixed-r23 -ffixed-r24 -ffixed-r25"
#export CCC_OPT_PLATFORM="-mA7"
#export CCC_OPT_PLATFORM="-mtune=itanium2"
#export CCC_OPT_PLATFORM="-march=athlon64"

#Some compilation info that should be standardized and automated 
#(if you use ARCH_CFG and/or ARCH_SIZE, you should set CCC_OPT_PLATFORM to "" or other platform related flag
export CCC_OPT_PLATFORM="-msse2"
#export CCC_ARCH_CFG="l1_cache=203; l2_cache=35;"
#export CCC_ARCH_SIZE=132

#Some compilation info that should be standardized and automated
#export CCC_OPT_FINE="loop_tiling=10;"
#export CCC_OPT_PAR_STATIC="all_loops=parallelizable;"

#Some run-time info that eventually should be standardized and automated
#export CCC_RUN_POWER=10
#export CCC_RUN_ENERGY=20
#export CCC_PAR_DYNAMIC="no deps"

#HERE YOU CAN SUBSTITUTE PLATFORM/ENVIRONMENT IDS IF YOU WANT TO DO CROSS-COMPILATION/ANALYSIS
#export CCC_PLATFORM_ID=
#export CCC_ENVIRONMENT_ID=

#Select which processor to run application on, in case of multiprocessor system
#export CCC_PROCESSOR_NUM=

#Select runtime environment (VM or simulator)
#export CCC_RUN_RE=llvm25

#For SPEC2006 and ICI ...
export ICI_WORK_DIR=$PWD/

#Baseline run
#export CCC_NOTES="baseline compilation"
ccc-comp gcc422 -O3
#export CCC_NOTES="baseline run"
ccc-run 1 1

#Optimization run
#export CCC_NOTES="opt compilation"
ccc-comp gcc422 -O2
#export CCC_NOTES="opt run"
ccc-run 1 0

Finally, for automatic iterative compilation experiments, you can use ccc-run--glob-flags.sh script that has several modes how to select benchmarks and datasets. Probably, it should be simplified ... To be described - any help is appreciated

Plugins

Empirical iterative feedback-directed compilation plugins

ccc-run-glob-flags-rnd-uniform

Command-line: <Number of runs> <Compiler name> <Baseline opt> <Rnd seed> <Dataset>

Generate a random combination of compiler flags (50% probability of selecting individual optimization).

ccc-run-glob-flags-rnd-fixed

Command-line: <Number of runs> <Sequence length> <Compiler name> <Baseline opt> <Rnd seed> <Dataset>

Generate a combination of compiler flags of a specified length randomly when performing iterative compilation.

ccc-run-glob-flags-one-by-one

Command-line: <Ignore first option> <Compiler name> <Baseline opt> <Dataset>

Select all optimizations from the compiler optimization list one by one

ccc-run-glob-flags-one-off-rnd

Command-line: "Compiler flags" <Compiler name> <Baseline opt> <Rnd seed> <Time diff tolerance> <Dataset>

Remove flags from the combination of "compiler flags" one by one randomly at each iterative step and put them back if execution time drops. We need this script to find influential flags.

Data Analysis plugins

All data analysis plugins first invoke ccc--select-db.sh to set working database parameters. If ccc--select-db-local.sh exists in the local directory it will be invoked to automate setting of working database parameters.

Platform-independent plugins use the CCC library src-plat-indep/include/ccc_script_functions.php that provide most of the low-level functionality. One day it should be fully documented and updated.

get-all-compilers

Report all compiler IDs from COD.

get-all-environments

Report all environment IDs from COD.

get-all-platforms

Report all platform IDs from COD.

get-all-programs

Report all program IDs from COD.

get-compiler-id

Command-line: <compiler name entered during CCC compiler configuration>

Get compiler ID referenced by the compiler name entered during CCC compiler configuration.

get-dataset-id

Command-line: <program ID> <dataset number>

Get dataset ID referenced by the program ID and dataset number (from cBench).

get-global-speedups

Main script to analyze speedups. You can edit it to obtain necessary statistics for your own optimization scenario:

#!/bin/bash

# Copyright (C) 2004-2010 by Grigori Fursin
#
# http://fursin.net/research
# 
# UNIDAPT Group
# http://unidapt.org

STAT_FILE=Stats/_stats__global_speedups
STAT_FILE1=${STAT_FILE}._SUMMARY.txt

# Set database variables
if [ -f "ccc--select-db.sh" ] ; then
 . ./ccc--select-db.sh
fi

# Delete previous statistic (some files are appended)
rm -rf ${STAT_FILE}.*

#export PLAT_ID=110787249241129258
#110787249241129258  AMD Opteron 2218
#708176059411291686  Intel Xeon EM64T

#export CMPLR_ID=329504539516446542
#329504539516446542  gcc 4.4.0

#export ENV_ID=198828472341129319
#198828472341129319  Debian Sid Linux v1.1

#retrieve opt cases only when execution time > TIME_THRESHOLD
#export TIME_THRESHOLD=0.3

#retrieve opt cases only with specific notes
#export NOTES=

#retrieve opt cases only when profile info is !=""
#export PG_USE=1

#retrieve opt cases only when execution output is correct (or not if =0)
#export OUTPUT_CORRECT=1

#Sort optimization case by speedup (0 - ex. time, 1 - code size, 2 - comp time)
#export SORT=012

#Produce optimization space frontier
#export DIM=01 (2D frontier)
#export DIM=02 (2D frontier)
#export DIM=12 (2D frontier)
#export DIM=012 (3D frontier)
#export DIM=012

#Cut cases when building optimization space frontier (select cases when speedup 0,1 or 2 is more than some threshold)
#export CUT=0,0,1.2
#export CUT=1,1,1
#export CUT=0,0,0

#export PROG_ID=4324827713289550
#export DS_NUM=1
#export DS_ID=

#check user or total execution time
#export RUN_TIME=RUN_TIME_USER
#export RUN_TIME=RUN_TIME

export CCC_FILE_TMP=tmp

export CCC_STATS=${STAT_FILE}
php $CCC_PLUGINS/plugins/get_global_speedups.php

The following graphs have been created using this script:

Example of complex optimization search spaces for susan_c from Collective Benchmark and MILEPOST GCC 4.4.0 after randomly selecting about 80 optimization flags:

We can automatically improve execution time of the program by nearly 2 times over the highest GCC optimization level after using CCC framework (we obtain similar results on LLVM, Open64, Intel and IBM compilers). We can also use CCC framework to perform multi-objective optimizations (selecting optimization cases on the optimization space frontier shown by red circles and blue dots) such as:

optimize both execution time and code size (important for optimizing libraries and embedded/mobile computing systems)
optimize just execution time (important for desktop computers and HPC servers/supercomputers)
optimize both execution time and compilation time (important for cloud computing services and real-time systems)

You can also use MILEPOST GCC to correlate program features and behavior to predict good optimizations for unseen programs based on prior learning.

get-global-speedups-by-list

The same as get-global-speedups but produce speedup statistics for a number of programs listed in the file list-progs.dat.

plot-bar-graph

Plots bar graphs.

plot-density-graph

Uses R language script to plot speedup density graphs.

Machine learning plugins

ccc-ml-accumulate-features

This plugin accumulates static program features per function for a given program into single feature vector using MILEPOST GCC. To be updated

ccc-ml-predict-best-flag

This plugin queries ML server to obtained combination of flags or passes on a global or function level to improve execution time or code size. Deprecated - you should use MILPOST GCC directly instead to predict good optimizations to balance execution time, code size and compilation time depending on user optimization scenarios.

Misc

Here you can find projects to extend CCC Framework and plugins. You are welcome to participate or you can submit your projects.