From cTuning.org

Jump to: navigation, search

Google Summer of Code 2009: Generic function cloning.

Liang Peng (ICT, China)

Based on Run-time Function Adaptation for Statically-Compiled Programs based on function multiversioning and FunctionSpecificOpt, we enabled generic function cloning with low-overhead program behavior monitoring routines. It will enable fine-grain self-tuning binaries and libraries and will increase performance and portability of the static code which is particularly important for rapidly evolving hardware and virtual enviroments.


Contents

Function cloning

Description

SIMPLE_IPA_PASS "generic_cloning"

This pass performs generic function cloning. It has the ability to create any number of clones for a given function on demand, and apply different fine-grain optimizations for a given clone, also provide mechanism to select different clones at run-time based on optimization scenarios. It's placed after "visibility".

It can be triggered by both GCC option -fapi-clone-function and Adapt plugin. We take advantage of ICI event and plugin to input information for function cloning. Currently, The information comes from xml files.

New ICI event

ICI event "load_clone_config"

This event is called at the very beginning of function cloning pass.

  • Callback function: load_clone_config () Defined in ICI plugin adapt.c
This callback function gets information needed by function cloning pass. My implementation is that it gets the current main input filename first using get_feature ("main_input_filename"), then read information from corresponding xml file into datastruct cloneinfo using Mini-XML library.
  • Key data structure: cloneinfo
Defined in both gcc/highlev-plugin-internal.h and gcc/highlev-plugin.h:
typedef struct {
  /* number of function to clone.  */
  int numofclonefun;                           
  /* function name list.  */
  char **clone_function_list;                        
  /* corresponding filename of function.  */
  char **function_filename_list;                   
  /* corresponding number of clones.  */
  int *clones;                                                 
  /* corresponding function name extension for clone.  */
  char **clone_extension;                             
  /* corresponding adaptation function name for current function .  */
  char **adaptation_function;                      
  /* option list for clone .  */
  char **clone_option;                                 
  /* corresponding external libraries.  */
  char *external_libraries;                          
} cloneinfo ;
This structure stores the information getting from callback function of ICI event "load_clone_config". It will control which functions to be cloned and how to clone them.
  • ICI event parameter: clone_info
It's a pointer which points to datastruct cloneinfo, and is taken as a parameter of ICI event "load_clone_config".

Work flow

For each cgraph node, we check whether or not it needs to be cloned.

If yes, we clone it using cgraph_function_version, apply different optimizations for each clone, and insert selection mechanism into original function.

Note: To get a generic clone and set correct debug information for clones, function cgraph_function_version() and some of its callees needs to be modified.

Applying different optimization to clone

Image:Apply_option_to_clone.jpg

One bug from GCC4.4:

A segmentation fault comes up if we:
delcare function Mibench > automotive_susan_e > susan.c:susan_edges_small() with __attribute__((optimize(3)))
and compile it using $~/ici/install/bin/gcc -O1 -lm -c susan.c

Another bug from GCC4.4:

A segmentation fault comes up if we:
declare function Mibench > automotive_susan_e > susan.c:susan_edges() with __attribute__((optimize(3)))
declare function Mibench > automotive_susan_e > susan.c:susan_edges_small() with __attribute__((optimize(1)))
declare function Mibench > automotive_susan_e > susan.c:susan_corners_quick() with __attribute__((optimize(3)))
and compile it using ~/ici/install/bin/gcc -O3 -lm -c susan.c

I think GCC4.4 does not support applying different optimizations to functions in a compilation unit well. We did the following things to prevent GCC from bugs:

1, The following flags are not allowed to be changed :

flag_strict_aliasing;
flag_omit_frame_pointer;
flag_pcc_struct_return;
flag_asynchronous_unwind_tables;

2, We always execute function init_caller_save () and rebuild optimization_default_node at the end of process_options();

Selection mechanism and overhead

Image:Selection_mechanism_paper_2.jpg

Overhead

'Mibench Automotive_susan_e'
Type average time: O3 average time: clone susan_edges() ten times, O3 to all clones, recurrent selection Overhead percent
"real" 8.1588s 8.282778s 1.5195617%
"user" 7.6258s 7.759111s 1.7481576%

If we run dataset 1 on automotive_susan_e, susan_edges() is a hot function that occupies about 80% of the whole execute time.

The following figure shows our evaluation of run-time overhead that introduced by our call-switch mechanism. Top 6 hot functions of MiBench were cloned once, original function and clones were selected recurrently.

Image:Overhead_paper.jpg

Primary functions

  • exec_clone_functions (void)
This is the "execute" function for gimple_opt_pass pass_clone_functions. It calls event "load_clone_config" to load information needed by cloning, and clone functions that is in the clone list using cgraph_function_versioning(), and insert selection mechanism at last.
  • add_call_to_clones (struct cgraph_node *orig, int nid)
This function inserts selection mechanism, which includes adding call to clones, adding call to adaptation function, and building switch statement.
  • get_arguments (tree tree_list)
This function gets the arguments of original function into tree argv.
  • parse_arguments (char *text, unsigned int *argc)
This function parses a option string, and return the number of argument and argument vector.
For example parsing string "-O3 -fici -fapi-clone-functions", to int argc=3, char **argv={"-O3","-fici","-fapi-clone-functions"}
  • find_clone_options (char *funcname, int *nid)
This function trys to find a option for clone according to clone's name in clone_option list, also records the id.
  • is_in_clone_list (const char *func_name, const char *file_name, int *nid)
This function checks whether function:func_name in file:file_name needs to be cloned, return true if yes, false if no, nid records the id in the list.
  • is_it_clonable (struct cgraph_node *cg_func)
check whether cgraph_node:cg_func is clonable.
  • is_it_main (struct cgraph_node *cg_func)
check whether cgraph_node:cg_func is a cgraph node of main/MAIN__.

TODOS

  • deal with functions that happen to share the same name with a clone.
  • add support to pragams.
  • apply different target optimization to clone.


Instrumentation

Description

SIMPLE_IPA_PASS "instrumentation"

This pass performs function instrumentation. Currently, it only has ability to add function calls. We also have the ability to link external libraries transparently withou Makefile modifications.

This pass also can be triggered by GCC option fapi-instrument-functions and Adapt plugin. Currently, the information come from xml files through ICI.

New ICI event

'ICI event “load_instr_config

This event is called at the very beginning of function instrumentation pass.

  • Callback function: load_instr_config () Defined in ICI plugin adapt.c
This callback function gets information needed by instrumentation pass. My implementation is that it gets the current main input filename first using get_feature ("main_input_filename"), then read information from corresponding xml file into datastruct instrinfo using Mini-XML library.
  • Key data structure: instrinfo
Defined in both gcc/highlev-plugin-internal.h and gcc/highlev-plugin.h:
typedef struct {
  /* number of function to instrument.  */
  int numofinstrfun;
  /* function name list.  */
  char **instrument_function_list;
  /* corresponding filename of function.  */
  char **function_filename_list;
  /* name of function instrument at the begin of function.  */
  char **timer1;
  /* name of function instrument at the end of function.  */
  char **timer2;
  /* flag list whether function is cloned.  */
  char *cloned;
} instrinfo ;
This structure stores the information getting from call back function of ICI event "load_instr_config". It will control whether to instrument a function and what the name of external call is.
  • ICI event parameter: instr_info
It's a pointer which points to datastruct instrinfo, and taken as a parameter of ICI event "load_instr_config".

Work flow

Image:Instrument_paper.jpg

We introduce a flag "cloned" to exclude the overhead of call to selection function.

Primary functions

  • exec_instrument_functions (void)
Execute function for instrumentation pass, It calls event "load_instr_config" to load information needed by instrumentation, and instrument a external call at the begin/end of function if function is in the instrument_function_list.
  • add_timer_begin (struct cgraph_node *cg_func, char *funname, int cloned)
instrument a external call named funname at the begin of function.
if cloned == '1' , the external call will be inserted after the first gimple statement since this function is cloned before, so the first statement should be call to select function.
  • add_timer_end (struct cgraph_node *cg_func, char *funname)
instrument a external call named funname at the end of function.
  • is_in_instrument_list (const char *func_name, const char *file_name, int *nid)
This function check whether function:func_name in file:file_name needs to be instrumented, return true if yes, false if no, nid records the id in the list.
  • is_it_instrumentable (struct cgraph_node *cg_func)
Check whether cg_func is instrumentable, return true if yes, false if no.

Run-time monitoring routines based on PAPI

Base on PAPI and the slot provided by instrumentation, we can expediently get the IPC, cache miss, etc of a function/clone execution.

Linking external library

Since we introduce some external calls both in function cloning pass and instrumentation pass, it can be important for us that we now have the ability to link external libraries transparently without Makefile modifications. We have this functionality now, it is provided by Yuri Kashnikoff. Currently, we take an enviroment variable ICI_LIBS as input. For example: ICI_LIBS="-Lpath/to/library -lselect".

TODOS

  • provide ability to add function calls before or after specific instructions with some program variables as arguments


Work with adapt plugin

Register pass: In toplev.c, we register cloning pass and instrumentation pass using ICI API.

  • register_pass (&pass_clone_functions.pass);
  • register_pass (&pass_instrument_functions.pass);

Adapt plugin: Fine-grain optimization tuning - another GSOC project by Yuanjie Huang from ICT, China.

provides support for GCC pass sequence record/substitution, function-specific optimization tuning, function clone and instrumentation. It's controled via environment variable ICI_ADAPT_CONTROL. When this variable is set to 1, information on compilation will be recorded into XML files; while this variable is set to 2, adapt plugin will reuse information from XML and tune GCC compilation workflow via ICI.

Script

ici-xml-util.py This script works after recording compilation informatoin but before reusing compilation information. The input of this script is a ini format file which describes how to perform function cloning and instrumentation. This script will also generate external library template.

option -n, --noclone : turn off clone (default on)
option -i, --instrumentation : turn on instrumentation (default off)
option -o, --optimization : turn on function specific optimization (default off)
option -t <filename> : generate external library template
option --tflavor=FLAVOR : FLAVOR: r for random select function, b for roundrobin

Note: Please refer to Scripts for GSOC for more information.

Work flow

Three steps compilation:

1: create current XML with the compilation flow and info

$ ./ici-adapt-compile.sh 1
all xml files are put in directory $ICI_ADAPT_XMLDIR

2: modify xml files based on an ini format file to turn

on function cloning pass or/and instrumentation pass
$ ./ici-xml-util.py [OPTION] a.ini $ICI_ADAPT_XMLDIR
xml files in $ICI_ADAPT_XMLDIR will be modified to
(compile the user provided external library)

3: compiles program, clone functions, apply optimization

flags to clones
$ ./ici-adapt-compile.sh 2

TODOS

Locations of visitors to this page

Tweet