CTools:ICI:Projects:Fine Grain Optimizations

From cTuning.org

(Difference between revisions)

Current revision

Navigation: cTuning.org > CTools > ICI

This is an on-going project funded by Google Summer of Code'09:

It is known that static compilers often fail to deliver portable performance due to necessarily simplistic hardware models, fixed and black-box optimization heuristics, inability to tune application at fine-grain level, large optimization spaces, highly dynamic behavior of the system and inability to adapt to varying program and system behavior at run time with low overhead. Possibility for architecture reconfiguration makes system optimization problem even worse. Iterative feedback-directed compilation combined with statistical and machine learning techniques ([1], [2]) is a popular approach to tackle such problems however it needs compiler support to access optimizations. Since 2005, PathScale/Open64 ICI enabled iterative feedback-directed compilation to select fine-grain transformations and their parameters such as loop tiling, unrolling, fusion/fission, vectorization, array padding and prefetching, and showed promising results ([3], [4]), however current GCC ICI implementation ([5]) allows pass selection, reordering and parameter tuning on function level at the moment.

We would like to enable and unify selection and parameter tuning of internal compiler optimizations and their paramters at fine-grain level (loop or instruction) through ICI. We would like to provide information about all parameters each transformation needs, optimization heuristic dependencies (i.e. loop unrolling internal optimization heuristic potentially depend on cache size) and optimization dependencies (other optimizations or enabling passes that create data structures needed for this optimization).

We need to develop a representation scheme to describe a sequence of optimizations and their parameters applied to a program and match it with a program structure. We should probably use XML for that. The idea is to simplify program structure and provide an external optimization XML file for different architectures (in a similar way as it was already done in the Framework for Continuous Optimizations(2003-2006)). A tricky part is to match optimizations with a program if it evolves (something like diff of program structure and optimization file) to avoid full re-optimization. On the other hand, we may need to fully re-optimize program anyway even if it changed slightly since it can influence all other optimizations in other parts of a program.

When it's done, we would like to combine it with [CTools:CCC|Continuous Collective Compilation Framework] and apply MILEPOST technology to build machine learning model that substitutes internal optimization heuristic of GCC for each optimization and predicts good optimizations automatically for a given architecture based on the training data collected within Collective Optimization Database. We can start with new loop transformations available in GCC 4.4 through GRAPHITE and gradually add more optimizations.

Who is interested: Grigori Fursin, Zbigniew Chamski, Chengyong Wu, Sebastian Pop

Who would like to help with implementation: Zbigniew Chamski, Sebastian Pop, Grigori Fursin, ...

@@ Line 1: / Line 1: @@
 {{CMenu:CTools|ICI}}
-It was shown many times that static compilers often fail to deliver portable performance due to necessarily simplistic hardware models, fixed and black-box optimization heuristics, inability to tune application at fine-grain level, large optimization spaces, highly dynamic behavior of the system and inability to adapt to varying program and system behavior at run time with low overhead. Possibility for architecture reconfiguration makes system optimization problem even worse. Iterative feedback-directed compilation combined with statistical and machine learning techniques ([http://unidapt.org/index.php/Dissemination#FUR2004], [http://unidapt.org/index.php/Dissemination#ABCP2006]) is a popular approach to tackle such problems however it needs compiler support to access optimizations. Since 2005, PathScale/Open64 ICI enabled iterative feedback-directed compilation to select fine-grain transformations and their parameters such as loop tiling, unrolling, fusion/fission, vectorization, array padding and prefetching, and showed promising results ([http://unidapt.org/index.php/Dissemination#FCOP2005], [http://unidapt.org/index.php/Dissemination#LCWP2009]), however current GCC ICI implementation ([http://unidapt.org/index.php/Dissemination#FMTP2008]) allows pass selection, reordering and parameter tuning on function level at the moment.
+'''This is an on-going project funded by Google Summer of Code'09:'''
+* [http://socghop.appspot.com/org/home/google/gsoc2009/gcc GSOC'09 project description]
+* [http://gcc-ici.svn.sourceforge.net/viewvc/gcc-ici/branches/ SVN repository]
+* [http://groups.google.com/group/ctuning-discussions cTuning development mailing list]
-We would like to enable and unify selection and parameter tuning of internal compiler optimizations and their paramters at fine-grain level (loop or instruction) through ICI. We would like to provide information about all parameters each transformation needs, optimization heuristic dependencies (i.e. loop unrolling internal optimization heuristic potentially depend on cache size) and optimization dependencies (other optimizations or enabling passes that create data structures needed for this optimization). When it's done, we would like to combine it with [CTools:CCC|Continuous Collective Compilation Framework] and apply [http://www.milepost.eu] MILEPOST technology to build machine learning model that substitutes internal optimization heuristic of GCC for each optimization and predicts good optimizations automatically for a given architecture based on the training data collected within [CDatabase|Collective Optimization Database]. We can start with new loop transformations available in GCC 4.4 through GRAPHITE and gradually add more optimizations.
+----
+It is known that static compilers often fail to deliver portable performance due to necessarily simplistic hardware models, fixed and black-box optimization heuristics, inability to tune application at fine-grain level, large optimization spaces, highly dynamic behavior of the system and inability to adapt to varying program and system behavior at run time with low overhead. Possibility for architecture reconfiguration makes system optimization problem even worse. Iterative feedback-directed compilation combined with statistical and machine learning techniques ([http://unidapt.org/index.php/Dissemination#FUR2004], [http://unidapt.org/index.php/Dissemination#ABCP2006]) is a popular approach to tackle such problems however it needs compiler support to access optimizations. Since 2005, PathScale/Open64 ICI enabled iterative feedback-directed compilation to select fine-grain transformations and their parameters such as loop tiling, unrolling, fusion/fission, vectorization, array padding and prefetching, and showed promising results ([http://unidapt.org/index.php/Dissemination#FCOP2005], [http://unidapt.org/index.php/Dissemination#LCWP2009]), however current GCC ICI implementation ([http://unidapt.org/index.php/Dissemination#FMTP2008]) allows pass selection, reordering and parameter tuning on function level at the moment.
+We would like to enable and unify selection and parameter tuning of internal compiler optimizations and their paramters at fine-grain level (loop or instruction) through ICI. We would like to provide information about all parameters each transformation needs, optimization heuristic dependencies (i.e. loop unrolling internal optimization heuristic potentially depend on cache size) and optimization dependencies (other optimizations or enabling passes that create data structures needed for this optimization).
+We need to develop a representation scheme to describe a sequence of optimizations and their parameters applied to a program and match it with a program structure. We should probably use XML for that. The idea is to simplify program structure and provide an external optimization XML file for different architectures (in a similar way as it was already done in the [http://fursin.net/wiki/index.php5?title=Research:Developments:FCO Framework for Continuous Optimizations(2003-2006)]). A tricky part is to match optimizations with a program if it evolves (something like diff of program structure and optimization file) to avoid full re-optimization. On the other hand, we may need to fully re-optimize program anyway even if it changed slightly since it can influence all other optimizations in other parts of a program.
+When it's done, we would like to combine it with [CTools:CCC|Continuous Collective Compilation Framework] and apply [http://cTuning.org/project-milepost MILEPOST technology] to build machine learning model that substitutes internal optimization heuristic of GCC for each optimization and predicts good optimizations automatically for a given architecture based on the training data collected within [[CDatabase|Collective Optimization Database]]. We can start with new loop transformations available in GCC 4.4 through GRAPHITE and gradually add more optimizations.
 '''Who is interested:''' Grigori Fursin, Zbigniew Chamski, Chengyong Wu, Sebastian Pop
 '''Who would like to help with implementation:''' Zbigniew Chamski, Sebastian Pop, Grigori Fursin,  ...