(31 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
== Notes<br/> ==
 
== Notes<br/> ==
  
Some cost-aware experiments (execution time, size, energy, compilation time) performed by {{FGG}}. It supports our research on continuous performance tracking, code optimization and compiler benchmarking (regression detection).
+
''This computation species (kernel) is a threshold filter ({{CREF|45741e3fbcf4024b:1db78910464c9d05}}) - it is used in image processing and neuron activation functions (part of artificial neural networks).''
  
This computation species (kernel) is a threshold filter - it is used in image processing and neuron activation functions (part of artificial neural networks).
+
Some cost-aware experiments (execution time, size, energy, compilation time) performed by {{FGG}} using [http://c-mind.org Collective Mind framework] and artifacts from the [http://c-mind.org/repo public repository] to be reproducible. It supports our collaborative research on continuous performance tracking, code optimization and compiler benchmarking (regression detection). If you find any mistakes or would like to extend this page, please help us!
 +
 
 +
cM repositories online:
 +
 
 +
*Part1: [https://github.com/gfursin/cm-experiments-201408xx-fgg-motivation-compiler-flag-tuning-mess-1 download repo from github] , [http://c-mind.org/repo/?cm_menu=browse&browse_module_uoa=ctuning.scenario.compiler.optimizations&browse_repo_uoa=cm-experiments-201408xx-fgg-motivation-compiler-flag-tuning-mess-1 browse repo at cMind] , [http://c-mind.org/repo/?cm_menu=scenarios&cm_submenu=ctuning_space_visualize&ct_module_uoa=ctuning.scenario.compiler.optimizations&ct_repo_uoa=cm-experiments-201408xx-fgg-motivation-compiler-flag-tuning-mess-1 view nteractive graphs at cMind]
 +
*Part2: [https://github.com/gfursin/cm-experiments-201408xx-fgg-motivation-compiler-flag-tuning-mess-2 download repo from github] , [http://c-mind.org/repo/?cm_menu=browse&browse_module_uoa=ctuning.scenario.compiler.optimizations&browse_repo_uoa=cm-experiments-201408xx-fgg-motivation-compiler-flag-tuning-mess-2 browse repo at cMind] , [http://c-mind.org/repo/?cm_menu=scenarios&cm_submenu=ctuning_space_visualize&ct_module_uoa=ctuning.scenario.compiler.optimizations&ct_repo_uoa=cm-experiments-201408xx-fgg-motivation-compiler-flag-tuning-mess-2 view interactive graphs at cMind]
 +
*Part3: [https://github.com/gfursin/cm-experiments-201408xx-fgg-motivation-compiler-flag-tuning-mess-3 download repo from github] , [http://c-mind.org/repo/?cm_menu=browse&browse_module_uoa=ctuning.scenario.compiler.optimizations&browse_repo_uoa=cm-experiments-201408xx-fgg-motivation-compiler-flag-tuning-mess-3 browse repo at cMind] , [http://c-mind.org/repo/?cm_menu=scenarios&cm_submenu=ctuning_space_visualize&ct_module_uoa=ctuning.scenario.compiler.optimizations&ct_repo_uoa=cm-experiments-201408xx-fgg-motivation-compiler-flag-tuning-mess-3 view interactive graphs at cMind]
  
 
== Used artifacts<br/> ==
 
== Used artifacts<br/> ==
  
*Datasets:
+
=== Datasets ===
**D1 = grayscale image 1 ({{CREF|8a7141c59cd335f5:c8848a1b1fb1775e}}), size=1536x1536~2.4E6 (pixels or neurons)
+
 
**D2 = grayscale image 2 ({{CREF|8a7141c59cd335f5:0045c9b59e84318b}}), size=1536x1536~2.4E6 (pixels or neurons)
+
*D1 = grayscale image 1 ({{CREF|8a7141c59cd335f5:c8848a1b1fb1775e}}), size=1536x1536~2.4E6 (pixels or neurons)
 +
*D2 = grayscale image 2 ({{CREF|8a7141c59cd335f5:0045c9b59e84318b}}), size=1536x1536~2.4E6 (pixels or neurons)
 +
 
 +
=== Systems ===
 +
*S1 = Dell Laptop Latitude E6320, Processor=P1, Memory = 8Gb, Storage=256Gb (SSD), Max power consumption=52W, Cost (year of purchase 2011)~1200 euros ({{CREF|cb7e6b406491a11c:0d84339816de0271}})
 +
*S2 = Samsung Mobile Galaxy Duos GT-S6312, Processor=P2, Memory = 0.8Gb, Storage=4Gb, Battery=1300 mAh / 3.9V / up to 250 hours, Max power consumption~5W, Cost (year of purchase 2013)~200 euros ({{CREF|cb7e6b406491a11c:a9740acbe06bcd1e}})
 +
*S3 = Polaroid Tablet Executive 9" MID0927, Processor=P3, Memory=1Gb, Storage=16Gb, Battery=3500 mAh / 3.9V / up to 80 hours, Max power consumption~13W, Cost (year of purchase 2014)~100 euros ({{CREF|cb7e6b406491a11c:3419444faf22f3d0}})
 +
*S4 = Semiconductor neural network, PSpice simulation (year of development = 1993-1997)
 +
 
 +
=== Processors ===
 +
 
 +
*P1 = Intel Core i5-2540M, 2.60GHz, 2 cores ({{CREF|54cd38490124ef51:425ae4e3483c82e8}})
 +
*P2 = Qualcomm MSM7625A FFA, ARM Cortex A5, ARMv7, 1 GHz, 1 core({{CREF|54cd38490124ef51:ae17889f40209ae7}})
 +
*P3 = Allwinner A20 (sun7i), Dual-Core ARM Cortex A7, ARMv7, 1.6GHz, Mali400 GPU, 2 core ({{CREF|54cd38490124ef51:fc020ce2e4d44f3d}})
 +
*P4 = NVidia Quadro NVS 135M, 16 cores, 400MHz, 10Watt, 210 Million transistors (TBD)
 +
 
 +
=== Processor mode ===
 +
*W1 = 32 bit
 +
*W2 = 64 bit
 +
 
 +
=== OSs ===
 +
*O1 = Windows 7 Pro SP1,&nbsp; cost~170 euros ({{CREF|c4d3ce728f46eea2:10c4f7484446b689}})
 +
*O2 = O1, MinGW32
 +
*O2 = OpenSuse 12.1, Kernel 3.1.10, cost=free ({{CREF|c4d3ce728f46eea2:29ce89f1a1446e89}})
 +
*O3 = Android 4.1.2, Kernel 3.4.0, cost=free ({{CREF|c4d3ce728f46eea2:e734c48d5a5824c1}})
 +
*O4 = Android 4.2.2, Kernel 3.3.0, cost=free ({{CREF|c4d3ce728f46eea2:d3e9b97f6994444b}})
  
*Systems:
+
=== Compilers ===
**S1 = Dell Laptop Latitude E6320, Processor=P1, Memory = 8Gb, Storage=256Gb (SSD), Max power consumption=52W, Cost (time of purchase)~1200 euros ({{CREF|cb7e6b406491a11c:0d84339816de0271}})
+
**S2 = Samsung Mobile Galaxy Duos GT-S6312, Processor=P2, Memory = 0.8Gb, Storage=4Gb, Battery=1300 mAh / 3.9V / up to 250 hours, Max power consumption~5W, Cost (time of purchase)~200 euros ({{CREF|cb7e6b406491a11c:a9740acbe06bcd1e}})
+
**S3 = Polaroid Tablet Executive 9", Processor=P3, Memory=1Gb, Storage=16Gb, Battery=3500 mAh / 3.9V / up to 80 hours, Max power consumption~13W, Cost (time of purchase)~100 euros ()
+
  
*Processors:
+
*X1 = '''GCC 4.1.1''', number of optimization flags available~60+130, release date=2006 ({{CREF|cff49b38f4c2395d:ac263305247d3953}})
**P1 = Intel Core i5-2540M, 2.60GHz, 2 cores ({{CREF|54cd38490124ef51:425ae4e3483c82e8}})
+
*X2 = '''GCC 4.4.1''', number of optimization flags available~100+170, release date=2009 ({{CREF|cff49b38f4c2395d:15a583ed9eb54b57}})
**P2 = Qualcomm MSM7625A FFA, ARM Cortex A5, ARMv7, 1 GHz, 1 core ({{CREF|54cd38490124ef51:ae17889f40209ae7}})
+
*X3 = '''GCC 4.4.4''', number of optimization flags available~100+170, release date=2010 ({{CREF|0247b19de472d7d0:03a91d01a54ef7f5}}, {{CREF|cff49b38f4c2395d:15a583ed9eb54b57}})
**P3 = Allwinner A20 (sun7i), Dual-Core ARM Cortex A7, ARMv7, 1.6GHz, Mali400 GPU, 2 core ()
+
*X4 = '''GCC 4.6.3''', number of optimization flags available~120+200 (including polyhedral and lto), release date=2012 ({{CREF|0247b19de472d7d0:fc7b8424bbecc4d1}}, {{CREF|cff49b38f4c2395d:2454492134ed4b73}})
**P4 = NVidia Quadro NVS 135M, 16 cores, 400MHz ()
+
*X5 = '''GCC 4.7.2''', number of optimization flags available~130+210, release date=2012 ({{CREF|0247b19de472d7d0:a1b38095ce254cd2}}, {{CREF|cff49b38f4c2395d:9c1310b41c9a2b38}})
*Processor mode:
+
*X6 = '''GCC 4.8.3''', number of optimization flags available~135+215, release date=2014 ({{CREF|cff49b38f4c2395d:3474da936450dd7a}})
**W1 = 32 bit
+
*X7 = '''GCC 4.9.1''', number of optimization flags available~140+220, release date=2014 ({{CREF|cff49b38f4c2395d:264156bb24190a99}})
**W2 = 64 bit
+
*X8 = '''LLVM 3.1''', number of optimization flags available=TBD, release date=2012 ({{CREF|0247b19de472d7d0:697ad401bfc43c5b}}, {{CREF|cff49b38f4c2395d:ad6b41ddae73bbd4}})
 +
*X9 = '''LLVM 3.4.2''', number of optimization flags available=TBD, release date=2014 ({{CREF|cff49b38f4c2395d:b1e488aa91274cb6}})
 +
*X10 = '''Open64 5.0''', number of optimization flags available=TBD, release date=2011 ({{CREF|0247b19de472d7d0:0599ca2a3b34a6b8}}, {{CREF|cff49b38f4c2395d:48d5baa4569f59a8}})
 +
*X11 = '''PathScale 2.3.1''', number of optimization flags available=TBD, release date=2006 ({{CREF|cff49b38f4c2395d:164c4aaff2b69279}})
 +
*X12 = '''NVidia CUDA Toolkit 5.0''', number of optimization flags available=TBD, release date=2012 ({{CREF|0247b19de472d7d0:89e947f8430eaa37}}, {{CREF|cff49b38f4c2395d:48d5baa4569f59a8}})
 +
*X13 = '''Intel Composer XE 2011''', number of optimization flags available=TBD, release date=2011, cost = ~800euro ({{CREF|0247b19de472d7d0:e985f0596b1b1d9e}}, {{CREF|cff49b38f4c2395d:42eab7eefa890ddc}})
 +
*X14 = '''Microsoft Visual Studio 2013''', number of optimization flags available=TBD, release date=2013,cost = has free minimal version ({{CREF|cff49b38f4c2395d:5e35f4112bf996c5}})
  
*OSs:
+
=== Compiler optimization level ===
**O1 = Windows 7 Pro SP1,&nbsp; cost~170 euros ({{CREF|c4d3ce728f46eea2:10c4f7484446b689}})
+
*Y1 = Performance (usually -O3)
**O2 = O1, MinGW32
+
*Y2 = Size (usually -Os)
**O2 = OpenSuse 12.1, Kernel 3.1.10, cost=free ({{CREF|c4d3ce728f46eea2:29ce89f1a1446e89}})
+
*Y3 = -O3 -fmodulo-sched -funroll-all-loops
**O3 = Android 4.1.2, Kernel 3.4.0, cost=free ({{CREF|c4d3ce728f46eea2:e734c48d5a5824c1}})
+
*Y4 = -O3 -funroll-all-loops
**O4 = Android 4.2.2, Kernel 3.3.0, cost=free ()
+
*Y5 = -O3 -fprefecth-loop-arrays
 +
*Y6 = -O3 -fno-if-conversion
 +
*Y7 = Auto-tuning with more than 6 flags (-fif-conversion)
 +
*Y8 = Auto-tuning with more than 6 flags (-fno-if-conversion)
  
*Compilers:
+
=== Number of run-time code repetitions (for example, processing steps in neural networks) ===
**X1 = GCC 4.1.1
+
*R1 = 4000
**X2 = GCC 4.4.1
+
*R2 = 1000
**X3 = GCC 4.4.4
+
*R3 = 400
**X4 = GCC 4.6.3
+
**X5 = GCC 4.7.2
+
**X6 = GCC 4.8.3
+
**X7 = GCC 4.9.1
+
**X8 = LLVM 3.1
+
**X9 = LLVM 3.3.2
+
**X10 = Open64 5.0
+
**X11 = PathScale 2.3.1
+
**X12 = NVidia CUDA Toolkit 5.0
+
**X13 = Microsoft Visual Studio 2013, cost = has free minimal version
+
**X14 = Intel Composer XE 2011, cost =
+
  
*Compiler optimization level:
+
=== Total number of computations (processed neurons or pixels) ===
**O1 = Performance (usually -O3)
+
*T1 ~ 9.6E9
**O2 = Size (usually -Os)
+
*T2 ~ 2.4E9
**
+
*T3 ~ 1.0E9
  
*Number of run-time code repetitions (for example, processing steps in neural networks):
+
=== Costs ===
**R1 = 4000
+
*C1= Execution time
**R2 = 1000
+
*C2 = Energy
**R3 = 400
+
*C3 = Code size
 +
*C4 = Compilation time
 +
*C5 = System size
 +
*C6 = Hardware price
 +
*C7 = Software price
 +
*C8 = (Auto-)tuning price
 +
*C9 = Development time
 +
*C10 = Validation and testing time
  
*Total number of computations (processed neurons or pixels)
+
== Example of continuosly evolving advice (combination of decision trees and models)<br/> ==
**T1 ~ 9.6E9
+
**T2 ~ 2.4E9
+
**T3 ~ 1.0E9
+
  
*Costs
+
def advice(i):
**C1= Execution time
+
:
**C2 = Energy
+
:if i['usage_scenario']['need_fastest_code']=='yes':
**C3 = Code size
+
::if i['cost']['can_afford_specialized_hardware']=='yes':
**C4 = Compilation time
+
:::if i['cost']['can_afford_specialized_hardware']=='yes':
**C5 = System size
+
::::i['advice']='S4'
**C6 = Hardware price
+
:::else:
**C7 = Software price
+
::::i['advice']='P4'
**C8 = (Auto-)tuning price
+
::else:
**C9 = Development time
+
:::if i['resources']['p1']['available']=='yes':
**C10 = Validation and testing time
+
::::if i['resources']['p1']['cores']=='1':
 +
:::::if i['cost']['can_afford_auto_tuning']=='yes':
 +
::::::if i['dataset']['feature']=='D2':
 +
:::::::i['advice']='Solution 11'
 +
::::::else:
 +
:::::::i['advice']='Solution 4'
 +
:::::else:
 +
::::::i['advice']='Solution 5'
 +
::::elif i['resources']['p1']['cores']=='2':
 +
:::::if i['cost']['can_afford_auto_tuning']=='yes':
 +
::::::i['advice']='Solution 7'
 +
:::::else:
 +
::::::i['advice']='Solution 6'
 +
:::else:
 +
::::i['advice']='Ask for more resources'
 +
:elif i['usage_scenario']['need_most_energy_efficient_code']=='yes':
 +
::if i['cost']['can_afford_specialized_hardware']=='yes':
 +
:::i['advice']='P4'
 +
::else:
 +
:::i['advice']='P2'
 +
:elif i['usage_scenario']['need_cheapest_hardware']=='yes':
 +
::i['advice']='P3'
 +
:elif i['usage_scenario']['need_smallest_code']=='yes':
 +
::i['advice']='Solution 9'
 +
:elif i['usage_scenario']['need_fastest_compilation_and_good_speed']=='yes':
 +
::i['advice']='X4'<br/>
 +
:
 +
:return i
  
 
== Notes ==
 
== Notes ==

Latest revision as of 13:23, 4 September 2014

Computational species "bw filter simplified less" (CID=45741e3fbcf4024b:1db78910464c9d05)

Notes

This computation species (kernel) is a threshold filter (CID=45741e3fbcf4024b:1db78910464c9d05) - it is used in image processing and neuron activation functions (part of artificial neural networks).

Some cost-aware experiments (execution time, size, energy, compilation time) performed by Grigori Fursin using Collective Mind framework and artifacts from the public repository to be reproducible. It supports our collaborative research on continuous performance tracking, code optimization and compiler benchmarking (regression detection). If you find any mistakes or would like to extend this page, please help us!

cM repositories online:

Used artifacts

Datasets

Systems

  • S1 = Dell Laptop Latitude E6320, Processor=P1, Memory = 8Gb, Storage=256Gb (SSD), Max power consumption=52W, Cost (year of purchase 2011)~1200 euros (CID=cb7e6b406491a11c:0d84339816de0271)
  • S2 = Samsung Mobile Galaxy Duos GT-S6312, Processor=P2, Memory = 0.8Gb, Storage=4Gb, Battery=1300 mAh / 3.9V / up to 250 hours, Max power consumption~5W, Cost (year of purchase 2013)~200 euros (CID=cb7e6b406491a11c:a9740acbe06bcd1e)
  • S3 = Polaroid Tablet Executive 9" MID0927, Processor=P3, Memory=1Gb, Storage=16Gb, Battery=3500 mAh / 3.9V / up to 80 hours, Max power consumption~13W, Cost (year of purchase 2014)~100 euros (CID=cb7e6b406491a11c:3419444faf22f3d0)
  • S4 = Semiconductor neural network, PSpice simulation (year of development = 1993-1997)

Processors

Processor mode

  • W1 = 32 bit
  • W2 = 64 bit

OSs

Compilers

Compiler optimization level

  • Y1 = Performance (usually -O3)
  • Y2 = Size (usually -Os)
  • Y3 = -O3 -fmodulo-sched -funroll-all-loops
  • Y4 = -O3 -funroll-all-loops
  • Y5 = -O3 -fprefecth-loop-arrays
  • Y6 = -O3 -fno-if-conversion
  • Y7 = Auto-tuning with more than 6 flags (-fif-conversion)
  • Y8 = Auto-tuning with more than 6 flags (-fno-if-conversion)

Number of run-time code repetitions (for example, processing steps in neural networks)

  • R1 = 4000
  • R2 = 1000
  • R3 = 400

Total number of computations (processed neurons or pixels)

  • T1 ~ 9.6E9
  • T2 ~ 2.4E9
  • T3 ~ 1.0E9

Costs

  • C1= Execution time
  • C2 = Energy
  • C3 = Code size
  • C4 = Compilation time
  • C5 = System size
  • C6 = Hardware price
  • C7 = Software price
  • C8 = (Auto-)tuning price
  • C9 = Development time
  • C10 = Validation and testing time

Example of continuosly evolving advice (combination of decision trees and models)

def advice(i):

if i['usage_scenario']['need_fastest_code']=='yes':
if i['cost']['can_afford_specialized_hardware']=='yes':
if i['cost']['can_afford_specialized_hardware']=='yes':
i['advice']='S4'
else:
i['advice']='P4'
else:
if i['resources']['p1']['available']=='yes':
if i['resources']['p1']['cores']=='1':
if i['cost']['can_afford_auto_tuning']=='yes':
if i['dataset']['feature']=='D2':
i['advice']='Solution 11'
else:
i['advice']='Solution 4'
else:
i['advice']='Solution 5'
elif i['resources']['p1']['cores']=='2':
if i['cost']['can_afford_auto_tuning']=='yes':
i['advice']='Solution 7'
else:
i['advice']='Solution 6'
else:
i['advice']='Ask for more resources'
elif i['usage_scenario']['need_most_energy_efficient_code']=='yes':
if i['cost']['can_afford_specialized_hardware']=='yes':
i['advice']='P4'
else:
i['advice']='P2'
elif i['usage_scenario']['need_cheapest_hardware']=='yes':
i['advice']='P3'
elif i['usage_scenario']['need_smallest_code']=='yes':
i['advice']='Solution 9'
elif i['usage_scenario']['need_fastest_compilation_and_good_speed']=='yes':
i['advice']='X4'
return i

Notes

Energy: 1Wh = 3600 joules

W = mAh * V / 1000 = 1300 * 3.9 / 1000 ~ 5W


(C) 2011-2014 cTuning foundation