Difference between revisions of "CM:data:29db2248aba45e59:0e4d46aabcde1b4e"

Latest revision as of 15:10, 24 March 2014

Publication from ADAPT'14 workshop.

Notes on reproducibility:

Roofline-aware DVFS for GPUs
=============

Date: 16-Oct-2013

Author: Cedric Nugteren (http://www.cedricnugteren.nl)

Description: This repository is an online appendix to the
scientific article "Roofline-aware DVFS for GPUs"

Benchmarks
=============

Three types of CUDA benchmarks are tested:
*   Benchmarks from PolyBench/GPU
*   Benchmarks from Parboil (requires Parboil datasets
   to be installed in ~/software/parboil-2.5/datasets/)
*   Two artificial micro-benchmarks

Experimental setup
=============

GPGPU-Sim version 3.2.1 + GPUWattch

(commit 72aaaf6b11b38121d946469f26d85315ff794f29)

Configuration for GPGPU-Sim
-------------

*   Clock frequencies:

       -gpgpu_clock_domains XXX:YYY:XXX:ZZZ

   XXX is the halved core frequency (600-500-400-300).
   YYY is the full core frequency (1200-1000-800-600).
   ZZZ is the memory frequency (900-750-600-450).

*   DRAM latencies:

       -dram_latency XXX

   XXX is the DRAM latency is core clock cycles, reduced
   when scaling the core frequency to keep the latency
   (in seconds) constant (100-83-76-50).

Configuration for GPUWattch
-------------

*   Memory configuration:

       <param name="mc_clock" value="XXX"/>
       <param name="peak_transfer_rate" value="YYY"/>

   XXX is the doubled memory clock or the halved effective
   clock (1800-1500-1200-900). YYY is the bandwidth per
   memory controller (28800-24000-19200-14400).

*   Clock frequencies:

       <param name="target_core_clockrate" value="XXX"/>
       <param name="clockrate" value="XXX"/>
       <param name="NOC_A" value="XXX" />

   XXX is either the halved or full core clock frequency
   in various places in the configuration settings.

*   Memory power parameters:

       <param name="MEM_RD" value="XXX" />
       <param name="MEM_WR" value="YYY" />
       <param name="MEM_PRE" value="ZZZ" />

   XXX, YYY, and ZZZ are scaled with the core clock rate
   to obtain correct memory power characteristics. This
   has been acknowledge to be a bug in the simulator and
   will be repaired in the next version.

Contents of the repository
=============

*   *benchmark_code*

   Folder containing CUDA source code made suitable for
   the GPGPU-Sim simulator.

*   *configurations*

   All the GPGPU-Sim and GPUWattch configuration files.

*   *results*

   Folder containing the graphs as they appear in the
   article plus more detailed graphs. It also contains
   a processed database extracted from simulation data.

*   *simulation_data*

   The raw simulation output from GPGPU-Sim and GPUWattch.

*   *process.r*

   An R-script to process the raw simulation data and
   output a database in CSV format (in results folder).

*   *graph.r*

   An R-script to generate plots based on the database
   generated by the process.r script.

*   *README*

   This file.

###################################################

@@ Line 1: / Line 1: @@
 Publication from ADAPT'14 workshop.
-Notes on reproducibility:
+'''Notes on reproducibility:'''
+<br/>Roofline-aware DVFS for GPUs<br/>=============<br/><br/>Date: 16-Oct-2013<br/><br/>Author: Cedric Nugteren (http://www.cedricnugteren.nl)<br/><br/>Description: This repository is an online appendix to the<br/>scientific article "Roofline-aware DVFS for GPUs"<br/><br/><br/>Benchmarks<br/>=============<br/><br/>Three types of CUDA benchmarks are tested:<br/>*&nbsp;&nbsp; &nbsp;Benchmarks from PolyBench/GPU<br/>*&nbsp;&nbsp; &nbsp;Benchmarks from Parboil (requires Parboil datasets<br/>&nbsp;&nbsp; &nbsp;to be installed in ~/software/parboil-2.5/datasets/)<br/>*&nbsp;&nbsp; &nbsp;Two artificial micro-benchmarks<br/><br/><br/>Experimental setup<br/>=============<br/><br/>GPGPU-Sim version 3.2.1 + GPUWattch<br/><br/>(commit 72aaaf6b11b38121d946469f26d85315ff794f29)<br/><br/>Configuration for GPGPU-Sim<br/>-------------<br/><br/>*&nbsp;&nbsp; &nbsp;Clock frequencies:<br/><br/>&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;-gpgpu_clock_domains XXX:YYY:XXX:ZZZ<br/><br/>&nbsp;&nbsp; &nbsp;XXX is the halved core frequency (600-500-400-300).<br/>&nbsp;&nbsp; &nbsp;YYY is the full core frequency (1200-1000-800-600).<br/>&nbsp;&nbsp; &nbsp;ZZZ is the memory frequency (900-750-600-450).<br/><br/>*&nbsp;&nbsp; &nbsp;DRAM latencies:<br/><br/>&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;-dram_latency XXX<br/><br/>&nbsp;&nbsp; &nbsp;XXX is the DRAM latency is core clock cycles, reduced<br/>&nbsp;&nbsp; &nbsp;when scaling the core frequency to keep the latency<br/>&nbsp;&nbsp; &nbsp;(in seconds) constant (100-83-76-50).<br/><br/>Configuration for GPUWattch<br/>-------------<br/><br/>*&nbsp;&nbsp; &nbsp;Memory configuration:<br/><br/>&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&lt;param name="mc_clock" value="XXX"/&gt;<br/>&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&lt;param name="peak_transfer_rate" value="YYY"/&gt;<br/><br/>&nbsp;&nbsp; &nbsp;XXX is the doubled memory clock or the halved effective<br/>&nbsp;&nbsp; &nbsp;clock (1800-1500-1200-900). YYY is the bandwidth per<br/>&nbsp;&nbsp; &nbsp;memory controller (28800-24000-19200-14400).<br/><br/>*&nbsp;&nbsp; &nbsp;Clock frequencies:<br/><br/>&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&lt;param name="target_core_clockrate" value="XXX"/&gt;<br/>&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&lt;param name="clockrate" value="XXX"/&gt;<br/>&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&lt;param name="NOC_A" value="XXX" /&gt;<br/><br/>&nbsp;&nbsp; &nbsp;XXX is either the halved or full core clock frequency<br/>&nbsp;&nbsp; &nbsp;in various places in the configuration settings.<br/><br/>*&nbsp;&nbsp; &nbsp;Memory power parameters:<br/><br/>&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&lt;param name="MEM_RD" value="XXX" /&gt;<br/>&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&lt;param name="MEM_WR" value="YYY" /&gt;<br/>&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&lt;param name="MEM_PRE" value="ZZZ" /&gt;<br/><br/>&nbsp;&nbsp; &nbsp;XXX, YYY, and ZZZ are scaled with the core clock rate<br/>&nbsp;&nbsp; &nbsp;to obtain correct memory power characteristics. This<br/>&nbsp;&nbsp; &nbsp;has been acknowledge to be a bug in the simulator and<br/>&nbsp;&nbsp; &nbsp;will be repaired in the next version.<br/><br/><br/>Contents of the repository<br/>=============<br/><br/>*&nbsp;&nbsp; &nbsp;*benchmark_code*<br/><br/>&nbsp;&nbsp; &nbsp;Folder containing CUDA source code made suitable for<br/>&nbsp;&nbsp; &nbsp;the GPGPU-Sim simulator.<br/><br/>*&nbsp;&nbsp; &nbsp;*configurations*<br/><br/>&nbsp;&nbsp; &nbsp;All the GPGPU-Sim and GPUWattch configuration files.<br/><br/>*&nbsp;&nbsp; &nbsp;*results*<br/><br/>&nbsp;&nbsp; &nbsp;Folder containing the graphs as they appear in the<br/>&nbsp;&nbsp; &nbsp;article plus more detailed graphs. It also contains<br/>&nbsp;&nbsp; &nbsp;a processed database extracted from simulation data.<br/><br/>*&nbsp;&nbsp; &nbsp;*simulation_data*<br/><br/>&nbsp;&nbsp; &nbsp;The raw simulation output from GPGPU-Sim and GPUWattch.<br/><br/>*&nbsp;&nbsp; &nbsp;*process.r*<br/><br/>&nbsp;&nbsp; &nbsp;An R-script to process the raw simulation data and<br/>&nbsp;&nbsp; &nbsp;output a database in CSV format (in results folder).<br/><br/>*&nbsp;&nbsp; &nbsp;*graph.r*<br/><br/>&nbsp;&nbsp; &nbsp;An R-script to generate plots based on the database<br/>&nbsp;&nbsp; &nbsp;generated by the process.r script.<br/><br/>*&nbsp;&nbsp; &nbsp;*README*<br/><br/>&nbsp;&nbsp; &nbsp;This file.<br/><br/>###################################################<br/>