(Created page with "Publication from ADAPT'14 workshop. Notes on reproducibility:") |
|||
Line 1: | Line 1: | ||
Publication from ADAPT'14 workshop. | Publication from ADAPT'14 workshop. | ||
− | Notes on reproducibility: | + | '''Notes on reproducibility:''' |
+ | |||
+ | <br/>Roofline-aware DVFS for GPUs<br/>=============<br/><br/>Date: 16-Oct-2013<br/><br/>Author: Cedric Nugteren (http://www.cedricnugteren.nl)<br/><br/>Description: This repository is an online appendix to the<br/>scientific article "Roofline-aware DVFS for GPUs"<br/><br/><br/>Benchmarks<br/>=============<br/><br/>Three types of CUDA benchmarks are tested:<br/>* Benchmarks from PolyBench/GPU<br/>* Benchmarks from Parboil (requires Parboil datasets<br/> to be installed in ~/software/parboil-2.5/datasets/)<br/>* Two artificial micro-benchmarks<br/><br/><br/>Experimental setup<br/>=============<br/><br/>GPGPU-Sim version 3.2.1 + GPUWattch<br/><br/>(commit 72aaaf6b11b38121d946469f26d85315ff794f29)<br/><br/>Configuration for GPGPU-Sim<br/>-------------<br/><br/>* Clock frequencies:<br/><br/> -gpgpu_clock_domains XXX:YYY:XXX:ZZZ<br/><br/> XXX is the halved core frequency (600-500-400-300).<br/> YYY is the full core frequency (1200-1000-800-600).<br/> ZZZ is the memory frequency (900-750-600-450).<br/><br/>* DRAM latencies:<br/><br/> -dram_latency XXX<br/><br/> XXX is the DRAM latency is core clock cycles, reduced<br/> when scaling the core frequency to keep the latency<br/> (in seconds) constant (100-83-76-50).<br/><br/>Configuration for GPUWattch<br/>-------------<br/><br/>* Memory configuration:<br/><br/> <param name="mc_clock" value="XXX"/><br/> <param name="peak_transfer_rate" value="YYY"/><br/><br/> XXX is the doubled memory clock or the halved effective<br/> clock (1800-1500-1200-900). YYY is the bandwidth per<br/> memory controller (28800-24000-19200-14400).<br/><br/>* Clock frequencies:<br/><br/> <param name="target_core_clockrate" value="XXX"/><br/> <param name="clockrate" value="XXX"/><br/> <param name="NOC_A" value="XXX" /><br/><br/> XXX is either the halved or full core clock frequency<br/> in various places in the configuration settings.<br/><br/>* Memory power parameters:<br/><br/> <param name="MEM_RD" value="XXX" /><br/> <param name="MEM_WR" value="YYY" /><br/> <param name="MEM_PRE" value="ZZZ" /><br/><br/> XXX, YYY, and ZZZ are scaled with the core clock rate<br/> to obtain correct memory power characteristics. This<br/> has been acknowledge to be a bug in the simulator and<br/> will be repaired in the next version.<br/><br/><br/>Contents of the repository<br/>=============<br/><br/>* *benchmark_code*<br/><br/> Folder containing CUDA source code made suitable for<br/> the GPGPU-Sim simulator.<br/><br/>* *configurations*<br/><br/> All the GPGPU-Sim and GPUWattch configuration files.<br/><br/>* *results*<br/><br/> Folder containing the graphs as they appear in the<br/> article plus more detailed graphs. It also contains<br/> a processed database extracted from simulation data.<br/><br/>* *simulation_data*<br/><br/> The raw simulation output from GPGPU-Sim and GPUWattch.<br/><br/>* *process.r*<br/><br/> An R-script to process the raw simulation data and<br/> output a database in CSV format (in results folder).<br/><br/>* *graph.r*<br/><br/> An R-script to generate plots based on the database<br/> generated by the process.r script.<br/><br/>* *README*<br/><br/> This file.<br/><br/>###################################################<br/> |
Latest revision as of 09:10, 24 March 2014
Publication from ADAPT'14 workshop.
Notes on reproducibility:
Roofline-aware DVFS for GPUs
=============
Date: 16-Oct-2013
Author: Cedric Nugteren (http://www.cedricnugteren.nl)
Description: This repository is an online appendix to the
scientific article "Roofline-aware DVFS for GPUs"
Benchmarks
=============
Three types of CUDA benchmarks are tested:
* Benchmarks from PolyBench/GPU
* Benchmarks from Parboil (requires Parboil datasets
to be installed in ~/software/parboil-2.5/datasets/)
* Two artificial micro-benchmarks
Experimental setup
=============
GPGPU-Sim version 3.2.1 + GPUWattch
(commit 72aaaf6b11b38121d946469f26d85315ff794f29)
Configuration for GPGPU-Sim
-------------
* Clock frequencies:
-gpgpu_clock_domains XXX:YYY:XXX:ZZZ
XXX is the halved core frequency (600-500-400-300).
YYY is the full core frequency (1200-1000-800-600).
ZZZ is the memory frequency (900-750-600-450).
* DRAM latencies:
-dram_latency XXX
XXX is the DRAM latency is core clock cycles, reduced
when scaling the core frequency to keep the latency
(in seconds) constant (100-83-76-50).
Configuration for GPUWattch
-------------
* Memory configuration:
<param name="mc_clock" value="XXX"/>
<param name="peak_transfer_rate" value="YYY"/>
XXX is the doubled memory clock or the halved effective
clock (1800-1500-1200-900). YYY is the bandwidth per
memory controller (28800-24000-19200-14400).
* Clock frequencies:
<param name="target_core_clockrate" value="XXX"/>
<param name="clockrate" value="XXX"/>
<param name="NOC_A" value="XXX" />
XXX is either the halved or full core clock frequency
in various places in the configuration settings.
* Memory power parameters:
<param name="MEM_RD" value="XXX" />
<param name="MEM_WR" value="YYY" />
<param name="MEM_PRE" value="ZZZ" />
XXX, YYY, and ZZZ are scaled with the core clock rate
to obtain correct memory power characteristics. This
has been acknowledge to be a bug in the simulator and
will be repaired in the next version.
Contents of the repository
=============
* *benchmark_code*
Folder containing CUDA source code made suitable for
the GPGPU-Sim simulator.
* *configurations*
All the GPGPU-Sim and GPUWattch configuration files.
* *results*
Folder containing the graphs as they appear in the
article plus more detailed graphs. It also contains
a processed database extracted from simulation data.
* *simulation_data*
The raw simulation output from GPGPU-Sim and GPUWattch.
* *process.r*
An R-script to process the raw simulation data and
output a database in CSV format (in results folder).
* *graph.r*
An R-script to generate plots based on the database
generated by the process.r script.
* *README*
This file.
###################################################