(Created page with "Publication from ADAPT'14 workshop. Notes on reproducibility:")
 
 
Line 1: Line 1:
 
Publication from ADAPT'14 workshop.
 
Publication from ADAPT'14 workshop.
  
Notes on reproducibility:
+
'''Notes on reproducibility:'''
 +
 
 +
<br/>Roofline-aware DVFS for GPUs<br/>=============<br/><br/>Date: 16-Oct-2013<br/><br/>Author: Cedric Nugteren (http://www.cedricnugteren.nl)<br/><br/>Description: This repository is an online appendix to the<br/>scientific article "Roofline-aware DVFS for GPUs"<br/><br/><br/>Benchmarks<br/>=============<br/><br/>Three types of CUDA benchmarks are tested:<br/>*&nbsp;&nbsp; &nbsp;Benchmarks from PolyBench/GPU<br/>*&nbsp;&nbsp; &nbsp;Benchmarks from Parboil (requires Parboil datasets<br/>&nbsp;&nbsp; &nbsp;to be installed in ~/software/parboil-2.5/datasets/)<br/>*&nbsp;&nbsp; &nbsp;Two artificial micro-benchmarks<br/><br/><br/>Experimental setup<br/>=============<br/><br/>GPGPU-Sim version 3.2.1 + GPUWattch<br/><br/>(commit 72aaaf6b11b38121d946469f26d85315ff794f29)<br/><br/>Configuration for GPGPU-Sim<br/>-------------<br/><br/>*&nbsp;&nbsp; &nbsp;Clock frequencies:<br/><br/>&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;-gpgpu_clock_domains XXX:YYY:XXX:ZZZ<br/><br/>&nbsp;&nbsp; &nbsp;XXX is the halved core frequency (600-500-400-300).<br/>&nbsp;&nbsp; &nbsp;YYY is the full core frequency (1200-1000-800-600).<br/>&nbsp;&nbsp; &nbsp;ZZZ is the memory frequency (900-750-600-450).<br/><br/>*&nbsp;&nbsp; &nbsp;DRAM latencies:<br/><br/>&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;-dram_latency XXX<br/><br/>&nbsp;&nbsp; &nbsp;XXX is the DRAM latency is core clock cycles, reduced<br/>&nbsp;&nbsp; &nbsp;when scaling the core frequency to keep the latency<br/>&nbsp;&nbsp; &nbsp;(in seconds) constant (100-83-76-50).<br/><br/>Configuration for GPUWattch<br/>-------------<br/><br/>*&nbsp;&nbsp; &nbsp;Memory configuration:<br/><br/>&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&lt;param name="mc_clock" value="XXX"/&gt;<br/>&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&lt;param name="peak_transfer_rate" value="YYY"/&gt;<br/><br/>&nbsp;&nbsp; &nbsp;XXX is the doubled memory clock or the halved effective<br/>&nbsp;&nbsp; &nbsp;clock (1800-1500-1200-900). YYY is the bandwidth per<br/>&nbsp;&nbsp; &nbsp;memory controller (28800-24000-19200-14400).<br/><br/>*&nbsp;&nbsp; &nbsp;Clock frequencies:<br/><br/>&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&lt;param name="target_core_clockrate" value="XXX"/&gt;<br/>&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&lt;param name="clockrate" value="XXX"/&gt;<br/>&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&lt;param name="NOC_A" value="XXX" /&gt;<br/><br/>&nbsp;&nbsp; &nbsp;XXX is either the halved or full core clock frequency<br/>&nbsp;&nbsp; &nbsp;in various places in the configuration settings.<br/><br/>*&nbsp;&nbsp; &nbsp;Memory power parameters:<br/><br/>&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&lt;param name="MEM_RD" value="XXX" /&gt;<br/>&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&lt;param name="MEM_WR" value="YYY" /&gt;<br/>&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&lt;param name="MEM_PRE" value="ZZZ" /&gt;<br/><br/>&nbsp;&nbsp; &nbsp;XXX, YYY, and ZZZ are scaled with the core clock rate<br/>&nbsp;&nbsp; &nbsp;to obtain correct memory power characteristics. This<br/>&nbsp;&nbsp; &nbsp;has been acknowledge to be a bug in the simulator and<br/>&nbsp;&nbsp; &nbsp;will be repaired in the next version.<br/><br/><br/>Contents of the repository<br/>=============<br/><br/>*&nbsp;&nbsp; &nbsp;*benchmark_code*<br/><br/>&nbsp;&nbsp; &nbsp;Folder containing CUDA source code made suitable for<br/>&nbsp;&nbsp; &nbsp;the GPGPU-Sim simulator.<br/><br/>*&nbsp;&nbsp; &nbsp;*configurations*<br/><br/>&nbsp;&nbsp; &nbsp;All the GPGPU-Sim and GPUWattch configuration files.<br/><br/>*&nbsp;&nbsp; &nbsp;*results*<br/><br/>&nbsp;&nbsp; &nbsp;Folder containing the graphs as they appear in the<br/>&nbsp;&nbsp; &nbsp;article plus more detailed graphs. It also contains<br/>&nbsp;&nbsp; &nbsp;a processed database extracted from simulation data.<br/><br/>*&nbsp;&nbsp; &nbsp;*simulation_data*<br/><br/>&nbsp;&nbsp; &nbsp;The raw simulation output from GPGPU-Sim and GPUWattch.<br/><br/>*&nbsp;&nbsp; &nbsp;*process.r*<br/><br/>&nbsp;&nbsp; &nbsp;An R-script to process the raw simulation data and<br/>&nbsp;&nbsp; &nbsp;output a database in CSV format (in results folder).<br/><br/>*&nbsp;&nbsp; &nbsp;*graph.r*<br/><br/>&nbsp;&nbsp; &nbsp;An R-script to generate plots based on the database<br/>&nbsp;&nbsp; &nbsp;generated by the process.r script.<br/><br/>*&nbsp;&nbsp; &nbsp;*README*<br/><br/>&nbsp;&nbsp; &nbsp;This file.<br/><br/>###################################################<br/>

Latest revision as of 09:10, 24 March 2014

Publication from ADAPT'14 workshop.

Notes on reproducibility:


Roofline-aware DVFS for GPUs
=============

Date: 16-Oct-2013

Author: Cedric Nugteren (http://www.cedricnugteren.nl)

Description: This repository is an online appendix to the
scientific article "Roofline-aware DVFS for GPUs"


Benchmarks
=============

Three types of CUDA benchmarks are tested:
*    Benchmarks from PolyBench/GPU
*    Benchmarks from Parboil (requires Parboil datasets
    to be installed in ~/software/parboil-2.5/datasets/)
*    Two artificial micro-benchmarks


Experimental setup
=============

GPGPU-Sim version 3.2.1 + GPUWattch

(commit 72aaaf6b11b38121d946469f26d85315ff794f29)

Configuration for GPGPU-Sim
-------------

*    Clock frequencies:

        -gpgpu_clock_domains XXX:YYY:XXX:ZZZ

    XXX is the halved core frequency (600-500-400-300).
    YYY is the full core frequency (1200-1000-800-600).
    ZZZ is the memory frequency (900-750-600-450).

*    DRAM latencies:

        -dram_latency XXX

    XXX is the DRAM latency is core clock cycles, reduced
    when scaling the core frequency to keep the latency
    (in seconds) constant (100-83-76-50).

Configuration for GPUWattch
-------------

*    Memory configuration:

        <param name="mc_clock" value="XXX"/>
        <param name="peak_transfer_rate" value="YYY"/>

    XXX is the doubled memory clock or the halved effective
    clock (1800-1500-1200-900). YYY is the bandwidth per
    memory controller (28800-24000-19200-14400).

*    Clock frequencies:

        <param name="target_core_clockrate" value="XXX"/>
        <param name="clockrate" value="XXX"/>
        <param name="NOC_A" value="XXX" />

    XXX is either the halved or full core clock frequency
    in various places in the configuration settings.

*    Memory power parameters:

        <param name="MEM_RD" value="XXX" />
        <param name="MEM_WR" value="YYY" />
        <param name="MEM_PRE" value="ZZZ" />

    XXX, YYY, and ZZZ are scaled with the core clock rate
    to obtain correct memory power characteristics. This
    has been acknowledge to be a bug in the simulator and
    will be repaired in the next version.


Contents of the repository
=============

*    *benchmark_code*

    Folder containing CUDA source code made suitable for
    the GPGPU-Sim simulator.

*    *configurations*

    All the GPGPU-Sim and GPUWattch configuration files.

*    *results*

    Folder containing the graphs as they appear in the
    article plus more detailed graphs. It also contains
    a processed database extracted from simulation data.

*    *simulation_data*

    The raw simulation output from GPGPU-Sim and GPUWattch.

*    *process.r*

    An R-script to process the raw simulation data and
    output a database in CSV format (in results folder).

*    *graph.r*

    An R-script to generate plots based on the database
    generated by the process.r script.

*    *README*

    This file.

###################################################


(C) 2011-2014 cTuning foundation