| Index |
Performance Counters |
Description |
| 1 |
I$-hits (IC_H) |
Counts the number of instruction cache hits, counted per warp instruction |
| 2 |
I$-misses (IC_M) |
Counts the number of instruction cache misses, counted per warp instruction |
| 3 |
D$-read hits (DC_RH) |
Counts the number of data cache read hits, counted per memory access |
| 4 |
D$-read misses (DC_RM) |
Counts the number of data cache read misses, counted per memory access |
| 5 |
D$-write hits (DC_WH) |
Counts the number of data cache write hits, counted per memory access |
| 6 |
D$-write misses (DC_WM) |
Counts the number of data cache write misses, counted per memory access |
| 7 |
T$-hits (TC_H) |
Counts the number of texture cache hits, counted per memory access |
| 8 |
T$-misses (TC_M) |
Counts the number of texture cache misses, counted per memroy access |
| 9 |
C$-hits (CC_H) |
Counts the number of constant cache hits, counted per memory access |
| 10 |
C$-misses (CC_M) |
Counts the number of constant cache misses, counted per memory access |
| 11 |
Shared Memory Accesses (SHRD_ACC) |
Counts the number of shared memory accesses, counted per memory access |
| 12 |
Register File Reads (REG_R) |
Counts the number of register file reads in all instructions, counted per thread |
| 13 |
Register File Writes (REG_W) |
Counts the number of register file writes in all instructions, counted per thread |
| 14 |
Non Register File Operands (NON_REG_OPs) |
Counts the number of non register file operands (e.g. immediate), counted per thread |
| 15 |
SFU accesses (SFU_ACC) |
Counts the all instructions that exercise SFU pipeline (it also includes multiplications/division), counted per thread |
| 16 |
SP accesses (SP_ACC) |
Counts the all instructions that exercise SP pipeline with integer operands, counted per thread |
| 17 |
FPU accesses (FPU_ACC) |
Counts the all instructions that exercise SFU pipeline with floating-point operands, counted per thread |
| 18 |
Total number of instructions (TOT_INST) |
Counts the all decoded instructions, counted per warp |
| 19 |
W/O operand instructions (FP_INT) |
Counts the all instructions without operands (e.g., call/return) |
| 20 |
DRAM reads (DRAM_RD) |
Counts the dram read accesses, counted per memory access |
| 21 |
DRAM writes (DRAM_WR) |
Counts the dram writes accesses, counted per memory access |
| 22 |
DRAM precharge (DRAM_PRE) |
Counts the dram precharges accesses, counted per memory access |
| 23 |
L2$-read hits (L2_RH) |
Counts the number of L2 data cache read hits, counted per memory access |
| 24 |
L2$-read misses (L2_RM) |
Counts the number of L2 data cache read misses, counted per memory access |
| 25 |
L2$-write hits (L2_WH) |
Counts the number of L2 data cache write hits, counted per memory access |
| 26 |
L2$-write misses (L2_WM) |
Counts the number of L2 data cache write misses, counted per memory access |
| 27 |
Pipeline Duty Cycle (PIPE) |
Ratio of committed number of instructions to the maximum peak of committed instructions, counted per thread |
| 28 |
Interconnect flit SIMT-to-Mem (NOC_A) |
Counts the number of flits traveling from SIMT cluster to memory partition |
| 29 |
Interconnect flit Mem-to-SIMT (NOC_A) |
Counts the number of flits traveling from memory partition to SIMT cluster |
| 30 |
Idle Core (IDLE_CORE_N) |
Counts the average number of idle cores over cycles of each sample |
| Index |
GPGPUsim Configuration Option |
Description |
| 1 |
power_simulation_enabled |
Enable the power model simulator; if enabled, an output file is genarated to include the detailed Power coefficients for the simulated configuration and the Average/Maximum/Minimum total power breakdowns for each kernel |
| 2 |
gpuwattch_xml_file |
The GPUWattch (McPAT) XML configuration file name; by default it is gpuwattch.xml. For the GTX480 configuration in gpgpusim.config, this is set to gpuwattch_gtx480.xml |
| 3 |
gpu_stat_sample_frequency |
Determines the sampling frequency (in number of GPGPU-Sim core cycles) used in the power calculations, the performance counters are reset before each samples and accumulated during the sampling period, and finaly passed to the power model (McPAT) at the end of each sample |
| 4 |
power_trace_enabled |
If enabled, it produces two output files that details the power breakdown values, and the accumulative performance counters values for each sample |
| 5 |
power_per_cycle_dump |
Dump detailed power data each sample |
| 6 |
steady_power_levels_enabled |
If enabled, it tracks the steady state power level throughout the execution and report the start/end values with the average power recorded for each component. The steady state is determined by (-steady_state_definition) option |
| 7 |
steady_state_definition |
Takes two values. First value detemines the allowed deviation within the steady state and the second value determines minimum number of samples required to assume this is a steady state power level |
| Index |
Output File name |
Configuration |
Description |
| 1 |
gpgpusim_power_report_(date&time).log |
-power_simulation_enabled 1 |
Includes the detailed power coefficients for this configuration and the Average/Maximum/Minimum total power and their breakdowns for the different components for each kernel |
| 2 |
gpgpusim_power_trace_(date&time).log.gz |
-power_trace_enabled 1 |
A compressed file that has a detailed average power breakdown trace in a comma separated format |
| 3 |
gpgpusim_metric_trace_(date&time).log.gz |
-power_trace_enabled 1 |
A compressed file that has a detailed performance counters trace in a comma separated format |
| 4 |
gpgpusim_steady_state_tracking_report_(date&time).log.gz |
-steady_power_levels_enabled 1 |
It reports the steady state power level throughout the execution with the start/end values of each interval and the average power recorded for each component during this interval in a comma separated format |
| Index |
File Name |
Directory |
Description |
| 1 |
power_stat.cc/h |
src/gpgpu-sim/ |
These files contain the main structures used for recording GPGPU-Sim performance counters: power_core_stat_t (for all core related counters) and power_mem_stat_t (for all memory related counters), which are contained in the wrapper power_stat_t object. The core and mem stat structures contain multiple counter pointer arrays with 2 locations per counter (e.g. unsigned *m_counter[2]): [0] -> pointer to counter with the current value, [1] -> previous sampled value. The difference, [0]-[1], is used to get the per-sample estimated power in McPAT. |
| 2 |
gpgpu_sim_wrapper.cc/h |
src/gpuwattch/ |
These files contain the gpgpu_sim_wrapper class that contains all of the McPAT structures (such as Processor, ParseXML, etc), manages the power output files, and passes the GPGPU-Sim performance counters (described in power_stat.cc/h (1)) into McPAT. The gpgpu_sim_wrapper structure is used in power_interface.cc/h (3) to separate the McPAT structures and interface from GPGPU-Sim. |
| 3 |
power_interface.cc/h |
src/gpgpu-sim/ |
These files are used to interface GPGPU-Sim with McPAT via two main functions: init_mcpat() and mcpat_cycle() . init_mcpat() is called from gpgpu_sim::init() in gpu-sim.cc and through the gpgpu_sim_wrapper object, initializes all of the power related structures in GPGPU-Sim and McPAT. Similarly, mcpat_cycle() is called from gpgpu_sim::cycle() in gpu-sim.cc, which passes all of the performance counters to McPAT (through the gpgpu_sim_wrapper object). |
| 4 |
gpgpu_sim.verify |
src/gpuwattch/ |
This file is distributed with our modified version of McPAT to ensure the correct McPAT version is used with GPGPU-Sim. |