What is AerialVision

AerialVision is a GPU performance analysis tool. Its main features are a Time Lapse View (Visualizer in version 1.0) where various metrics can be graphed as a function of cycle count, and a Source Code View where various metrics are displayed in correlation to the corresponding lines of PTX or CUDA.

If you use figures generated by AerialVision in your published papers, please cite:

Aaron Ariel, Wilson W. L. Fung, Andrew Turner, Tor M. Aamodt, Visualizing Complex 
Dynamics in Many-Core Accelerator Architectures, In Proceedings of the IEEE International 
Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 164-174, 
White Plains, NY, March 28-30, 2010.

Using AerialVision

To begin using AerialVision, follow the instructions in the GPGPU-Sim README file. Next, follow the instructions in v3.x/README to configure your setup_environment script. Then do the following:

source v3.x/setup_environment
./v3.x/bin/aerialvision.py

This should load up the startup screen as in [ref]

Startup Screen in AerialVision

</figure>

A Walkthrough Example

We first present a walkthrough example of various AerialVision features. In this manual, we have used GPGPU-Sim trace files corresponding to the MUMmerGPU CUDA application. However, users may follow along using whichever application they choose.

We begin by loading the startup screen by typing ‘python aerialvision.py’ into the command line. Once the startup screen loads up, the next step is to submit all of the relevant files.

By default, you should now be looking at the fields inside the File Inputs for Time Lapse View tab. In this tab, we submit files that will enable us to use the time lapse visualizing capabilities of the Time Lapse View. These files are by default in the form gpgpusim_visualizer__*.log.gz. We submit files by clicking the Browse button (if you've submitted the file before you can click on the Recent Files button), and then clicking Add File once the file’s path is in the Add Input File text field. Finally, choose the appropriate resolution and parsing options (described in another section of this manual). [ref] depicts what your screen should look like at this point. Notice that you can submit numerous files for visualizing into this tab; however, for the purposes of this walkthrough we have limited it to one.

File Inputs for Time Lapse View

</figure>

Now click on the File Inputs for Source Code View tab. In this tab we submit files that present statistics corresponding to each line of PTX or CUDA source. Before clicking the Add Files button, it is necessary to insert the file paths to three distinct files required by this part of AerialVision. The file that goes in the Add CUDA Source Code File text field is the appropriate CUDA kernel source code file. This file should end with a *.cu extension. In the case of MUMmerGPU, this file is named mummergpu_kernel.cu (Not to be confused with mummergpu.cu). The file that goes in the Add Corresponding PTX File text field is the appropriate PTX file generated by the CUDA compiler or extracted from the binrary by GPGPU-Sim. In the case of MUMmerGPU, this file is named _1.ptx. Finally, the file that goes in the Add Corresponding Stat File is generated by the GPGPU-Sim and is by default named gpgpu_inst_stats.txt. This file should be located in the same folder as the one you launched GPGPU-Sim from. Once you have filled the three text fields, click the green Add Files button. [ref] depicts what your screen should look like at this point. You can now launch AerialVision by clicking the Submit button at the bottom. It should be noted that for your own purposes, it is not necessary to fill both the File Inputs for Time Lapse View and File Inputs for Source Code View tabs as both parts of AerialVision can be used independently of the other.

File Inputs for Source Code View

</figure>

Once the Submit button has been clicked, AerialVision will parse the input files for the data inside. This process may take a minute to several minutes depending on the length of the trace files submitted to AerialVision. Your screen should now look something like [ref].

Time Lapse View Tab (New Plot Creation)

</figure>

We will now create our first plot using AerialVision. It should be noted that the plots that we produce in this tutorial will obviously differ from the ones that the user produces as we are simulating a different application (unless of course the user is simulating MUMmerGPU as well). For starters, let’s investigate whether off chip memory latency effects performance for our application.

We first need to Choose a File by double clicking on the trace file that we want to extra data from. These files can be found in [ref]. Double clicking on one of the files should turn the appropriate section of the ‘Options Chosen’ list green.

From this file the first thing we'd like to plot is Instruction Per Cycle (IPC). We do this by selecting globalInsn from the list of Y Vars (Table and Table in this manual contains a list of all variables and their descriptions), and then clicking the dy/dx checkbox. We click the dy/dx checkbox because the globalInsn variable contains a running count of instructions executed, and in order to plot IPC we must take the derivative. For this particular plot, we will select ‘Line’ from the options available for Type of Graph (Table and Table depicts the recommended type of plots for all of the available performance metrics).

We now have the option of selecting the number of subplots that we'd like to add. For this particular example, we will choose 2. In order to do this, slide the Add Subplot slider to the 2 position and click Submit.

In the Subplot0 tab we will once again be plotting IPC, only this time on a per shader basis. We do this by first selecting a file as we did earlier. The Y Vars that we need to select this time is named shaderInsn so double click on it and once again check the dy/dx checkbox. This time, the Type of Graph that we are going to select is the Parallel Intensity Plot. You have now chosen all of the necessary options for Subplot0.

In the Subplot1 tab we will be plotting the memory latency. Once again select the appropriate file as we did earlier. The Y Vars that we need to select this time is named averagemflatency. This time do not click on the dy/dx button and choose the Line plot from the Type of Graph options. You may now press the green GraphMe! button. If you have followed this walkthrough correctly, all of the fields in the Option Chosen list should be green. After clicking the green GraphMe! button, your screen should now look something like [ref].

Time Lapse Plots

</figure>

We now see how the latency affects the IPC. The top plot in your screen is the IPC for all of the shaders added up, the middle plot displays the IPC for each individual shader, and lastly the bottom plot displays the average latency per cycle.

We will now demonstrate how to use the Source Code View of AerialVision. After clicking on the Source Code View tab your screen should look like [ref]

Source Code View

</figure>

For this example, we will plot CUDA source line statistics rather than PTX line statistics. Therefore first we must choose the appropriate CUDA source file, do this by clicking the appropriate file under the Cuda C header. This should turn the File: under Chosen Data from red to green.

Next, we will need to choose the appropriate PTX statistic aggregation method from under Available Functions (this is explained in another section of this manual), choose the Sum option, and then choose exposed_latency from the Available Stats list. If you have followed the instruction correctly, all fields under Chosen Data should be green and you are now ready to plot. Click the green Show Data button at the bottom. Your screen should now look something like [ref].

Statistics per line of source code

</figure>

You can now explore which lines of CUDA source are causing the greatest amount of thread stalls due to waiting time on memory requests. Right click on the largest bar to scroll the text to that specific line. In our particular example, after right clicking on the bar corresponding to line 111, the text scrolls down to that line and we see that this large amount of DRAM traffic is being cause by a memory access.

This walkthrough is not intended to be a comprehensive presentation of all AerialVision features. It is simply intended to introduce the user to some of the basic AerialVision functionality. A far more detailed understanding can be gained by reading the rest of this manual.

Detailed Description of AerialVision Features

Startup Screen

In the startup screen, you can select all relevant input files to either time lapse view or source code view of AerialVision by switching between the tabs found in the upper left. To select a file, you may type the path to the input file or click ‘Browse’ for a file selection dialog box to fill in the path. Click ‘Add File’ to add the file to the list of input files.

The time lapse view takes in trace files generated by GPGPU-Sim. By default, the naming scheme for the trace files is: gpgpusim_visualizer__*.log.gz (can be changed with the option –visualizer_outputfile in GPGPU-Sim). The source code viewer requires the user to input a CUDA source file and a PTX source file (generated from the CUDA source file) and a file generated by GPGPU-Sim containing the statistics for each line of PTX source code. The default name for this statistics file is gpgpu_inst_stats.txt (can be changed with the option –ptx_line_stats_filename in GPGPU-Sim).

The user may also select the screen resolution of your desktop (for AerialVision to optimize the widget layout). The two checkboxes at the bottom enable AerialVision to parse CFLog data (a visualization of how threads traverse through a program over time) and convert the parsed data from PTX line number to CUDA source code line numbers (requires a PTX file to be selected in the ‘File Input for Source Code Viewer’ tab). Clicking the ‘Skip CFLOG parsing’ checkbox will decrease the load time of AerialVision however this abovementioned functionality would no longer be available. Clicking the ‘Convert CFLOG to CUDA source line’ checkbox will increase the load time of AerialVision however the user will now be able to visualize this metric with respect to CUDA source line numbers instead of ptx.

Once you have selected all the input files and specified all the options, press the ‘Submit’ button at the bottom and AerialVision will start after it finishes parsing the input files. It should be noted that the more files that the user inputs (and the larger the files) the longer it will take for AerialVision to parse.

Time Lapse View Tab - Creating New Figures

[ref] depicts the Time Lapse View tab in the middle of creating a new figure. This is what you will see by default when AerialVision starts. To create additional plots, click ‘Add Tab’ in the Control Panel. A description of each boxed section is on the next page.

To create a single plot in the new figure, specify the options in regions A, B, and C. Use the widgets in regions D, E, and F to attach addition subplots to the main plot. The text box in region G displays the options chosen in green. Once all the options are set, click ‘GraphMe!’ button in region H to generate the new figure. Here is the description of each labeled region in [ref].

A: A list of loaded traces files. Choose one by double clicking on it. A list of statistics that are available for graphing in the trace file will then appear in the boxes below.

B: A list of statistics available for plotting should now appear under the ‘X Vars’ and ‘Y Vars’ headings. ‘X Vars’ currently only consist of global clock cycle, whereas ‘Y Vars’ consist of different performance counters. Choose one of each by double clicking on them. Use the ‘dy/dx’ button to take the derivative of the y-axis variable before it is graphed.

C: Display the types of plots available for the chosen variables. The following types of plots are supported by this release of AerialVision:

Line Plot – A standard line plot.
The Parallel Intensity Plot – Essentially a color map. This plot is particularly useful for displaying performance counters that are collected for multiple units (e.g. instruction count for each shader core).
Stacked Bar – A standard stacked bar chart that shows the component breakdown of each sample of a performance counter.

D: Use this slider to select the number of subplots that you would like to graph along with the main plot. Press ‘Submit’ when you’ve selected the desired number. The ‘Cancel’ button is used to remove the subplots that have already been specified.

E: Use these tabs to switch between the options available for each subplot.

F: Select the configuration for each subplot here. The options are exactly the same as those that were available for the main plot above (A,B,C).

Important note: In this documentation, we refer to labels (A,B,C) as defining the configuration for the PLOT while label (F) defines the configuration for the SUBPLOTS.

G: This list displays the options you have chosen. A green highlight is representative of an option that has been selected. A red highlight is representative of an option that still needs to be chosen. You will not be able to graph anything until there is no red in this list.

H: click this button to generate the figure. Once there are no red fields in the options chosen list (directly to the above this button),

I: The ‘favourites’ button is useful when there are certain combinations of plots that you use frequently. The button is only enabled after you have selected a trace file for the figure (which unfortunately limits the favourites to a combination of plots from a single trace only). Clicking the button will shows you’re a list of favourites available. Choose to generate a favorite figure by double clicking it. Instructions describing how to save a plot combination to the ‘favourites’ can be found in section 4b.

J: In the control panel there are three buttons:

1. Rem Tab: Deletes the current tab you are viewing
2. Add Tab: Adds a new tab with a new plot. You may define a name for this tab by entering a string of characters in the field beside this button
3. Manage Files: Use this feature to refresh, remove, or add a file to the time lapse view.

K: Use these tabs to switch between Time Lapse View and Source Code View.

L: Exits the program.

Time Lapse View Tab - Customizing an Existing Figure

Time Lapse View Tab with a Generated Figure

</figure>

Buttons

[ref] shows AerialVision with a generated figure. The buttons below the figure canvas provides ways for the user to customize a figure after it is generated. Here is a zoom-in view of the buttons at the bottom right of the window, followed by the description of the functionality of each button: <figure id="fig:timelapse_buttons">

Time Lapse View Tab - Buttons

</figure>

Edit Labels: Clicking the Edit Labels button opens up the above dialog box. The user can modify the axis label, the title (color map label for parallel intensity plots as they have no title), their font sizes, and the font sizes of the axises tick labels.

Time Lapse View Tab - Edit Labels

</figure>

Change Binning (for Parallel Intensity Plots only): Clicking the Change Binning button opens up the above dialog box. The user can use this dialog box to increase or decrease the density of y-axis or x-axis tick labels (for Parallel Intensity Plots only). First, select the plot that you would like to edit from the list on the left. Then, click on the appropriate button on the right to perform the modification. Click Increase Binning and Decrease Binning to increase or decrease the density of the tick labels. Click Remove Binning to remove all the tick labels completely from the specific axis of the plot.

Time Lapse View Tab - Change Beginning

</figure>

Change Colormap Max/Min: Clicking the Change Colormap Max/Min button opens up the above dialog box. The user can use this dialog box to specify the max/min value that maps to the extremas of the color map in each Parallel Intensity Plot in the figure. This feature is useful if the user would like to increase the color resolution over a particular range of data. The dropping lists on the right allow the user to specify the color map for each Parallel Intensity Plot. Click Submit Changes to apply the new color mapping settings. Or, the user can normalize the color mapping among all the plots in a figure (so the the same value maps to the same color between different plots) with the Normalize all Subplots button. By default, the AerialVision sets the extremas of the color map to be the max/min value of the plotted data (regardless of the previous settings). So, to return to the original color mapping, simply submit the default values.'

Time Lapse View Tab - Change Colormap Max/Min

</figure>

Refresh Input Files: The Refresh Input Files button reloads the plotted data of a figure from the trace files and redraws the whole figure to reflect any changes in the data. This allows the user to reload trace files into the AerialVision as the GPGPU-Sim writes to them, visualizing the behaviour of a benchmark run even before the simulation finishes. Note: Only the trace files that are currently being plotted will be reloaded using this feature. Instead, use the Manage Files feature described in another section of this documentation.

Add to Favorites: Use this feature to save the configuration of commonly used plot combinations for quick reuse at a later time (potentially with a different trace file). Fill in the two entry fields in the above dialog box to give this combination of plots a title and a description, then click ‘Submit’ to save the configuration as a favourite that can be recalled with the ‘Favourite’ button.

Time Lapse View Tab - Add to Favorites

</figure>

Dy/Dx: Use this feature to take the derivative of the data in a plot after the figure is generated. In the dialog box (shown above), check the particular plot with data that needs to be differentiated and re-plotted and click Submit Changes.

Time Lapse View Tab - Dy/Dx

</figure>

Current Constraints When Customizing an Existing Feature

When modifying an existing figure, it is necessary to make changes to the figure in the following order:

If desired, modify the colormap configuration first.
If desired, modify the binning configuration second.
If desired, modify the edit labels configuration last.

The reason for this requirement is that when modifying one of the features out of order, making a change may set the configuration of a different feature to its default value. Modifying your figure in the order specified will ensure that the process of customizing an existing feature will go as smoothly as possible. This constraint will be resolved in a future release of AerialVision.

Navigation Toolbar

The following image depicts the navigation tool bar at the bottom-left corner of the figure canvas once the figure has been generated. The functionality of each button is outlined in this section. <figure id="fig:timelapse_nav">

Time Lapse View Tab -Navigation Toolbar

</figure>

Home: Use this button to return all plots in the figure to their default views.

Go Back/Go Forward: Use this button to go back/forward to the last view of the most recently changed plot.

Pan: To pan around a particular plot, click the button to switch to pan mode, then hold the left-mouse button down on the plot and drag in the direction you would like to view.

Zoom: To zoom-in on a particular plot, click this button to switch to zoom mode and then make a box around the area with the left-mouse button that you would like to enlarge. Use the right-mouse button for zoom-out.

Change Spacing :Clicking this button will show the following dialog box. Use the sliders to adjust the spacing for all plots in the current figure.
- Left: Adjust the distance from the left border of the figure canvas.
- Bottom: Adjust the distance from the bottom border of the figure canvas.
- Right: Adjust the distance from the right border of the figure canvas.
- Top: Adjust the distance from the top border of the figure canvas.
- Wspace: Currently has no effect.
- Hspace: Adjust the spacing between plots.

Save: Save the current figure as an image. Various extensions are available.

Manage Files

The following dialog box shows up when the Manage Files button is clicked. It can be used to manage (Add, Remove, Refresh) the current set of trace files that are available for visualization in AerialVision. All changes are added to the list on the bottom and will not take effect until the Submit Changes button is clicked. If you are unsatisfied with the list of changes created, simply click the Omit Changes button to discard all.

Manage Files

</figure>

Source Code View Tab - Performance Metric Selection

[ref] depicts the startup configuration for the ‘Source Code View’ tab. The user can choose the source code file to be viewed and the performance metric to be displayed along with it. A description of each labeled region is included below.

Source Code Viewer (Performance Metric Selection)

</figure>

A: Select either a PTX or CUDA C++ source file to be viewed with the source code viewer.

B: In GPGPU-Sim, performance metrics are collected per line of PTX and then mapped to the appropriate line of CUDA source code. Therefore the user needs to specify how the data per line of PTX should be combined to form the data for a line of CUDA source code. For instance, the execution count for each line of CUDA source code should be the maximum execution count among its corresponding set of PTX source code. On the other hand, the total dram traffic generated by each line of CUDA source code should be the sum of the dram traffic generated by its corresponding set of PTX source code. In the current release of AerialVision, the ‘ratio’ option takes the ratio of the ‘sum’ of the two performance metrics chosen. A future release of AerialVision will provide the flexibility of choosing whether the ‘max’ or ‘sum’ should be used for a performance metric when calculating the ratios. Table 1 depicts the recommended configurations for the available performance metrics. Table 2 depicts a few example ratios that may be insightful to the user.

C: Choose performance metrics to be displayed along side with the selected source code.

D: This list displays which options you have chosen. A green highlight is representative of an option that has been selected. A red highlight is representative of an option that still needs to be chosen. You will not be able to plot anything until there is no red in this list.

Recommended Statistic Amalgamation Method for Performance Metrics
Performance Metric	Recommended Operation on Particular Statistic	GPGPU-Sim v3.x Support
Count	Max	Yes
Latency	Sum	Yes
Dram_Traffic	Sum	Yes
Smem_bk_conflicts	Sum	Yes
Smem_warp	Sum	Yes
Gmem_access_generated	Sum	Yes
Gmem_warp	Sum	Yes
Exposed_latency	Sum	No

Source Code View Tab - Navicating Through Source Code with Performance Metrics

[ref] shows the Source Code Viewer tab with a CUDA source file opened. The user can view the source code (labeled region B) along side with the chosen performance metric (labeled region A). The viewer includes a navigation graph that plots performance metric among each line of CUDA source code (labeled region C). The user can quickly traverse to the line of source code with the data of interest by right-clicking on the graph. The navigation graph also has a tool bar (labeled region D) that allows the user to customize the looks of the graph as it does with figures in the Time Lapse View tab.

Source Code Viewer

</figure>

Description of Available Performance Metrics

The current release of AerialVision contains numerous performance metrics that can be plotted in a variety of ways. We aim here to give the user a comprehensive understanding of each of the performance metrics. A deeper understanding of the available performance metrics should greatly increase the efficiency of the user and consequently speed up the analytical process. Upon loading the program, several principal performance metrics should be plotted first. These principal performance metrics illustrate the very top-level behavior of the benchmark (i.e. whether the benchmark was memory bound or not). A list of these top-level performance metrics can be found in the table below. A description of each of them can be found directly thereafter. There are many other performance metrics in AerialVision. Table 4 contains a list of these performance metrics, followed by a description of each of them. It is important to note that in the current release of AerialVision, all variables are plotted as a function of cycle count.

General Behavior Performance Metrics. dy/dx = Select dy/dx for this metric. PlPlot = View this metric with Parallel Intensity Plot
General Behavior Performance Metrics	dy/dx	PlPlot	GPGPU-Sim v3.x Support	Description
averagemflatency	No	No	Yes	The average amount of cycles that threads waiting for off-chip memory requests have been stalled for
dramUtil	No	Yes	Yes	The percent of the full capacity that each DRAM channel is being utilized
globalInsn	Yes	No	Yes	A running count of the total number of instructions that have executed. Taking the ‘dy/dx’ of this variable (described in previous sections of this manual) results in the more commonly known instructions per cycle metric
Ldmemlatdist	na	na	No	A breakdown of memory read access (load) latency to reflect the amount of time that the memory accesses are spending on each part of the GPU microarchitecture during the sampling period. Here is the description of each component in the breakdown: FQPUSHED: Time spent in the memory fetch queue (currently unused) ICNT_PUSHED: Time spent in interconnect buffer to DRAM ICNT_INJECTED: Time spent on traversing inside the interconnect to DRAM ICNT_AT_DEST: Time spent waiting inside the interconnect output buffer DRAMQ: Time spent inside DRAM controller to have the access processed DRAM_OUTQ: Time spent at the DRAM output queue (currently unused) 2SH_ICNT_PUSHED: Time spent in the interconnect buffer to the Shader Cores 2SH_ICNT_INJECTED: Time spent on traversing inside the interconnect back to the Shader Cores 2SH_ICNT_AT_DEST: Time spent waiting inside the interconnection output buffer 2SH_FQ_POP: Time spent at the memory interface into the Shader Cores RETURN_Q: Time spent at the return queue to writeback stage inside a Shader Core WRITEBACK: Time spent waiting for the writeback to register file (always 0)
shaderInsn	Yes	Yes	Yes	Corresponds to a running count of the total number of instructions that have executed in each shader. This performance metric is best viewed using the ‘Parallel Instensity Plot’ which improves the clarity of visualization for this metric significantly. Taking the ‘dy/dx’ of this variable (described in previous sections of this manual) results in the more commonly known instruction per cycle metric, viewed on a per shader basis.
STmemlatdist	na	na	No	A breakdown of memory write access (store) latency to reflect the amount of time that the memory accesses are spending on each part of the GPU microarchitecture during the sampling period. See Ldmemlatdist for the description of each component in the breakdown
WarpDivergenceBreakdown	na	na	Yes	A breakdown of the number of warps warp issued for execution during the sampling period according to the number of active threads in the warp. For example, component W1:4 includes all warps with one to four active threads. Category W0 denotes the idle cycles in each SM when all the warps in the SM are waiting for data from off-chip memory; whereas category Fetch Stalled denotes the cycles when the fetch stage of an SM stalls, preventing any warp from being issued that cycle.

Other Performance Metrics dy/dx = Select dy/dx for this metric. PI Plot = View this metric with Parallel Intensity Plot
Other Performance Metric	dy/dx	PlPlot	GPGPU-Sim v3.x Support	Description
cacheMissRate_constL1_all	Yes	Yes	No	Cache miss rate of the L1 constant cache in each shader core. All accesses that miss the cache are accounted.
cacheMissRate_constL1_noMgHt	Yes	Yes	No	Cache miss rate of the L1 constant cache in each shader core. This cache miss rate discards any misses that are merged into an in-flight access and therefore not generating extra memory accesses.
cacheMissRate_globalL1_all	Yes	Yes	No	Cache miss rate of the L1 data cache (serves global and local memory space) in each shader core. All accesses that miss the cache are accounted.
cacheMissRate_globalL1_noMgHt	Yes	Yes	No	Cache miss rate of the L1 data cache in each shader core. This cache miss rate discards any misses that are merged into an in-flight access and therefore not generating extra memory accesses.
cacheMissRate_textureL1_all	Yes	Yes	No	Cache miss rate of the L1 texture cache in each shader core. All accesses that miss the cache are accounted.
cacheMissRate_textureL1_noMgHt	Yes	Yes	No	Cache miss rate of the L1 texture cache in each shader core. This cache miss rate discards any misses that are merged into an in-flight access and therefore not generating extra memory accesses.
CFLOG	No	Yes	Yes	All performance metrics that begin with ‘CFLOG’ correspond to visualizing how many threads are executing each PTX instruction or line of CUDA source. This variable can be used to associate observed performance behavior with lines of code.
dramAveMRQS	No	Yes	Yes	The average number of requests inside the memory controller of each DRAM channel. This metric is best visualized using the parallel intensity map.
dramCMD	No	Yes	Yes	The maximum number of command the memory controller of each DRAM channel can send in each sampling period.
dramconst_acc_r	Yes	Yes	Yes	The number of memory read accesses sent to each DRAM channel that are generated by access to constant memory space. This metric is best visualized using the parallel intensity map. When plotting using the parallel intensity plot, the ticks on the y-axis are in the form <#1.#2>. The first number corresponds to the particular DRAM channel and the second number corresponds to the particular memory bank in that DRAM channel.
dramEff	No	Yes	Yes	The percent of the full capacity of each DRAM channel is utilized when there is a pending request at the DRAM channel.
dramglobal_acc_r	Yes	Yes	Yes	The number of memory read accesses sent to each DRAM channel that are generated by access to global memory space.
dramglobal_acc_w	Yes	Yes	Yes	The number of memory write accesses sent to each DRAM channel that are generated by access to global memory space.
dramlocal_acc_r	Yes	Yes	Yes	The number of memory read accesses sent to each DRAM channel that are generated by access to local memory space.
dramlocal_acc_w	Yes	Yes	Yes	The number of memory write accesses sent to each DRAM channel that are generated by access to local memory space.
dramNACT	No	Yes	Yes	The total number of row activation command send by the memory controller of each DRAM channel in each sampling period.
dramNOP	No	Yes	Yes	The total number of NOP command send by the memory controller of each DRAM channel in each sampling period.
dramNPRE	No	Yes	Yes	The total number of precharge command send by the memory controller of each DRAM channel in each sampling period.
dramNREQ	No	Yes	Yes	The total number of read/write requests command send by the memory controller of each DRAM channel in each sampling period.
dramtexture_acc_r	Yes	Yes	Yes	The number of memory read accesses sent to each DRAM channel that are generated by access to texture memory space.
globalCompletedThreads	Yes	No	No	The total number of threads that have finished executing.
globalProcessedWrites	Yes	No	No	The total number of memory writes processed by the DRAM memory subsystem.
globalSentWrites	Yes	No	No	The total number of memory writes sent by the shader cores.
gpu_stall_by_MSHRwb	Yes	No	No	The total number of pipeline stalls caused by yielding to MSHR writebacks (i.e. writing data from DRAM to register file).
L1ConstMiss	Yes	No	No	The total number of L1 constant cache misses across all shader cores.
L1ReadMiss	Yes	No	No	The total number of L1 data cache (serves global and local memory space) read misses across all shader cores.
L1TextMiss	Yes	No	No	The total number of L1 texture cache misses across all shader cores.
L1WriteMiss	Yes	No	No	The total number of L1 data cache (serves global and local memory space) write misses across all shader cores.
L2ReadHit	Yes	No	No	The total number of read hits across all L2 cache banks.
L2ReadMiss	Yes	No	No	The total number of read misses across all L2 cache banks.
L2WriteHit	Yes	No	No	The total number of write hits across all L2 cache banks.
L2WriteMiss	Yes	No	No	The total number of write misses across all L2 cache banks.
shaderWarpDiv	Yes	Yes	No	The total number of warp divergence occurred in each shader core.
shdrctacount	No	Yes	Yes	The total number of CTA’s assigned to each shader core.

Extending AerialVision - Adding Variables

Open the file lexyacc.py.

Inside the function parseMe(), there is a dictionary that is initialized with a long list of <variablename> : vc.variable(<#1>,<#2>,<#3>,<#4>).

Here, <variablename> will be the python object corresponding to the new data that you are entering into the AerialVision’s Time Lapse View. This object is initialized with two class variables that are explained below.

<#1>: The metric name that appears in the visualizer log generated by GPGPU-Sim (see visualizer.cc).
<#2>: This number can currently be between 1 and 4 and in order to make these instruction easier to read we'll call this number the variable "type"

1 = The variable should be a plotted with one 'y' value for every 'x' value (X0,Y0).

2 = The variable should be a plotted with multiple 'y' values for every 'x' value (X0,Y0,Y1,Y2, Y3...YN).

3 = The variable should be a plotted with a stacked bar plot (i.e. it is a breakdown of an arbitrary metric).

4 = The variable should be a plotted with multiple 'y' values for every 'x' value (X0,Y0,Y1,Y2, Y3...YN). The difference from 2 is that the ‘y’ values are organized in 2 levels of hierarchies (We use this to represent banks within a DRAM channel).

<#3>: This number is a Boolean:

1 = Variable resets at the start of every kernel.

0 = Variable does not reset at kernel boundaries.

<#4>: How this data is organized in the visualizer log:

scalar = Each sample contains a single value.

impVec = Each sample contains an array of values for a set of units.

stackbar = Each sample contains an array of values representing a breakdown of a metric.

idxVec = Each sample contains a unit ID with a single value

idx2DVec = Each sample contains a 2D unit ID with a single value

AerialVision Manual

Contents

What is AerialVision

Using AerialVision

A Walkthrough Example