Computer Architecture
For this assignment, the simplest data structure for our purpose is a two-dimensional array whose size must be declared large enough to encompass the largest potential cache size. The program declares a 4096 x 4096 maximum cache size which is equivalent to 16 mebibytes.
Program Design
The provided program is written in C because it compiles directly to native executable files and will typically provide more accurate results on most operating systems. You may entertain the option of adapting the sample program to another language. Java programs compile to bytecode which are interpreted by the JVM (Java Virtual Machine) and this additional overhead of execution may affect the timings that are collected. Python is similar to Java in that it is an interpreted language but it is possible to produce an executable file using an add-on utility (your option). If you wish, you may try to rewrite the program in a language that is most convenient for you.
The output of the program is directed to a text file so you have a record of the timing information your program generates. The output is also printed on the monitor so that you can follow the program’s execution. The format of the output is the size of the cache, the stride and the time for the read/write of the array on each cache size/stride increment. The program is set up to output the data into a comma separated text file where each row represents a cache size, each column a stride value. This format makes it possible to import the data into Excel so that the analysis part of this assignment somewhat easier.
Program Structure
There are two nested loops: the outer loop increments through the cache sizes (from 1K – 16M) and the inner loop increments through the strides for each cache size (1 – cache size/2). Within the inner loop are two do loops. The first performs repeated read/writes to the matrix. The second repeats the loop without access to the matrix to capture the overhead of the loop. The difference between the two times provides the data access times which are averaged over the number of accesses per stride. This is represented by the variable loadtime in the program.
This program takes a long time to run because it constantly loops on each cache size and stride for 20 seconds. Even on a fast computer, the run time can be more than 1.5 hours. So, you need to allow enough time for the program to complete execution.
Observations and Analysis (What To Do For This Assignment)
Run the program on a computer system to generate a complete sequence of memory access timings. Once the program has completed, you will need to analyze the results. Using Excel or a comparable spreadsheet, you can import your data and then create graphs to show your data. A sample graph, as presented in the textbook on page 152, is shown below. You can use the graph you create using your own results data as a reference for your analysis and conclusions.
Review the results and see if you can use the results to answer the following questions:
At what cache size and stride level do significant changes in access times occur?
Do these timing changes correlate to typical cache sizes or changes in stride?
Is it possible to determine the cache sizes of the different levels based on the produced data?
What in your data doesn’t make sense? What questions arise from this data?
Compare your data against the actual cache information for the system. For Windows-based systems, there is a freeware product called CPU-Z which will report detailed CPU information including cache. On Unix & Linux systems, /prod/cpuinfo or lspci will provide similar information. MacCPUID is a tool used for displaying detailed information about the microprocessor in a Mac computer. Both CPU-Z and MacCPUID are free and can be downloaded from the Internet. You may also refer to the specifications for the processor which are published online.
If time allows, run the program on a second, different system and compare the results. Are they similar or different? How are they different?