You will implement functions which operate on matrices and vectors – for namely two forms of matrix multiplication. i) Dense generalized matrix multiplication (GEMV) and ii) Sparse vector – dense vector matrix multiplication dot (SDOT). Both of these operations are critical in a number of application domains
Note: Although the spec is quite long, please make sure to read through the whole thing as it contains a lot of important information about the functions provided, running Venus, and testing your code.
wget https://www2.cs.sfu.ca/~alaa/courses/cmpt295/fall2022/assets/distrib/Venus/jvm/venus.jar
# We can also encourage you to use the online version.
# Do not check it in under any circumstance or you may fail the travis test.
# https://www2.cs.sfu.ca/~alaa/courses/cmpt295/fall2022/assets/distrib/Venus/
In C dense matrices are stored as a one-dimensional vector in row-major order. One way to think about it is that we can create a 1D vector from a 2D matrix by concatenating together all the rows in the matrix. Alternatively, we could concatenate all the columns together instead, which is known as column-major order.
For a more in-depth look at row-major vs. column-major order, see this Wikipedia page.
The stride of a vector is the number of memory locations between consecutive elements of our vector, measured in the size of our vector’s elements. If our stride is n, then the memory addresses of our vector elements are n * sizeof(element) bytes apart.
So far, all the arrays/vectors we’ve worked with have had stride 1, meaning there is no gap betwen consecutive elements. Now, to do the row * column
dot products with our row-major matrices that we’ll need for matrix multiplication, we will need to consider vectors with varying strides. Specifically, we’ll need to do this when considering a column vector in a flattened, row-major representation of a 2D matrix
Let’s take a look at a practical example. We have the vector int *a
with 3 elements.
*(a)
, *(a + 1)
, and *(a + 2)
, in other words a[0]
, a[1]
, and a[2]
.*(a)
, *(a + 4)
, and *(a + 8)
or in other words a[0]
, a[4]
, and a[8]
.To summarize in C code, to access the i
th element of a vector int *a
with stride s
, we use *(a + i * s)
, or a[i * s]
. We leave it up to you to translate this memory access into RISC-V.
For a closer look at strides in vectors/arrays, see this Wikipedia page.
All the code you write will be in RISC-V, and will be run in Venus. There are two ways to run your code with Venus, either through the web interface or from the command line using a Java .jar
file. We recommend that you do most of your development locally with the .jar
file, and only use the web interface for debugging or making changes to one file at a time.
We will be testing all of your code on RISC-V calling conventions, as described in lecture/lab/discussion. All functions that overwrite registers that are preserved by convention must have a prologue and epilogue where they save those register values to the stack at the start of the function and restore them at the end.
Follow the calling conventions. It is extremely important for this project, as you’ll be writing functions that call your other functions, and maintaining the abstraction barrier provided by the conventions will make your life a lot easier.
We’ve provided # Prologue
and # Epilogue
comments in each function as a reminder. Note that depending on your implementation, some functions won’t end up needed a prologue and epilogue. In these cases, feel free to delete/ignore the comments we’ve provided.
For an closer look at RISC-V calling conventions, refer here.
addi sp, sp, -8
and addi sp, sp, 8
?sw s0, 4 (sp)
Source | Description |
---|---|
test_dot.s, dot.s | Driver for dot product, TODO: implement dot product |
test_gemv.s, gemv.s | TODO: Driver for matmul, implement matmul |
test_read_matrix.s, read_matrix.s | TODO: Driver for dense matrices and vector, read the matrix |
test_sdot.s, sdot.s | TODO: Driver for sparse dot product, implement sparse dot product |
main.s | Main starting file for end-to-end gemv |
In the test_files
subdirectory, you’ll find several RISC-V files to test your code with. There is a test file corresponding to every function you’ll have to write.
DO NOT MODIFY THE INPUTS WHEN COMMITTING THE FILES TO GIT. IT WILL FAIL THE REFERENCE CHECKS
# e.g., listing of inputs for part 1. mnist
$ ls inputs/
bin txt
Each network folder contains a bin
and txt
subfolder. The bin
subfolder contains the binary files that you’ll run main.s
on, while the txt
subfolder contains the plaintext versions for debugging and calculating the expected output.
Within the bin
and txt
subfolders, you’ll find files for m
and v
which define the matrix and vector respectively.
$ ls inputs/mnist/bin/
m.bin m.coo v0.bin v1.bin v2.bin v3.bin v4.bin v5.bin v6.bin v7.bin v8.bin
To aid in the process of testing and debugging, create an output folder after you clone.
# DO NOT CHECK IN out/. IT MIGHT BREAK THE AUTOGRADERS
mkdir -p out/gemv/mnist
mkdir -p out/gemv/simple
The test_[*].out represent the output of each sub-function.
$ ls ./ref/
gemv/ test_gemv.out test_sdot.out
spmv/ test_read_coo.out
test_dot.out test_read_matrix.out
$ ls ref/gemv/*
ref/gemv/mnist:
v0.trace v2.trace v4.trace v6.trace v8.trace
v1.trace v3.trace v5.trace v7.trace
ref/gemv/simple:
simple0.trace simple1.trace simple2.trace
We include the reference outputs for each of the inputs. The traces display the out array of each of the stages in main. (look for the jal print_array
calls in main.s). They include the outputs from each stage of main for verification.
Dense Matrix - Dense Vector Multiplication (GEMV)
Expected completion time (within 7 days)
Some of these test files have been provided for you in their entirety, and you’ll only have to edit them to change their inputs. Other ones just include starter code, and have been left for you to fill out. You will implement.
In dot.s
, implement the dot
function to compute the dot product of two integer vectors. The dot product of two vectors a and b is defined as dot(a, b) = $\sum_{i=0}^{n-1} a_i \times b_i = a_0 * b_0 + a_1 * b_1 + \cdots + a_{n-1} * b_{n-1}$, where $a_i$ is the ith element of a.
Notice that this function for gemv the stride is always 1 for each of the two vectors, make sure you’re considering this when calculating your memory addresses. We’ve described strides in more detail in the background section above, which also contains a detailed example on how stride affects memory addresses for vector elements.
Also note that we do not expect you to handle overflow when multiplying. This means you won’t need to use the mulh
instruction.
For a closer look at dot products, see this Wikipedia page.
This time, you’ll need to fill out test_dot.s
, using the starter code and comments we’ve provided. Overall, this test should call your dot product on two vectors in static memory, and print the result (285 for the sample input).
$ java -jar venus.jar ./test_files/test_dot.s
## See the screen output
## To validate
$ java -jar venus.jar ./test_files/test_dot.s > ./out/test_dot.out
$ diff ./out/test_dot.out ./ref/test_dot.out
# If diff report any lines, then check your output
By default, in the starter code we’ve provided, v0
and v1
point to the start of an array of the integers 1 to 9, continuous in memory. Let’s assume we set the length and stride of both vectors to 9 and 1 respectively. We should get the following:
v0 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
v1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
dot(v0, v1) = 1 * 1 + 2 * 2 + ... + 9 * 9 = 285 ### Task 2: Matrix-Vector Multiplication (GEMV)
Now that we have a dot product function that can take in varying strides, we can use it to do matrix multiplication. In gemv.s
, implement the gemv
function to compute the product of a dense matrix and dense vector.
The multiplication of matrix M
and dense vector V
results in the output vector Output[] = M.V, where Output[i]
is equal to the dot product of the ith row of A and the vector v
. Note that if we let the dimensions of M be $r \times c$, and the dimensions of B be $c \times 1$), then the dimensions of C must be (r * 1).
// A[n][m], B[m][k] .
// M - Base address, V - Base Address
//
for (i = 0; i<r;i++)
row_address = M+i*m*sizeof(int)
// dot_product(&v0[0],&v1[0],length of vectors,stride of v0,stride of v1)
Result[i] = dot_product(row_address,V,c,1,1)
Documentation on the function has been provided in the comments. A pointer to the output matrix is passed in as an argument, so you don’t need to allocate any memory for it in this function. Additionally, note that M
is the left matrix, and V
is the dense vector.
Note that since we’re taking the dot product of the rows of M
with vector V
, the length of the two must be the same. If this is not the case, you should exit the program with exit code 2
. This code has been provided for you under the mismatched_dimensions
label at the bottom of the file, so all you need to do is jump to that label when the a row of M
and a column of V
do not have the same length.
A critical part of this function, apart from having a correctly implemented dot product, is passing in the correct row and vector to the dot product function, along with their corresponding strides. Since our matrices are in row-major order, all the elements of any single row are contiguous to each other in memory, and have stride 1.
For a closer look at matrix-vector multiplication, see this Wikipedia page.
Fill out the starter code in test_gemv.s
to test your matrix multiplication function. The completed test file should let you set the values and dimensions for two matrices in .data
as 1D vectors in row-major order. When ran, it should print the result of your matrix multiplication.
Note that you’ll need to allocate space for an output matrix as well. The starter code does this by creating a third matrix in static memory.
$ java -jar venus.jar ./test_files/test_gemv.s
## See the screen output
## To validate
$ java -jar venus.jar ./test_files/test_gemv.s > ./out/test_gemv.out
$ diff ./out/test_gemv.out ./ref/test_gemv.out
# If diff report any lines, then check your output
# The inputs are matrix and vector
# |1, 2, 3| |1|
# |4, 5, 6| * |2| = |14, 32, 50|
# |7, 8, 9| |3|
For test_matrix, we have provided the sample output (./ref/test_gemv.out
).
IF YOU TRY OUT OTHER INPUTS, YOU CAN CHECK THE RESULT USING WOLFRAM ALPHA. BUT DO NOT CHECK IN THE CHANGED INPUTS
In this part, you will implement functions to read matrices from the binary files. Then, you’ll write a main function putting together all of the functions you’ve written so far into an MNIST classifier, and run it using pre-trained weight matrices that we’ve provided.
Our matrix files come in two forms: binary and plaintext. The binary format is the one we will be using for all our programs as it is easier to read. The plain text version has been provided to help with debugging The usage is as follows:
We recommend the xxd
command to open the binary file (DO NOT USE YOUR EDITOR). You can find it’s man page here, but its default functionality is to output the raw bits of the file in a hex representation.
The first 8 bytes of the binary file represent two 4 byte integers. These integers are the number of rows and columns of the matrix. Every 4 following bytes represents an integer that is an element of the matrix, in row-major order. In this case each of the 4 bytes represents a vallue of the pixel There are no gaps between elements.. It is important to note that the bytes are in little-endian order. This means the least significant byte is placed at the lowest memory address. For files, the start of the file is considered the lower address
. This relates to how we read files into memory, and the fact that the start/first element of an array is usually at the lowest memory address.
$ xxd ./inputs/mnist/bin/m.bin
# hit q to exit the viewer
Vectors also use the same format except; the number of columns
second parameter in format is hardcoded to 1
We’ve included a python script called convert.py
to convert between the text and binary formats
python txt2bin.py file.bin file.txt --to-ascii
to go from binary to plaintextpython txt2bin.py file.txt file.bin --to-binary
to go from plaintext to binaryFor example, let’s say the plaintext example in the previous section is stored in file.txt
. We can run python convert.py file.txt file.bin --to-binary
to convert it to a binary format.
In read_matrix.s
, implement the read_matrix
function which uses the file operations we described above to read a binary matrix file into memory. If any file operation fails or doesn’t return the expected number of bytes, exit the program with exit code 1
. The code to do this has been provided for you, simply jump to the eof_or_error
label at the end of the file.
Recall that the first 8 bytes contains the two 4 byte dimensions of the matrix, which will tell you how many bytes to read from the rest of the file. Additionally, recall that the binary matrix file is already in row-major order.
You’ll need to allocate memory for the matrix in this function as well. This will require calls to malloc
, which is in util.s
and also described in the background section above.
Finally, note that RISC-V only allows for a0
and a1
to be return registers. Here we will violate that rule and our function needs to return three values: We return the 3 pointers in a0,a1,a2. a1 is a pointer to an integer, we will set it to the number of row. a2 is a pointer to an integer, we will set it to the number of columns
Testing this function is a bit different from testing the others, as the input will need to be a properly formatted binary file that we can read in.
We’ve provided a skeleton for test_read_matrix.s
, which will read the file test_input.bin
, and then print the output. The file test_input.bin
is the binary format of the plaintext matrix file test_input.txt
. To change the input file read by the test you’ll need to edit test_input.txt
first, then run the convert.py
script with the --to-binary
flag to update the binary.
From the root directory, it should look something like this: python convert.py --to-binary test_files/test_input.txt test_files/test_input.bin
After this, you can run the test again, and it’ll read your updated test_input.bin
.
Another thing to note is that you’ll need to allocate space for two integers, and pass in those memory addresses as arguments to read_matrix
. You can do this either with malloc
.
$ java -jar venus.jar ./test_files/test_read_matrix.s
## See the screen output
## To validate
$ java -jar venus.jar ./test_files/test_read_matrix.s > ./out/test_read_matrix.out
$ diff ./out/test_read_matrix.out ./ref/test_read_matrix.out
# If diff report any lines, then check your output
In main_gemv.s
, implement the main
function. This will bring together everything you’ve written so far, and create a basic sequence of functions that will allow you to perform gemv using the matrices and vector we’ve provided. You may need to malloc space when reading in matrices and computing the layers of the network, but remember to always free all data allocated at the end of this process. More information about the free
function is available in utils.s
and the background section above.c
Note that for THIS PROJECT/FUNCTION ONLY, we will NOT require you to follow RISC-V calling convention by preserving saved registers in the main
function. This is to make testing the main function easier, and to reduce its length. Normally, main
functions do follow convention with a prologue and epilogue.
The filepaths for the v
, m
will all be passed in on the command line. RISC-V handles command line arguments in the same way as C, at the start of the main function registers a0
and a1
will be set to argc
and argv
respectively.
We will call main.s
in the following way: java -jar venus.jar main.s <VECTOR_PATH> <MATRIX_PATH>
Note that this means the pointer to the string VECTOR_PATH
will be located at index 1 of argv
, MATRIX_PATH
at index 2, and so on.
If the number of command line arguments is different from what is expected, your code should exit with exit code 3. This will require a call to a helper function in utils.s
. Take a look at the starter code for gemv
, read_matrix
, and write_matrix
for hints on how to do this.
Read Matrix: Read the inputs vector
matrix
by making calls to read_matrix
. The paths to the matrices are passes in the command-line. Remember to store them and pass two integer pointers as arguments.
Next, you’ll invoke the gemv operation and display the result.
Note that when calling gemv, you should the hardcode the number of columns in the vector to 1.
Apart from MNIST, we’ve provided several smaller input networks for you to run your main
function on. simple0
, simple1
, and simple2
are all smaller matrices that will be easier to debug.
# Testing simple1.
java -jar venus.jar ./test_files/main_gemv.s ./inputs/simple1/bin/v.bin ./inputs/simple1/bin/m.bin -ms -1
# To validate
java -jar venus.jar ./test_files/main_gemv.s ./inputs/simple1/bin/v.bin ./inputs/simple1/bin/m.bin > ./out/gemv/simple/simple1.trace
python3 part2_tester.py gemv/simple/simple1
Note that these files cover a variety of dimensions.
All the files for testing the mnist network are contained in inputs/mnist
. There are both binary and plaintext versions of m0
. and 9 input files (inputs/mnist/bin/v[0-9]*.bin).
To test on the first input file for example, run the following:
# Testing mnist. you can try inputs from v[0-9].
java -jar venus.jar ./test_files/main_gemv.s ./inputs/mnist/bin/v0.bin ./inputs/mnist/bin/m.bin -ms -1
# To validate
java -jar venus.jar ./test_files/main_gemv.s ./inputs/mnist/bin/v0.bin ./inputs/mnist/bin/m.bin > ./out/gemv/mnist/v0.trace -ms -1
# a python script to validate your output
python3 part2_tester.py gemv/mnist/v0
#
# (Note that we run with the `-ms -1` flag, as real-dataset inputs are large and we need to increase the max instructions Venus will run)
To check the final output. We have also included a python script that implements MNIST.
python3 gemv.py ./inputs/mnist/bin/v0.bin ./inputs/mnist/bin/m.bin
# if scipy is not installed, you can install it with
$ conda activate
$ pip install --user scipy numpy
You can check the printed output against the references for each of the input files in the ref/mnist
.
Sparse Matrix - Dense vector multiplication (SPMV)
SPMV considers sparse matrices which only stores the non-zeros in the matrix. Sparse matrices can be represented in memory using a variety of formats. Unlike dense matrices, which are commonly represented as contiguous column-major or row-major arrays, sparse matrices present an additional degree of information, which allows for memory-efficient representations. In the coordinate format ,sometimes known as ``edge list’’ or COO, the non-zero elements are represented as a list of triplets [row, col, non-zero value]. The matrix/vector is described here as a single array of triplets (Array of Structs). In COO format, we first store all the non-zeros of row 0, followed by row-1 etc. This will come in handy when we perform multiplication since we can stream over the COO array and process a non-zero at a time.
In dense dot product we iterate over every element. In sparse we only iterate over the non zeros. The key difference is that we need to use match up the coordinates of the sparse vector with the dense vector (and output) and reference the non-zero positions. For instance, in the above figure notice how C[0] (the red element) is generated. We identify the coordinates of the non-zero elements in matrix A ([0,0], [0,3]). We need to reference elements 0 and 3 from the dense vector B ([0] and [3]) corresponding to the columns of the non-zeros in row 0. We then perform the multiplication A[0,0]B[0] + A[0,3]B[3].
This time, you’ll need to fill out test_sdot.s
, using the starter code and comments we’ve provided. Overall, this test should print out a single value matrix. Note that vector v0 is stored in COO format and v1 in dense row-major format.
Check yourself
Do not proceed without answering above question
If you could not answer then you will not able to complete next steps
If you cannot answer, visualize previous bullet before trying again
By default, in the starter code we’ve provided, m
point to the start of a sparse vector shown in the figure above with values 1,3,5,6,7,9.
We have stored in the COO format as an array of structs
coo matrix[6] = {\{0,0,1},{0,2,3},{0,4,5},{0,5,6},{0,6,7},{0,8,9}};
# In assembly and memory this would be laid out as
# vector v0 COO format: 0 0 1 0 2 3 0 4 5 0 5 6 0 6 7 0 8 9
# Actual vector laid out in dense format: [1 0 3 0 5 6 7 0 9]
# Dense vector v1 is [1 2 3 4 5 6 7 8 9]
# Sparse vector * dense vector = [1*1 + 3*3 + 5*5 + 6*6 + 7*7 + 9*9] = [210]
# The output of test_sdot should be **210**
$ java -jar venus.jar ./test_files/test_sdot.s
## See the screen output
## To validate
$ java -jar venus.jar ./test_files/test_sdot.s > ./out/test_sdot.out
$ diff ./out/test_sdot.out ./ref/test_sdot.out
# If diff report any lines, then check your output
$ bash ./scripts/localci.sh
# if you see SUCCESS and *.log.sucess then you passed. You can also check your *_Grade.json to see your tentative grade.
# If you see FAILED, then inspect *.log.failed. Check the failed section to see what tests failed.
Remember from the testing framework section that these sanity tests are not comprehensive, and you should rely on your own tests to decide whether your code is correct. Your score will be determined mostly by hidden tests that will be ran after the submission deadline has passed.
We will also be limiting the number of submissions you can make to the travis-ci. Each test can take up to 5-6 minutes. To give the 150-200 students a fair chance. For any given 2 hour period, you’re limited to 6 submissions.
Overall, there aren’t that many edge cases for this project. We’re mainly testing for correctness, RISC-V calling convention, and exiting with the correct code in the functions where you’ve been instructed to.
Test | Points |
---|---|
test_dot | 10 |
test_gemv | 10 |
test_read_matrix | 10 |
simple[0,1,2] | 15 (5pts each) |
mnist/[v0-8] | 80 (10pts each) |
test_sdot | 30 |
unable to venusbackend.assembler.AssemblerError: Could not find the library:
Check the path of the imported files.unable to create ./out/....trace
: Check if the out/mnist, out/simple0, out/simple1 and out/simple2 folders exist.File read errors
: Note that the file paths we specify ./
are relative and assume you are in the top folder. You should include complete path when in web-based venus
In web-based venus make sure you are at the top of the folder to invoke in the same manner as described here.
File paths if in root repo.
If you are in a different folder (e.g., test_files then relative path will be ../
)Ran for more than max allowed steps!
: Venus will automatically terminate if your program runs for too many steps. This is expected for large MNIST sized inputs, and you can workaround it with the -ms
flag. If you’re getting this for small inputs, you might have an infinite loop.Attempting to access uninitialized memory between the stack and heap.
: Your code is trying to read or write to a memory address between the stack and heap pointers, which is causing a segmentation fault. Check that you’re allocating enough memory, and that you’re accessing the correct addresses.Assembler errors
: You cannot have any .word and .data sections in your included files.# Sample execution only
cd $REPO
java -jar venus.jar -cc --retToAllA ./test_files/main_gemv.s ./inputs/simple1/bin/v.bin ./inputs/simple1/bin/m.bin -ms -1 > ./out/gemv/simple1.trace
# You forgot to save and restore a register that is expected. Check your prologue and epilogue in gemv.s
Save register s8 not correctly restored before return! Expected 0x10008030, Actual 0x00000000. ../gemv.s:91 ret
# You have not written to t3 but are trying to use it. Initialize or see what t3 should be.
# t3 is being used in line 150
150 Usage of unset register t3! ./test_files/main_gemv.s
TAs are not magicians to say what’s wrong with your code by just staring at assembly for 5-6 minutes. Since TA hours are limited per student we will be imposing the following rules for TA help.
TA help is available only for the following:
simple inputs and test_ filesYou have to run venus in command line and show there are calling convention errors and no return errors
# important venus options
-cc, --callingConvention Runs the calling convention checker.
--retToAllA If this flag is enabled, the calling convention checker will assume all `a` register can be used to return values. If this is not there, then it will assume only a0 will be returned.
You should print the registers of interest (e.g., loop counters, based addresses)
Print int
Print arrrayYou should preferrably have your program loaded to browser based venus for a TA OH.
If you do not follow the above instructions, TAs will simply direct you to follow these steps and move to the next student.
This assignment has been modified the CMPT 295 instructor.