Member-only story

How to Reduce II in HLS: Part 4

Mohammad Hosseinabady
1 min readMay 24, 2021

This week’s problem is the traditional matrix-vector multiplication kernel used in several applications such as machine learning and image processing.

This figure shows the software-oriented code that describes the kernel.

The code consists of a two-level loop nest that reads the matrix A and vector

and generates the output elements located in

If we assume n is 4096 and m is 2048, then this figure shows the synthesis report after synthesising the code with Vitis 2020.2.

As can be seen, the first loop is not pipelined, but the inner loop is pipelined with the initiation interval of 1.

After executing the code on Ultra96v2 using the Vitis-2020.2 software tool, the execution time would be 683.491 ms.

Now the question is, “How can we improve the performance?”

Please follow the solution in the video.

Originally published at http://highlevel-synthesis.com on May 24, 2021.

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Mohammad Hosseinabady
Mohammad Hosseinabady

Written by Mohammad Hosseinabady

Designing digital systems and accelerating functions with HLS for FPGAs are fun.

No responses yet

Write a response