Intel® FPGA SDK for OpenCL™ Standard Edition: Programming Guide

ID 683342
Date 4/22/2019
Public
Document Table of Contents

5.2.4. Loop Concurrency (max_concurrency Pragma)

You can use the max_concurrency pragma to decrease the concurrency of a loop in your component. The concurrency of a loop is how many iterations of that loop can be in progress at one time. By default, the Intel® FPGA SDK for OpenCL™ tries to maximize the concurrency of loops so that your component runs at peak throughput.

The max_concurrency pragma applies to single work-item kernels (that is, single-threaded kernels) in which loops are pipelined. Refer to the Single Work-Item Kernel versus NDRange Kernel section of the Intel® FPGA SDK for OpenCL™ Standard Edition Best Practices Guide for information on loop pipelining, and on kernel properties that drive the offline compiler's decision on whether to treat a kernel as single-threaded.

The max_concurrency pragma enables you to control the on-chip memory resources required to implement your loop. To achieve simultaneous execution of loop iterations, the offline compiler must create independent copies of any memory that is private to a single iteration. The greater the permitted concurrency, the more copies the compiler must make.

The kernel's HTML report (report.html) provides the following information pertaining to loop concurrency:

  • Maximum concurrency that the offline compiler has chosen

    This information is available in the Loop Analysis report. A message in the Details pane reports that the maximum number of simultaneous executions has been limited to N.

  • Impact to memory usage

    This information is available in the Area Analysis report. A message in the Details pane reports that the offline compiler has created N independent copies of the memory to enable simultaneous execution of N loop iterations.

If you want to exchange some performance for physical memory savings, apply #pragma max_concurrency <N> to the loop, as shown below. When you apply this pragma, the offline compiler limits the number of simultaneously-executed loop iterations to N. The number of independent copies of loop memories is also reduced to N.

#pragma max_concurrency 1
​for (int i = 0; i < N; i++) {
  int arr[M];
  // Doing work on arr
}