Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide

ID 683521
Date 12/19/2022
Public
Document Table of Contents

4.5. Aligning a Struct with or without Padding

A properly aligned struct helps the Intel® FPGA SDK for OpenCL™ Offline Compiler generate the most efficient hardware.
A proper struct alignment means that the alignment can be evenly divided by the struct size.
Important: Ensure a 4-byte alignment for the data structures. struct alignments smaller than four bytes result in larger and slower hardware. Hardware efficiency increases with the increasing alignment. In the following example, the Pixel_s structure is only one-byte aligned but the Pixel structure is four-byte aligned due to the presence of a four-byte not_used integer:
typedef struct {
	char r,g,b,alpha;
} Pixel_s;

typedef union {
     Pixel_s p;
     int not_used;
} Pixel;
You can also use the aligned attribute to force a 4-byte alignment, as shown in the following example code:
typedef struct {
       char r,g,b,alpha;
} __attribute__((aligned(4))) Pixel;

The offline compiler conforms with the ISO C standard that requires the alignment of a struct to satisfy all of the following criteria:

  • The alignment must be an integer multiple of the lowest common multiple between the alignments of all struct members.
  • The alignment must be a power of two.

You may set the struct alignment by including the aligned(N) attribute in your kernel code. Without an aligned attribute, the offline compiler determines the alignment of each struct in an array of struct based on the size of the struct. Consider the following example:

__kernel void test (struct mystruct* A,
                    struct mystruct* B)
{
    A[get_global_id(0)] = B[get_global_id(0)];
}

If the size of mystruct is 101 bytes, each load or store access is 1-byte aligned. If the size of mystruct is 128 bytes, each load or store access is 128-byte aligned, which generates the most efficient hardware.

When the struct fields are not aligned within the struct, the offline compiler inserts padding to align them. Inserting padding between struct fields affects hardware efficiency in the following manner:

  • Increases the size of the struct
  • Might affect the alignment

To prevent the offline compiler from inserting padding, include the packed attribute in your kernel code. The aforementioned ISO C standard applies when determining the alignment of a packed or unpacked struct. Consider the following example:

struct mystruct1
{
    char a;
    int b;
};

The size of mystruct1 is 8 bytes. Therefore, the struct is 8-byte aligned, resulting in efficient accesses in the kernel. Now consider another example:

struct mystruct2
{
    char a;
    int b;
    int c;
};

The size of mystruct2 is 12 bytes and the struct is 4-byte aligned. Because the struct fields are padded and the struct is unaligned, accesses in the kernel are inefficient.

Following is an example of a struct that includes the packed attribute:

struct __attribute__((packed)) mystruct3
{
    char a;
    int b;
    int c;
};

The size of mystruct3 is 16 bytes. Because mystruct3 is aligned and there is no padding between struct fields, accesses in this kernel are more efficient than accesses in mystruct3.

To include both the aligned(N) and packed attributes in a struct, consider the following example:

struct __attribute__((packed)) __attribute__((aligned(16))) mystruct5
{
    char a;
    int b;
    int c;
};

The size of mystruct5 is 9 bytes. Because of the aligned(16) attribute, the struct is stored at 16-byte aligned addresses in an array. Because mystruct5 is 16-byte aligned and has no padding, accesses in this kernel is efficient.

For more information about struct alignment and the aligned(N) and packed attributes, refer to the following documents:

  • Section 6.11.1 of the OpenCL Specification version 1.2
  • Disabling Insertion of Data Structure Padding section of the Intel® FPGA SDK for OpenCL™ Programming Guide
  • Specifying the Alignment of a Struct section of the Intel® FPGA SDK for OpenCL™ Programming Guide