Intel® Advisor User Guide

ID 766448
Date 3/22/2024
Public
Document Table of Contents

Loop Markup to Minimize Analysis Overhead

Issue

Running your target application with the Intel® Advisor can take substantially longer than running your target application without the Intel® Advisor. Depending on an accuracy level and analyses you choose for a perspective, different overhead is added to your application execution time. For example:

Runtime Overhead / Analysis

Survey

Characterization

Dependencies

MAP

Target application runtime with Intel® Advisor compared to runtime without Intel® Advisor

1.1x longer

2 - 55x longer

5 - 100x longer

5 - 20x longer

Solutions

Use the following techniques to skip uninteresting loops and analyze only interesting loops.

Select Loops by ID

Goal: Minimize collection overhead.

Applicable analyses: Characterization with Trip Counts and FLOP collection enabled, Dependencies, Memory Access Patterns.

Use when...

  • You want to perform a deeper analysis on only a few loops.

  • CLI environment: You cannot identify source file/line numbers, such as when you are analyzing a target application for which you do not have access to source code.

Note: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.

Prerequisites:

  1. Run a Survey analysis.

  2. advisor CLI environment: Identify the loop IDs for the loops of interest.

    advisor --report=survey --project-dir=./advi_results -- ./myApplication

    In the report, the first column is the loop IDs.

TIP:

Intel® Advisor reports tend to be very wide. Do one of the following to generate readable reports:

  • Set your console width appropriately to avoid line wrapping.

  • Pipe your report using the appropriate truncation command if you care only about the first few report columns.

After performing the prerequisites, do one of the following:

  • For Vectorization and CPU Roofline: Mark the loop(s) of interest by enabling the associated checkbox on the Survey Report.

    Then run a Characterization with Trip Counts and FLOP collection enabled, Dependencies, or Memory Access Patterns analysis.

  • For Offload Modeling: Go to Project Properties > Performance Modeling and enter the CLI action option --select=<string> in the Other parameters field. For example, --select=5,10,12.

  • Mark the loop(s) of interest using the CLI action option --select=<string> (recommended) or --mark-up-list=<string> when running a Characterization with Trip Counts and FLOP collection enabled, Dependencies, or Memory Access Patterns analysis. For example, with the --select option:

    advisor --collect=tripcounts --flop --project-dir=./advi_results --select=5,10,12 -- ./myApplication

    Then run a Characterization with Trip Counts and FLOP collections enabled, Dependencies, or Memory Access Patterns analysis.

NOTE:

There are different ways to select loops is in the CLI environment:

  • The advisor CLI action options --mark-up-list=<string> and --select=<string> merely simulate enabling a GUI checkbox when used within -collect action. They are active only for the duration of the --collect command.

  • The same options used with advisor CLI action --mark-up-loops actually enable a GUI checkbox. They are active beyond the duration of the -mark-up-loops command and applies to all downstream analyses, such as Characterization with Trip Counts and FLOP collection enabled, Dependencies, Memory Access Patterns.

Select Loops by Source File/Line Number

Minimize collection overhead.

Applicable analyses: Characterization with Trip Counts and FLOP collection enabled, Dependencies, Memory Access Patterns.

Use when...

  • You want to perform a deeper analysis on only a few loops.

  • CLI environment: You are analyzing a target application for which you have access to source code and can identify source file/line numbers.

Note: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.

Prerequisites:

  1. Run a Survey analysis.

  2. advisor CLI environment: If necessary, identify the source file and line number for the loops of interest.

    advisor --report=survey --project-dir=./advi_results -- ./myApplication

After performing the prerequisites, do one of the following:

  • For Vectorization and CPU Roofline: Mark the loop(s) of interest by enabling the associated checkbox on the Survey report.

    Then run a Characterization with Trip Counts and FLOP collection enabled, Dependencies, or Memory Access Patterns analysis.

  • For Offload Modeling: Go to Project Properties > Performance Modeling and enter the CLI action option --select=<string> in the Other parameters field. For example, --select=foo.cpp:34,bar.cpp:192.

  • Mark the loop(s) of interest using the CLI action option --select=<string> (recommended) or --mark-up-list=<string> for a Characterization with Trip Counts and FLOP collection enabled, Dependencies, or Memory Access Patterns analysis. For example, with the -select option:

    advisor --collect=tripcounts --flop --project-dir=./advi_results --select=foo.cpp:34,bar.cpp:192 -- ./bin/myApplication
  • Mark the loop(s) of interest by enabling the associated checkbox on the Survey Report.

    Then run a Characterization with Trip Counts and FLOP collection enabled, Dependencies, or Memory Access Patterns analysis.

  • Mark the loop(s) of interest using the advisor CLI action --mark-up-loops and action option --select=<string>. For example:

    advisor --mark-up-loops --select=foo.cpp:34,bar.cpp:192 --project-dir=./advi_results -- ./myApplication

    Then run a Characterization with Trip Counts and FLOP collection enabled, Dependencies, or Memory Access Patterns analysis.

NOTE:
  • There is essentially no difference between selecting loops by ID and selecting loops by source file/line in the GUI environment. The difference is in the advisor CLI environment:

    • The advisor CLI action option--mark-up-list=<string> merely simulates enabling a GUI checkbox; therefore it persists only for the duration of the --collect command.

    • The advisor CLI action--mark-up-loops and action option --select=<string> actually enables a GUI checkbox; therefore it persists beyond the duration of the --mark-up-loops command and applies to downstream analyses, such as Characterization with Trip Counts and FLOP collection enabled, Dependencies, and Memory Access Patterns.

  • If you use the --mark-up-loops CLI action to mark up loops, you can append and remove source file/line numbers for an analysis run after it using the advisor CLI action option --append=<string> and --remove=<string> respectively.

Select Loops by Criteria

Goal: Minimize collection overhead.

Applicable analyses: Dependencies, Memory Access Patterns.

Use when you want to perform a deeper analysis on loops chosen by criteria instead of by human input, such as when you are running the Intel® Advisor with a collection preset or using automated scripts.

To implement in the advisor CLI environment, run the commands similar to the following one by one from the command line or create a script similar to the following examples and run it to execute the commands automatically. Use the --select (recommended) or --loops option to select loops by criteria.

Note: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.

For example, to analyze loop-carried dependencies in loops/functions that have the Assumes dependency present issue, use one of the following:

  • Example 1:

    advisor --collect=survey --project-dir=./advi_results -- ./bin/myApplication
    advisor --collect=dependencies --project-dir=./advi_results  -- ./myApplicaton
    
  • Example 2:

    advisor --collect=survey --project-dir=./advi_results -- ./bin/myApplication
    advisor --collect=dependencies select="scalar,has-issue" --project-dir=./advi_results  -- ./myApplicaton
    

Select Loops by Markup Algorithm

Goal: Minimize collection overhead.

Applicable analyses: Characterization with Trip Counts and FLOP collection enabled, Dependencies, Memory Access Patterns.

NOTE:
This is only applicable to the Offload Modeling perspective.

Use --select=r:markup=<algorithm> when you want to perform a deeper analysis on loops chosen by a pre-defined markup algorithm based on a programming model used and/or estimated offload profitability.

If you analyze an application that runs on a CPU, use the gpu_generic algorithm. This algorithm selects all potentially profitable loops/functions for additional analyses to collect more data and make sure they can be safely offloaded.

If you analyze code regions that are already offloaded and use a specific programming model, use one of the following algorithms:

  • omp - Select OpenMP* loops.

  • icpx -fsycl - Select SYCL loops.

  • ocl - Select OpenCL™ loops.

  • daal - Select Intel® oneAPI Data Analytics Library loops.

  • tbb - Select Intel® oneAPI Threading Building Blocks loops.

Note: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.

For example, to run the Offload Modeling and analyze potentially profitable code regions in details:

  • Example 1. Use the --select=r:markup=<algorithm> option with the --collect action option to select loops only for the specific analysis.

    advisor --collect=survey --project-dir=./advi_results --static-instruction-mix -- ./myApplication
    advisor --collect=tripcounts --project-dir=./advi_results --flop --cache-simulation=single  --target-device=xehpg_512xve --stacks --data-transfer=light  -- ./myApplication
    advisor --collect=dependencies --filter-reductions --loop-call-count-limit=16 --select markup=gpu_generic --project-dir=./advi_results -- ./myApplication
    advisor --collect=projection --project-dir=./advi_results
  • Example 2. Use the --select=r:markup=<algorithm> option with the --mark-up-loops action option in a separate step to select loops for all analysis executed after this command.

    advisor --collect=survey --project-dir=./advi_results --static-instruction-mix -- ./myApplication
    advisor --collect=tripcounts --project-dir=./advi_results --flop --cache-simulation=single  --target-device=xehpg_512xve --stacks --data-transfer=light  -- ./myApplication
    advisor --mark-up-loops --project-dir=./advi_results --select markup=gpu_generic -- ./myApplication
    advisor --collect=dependencies --filter-reductions --loop-call-count-limit=16 --project-dir=./advi_results -- ./myApplication
    advisor --collect=projection --project-dir=./advi_results
NOTE:
Currently, there is no GUI equivalent of the markup strategies. The gpu_generic strategy is used by default.