Tuning Recipes

These recipes explore typical application performance problems that you can detect with Intel® VTune™ Profiler or its predecessor, Intel® VTune™ Amplifier. Use the guidance in these recipes to optimize performance.

Recipe

Description

Cache-Related Latency Issues in Segmented Cache Environment

Use Cache Allocation Technology (CAT) to handle cache-related latency issues (cache misses) when you split a cache between cores.

False Sharing

Profile a memory-bound linear_regression application using the Microarchitecture Exploration and Memory Access analyses in Intel® VTune™ Profiler.

Frequent DRAM Accesses

Profile a memory-bound matrix application using the Microarchitecture Exploration and Memory Access analyses in Intel® VTune™ Profiler. Understand the cause for frequent DRAM accesses.

Poor Port Utilization

Profile a core-bound matrix application using the Microarchitecture Exploration analysis. Understand the cause for poor port utilization.

Page Faults

Identify and measure the impact of page faults on target application performance. Use the Microarchitecture Exploration, System Overview, and Memory Access analyses in Intel® VTune™ Profiler.

Instruction Cache Misses

Profile a front-end-bound application using the Microarchitecture Exploration analysis in Intel® VTune™ Profiler. Use a PGO option to reduce ICache misses.

Inefficient Synchronization

Locate inefficient synchronization in your code by running the Advanced Hotspots analysis with the stack collection enabled.

Inefficient TCP/IP Synchronization

Locate inefficient TCP/IP synchronization in your code by running the Locks and Waits analysis in Intel® VTune™ Profiler, with the task collection enabled.

OS Thread Migration

Identify OS thread migration on the NUMA architecture with the Hotspots analysis in Intel® VTune™ Profiler.

OpenMP* Imbalance and Scheduling Overhead

Detect and fix frequent parallel bottlenecks of OpenMP programs such as imbalance on barriers and scheduling overhead.

Processor Cores Underutilization: OpenMP* Serial Time

Identify a fraction of serial execution in an application parallelized with OpenMP. Find additional opportunities for parallelization, and improve the scalability of the application.

Scheduling Overhead in Intel® Threading Building Blocks (Intel® TBB) Apps

Detect and fix scheduling overhead for an Intel® TBB application.

PMDK Application Overhead

Detect and fix an overhead on memory accesses for a PMDK-based application.