These recipes explore typical application performance problems that you can detect with Intel® VTune™ Profiler or its predecessor, Intel® VTune™ Amplifier. Use the guidance in these recipes to optimize performance.
Recipe |
Description |
---|---|
Cache-Related Latency Issues in Segmented Cache Environment | Use Cache Allocation Technology (CAT) to handle cache-related latency issues (cache misses) when you split a cache between cores. |
False Sharing | Profile a memory-bound linear_regression application using the Microarchitecture Exploration and Memory Access analyses in Intel® VTune™ Profiler. |
Frequent DRAM Accesses | Profile a memory-bound matrix application using the Microarchitecture Exploration and Memory Access analyses in Intel® VTune™ Profiler. Understand the cause for frequent DRAM accesses. |
Poor Port Utilization | Profile a core-bound matrix application using the Microarchitecture Exploration analysis. Understand the cause for poor port utilization. |
Page Faults | Identify and measure the impact of page faults on target application performance. Use the Microarchitecture Exploration, System Overview, and Memory Access analyses in Intel® VTune™ Profiler. |
Instruction Cache Misses | Profile a front-end-bound application using the Microarchitecture Exploration analysis in Intel® VTune™ Profiler. Use a PGO option to reduce ICache misses. |
Inefficient Synchronization | Locate inefficient synchronization in your code by running the Advanced Hotspots analysis with the stack collection enabled. |
Inefficient TCP/IP Synchronization | Locate inefficient TCP/IP synchronization in your code by running the Locks and Waits analysis in Intel® VTune™ Profiler, with the task collection enabled. |
OS Thread Migration | Identify OS thread migration on the NUMA architecture with the Hotspots analysis in Intel® VTune™ Profiler. |
OpenMP* Imbalance and Scheduling Overhead | Detect and fix frequent parallel bottlenecks of OpenMP programs such as imbalance on barriers and scheduling overhead. |
Processor Cores Underutilization: OpenMP* Serial Time | Identify a fraction of serial execution in an application parallelized with OpenMP. Find additional opportunities for parallelization, and improve the scalability of the application. |
Scheduling Overhead in Intel® Threading Building Blocks (Intel® TBB) Apps | Detect and fix scheduling overhead for an Intel® TBB application. |
PMDK Application Overhead | Detect and fix an overhead on memory accesses for a PMDK-based application. |