Math Kernel Library Developer Guide

Getting Started with Intel Optimized HPCG

To start working with the benchmark,

  1. On a cluster file system, unpack the Intel Optimized HPCG package to a directory accessible by all nodes. Read and accept the license as indicated in the readme.txt file included in the package.

  2. Change the directory to hpcg/bin.

  3. Determine the prebuilt version of the benchmark that is best for your system or follow QUICKSTART instructions to build a version of the benchmark for your MPI implementation.

  4. Ensure that Intel® oneAPI Math Kernel Library, Intel C/C++ Compiler and MPI run-time environments have been set properly. You can do this using the scriptsvars.sh, compilervars.sh, and mpivars.sh that are included in those distributions.

  5. Run the chosen version of the benchmark.

    • The Intel AVX and Intel AVX2 optimized versions perform best with one MPI process per socket and one OpenMP* thread per core skipping simultaneous multithreading (SMT) threads: set the affinity as KMP_AFFINITY=granularity=fine,compact,1,0. Specifically, for a 128-node cluster with two Intel® Xeon® Processor E5-2697 v4 per node, run the executable as follows:
      #> mpiexec.hydra -n 
      256 -ppn 2 env OMP_NUM_THREADS=18 
      KMP_AFFINITY=granularity=fine,compact,1,0 
      ./bin/xhpcg_avx2 -n192
      
    • The Intel® Xeon® Phi processor optimized version performs best with four MPI processes per processor and two threads for each processor core, with SMT turned on. Specifically, for a 128-node cluster with one Intel® Xeon® Phi processor 7250 per node, run the executable in this manner:
      #> mpiexec.hydra -n 
      512 -ppn 2 env OMP_NUM_THREADS=34 
      KMP_AFFINITY=granularity=fine,compact,1,0 
      ./bin/xhpcg_knl -n160

  6. When the benchmark completes execution, which usually takes a few minutes, find the YAML file with official results in the current directory. The performance rating of the benchmarked system is in the last section of the file:

    HPCG result is VALID with a GFLOP/s rating of: [GFLOP/s]

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Notice revision #20201201