This page shows the supported scenarios for multi-card and multi-tile programming with the Intel® oneAPI Level Zero (Level Zero) Backend.
Root-devices
In this programming model, Intel GPUs are represented as SYCL* GPU devices, or root-devices. You can find your root-device with the sycl-ls tool, for example:
$ sycl-ls [opencl:gpu:0] Intel® OpenCL HD Graphics, Intel® UHD Graphics 630 [0x3e92] 3.0 [21.49.21786] [opencl:cpu:1] Intel® OpenCL, Intel® Core™ i7-8700K CPU @ 3.70GHz 2.1 [2020.11.11.0.03_160000] [ext_oneapi_level_zero:gpu:0] Intel® Level-Zero, Intel® UHD Graphics 630 [0x3e92] 1.2 [1.2.21786] [host:host:0] SYCL host platform, SYCL host device 1.2 [1.2]
sycl-ls shows the devices and platforms of all the SYCL backends, which are seen by the SYCL runtime. The example above shows the CPU (managed by an OpenCL™ backend) and two GPUs that correspond to the single physical GPU (managed by an OpenCL™ or Level Zero backend). There are two ways to filter observable root-devices:
$ SYCL_DEVICE_FILTER=ext_oneapi_level_zero sycl-ls Warning: SYCL_DEVICE_FILTER environment variable is set to level_zero. To see the correct device id, please unset SYCL_DEVICE_FILTER. [ext_oneapi_level_zero:gpu:0] Intel® Level-Zero, Intel® UHD Graphics 630 [0x3e92] 1.2 [1.2.21786]
If there are multiple GPUs in a system, they are seen as multiple root-devices. On Linux*, you will see multiple SYCL root-devices of the same SYCL platform (representing a Level Zero driver). On Windows* you will see root-devices of multiple different SYCL platforms (Level Zero drivers).
You can use CreateMultipleRootDevices=N NEOReadDebugKeys=1 environment variables to emulate multiple GPU cards. For example:
$ CreateMultipleRootDevices=2 NEOReadDebugKeys=1 SYCL_DEVICE_FILTER=ext_oneapi_level_zero sycl-ls Warning: SYCL_DEVICE_FILTER environment variable is set to ext_oneapi_level_zero. To see the correct device id, please unset SYCL_DEVICE_FILTER. [ext_oneapi_level_zero:gpu:0] Intel® Level-Zero, Intel® UHD Graphics 630 [0x3e92] 1.2 [1.2.21786] [ext_oneapi_level_zero:gpu:1] Intel® Level-Zero, Intel® UHD Graphics 630 [0x3e92] 1.2 [1.2.21786]
Sub-devices
Some Intel GPU hardware is composed of multiple tiles, where the root-devices can be partitioned into sub-devices that correspond to the physical tiles. For example:
try { vector<device> SubDevices = RootDevice.create_sub_devices< cl::sycl::info::partition_property::partition_by_affinity_domain>( cl::sycl::info::partition_affinity_domain::next_partitionable); }
Each call to create_sub_devices returns the same sub-devices in their persistent order. Use the ZE_AFFINITY_MASK environment variable to control what sub-devices are exposed by the Level Zero driver.
The CreateMultipleSubDevices=N NEOReadDebugKeys=1 environment variables can be used to emulate multiple tiles of a GPU.
Contexts are used for resource isolation and sharing. A SYCL context may consist of one or multiple devices. Both root-devices and sub-devices can be found within a single context, but they need to be from the same SYCL platform. A SYCL kernel_bundle created against a context with multiple devices is built to each of the root-devices in the context. For a context that consists of multiple sub-devices of the same root-device, only a single build (to that root-device) is needed.
Unified Shared Memory (USM)
There are three ways to allocate memory:
Buffers
SYCL buffers that are created against a context and under the hood are mapped to the Level Zero USM allocation. The mapping details are:
A SYCL queue is always attached to a single device in a potential multi-device context. Four example scenarios (from most to least performant) are provided:
try { vector<device> SubDevices = ...; for (auto &D : SubDevices) { // Each queue is in its own context, no data sharing across them. auto Q = queue(D); Q.submit([&](handler& cgh) {...}); } }
try { vector<device> SubDevices = ...; auto C = context(SubDevices); for (auto &D : SubDevices) { // All queues share the same context, data can be shared across queues. auto Q = queue(C, D); Q.submit([&](handler& cgh) {...}); } }
try { // The queue is attached to the root-device, driver distributes to sub-devices, if any. auto D = device(gpu_selector{}); auto Q = queue(D); Q.submit([&](handler& cgh) {...}); }
try { auto P = platform(gpu_selector{}); auto RootDevices = P.get_devices(); auto C = context(RootDevices); for (auto &D : RootDevices) { // Context has multiple root-devices, data can be shared across multi-card (requires explicit copying) auto Q = queue(C, D); Q.submit([&](handler& cgh) {...}); } }