How I Got TensorFlow and PyTorch working on an Intel Arc A770 GPU

Recently I replaced my Jankinator 1000 with an Intel Arc A770 16GB card.  While this card is a 16GB card versus 24GB, it’s a lot faster and, well, it is not an NVIDIA card. Plus, I can do things like mixed precision processing and modify batch sizes to cope with the loss of 8GB.  I will spare you my thoughts on certain companies and their monopolies in Deep Learning systems.

I thought I would write up this post since, well, some Intel documentation is a bit scattered and is not the easiest to follow. Plus some of it will end up messing with your system if you are not careful. For reference, I’m running Linux Mint 22, which is based on Ubuntu 24.04 Noble.

So for the standard disclaimer, these instructions got everything working for me.  I make absolutely no guarantees that they will work for you.  If your house burns down or a portal opens and Cthulhu appears, don’t blame me.

Drivers

First off, make sure you are running kernel version 6.8.0-41 or later. In some earlier kernels, someone posted a patch that caused a regression in the compute engines on the Arc GPUs (https://github.com/intel/compute-runtime/issues/726). This has been fixed in more recent kernels on Ubuntu. If you are on Linux Mint like I am (and would highly recommend), you should have the latest HWE kernel. If not, go ahead and install it first.

Next we will follow some of these instructions comes from Intel’s documentation at https://github.com/intel/intel-extension-for-tensorflow/blob/main/docs/install/experimental/install_for_arc_gpu.md. Note that it does not currently mention Ubuntu 24.04, but trust me, it works. We will mostly follow their documentation to properly install the drivers, just with a few changes.

Core Drivers

First set up the gpg key for the Intel repo.

sudo apt-get install -y gpg-agent wget
wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | sudo gpg --dearmor --output /usr/share/keyrings/intel-graphics.gpg

Now we install the Intel GPU repository. We differ from their instructions here because while the Nobel repository is not mentioned, trust me it is there.

echo "deb [arch=amd64 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu noble unified" | sudo tee /etc/apt/sources.list.d/intel-gpu-noble.list
sudo apt-get update

Next you will need to install the proper packages.

apt install intel-opencl-icd libze1 intel-level-zero-gpu-raytracing intel-media-va-driver-non-free libmfx1 libmfxgen1 libvpl2 libegl-mesa0 libegl1 libegl1-mesa-dev libgbm1 libgl1-mesa-dev libgl1-mesa-dri libglapi-mesa libgles2-mesa-dev libglx-mesa0 libigdgmm12 libxatracker2 mesa-va-drivers mesa-vdpau-drivers mesa-vulkan-drivers va-driver-all vainfo hwinfo clinfo

This is another difference from the Intel documentation. libze1 has replaced intel-level-zero-gpu, although intel-level-zero-gpu-raytracing is still around. Also libegl1-mesa seems to have been renamed to libegl1 except for the dev package.

You should probably reboot now since the intel-media-va-driver-non-free driver contains some extra functionality that the fully open source version does not.

One API

Now we go ahead and follow their instructions for setting up the Intel One API repository.

wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | sudo gpg --dearmor --output /usr/share/keyrings/oneapi-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
sudo apt-get update

Here we also stray a bit from their documentation. I have found we need to install a LOT of packages from One API to make sure everything works, including their TensorFlow and PyTorch extensions.

apt install intel-oneapi-runtime-dpcpp-cpp intel-oneapi-runtime-mkl intel-oneapi-common-oneapi-vars-2024.2 intel-oneapi-common-licensing-2024.2 intel-oneapi-common-vars intel-oneapi-dpcpp-ct-2024.2 intel-oneapi-mkl-cluster-2024.2 intel-oneapi-tbb-common-devel-2021.13 intel-oneapi-compiler-shared-2024.2 intel-oneapi-dal-2024.6 intel-oneapi-mkl-sycl-include-2024.2 intel-oneapi-mkl-sycl-2024.2 intel-oneapi-ippcp-common-devel-2021.12 intel-basekit-getting-started-2024.2 intel-oneapi-tlt-2024.2 intel-oneapi-advisor intel-oneapi-icc-eclipse-plugin-cpp-2024.2 intel-oneapi-openmp-common-2024.2 intel-oneapi-compiler-dpcpp-cpp-runtime-2024.2 intel-oneapi-mkl-sycl-vm-2024.2 intel-oneapi-mkl-devel-2024.2 intel-oneapi-mkl-sycl-devel-common-2024.2 intel-oneapi-ippcp-common-2021.12 intel-oneapi-dal-common-2024.6 intel-oneapi-mpi-devel-2021.13 intel-oneapi-ipp-2021.12 intel-oneapi-openmp-2024.2 intel-oneapi-dev-utilities-eclipse-cfg-2024.2 intel-oneapi-ipp-common-2021.12 intel-oneapi-tbb-2021.13 intel-oneapi-mkl-cluster-devel-common-2024.2 intel-oneapi-ipp-devel-2021.12 intel-oneapi-compiler-shared-runtime-2024.2 intel-oneapi-compiler-dpcpp-eclipse-cfg-2024.2 intel-oneapi-mkl-sycl-devel-2024.2 intel-oneapi-mkl-classic-include-2024.2 intel-oneapi-vtune intel-oneapi-mkl-core-2024.2 intel-oneapi-mkl-cluster-devel-2024.2 intel-oneapi-mkl-sycl-stats-2024.2 intel-oneapi-compiler-shared-common-2024.2 intel-oneapi-diagnostics-utility-2024.2 intel-oneapi-mkl-sycl-sparse-2024.2 intel-basekit-env-2024.2 intel-oneapi-libdpstd-devel-2022.6 intel-oneapi-mkl-sycl-blas-2024.2 intel-oneapi-dev-utilities-2024.2 intel-oneapi-dal-devel-2024.6 intel-oneapi-tbb-common-2021.13 intel-oneapi-dnnl-2024.2 intel-oneapi-mpi-2021.13 intel-oneapi-compiler-dpcpp-cpp-2024.2 intel-oneapi-mkl-classic-devel-2024.2 intel-oneapi-mkl-core-common-2024.2 intel-oneapi-ipp-common-devel-2021.12 libssl-dev intel-oneapi-mkl-core-devel-2024.2 intel-oneapi-mkl-sycl-lapack-2024.2 intel-oneapi-mkl-core-devel-common-2024.2 intel-oneapi-compiler-dpcpp-cpp-common-2024.2 intel-oneapi-mkl-sycl-data-fitting-2024.2 intel-oneapi-tbb-devel-2021.13 intel-oneapi-dpcpp-cpp-2024.2 intel-oneapi-dal-common-devel-2024.6 intel-oneapi-tcm-1.1 intel-oneapi-dpcpp-ct-eclipse-cfg-2024.2 intel-oneapi-ccl-devel-2021.13 intel-oneapi-dnnl intel-oneapi-compiler-cpp-eclipse-cfg-2024.2 intel-oneapi-mkl-sycl-rng-2024.2 intel-oneapi-mkl-classic-include-common-2024.2 intel-oneapi-ippcp-2021.12 intel-oneapi-dnnl-devel-2024.2 intel-oneapi-ccl-2021.13 intel-oneapi-mkl-sycl-dft-2024.2 intel-oneapi-dpcpp-debugger-2024.2 intel-oneapi-ippcp-devel-2021.12 intel-basekit-2024.2

Yes that is a lot of packages, but you will need them eventually.

Now add these statements in your .bashrc to ensure that all the necessary environment variables are set:

# Intel stuff
source /opt/intel/oneapi/setvars.sh
source /opt/intel/oneapi/compiler/latest/env/vars.sh
source /opt/intel/oneapi/mkl/latest/env/vars.sh

Save your .bashrc file and now whenever you open a terminal you should see something like this:

:: initializing oneAPI environment ...
bash: BASH_VERSION = 5.2.21(1)-release
args: Using "$@" for setvars.sh arguments:
:: advisor -- latest
:: ccl -- latest
:: compiler -- latest
:: dal -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: dnnl -- latest
:: dpcpp-ct -- latest
:: dpl -- latest
:: ipp -- latest
:: ippcp -- latest
:: mkl -- latest
:: mpi -- latest
:: tbb -- latest
:: vtune -- latest
:: oneAPI environment initialized ::

If you want, you can run clinfo to make sure OpenCL is working on the Arc.

bmaddox@sdf1:~$ clinfo
Number of platforms 2
Platform Name Intel(R) OpenCL
Platform Vendor Intel(R) Corporation
Platform Version OpenCL 3.0 LINUX
Platform Profile FULL_PROFILE
......
Platform Name Intel(R) OpenCL Graphics
Number of devices 1
Device Name Intel(R) Arc(TM) A770 Graphics
Device Vendor Intel(R) Corporation
Device Vendor ID 0x8086
Device Version OpenCL 3.0 NEO

I’ve abbreviated a lot of output, but you should see the Arc listed as a platform after running clinfo.

Congratulations!  The hardest part is now over.  Now it is time to get TensorFlow and PyTorch working with the Intel Arc GPU.

TensorFlow

Now we need to install Anaconda/Miniconda. This is because the most recent version of Python that the Intel TensorFlow and PyTorch extensions support is Python 3.11. You can find instructions on how to install conda from their websites.

Once you have it created, we will first work on the Intel TensorFlow extension.

conda create -n "tensorflowintel" python=3.11

or whatever you want to call your virtual conda environment. Activate that environment with:

conda activate tensorflowintel

Next we need to install the TensorFlow extension and TensorFlow itself.

pip install 'tensorflow==2.15.0'
pip install --upgrade intel-extension-for-tensorflow[xpu]

Make sure you specify the [xpu] at the end or else everything will end up using the CPU.

Now we verify that the Intel TensorFlow extension works. Run python to get into an interpreter and then type in the following:

(tensorflowintel) bmaddox@sdf1:~$ python
Python 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf

Now, after running the import statement, you will see a lot of output. Ignore anything that mentions cuda since of course we are not going to install cuda without an NVIDIA card. You will see something like this:

2024-09-08 12:01:20.128862: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-09-08 12:01:20.401696: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-09-08 12:01:20.401756: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-09-08 12:01:20.451490: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-09-08 12:01:20.557915: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-09-08 12:01:20.559102: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-09-08 12:01:21.568560: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-09-08 12:01:23.797012: W external/local_tsl/tsl/lib/monitoring/collection_registry.cc:81] Trying to register 2 metrics with the same name: /tensorflow/core/bfc_allocator_delay. The old value will be erased in order to register a new one. Please check if you link the metric more than once, or if the name is already used by other metrics.
2024-09-08 12:01:23.801616: W external/local_tsl/tsl/lib/monitoring/collection_registry.cc:81] Trying to register 2 metrics with the same name: /xla/service/gpu/compiled_programs_count. The old value will be erased in order to register a new one. Please check if you link the metric more than once, or if the name is already used by other metrics.
2024-09-08 12:01:23.814748: W external/local_tsl/tsl/lib/monitoring/collection_registry.cc:81] Trying to register 2 metrics with the same name: /jax/pjrt/pjrt_executable_executions. The old value will be erased in order to register a new one. Please check if you link the metric more than once, or if the name is already used by other metrics.
2024-09-08 12:01:23.814784: W external/local_tsl/tsl/lib/monitoring/collection_registry.cc:81] Trying to register 2 metrics with the same name: /jax/pjrt/pjrt_executable_execution_time_usecs. The old value will be erased in order to register a new one. Please check if you link the metric more than once, or if the name is already used by other metrics.
2024-09-08 12:01:24.959105: I itex/core/wrapper/itex_gpu_wrapper.cc:38] Intel Extension for Tensorflow* GPU backend is loaded.
2024-09-08 12:01:24.959972: I external/local_xla/xla/pjrt/pjrt_api.cc:67] PJRT_Api is set for device type xpu
2024-09-08 12:01:24.960001: I external/local_xla/xla/pjrt/pjrt_api.cc:72] PJRT plugin for XPU has PJRT API version 0.33. The framework PJRT API version is 0.34.
2024-09-08 12:01:25.106392: I external/intel_xla/xla/stream_executor/sycl/sycl_gpu_runtime.cc:134] Selected platform: Intel(R) Level-Zero
2024-09-08 12:01:25.106772: I external/intel_xla/xla/stream_executor/sycl/sycl_gpu_runtime.cc:159] number of sub-devices is zero, expose root device.
2024-09-08 12:01:25.107555: I external/xla/xla/service/service.cc:168] XLA service 0xac38370 initialized for platform SYCL (this does not guarantee that XLA will be used). Devices:
2024-09-08 12:01:25.107570: I external/xla/xla/service/service.cc:176] StreamExecutor device (0): Intel(R) Arc(TM) A770 Graphics, <undefined>
2024-09-08 12:01:25.109696: I itex/core/devices/gpu/itex_gpu_runtime.cc:130] Selected platform: Intel(R) Level-Zero
2024-09-08 12:01:25.110088: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2024-09-08 12:01:25.110521: I external/intel_xla/xla/pjrt/se_xpu_pjrt_client.cc:97] Using BFC allocator.
2024-09-08 12:01:25.110541: I external/xla/xla/pjrt/gpu/gpu_helpers.cc:106] XLA backend allocating 14602718822 bytes on device 0 for BFCAllocator.
2024-09-08 12:01:25.112748: I external/local_xla/xla/pjrt/pjrt_c_api_client.cc:119] PjRtCApiClient created.

Pay attention to the last few lines. They should show that the Arc is detected and available. Next verify by running this:

>>> gpus = tf.config.list_physical_devices('XPU')
>>> for gpu in gpus:
...     print("Name:", gpu.name, " Type:", gpu.device_type)
...
Name: /physical_device:XPU:0 Type: XPU
>>>

If you run into any issues, you may have to import the Intel TensorFlow extension to make sure everything works (you will need it anyway if you are modifying existing sources)

>>> import intel_extension_for_tensorflow as itex
>>> print(itex.__version__)
2.15.0.1
>>>

Congratulations! You now have a virtual environment set up to work with TensorFlow. You can probably get this to work with existing source by making sure to downgrade the version of TensorFlow you use to the above and install the Intel extension. Then change references to GPU to XPU to make sure everything is using the Intel card.

PyTorch

Since we went through everything to get the drivers and TensorFlow working, we can now look at using the Intel PyTorch extension.  Note, I have found that it’s better to keep TensorFlow and PyTorch in separate environments.  That way you will be less likely to run into issues.

First off a couple of notes.  I am purposely not posting links to these sites because you should not go there.  Pain and sorrow will only come to you if you do.  If you go to the PyTorch website, they will mention rebuilding PyTorch so that it supports Intel XPU devices.  Do NOT do this.  Intel also has a website out there that mentions adding another repository to install PyTorch and some additional drivers.  Do NOT do this either.  Doing so will likely break everything.  Yes it is a little fragile at the moment, that is the whole reason I am writing this 🙂

We will again create a Python 3.11 environment using conda.

conda create -n "pytorchintel" python=3.11

Activate this environment with

conda activate tensorflowintel

Now we install PyTorch into this environment:

python -m pip install torch==2.1.0.post3 torchvision==0.16.0.post3 torchaudio==2.1.0.post3 intel-extension-for-pytorch==2.1.40+xpu oneccl_bind_pt==2.1.400+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/

Again run the Python interpreter and run the following to verify everything is working in this environment:

(pytorchintel) bmaddox@sdf1:~$ python3
Python 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> import intel_extension_for_pytorch as ipex
>>> torch.xpu.is_available()
True

That is it!  You are now done!

As with TensorFlow, you will need to make some code changes for PyTorch to work.  Instead of sending the model to “GPU”, you will need to replace calls so they look like this:

model = model.to('xpu')
data = data.to('xpu')
model = ipex.optimize(model)

Conclusion

While it is a bit fragile now, I have had good luck with using my Arc A770 for deep learning and computer vision tasks.  Things like Stable Diffusion using OpenVino work REALLY well.  Other things work by removing and installing packages after you install their requirements and making some minor code modifications.  Intel has some really good documentation available online to port code to use their XPU devices and I highly suggest reading them before you start trying to run existing TensorFlow and PyTorch applications.