More Fun with the RTX 2060

So I recently wiped my system and upgraded to Linux Mint Cinnamon 20. I tend to wipe and install on major releases since I do a lot of customization.

Anyway, I wanted to set CUDA back up along with tensorflow-gpu since I have stuff I wanted to do. I recreated my virtual environment and found Tensorflow 2.2.0 had been released. Based on this I found it still needs CUDA 10.1. No worries, went through and put CUDA 10.1, cuDNN, and TensorRT back on my system and everything was working.

I noticed with 2.2.0 that I was getting the dreaded RTX CUDA_ERROR_OUT_OF_MEMORY errors for pretty much anything I did. So I fixed it and figured I’d post this in case it helps anyone else out down the road. You need to add this in so that the GPU memory can grow and use mixed precision with the RTX (which also helps to run things on the TPUs in the RTX series).

from tensorflow import config as tfc
from tensorflow.keras.mixed_precision import experimental as mixed_precision
...
gpus = tfc.experimental.list_physical_devices("GPU")
tfc.experimental.set_memory_growth(gpus[0], True)
policy = mixed_precision.Policy('mixed_float16')
mixed_precision.set_policy(policy)

If you’re having more out of memory errors on your RTX, give this a shot. You can read more about Tensorflow and mixed precision here.

Fun with Linux and a RTX 2060

Or…. how to spend a day hitting your head into your desk.

Or…. machine learning is easy, right? 🙂

For a while now I have been wanting to upgrade my video card in my desktop so I could actually use it to do machine/deep learning tasks (ML). Since I put together a Frankenstein gaming computer for my daughters out of some older parts, I finally justified getting a new card by saying I would then give them my older nVidia card. After a lot of research, I decided the RTX 2060 was a good balance of how much money I felt like spending versus something that would actually be useful (plus the series comes with dedicated Tensor Cores that work really fast with fp16 data types).

So after buying the card and installing it, the first thing I wanted to do was to get Tensorflow 2.1 to work with it. Now, I already had CUDA 10.2 and the most up-to-date version of TensorRT installed, and knew that I’d have to custom compile Tensorflow’s pip version to work on my system. What I did not know was just how annoying this would turn out to be. My first attempt was to follow the instructions from the Tensorflow web site, including applying this patch that fixes the nccl bindings to work with CUDA 10.2.

However, all of my attempts failed. I had random compiler errors crop up during the build that I have never had before and could not explain. Tried building it with Clang. Tried different versions of GCC. Considered building with a Catholic priest present to keep the demons at bay. No dice. Never could complete a build successfully on my system. This was a bit to be expected since a lot of people online have trouble getting Tensorflow 2.1 and CUDA 10.2 to play nice together.

I finally broke down and downgraded CUDA on my system to 10.1. I also downgraded TensorRT so it would be compatible with the version of CUDA I now had. Finally I could do a pip install tensorflow-gpu inside my virtual environment and and it worked and it ran and I could finally run my training on my GPU with fantastic results.

Almost.

I kept getting CUDNN_STATUS_INTERNAL_ERROR messages every time I tried to run a Keras application on the GPU. Yay. After some Googling, I found this link and apparently there’s an issue with Tensorflow and the RTX line. To fix it, you have to add this to your Python code that uses Keras/Tensorflow:

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

...
config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

FINALLY! After several days of trying to custom compile Tensorflow for my system, giving up, downgrading so I could install via pip, running into more errors, etc, I now have a working GPU-accelerated Tensorflow! As an example, running the simple additionrnn.py example from Keras, I went from taking around 3.5 seconds per epoch on my Ryzen 7 2700X processor (where I had compiled Tensorflow for CPU only to take advantage of the additional CPU instructions) to taking under 0.5 seconds on the GPU. I’m still experimenting, and modifying some things to use fp16 so I can take advantage of the Tensor Cores in the GPU.