The other day I was compiling the latest OpenCV on my computer and had planned on doing what I normally do when it’s done: run checkinstall to build a .deb for it because I like to keep all my files under package management. OpenCV finished compiling fairly quickly (it’s nice when you can do a make -j 16) and I then ran checkinstall.
It crashed while it was running and left a half-installed Debian package of OpenCV on my system. “No problem” I thought, I’ll just uninstall the deb and do a normal make install. Sometimes checkinstall crashes so I didn’t think anything was out of the ordinary. Since I usually put it in /opt/opencv4 it would still be self contained at least.
I noticed a little bit later that my system was acting oddly. Some things wouldn’t run, I couldn’t sudo any more, etc. I rebooted as a first check to see if it was just something random going on. And that’s when my system rebooted to a text mode login prompt. “Huh, maybe the card/drivers didn’t initialize fully I’ll just reboot again.” Nope, no joy still the text login.
I tried to login only to watch the process pause after I typed my password, and then came back up the login prompt. “Odd, maybe I’ll see if it’s something weird and try another virtual console.” Nope, no joy there. Tried to ssh into it, no joy there either. I was worried my SSD was going out. It’s not that old, but still a worry.
So I used my laptop to make a bootable Mint installer and plugged that in and tried to boot. The graphics screen was corrupted and had to use safe mode to log in. “Holy crap, is my graphics card messed up along with the hard drive?” I was worried about this because a new power supply I bought a while back had nuked my old motherboard so had to replace hardware in my system. (That’s a story for another day).
I could still get a GUI when I booted into safe mode from the thumb drive so assumed the open source drivers on the latest Mint installer just didn’t like my card unless I did safe mode.. I did a SMART test to make sure nothing was wrong with the drive. That worked so I ran a fsck to check the integrity of the drive. I then went to set up a chroot to the hard drive so I could run debsums to make sure the packages hadn’t gotten randomly corrupted. And then I noticed a problem.
I couldn’t set up the chroot to work. I kept getting an error about /bin/bash not existing. I checked the /bin directory on the hard drive and sure enough, it was empty save for a broken link to some part of the JDK. “That’s odd, there were no drive errors but /bin is empty.” I thought about things for a moment and it randomly did an ls -ld on the root of the hard drive but didn’t see anything at first.
Then hit it me: “Wait a minute, /bin is supposed to be a link to /usr/bin these days.” I realized that for whatever reason, it looked like checkinstall had replaced the link for /bin with an actual /bin and had randomly placed a link in there for the jdk. I deleted the directory and replaced the link to /usr/bin and rebooted. Boom, system booted normally. Well, mostly normally. CUDA had somehow disappeared from the drive and I had to reinstall it (didn’t use the packages from nVidia since they want to downgrade my video drivers so just did a local install). I ran debsums to check and everything verified properly.
The moral of the story is, it’s good to have debugging skills and know how your computer is supposed to work!