7. Reference materials (Driver/library)
Some drivers and libraries are hardware dependent and may not work with specific version combinations. It is not fully guaranteed to work by following the description below.
7.1. How to check versions
Please refer to Recommended system requirements to confirm if the version of driver/library are supported.
Docker
$ docker version
->If it fails,
(1) start docker.
$ service docker start
(2) install the supported version, if you have the one with different version.
docker-compose
$ docker-compose --version
->If it fails,
$ systemctl enable docker
Nvidia-driver
$ nvidia-smi
->GPU card of your MANUFACIA server may not be supported by the recommended CUDA version.
CUDA
$ ls -l /usr/local | grep cuda
nvidia-docker
$ nvidia-docker -v
If the versions of drivers or libraries that are installed are not as same as the supported environment, uninstall them first and then install the version recommended.
7.2. How to uninstall
CUDA
$ sudo apt purge cuda*
$ sudo apt purge nvidia-cuda-*
$ sudo apt purge libcuda*
$ sudo apt autoremove
$ reboot
Docker
$ sudo yum remove docker-ce
$ sudo apt-get remove docker docker-engine docker.io containerd runc
$ sudo rm -rf /var/lib/docker
$ sudo apt autoremove
Docker Compose
$ sudo rm /usr/local/bin/docker-compose
Nvidia-Driver
$ sudo apt-get purge nvidia-*
$ sudo apt-get purge libnvidia*
7.3. How to install
7.3.2. Install CUDA and Nvidia-Driver
To use MANUFACIA GPU version in a server with GPU board, both CUDA and Nvidia-Driver should be installed in the server. In the following it is described how to install with necessary shell commands.
7.3.2.1. Updating MANUFACIA from v2.0/v2.1/v2.1.2
CUDA10.x should have been already installed. Please use following command to resolve library dependency.
# sudo apt-get purge nvidia-*
# sudo apt-get purge cuda-*
7.3.2.2. Find available drivers for the server with GPU board
The following command will show the list of available drivers.
# ubuntu-drivers devices
If the command ubuntu-drivers was not found, please install it by calling the following command. Examples of the list.
# apt install ubuntu-drivers-common
Examples of the list from the command ubuntu-drivers devices
vendor: NVIDIA Corporation
driver: nvidia-driver-460-third-party free recommended
driver: nvidia-driver-415-third-party free
7.3.2.3. Install Nvidia-Driver
The following command will install the requested Nvidia driver into the server.
# sudo apt-get install nvidia-driver-455
If there is a problem to resolve dependency of drivers, use the command below.
# sudo aptitude install nvidia-driver-455
7.3.2.4. Install CUDA 11.1
The following command will install CUDA 11.1 driver into the server.
# sudo apt install cuda-11-1
or
# sudo apt-get install cuda-11-1
The following command will fix incomplete packages to install CUDA11.
# sudo apt --fix-broken install
Edit the configuration file .bashrc under the home directory in the following way. Then reboot the server by “sudo reboot”. Any editor or other tool can be used for this purpose. Add following lines in the configuration file.
# set PATH for cuda 11.1 installation
if [ -d "/usr/local/cuda-11.1/bin/" ]; then
export PATH=/usr/local/cuda-11.1/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
fi
The following command will show CUDA or Nvidia driver versions. Check if they are the expected ones.
# docker exec -it manufacia_app_1 bash
:~/rb#cat public/docker-build-conf.txt
[Output example]
BASE_IMAGE: nvidia/cuda:9.2-base-ubuntu18.04-sha256:e2caae08d28e7026f7ce5334a7375c306b2279fc5a47ef031be4262e0e4e394a
COMPOSE_FILE: /nix/store/b35biq54rnby9aabnsz8rga84fpfbqrm-docker-compose.gpu.yaml
CUDA_VERSION: 11.1
DOCKER_IMAGE: manufacia:2.1.3-gpu
RAILS_ENV: production
USE_GPU: true
:~/rb# nvidia-smi
[Output example] (The CUDA version below is not necessarily the same as the one above.)
+-------------------------------------------------------------------------+
| NVIDIA-SMI 455.32.00 Driver Version: 455.32.00 CUDA Version: 11.1 |
|-------------------------------+----------------------+------------------+
After installing drivers or libraries, reboot the system and then confirm if NVIDIA Docker properly works.
$ sudo reboot
$ service docker start