Skip to content

Commit 4db13c5

Browse files
committed
Create Setting Up Docker and Nvidia Container Toolkit on Ubuntu AWS.md
1 parent df265a2 commit 4db13c5

File tree

1 file changed

+141
-0
lines changed

1 file changed

+141
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
## Setting up Docker and Nvidia Container Toolkit on Ubuntu Instances
2+
3+
Once you have an instance running, you may need Docker for container testing:
4+
The information below has been taken from the [AWS drivers page](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/install-nvidia-driver.html) and the [Nvidia Container Toolkit installation page](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html), but written to be easier to digest.
5+
6+
**Note:** These steps are made for use with Ubuntu AWS instances
7+
8+
9+
### Installing Nvidia GRID drivers and Disabling Nouveau
10+
11+
1. Connect to your Linux instance. Install gcc and make, if they are not already installed.
12+
13+
2. Update your package cache and get the package updates for your instance.
14+
15+
$ sudo apt-get update -y
16+
17+
3. Upgrade the linux-aws package to receive the latest version.
18+
19+
$ sudo apt-get upgrade -y linux-aws
20+
21+
22+
4. Reboot your instance to load the latest kernel version.
23+
24+
$ sudo reboot
25+
26+
5. Reconnect to your instance after it has rebooted.
27+
28+
6. Install the gcc compiler and the kernel headers package for the version of the kernel you are currently running.
29+
30+
$ sudo apt-get install -y gcc make linux-headers-$(uname -r)
31+
32+
7. Disable the nouveau open source driver for NVIDIA graphics cards.
33+
Add nouveau to the /etc/modprobe.d/blacklist.conf blacklist file. Copy the following code block and paste it into a terminal.
34+
35+
$ cat << EOF | sudo tee --append /etc/modprobe.d/blacklist.conf
36+
blacklist vga16fb
37+
blacklist nouveau
38+
blacklist rivafb
39+
blacklist nvidiafb
40+
blacklist rivatv
41+
EOF
42+
43+
1. Edit the /etc/default/grub file and add the following line:
44+
45+
GRUB_CMDLINE_LINUX="rdblacklist=nouveau"
46+
47+
2. Rebuild the Grub configuration.
48+
49+
$ sudo update-grub
50+
51+
3. Download the GRID driver installation utility using the following command:
52+
53+
$ aws s3 cp --recursive s3://ec2-linux-nvidia-drivers/latest/ .
54+
55+
4. Multiple versions of the GRID driver are stored in this bucket. You can see all of the available versions using the following command.
56+
57+
$ aws s3 ls --recursive s3://ec2-linux-nvidia-drivers/
58+
59+
5. Add permissions to run the driver installation utility using the following command.
60+
61+
$ chmod +x NVIDIA-Linux-x86_64*.run
62+
63+
6. Run the self-install script as follows to install the GRID driver that you downloaded. For example:
64+
65+
$ sudo /bin/sh ./NVIDIA-Linux-x86_64*.run
66+
67+
7. When prompted, accept the license agreement and specify the installation options as required (you can accept the default options).
68+
69+
70+
15. Confirm that the driver is functional. The response for the following command lists the installed version of the NVIDIA driver and details about the GPUs.
71+
72+
$ nvidia-smi -q | head
73+
74+
16. If you are using NVIDIA vGPU software version 14.x or greater on the G4dn, G5, or G5g instances, disable GSP with the following commands. For more information, on why this is required visit NVIDIA’s documentation
75+
76+
$ sudo touch /etc/modprobe.d/nvidia.conf
77+
78+
$ echo "options nvidia NVreg_EnableGpuFirmware=0" | sudo tee --append /etc/modprobe.d/nvidia.conf
79+
80+
17. Reboot the instance.
81+
82+
$ sudo reboot
83+
84+
85+
### Installing Nvidia Container Toolkit and Docker
86+
87+
1. Setup package repository and GPG key:
88+
89+
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
90+
&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
91+
&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
92+
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
93+
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
94+
95+
2. Install the Nvidia container toolkit:
96+
97+
$ sudo apt-get update
98+
99+
$ sudo apt-get install -y nvidia-container-toolkit
100+
101+
102+
3. Docker-CE on Ubuntu can be setup using Docker’s official convenience script:
103+
104+
$ curl https://get.docker.com | sh \
105+
&& sudo systemctl --now enable docker
106+
107+
4. Configure the Docker daemon to recognize the NVIDIA Container Runtime:
108+
109+
$ sudo nvidia-ctk runtime configure --runtime=docker
110+
111+
5. Restart the Docker daemon to complete the installation after setting the default runtime:
112+
113+
$ sudo systemctl restart docker
114+
115+
6. At this point, a working setup can be tested by running a base CUDA container:
116+
117+
$ sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
118+
119+
This should result in a console output similar to:
120+
121+
+-----------------------------------------------------------------------------+
122+
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
123+
|-------------------------------+----------------------+----------------------+
124+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
125+
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
126+
| | | MIG M. |
127+
|===============================+======================+======================|
128+
| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
129+
| N/A 34C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |
130+
| | | N/A |
131+
+-------------------------------+----------------------+----------------------+
132+
133+
+-----------------------------------------------------------------------------+
134+
| Processes: |
135+
| GPU GI CI PID Type Process name GPU Memory |
136+
| ID ID Usage |
137+
|=============================================================================|
138+
| No running processes found |
139+
+-----------------------------------------------------------------------------+
140+
141+
Done! You can now use docker to run your applications in a container on your AWS instance.

0 commit comments

Comments
 (0)