Tensorflow on CentOS7 w/ Nvidia GPUs
First boot up a GPU instance. (ami-31490d51)
CentOS 7, amazon style (whatever that means).
sudo yum -y update
sudo yum install -y kernel-devel kernel-headers gcc make
sudo yum install -y epel-release
sudo yum install -y --enablerepo=epel dkms
sudo reboot
This updates your yum repos, then installs a bunch of kernel-ness.
At this point, the installed kernel may not be the same as the running kernel. (Bummer, man.)
To ensure consistency, we also reboot.
Compare your running kernel with the installed source:
uname -r
rpm -q kernel-devel
If they match, you're good!
Locally,
scp ~/.boto ec2-user@$(cat IP)
Then:
aws s3 cp s3://val314159-gpu/install-files . --recursive
sudo sh NVIDIA-Linux-x86_64-367.35.run
answer all the legalese...
(you don't need the 32-bit version)
nvidia-smi
[ec2-user@ip-172-31-3-130 ~]$ nvidia-smi
Sun Jul 17 20:30:06 2016
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.35 Driver Version: 367.35 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GRID K520 Off | 0000:00:03.0 Off | N/A |
| N/A 32C P0 1W / 125W | 0MiB / 4036MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
[ec2-user@ip-172-31-3-130 ~]$
sudo sh cuda7.5.18linux.run
45 spaces accept y y y y RET y y RET
wget https://doc-00-6s-docs.googleusercontent.com/docs/securesc/rmchn5rrn9h0vrd4d7vnrua4a8oqe4bc/nqmk91e589k17pa0vl52hcvpubdc2o88/1468728000000/13951467387256278872/09470983940600970531/0Bz7KyqmuGsilT0J5dmRCM0ROVHc?e=download&nonce=j3uirr35c8dtk&user=09470983940600970531&hash=1tedlt945gjfq5n8a3m3k5jf6aukktp7
==============
sudo yum -y update
sudo yum install -y kernel-devel kernel-headers gcc make
sudo yum install -y epel-release
sudo yum install -y --enablerepo=epel dkms
sudo reboot
wget https://s3.amazonaws.com/val314159-gpu/install-files/nvidia_cuda_stuff.tar
tar xvf nvidia_cuda_stuff.tar
sudo sh cuda_7.5.18_linux.run
export LD_LIBRARY_PATH=/usr/local/cuda/lib64
export PATH=$PATH:/usr/local/cuda/bin
sudo sh NVIDIA-Linux-x86_64-367.35.run
tar xvzf cudnn-7.0-linux-x64-v4.0-prod.tgz
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
nvidia-smi
sudo bash -c 'echo export LD_LIBRARY_PATH=/usr/local/cuda/lib64 >>/etc/bashrc'
sudo bash -c 'echo export PATH=\$PATH:/usr/local/cuda/bin >>/etc/bashrc'
export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.9.0-cp27-none-linux_x86_64.whl
sudo pip install --upgrade $TF_BINARY_URL
sudo reboot