Guide · ~5 min read

Getting started

Everything you need to request, connect to, and use a GPU node in the AI Lab. Read this once before you submit your first request.

00

Before you start

What you need to know and have on your machine.

This is a self-serve compute environment — there is no babysitting. You should be comfortable with the following before requesting access:

  • ·
    Linux & SSH. You can navigate a remote shell, edit files with vim/nano, manage processes (top, htop, ps, kill), and read logs.
  • ·
    Docker. You know what an image vs a container is, can write a Dockerfile or use a prebuilt one, and can mount volumes. Workloads run inside containers — no global pip installs.
  • ·
    Your ML framework. You know how to run training/inference in PyTorch, TensorFlow, JAX, or whatever you brought.
  • ·
    Git. Your code lives in a repo. You'll clone it on the node and push results back.
If any of these are unfamiliar
Don't request access yet — go learn the basics first. The lab admins won't tutor you on Linux or Docker. There are great free resources online; pick one and spend a weekend.
01

Install Tailscale

The GPU nodes are on a private network. You'll connect via Tailscale.

All three GPU nodes live behind Tailscale (private CGNAT IPs in the 100.64.0.0/16 range). You cannot reach them without Tailscale running on your machine.

macOS
brew install tailscale
Linux
curl -fsSL https://tailscale.com/install.sh | sh

After installing, run tailscale upand sign in with the account the admin links to your email. Once you're on the tailnet, the node IPs in your approval email will be reachable directly.

$ tailscale ping 100.64.0.1
# pong via direct in 12ms — you're connected.
02

Submit a request

Tell the admin what you want to do and for how long.

Create an account with your university email, then submit a request describing your project. Good briefs get approved faster — be specific:

Example brief

Thesis project — fine-tune YOLOv8 on a custom 30K-image fruit detection dataset. PyTorch 2.4 + CUDA 12.1. Training fits on a single GPU. Estimated 2 weeks of compute, mostly overnight runs.

  • · Mention framework + version (PyTorch, TF, JAX, …)
  • · Mention dataset size + storage needs
  • · Mention how long you actually need the node
  • · Don't ask for "always-on" — request realistic windows
03

Connect via SSH

On approval, you'll receive an email with credentials.

The email contains your node, IP, username, password, and an expiry date. Connect from your terminal:

$ ssh student01@100.64.0.1

On first connection you'll be asked to verify the host fingerprint — type yes. Change your password immediately:

$ passwd
Pro tip — set up SSH keys
After your first password login, copy your SSH public key over with ssh-copy-id. Way faster than typing a 14-char password every time.
04

Run your workload (Docker)

Containerize. The host stays clean. You stay sane.

Run everything inside Docker. The nodes have nvidia-container-toolkit installed, so you can pass GPUs into containers with --gpus all:

# interactive PyTorch shell with all GPUs $ docker run --gpus all -it --rm \ -v $PWD:/workspace -w /workspace \ pytorch/pytorch:2.4.0-cuda12.1-cudnn9-runtime # or run a script and exit $ docker run --gpus all --rm \ -v $PWD:/workspace -w /workspace \ pytorch/pytorch:2.4.0-cuda12.1-cudnn9-runtime \ python train.py
  • · nvidia-smi on the host shows your processes — admins watch this
  • · Mount your work directory with -v $PWD:/workspace
  • · Keep large datasets in /data (shared, read-only mount)
  • · Save final artifacts to ~ — get them before your access expires
05

Rules of the road

Be a good neighbor. Other students need the node too.

  • ·
    No shared accounts. The credentials are yours alone. If a friend needs access, they request their own.
  • ·
    Stay in your time window. When your access expires, your account is removed. Plan around it — checkpoint often.
  • ·
    Don't fill the disk. Clean up after yourself. Failed runs, stale Docker images, model checkpoints you don't need — gone.
  • ·
    Don't hog all GPUs. If others are using the node, scope to specific GPUs with CUDA_VISIBLE_DEVICES or --gpus '"device=0"'.
  • ·
    No mining, no scraping. Obvious. Instant ban + report to your department.
  • ·
    Report problems. If something is broken — driver, network, disk — tell an admin. Don't silently struggle.

Ready? Submit your first request.

Create account →