★ Learn by doing, not by watching

Become the engineer people trust during an outage.

ShellQuest teaches the mental models behind real infrastructure work — from Linux and DNS to Windows Server, storage, virtualisation, OpenStack and SRE — through quests, puzzles and labs.

12 tracksInteractive puzzlesA real incident labSkill diagnostic
incident // prod-web-03
⚠ Users can't SSH into prod-web-03
$ ssh deploy@prod-web-03
deploy@prod-web-03: Permission denied (publickey).
$ sudo tail -2 /var/log/auth.log
sshd[7781]: Authentication refused:
bad ownership or modes for directory /home/deploy
$ ls -ld /home/deploy
drwxrwxrwx 4 deploy deploy 4096 /home/deploy

A real failure mode. Investigate it in the puzzle.

Why ShellQuest

Most infrastructure learning doesn't stick

Videos you forget, certs that test recall, tutorials that go stale, and nothing that teaches you to think under pressure.

The usual way

  • 📺 Passive videos you forget by Friday
  • 🧩 Scattered blog posts with no path
  • 📜 Certification cramming, not real skill
  • 🪫 Too shallow, or boring after ten minutes
  • 🚫 Never lets you actually break things safely

The ShellQuest way

  • ⚡ Short lessons with one clear mental model each
  • 🧠 Interactive puzzles that reward reasoning
  • 🖥️ Realistic labs — incident practice without prod
  • 🗺️ Visual explainers: request paths, failures, scale
  • 🎯 Daily practice and spaced repetition that lands
Learn

Tracks for the whole stack

From your first command to production incident response — beginner to advanced in every domain.

Linux

Beginner → Advanced

From first commands to kernel-depth troubleshooting.

PermissionssystemdjournalctlDisk & inodes
View track →

Windows Server

Beginner → Advanced

AD, Group Policy, Event Viewer and PowerShell, demystified.

Active DirectoryKerberosNTFS ACLsGPO
View track →

Networking

Beginner → Advanced

Packets to load balancers — and how to debug the path.

TCP/IPSubnettingHTTPLoad balancing
View track →

DNS

Beginner → Advanced

The first domino in most outages. Master it.

ResolversRecord typesTTL & cachingSplit-horizon
View track →

Storage

Beginner → Advanced

Disks to disaster recovery, RPO/RTO and restore testing.

RAIDLVMSnapshotsBackups
View track →

Virtualisation

Intermediate → Advanced

Hypervisors, vCPUs, overcommit and live migration.

HypervisorsvCPU schedulingMemory overcommitSnapshots
View track →

OpenStack

Intermediate → Advanced

Run a private cloud — Nova, Neutron, Cinder, Keystone, Glance.

KeystoneNovaNeutronCinder
View track →

PowerShell

Beginner → Advanced

Objects, not text — the pipeline that runs Windows.

Object pipelineWhere/Select/ForEachRemotingError handling
View track →

Bash

Beginner → Advanced

Glue the system together — without the footguns.

Pipes & redirectionQuotingExit codesDefensive scripting
View track →

Python for Sysadmins

Beginner → Advanced

When Bash isn't enough — automation that scales.

subprocessAPIsIdempotencyRetries & backoff
View track →

SRE

Intermediate → Advanced

SLOs, error budgets, golden signals and incident response.

SLI/SLO/SLAError budgetsGolden signalsIncident response
View track →

Interview Prep

All levels

High-signal infra fundamentals, asked the way they ask them.

DNSPermissionsTroubleshooting judgementNetworking
View track →
Practise

Puzzles that feel like real incidents

No multiple-choice trivia. Investigate with real commands, read the output, find the root cause, choose the safe fix.

Signature feature

The Linux Black Box Lab

Realistic incident practice without touching production.

A production Linux web server is responding slowly and sometimes timing out. nginx is technically running, but users are complaining. CPU isn't high. Disk usage 'looks normal' at first glance. SSH works. Your job: investigate safely, narrow the uncertainty, identify the root cause, and choose the safest fix — without touching production blindly.

df and du can disagree — and the gap is usually a deleted file held open.Deleted files keep their blocks until every process closes them.High load does not always mean high CPU; IO wait is load too.
prod-web-03 // investigate
$ df -h /
/dev/sda1 50G 50G 0 100% /
$ du -sh /* | sort -h | tail -1
3.2G /var # only ~5G total… where's the space?
$ top
%Cpu: 9 id, 84.3 wa load avg 26.4
$ lsof | grep deleted
gunicorn 3992 ... debug.log (deleted) 42G
Know your level

Find your infrastructure level in 10 minutes

Twenty questions across Linux, Windows, DNS, networking, storage, scripting and troubleshooting judgement. We'll place you and recommend where to start.

Helpdesk ExplorerJunior SysadminProduction OperatorInfrastructure EngineerSRE CandidateGreybeard Wizard
Read

Field notes for infrastructure engineers

Start your quest to become dangerously good at infrastructure.

Join the waitlist for early access, or jump straight into a puzzle and see how you think under pressure.