A
Allen Institute for AI

Seattle, WA · $146,880 - 220,320

Lead Software Engineer, AI Infrastructure

Apply Now

About the Role

You are a visionary leader who occupies the space between high-level software orchestration and low-level system performance. You are motivated by the idea that world-class infrastructure should be a catalyst for public good, not a proprietary secret. You understand that in the world of frontier AI, the software and the hardware are a single, inseparable organism.

Responsibilities

  • Strategic Leadership: Develop the roadmap for managing large-scale HPC systems
  • Full-Stack Ownership: Lead the design and delivery of critical systems from Beaker job scheduler to execution runtime
  • System Automation: Build innovative tooling and software-defined infrastructure
  • Performance Optimization: Conduct root-cause analysis on complex distributed system failures
  • Mentorship & Culture: Foster a high-performance culture by reviewing code/design docs

Requirements

  • 10+ years of professional experience developing business-critical software and operating large-scale compute infrastructure
  • Deep Linux Expertise: Expert-level knowledge of Linux internals and container runtimes
  • Distributed Systems Mastery: Designing, debugging, and optimizing high-scale distributed systems
  • HPC Foundations: Experience with Kubernetes or Slurm and high-performance networking (NCCL and InfiniBand)

Benefits

  • Medical, dental, vision coverage
  • 401k plan
  • $125/month commuting or internet expenses
  • $200/month fitness and wellbeing expenses
  • 20 vacation days, 10 sick days, 7 personal days