N
NVIDIA

US, CA, Santa Clara · $184,000

Machine Learning Engineer, GeForce G-Assist

Apply Now

Position Overview

At NVIDIA, we're building GeForce G-Assist — an on-device AI assistant that combines Small Language Models (SLMs), retrieval systems, and hybrid cloud capabilities to deliver responsive, context-aware assistance inside the GeForce ecosystem. We work closely across engineering and product teams to ensure G-Assist performs reliably in real-world scenarios.

Responsibilities

  • Together, we focus on how models behave in production, not just on benchmarks. Evaluate and improve Small Language Models used in GeForce G-Assist, with an emphasis on accuracy, robustness, and conversational reliability. Identify and mitigate conversation and context contamination, including state drift, prompt leakage, and retrieval cross-talk.
  • Work with SLM and VLM architectures to support text and multimodal interactions. Collaborate on hybrid architectures that combine local SLMs with cloud-based models. We value engineers who enjoy thinking across the full system—from model behavior to runtime performance.
  • Optimize local inference using llama.cpp, including quantization, memory usage, and performance tuning. Read, write, and optimize C/C++ code in performance-critical paths.
  • Design and integrate retrieval-augmented generation (RAG) systems that ground responses in system and user context. Support agentic AI workflows, enabling planning, tool use, and multi-step execution.

Requirements & Skills

  • 8+ years of validated experience in system software or a related field, with an M.S. or higher degree in Computer Science,