Senior Site Reliability Engineer
Location: Santa Clara
Posted on: June 9, 2021
NVIDIA is looking for a Senior Site Reliability Engineer to work
in IPP (Infrastructure, Planning and Process). IPP is a global
organization within NVIDIA. This group works with various other
groups within NVIDIA Software such as Graphics Processors, Mobile
Processors, Deep Learning, Artificial Intelligence and Driverless
Cars to cater to their infrastructure needs. These cloud services
provide almost half a million automated jobs per day on thousands
of servers helping with the productivity of thousands of NVIDIA's
software engineers worldwide.
The cloud hosts heterogeneous mix of machines and devices with
various operating systems (Windows/Linux/Android), multitude of
hardware platforms both NVIDIA GPUs and Tegra Processors. Are you
passionate about infrastructure and looking for sophisticated,
relevant issues, ready to build the next generation of cloud
services, design creative solutions, mine through data to uncover
real problems and fix them? We are excited to have a fun-loving
person like you.
What you'll be doing:
Develop framework and scripts to automate workflows and
deployments in the cloud environment.
Deploy and maintain a large farm of machines using the latest
Configuration Management & Infrastructure Automation tools (Chef,
Develop extensive monitoring systems to have fast, reliable and
real-time pulse of the various infrastructure subsystems (Zabbix,
Participate in on-call & rotational L1 support for
round-the-clock monitoring of the infrastructure.
Solve complex problems involving infrastructure scaling,
capacity and planning. Analyze and Debug operating system,
networking, configuration and performance problems.
- Assist in roll-out and deployment of new development features
aimed at supporting the latest NVIDIA hardware and
What we need to see:
Bachelor's or master's Degree in Computer Science or Software
Engineering, or equivalent experience.
Ability to debug and analyze source code to triage, root cause
and resolve issues in the infrastructure. Work closely with the
development team in improving the build and test systems.
Familiar with maintenance and setup of Linux, Windows hosts and
popular open source applications such as Nginx, Apache HTTP, Apache
Tomcat and MySQL server.
Hands on experience on any coding languages like -Java, Python,
Experience with version control systems like Perforce, GIT.
- Minimum 8+ years of experience working in large scale
enterprise production systems.
Ways to stand out from the crowd:
Experience with cloud (AWS, Azure) & virtualization technologies
like VMs or containers (Docker, Kubernetes).
Background with Configuration management tools used for
- Experience with GPUs, driver development and CUDA
NVIDIA is leading the way in groundbreaking developments in
Artificial Intelligence, High-Performance Computing and
Visualization. The GPU, our invention, serves as the visual cortex
of modern computers and is at the heart of our products and
services. Our work opens new universes to explore, enables
outstanding creativity and discovery, and powers what were once
science fiction inventions from artificial intelligence to
NVIDIA is looking for phenomenal people like you to help us
accelerate the next wave of artificial intelligence. If you are
creative and passionate about developing cloud services, we want to
hear from you! Widely considered to be one of the technology
world's most desirable employers. We have some of the most
forward-thinking and hardworking people in the world working for
us. If you're creative and passionate about developing cloud
services we want you on our team!
NVIDIA is committed to fostering a diverse work environment and
proud to be an equal opportunity employer. As we highly value
diversity in our current and future employees, we do not
discriminate (including in our hiring and promotion practices) on
the basis of race, religion, color, national origin, gender, gender
expression , sexual orientation, age, marital status, veteran
status, disability status or any other characteristic protected by
Keywords: Nvidia, Santa Clara , Senior Site Reliability Engineer, Other , Santa Clara, California
Didn't find what you're looking for? Search again!