SENIOR SITE RELIABILITY ENGINEER
Company: NVIDIA
Location: Santa Clara
Posted on: September 26, 2024
|
|
Job Description:
NVIDIA is looking for a Senior Site Reliability Engineer to work
in IPP (Infrastructure, Planning and Process). IPP is a global
organization within NVIDIA. This group works with various other
groups within NVIDIA Software such as Graphics Processors, Mobile
Processors, Deep Learning, Artificial Intelligence and Driverless
Cars to cater to their infrastructure needs. These cloud services
provide almost half a million automated jobs per day on thousands
of servers helping with the efficiency of thousands of NVIDIA's
software engineers worldwide.The cloud hosts a heterogeneous mix of
machines and devices with various operating systems
(Windows/Linux/Android), a multitude of hardware platforms both
NVIDIA GPUs and Tegra Processors. Are you passionate about
infrastructure and looking for sophisticated, relevant issues,
ready to build the next generation of cloud services, craft
creative solutions, mine through data to uncover real problems and
fix them? We would be excited to have you onboard.What you'll be
doing:Develop frameworks and scripts to automate workflows and
deployments in a private cloud environment that houses several
compute servers with NVIDIA GPUs.Specific focus on building and
stabilizing our virtualization infrastructure of ESXi, KVM and
Hyper-V.Deploy and maintain a large farm of machines using the
latest Configuration Management & Infrastructure Automation tools
(Chef, Ansible, Terraform).Develop extensive monitoring systems to
have fast, reliable and real-time pulse of the various
infrastructure subsystems (Zabbix, Big Panda, Grafana).Participate
in on-call & rotational L1 support for round-the-clock monitoring
and remediation of the infrastructure. (PagerDuty)Tackle
sophisticated problems involving infrastructure scaling, capacity
and planning.Analyze and Debug operating system, networking,
configuration and performance problems.Assist in roll-out and
deployment of new development features sought at supporting the
latest NVIDIA hardware and technologies.What we need to
see:Bachelor's or Master's Degree in Computer Science or Software
Engineering, or equivalent experience.Proven experience working in
large scale enterprise production systems. 6+ years of professional
experience required.Ability to debug and analyze source code to
triage, root cause and resolve issues in the infrastructure. Work
closely with the platform engineering team in understanding
hardware setups.Familiar with maintenance and setup of Linux,
Windows hostsHands-on coding experience with any of Python, Go.
Unix shell proficiency. Knowledge of Java, C.Experience with
version control systems like Perforce, GIT.Ways to stand out from
the crowd:Experience with VM and hardware virtualization
technologies like VMware, KVM, Hyper-V, Docker and
Kubernetes.Background with automating bare metal and VM
provisioning.Experience with supporting GPUs, embedded device
development, driver development and CUDA/TensorRT
applications.Development experience in Chef, Ansible and
infrastructure orchestration.NVIDIA is looking for engineers like
you to help us accelerate the next wave of artificial intelligence.
If you are creative and passionate about developing cloud services,
we want to hear from you! Widely considered to be one of the
technology world's most desirable employers. If you're creative and
passionate about developing cloud services we want you on our
team!The base salary range is 164,000 USD - 327,750 USD. Your base
salary will be determined based on your location, experience, and
the pay of employees in similar positions.You will also be eligible
for equity and benefits (https://www.nvidia.com/en-us/benefits/) .
NVIDIA accepts applications on an ongoing basis.NVIDIA is committed
to fostering a diverse work environment and proud to be an equal
opportunity employer. As we highly value diversity in our current
and future employees, we do not discriminate (including in our
hiring and promotion practices) on the basis of race, religion,
color, national origin, gender, gender expression, sexual
orientation, age, marital status, veteran status, disability status
or any other characteristic protected by law.
Keywords: NVIDIA, Santa Clara , SENIOR SITE RELIABILITY ENGINEER, Professions , Santa Clara, California
Click
here to apply!
|