Sr Director of Platform & Site Reliability Engineering Apply to This Job
Location: Santa Clara
Posted on: February 17, 2021
Company: Software Services, Santa Clara, CA Remarks: Site
reliability and Platform Engineering are key functions in IT and
this highly visible senior leadership role will be responsible for
Infrastructure platforms and application support strategy, roadmap,
and technical implementation of the IT Transformation programs
- Manage Compute Platform as a service with end-to-end
responsibility for delivering and supporting the on-prem and cloud
compute platforms ( GCP, AWS) , VMWARE, Kubernetes, Terraform,
Ansible, CI/CD, Artifactory etc for continuously deploying
- Own automation for delivery of Platform services using
Infrastructure as Code. Build standard playbooks for Platform which
can be consumed across multiple teams in the organization.
- Lead delivery of Cloud Infrastructure strategies aligned with
business objectives with a focus on mass Application movements into
the Cloud involving design, implementation and Infrastructure
- Build a high performing team of Cloud Platform SMEs and
platform leads while mentoring traditional platform SMEs on cloud
computing best practices, technology, and adoption.
- Build and manage an SRE function that owns application
availability and performance and manage it through automation and
proactive/predictive alerts by having a strong data analytical tool
set to identify areas of improvement.
- Implement comprehensive service monitoring to ensure uptime and
performance, including synthetic, real user, system, application
performance, dashboards etc.
- Define, measure, and meet key Service Level Objectives
including availability, performance, incidents and chronic
- Own end-to-end availability and performance of mission critical
services and build automation to prevent problem recurrence;
eventually automate response to all non-exceptional service
- Partner with application and business stakeholders to ensure
high quality product is developed and released into production.
Establish and periodically update the Release Policy which governs
the release process and details release categories, release
activities, role & responsibilities, exception, etc.
- Work closely with Enterprise Architecture and Information
Security to specify and document solutions and practices.
- Keep abreast with evolving threats/risks, industry trends and
work to implement best practices in the organization.
- BA/BS degree in Computer Science or related technical field, or
equivalent practical experience.
- 10+ years of hands-on technical experience combined with strong
management and communication skills.
- Solid understanding of Windows, Linux, Networking, TCP-IP,
Routing, Switching, Firewalls, Load balancers and other
- Solid understanding of modern cloud technologies and developer
family of products: GKE, Istio, Serverless, Cloud Build, Monitoring
and Logging, as well as the Microservices, DevSecOps etc.
- Experience running revenue generating applications in a public
cloud and IaaS, including real world experience with at least one
public cloud provider: AWS, Google Cloud or Microsoft Azure.
- Experience building, scaling, and running production operations
for heterogeneous applications.
- Strong troubleshooting experience and skillset to resolve
incidents across multiple domains.
- Ability to nurture and support a strong operations culture:
customer/service focus excellent technology; high quality
implementations; self-motivated innovation and
- Demonstrated ability of establishing and maintaining
metrics-based process improvement.
- Demonstrated ability to develop strong alliances with those
outside of your immediate organization.
- Experience in building and managing strong technical
- Excellent communications, organization, and time management
Like This Job No. 643421
Head of Platform
Job No. 643171
Director of DevSecOps
Santa Clara, CA or Chicago, IL
Job No. 642082
Director of Practice Development IoT Engineering
San Francisco, CA
Job Tools:Apply to This Job --Print this job
Share thisEmail this job to meSend this job to a colleagueYour
* First Name:
* Last Name:
* First Name:
* Last Name:
Comments you would like to include (optional):
* Required fields
Need Help? Your Member ServicesRepresentative is
here for you at 1-203-750-1030
Keywords: ExecuNet, Santa Clara , Sr Director of Platform & Site Reliability Engineering Apply to This Job, Professions , Santa Clara, California
Didn't find what you're looking for? Search again!