Brainbridge - Senior Azure Site Reliability Engineer
We are looking for a highly skilled Azure Site Reliability Engineer (SRE) In this role, you will be instrumental in ensuring the reliability, scalability, and performance of our cloud-based services. You will work closely with cross-functional teams to design and implement best practices, optimize infrastructure, and drive operational excellence across our Azure environments.
Key responsibilities
Automation & CI/CD: Design and maintain automation frameworks for deployment, scaling, and environment management.Monitoring & Maintenance: Implement and manage monitoring tools to ensure system health and proactively resolve issues.
Incident Management: Respond to incidents, perform root cause analysis, and implement preventive measures.
Performance Optimization: Analyze and enhance system performance for scalability and efficiency.
Capacity Planning: Forecast system needs and ensure infrastructure readiness for growth.
Collaboration: Partner with development teams to embed reliability into the CI/CD lifecycle.
Documentation: Maintain detailed documentation of system architecture, configurations, and procedures.
Tool Development: Build internal tools to streamline operations and improve reliability.
Security: Enforce and monitor security controls across all systems.
SLO Management: Define and track Service Level Objectives (SLOs) to meet business reliability targets.
On-call Support: Participate in a rotating on-call schedule for 24/7 support of critical systems.
Required qualifications
Experience: Minimum 5 years in a Site Reliability Engineer or DevOps role, with a strong focus on Microsoft Azure.
Languages:
English: C1 levelBonus: French or Dutch (B1 level)
Technical skills
Proven experience in:
Azure Cloud Services (VMS, Storage, Networking, etc.)
Infrastructure as Code (IaC): TerraForm, ARM templates, or BicepMonitoring & Logging: Azure Monitor, Application Insights, Log Analytics, Grafana, Splunk, Elastic Stack
Scripting: Python, Azure CLI, PowerShell
Containerization: Docker, Kubernetes
Cloud Networking: VPNs, Nsgs, load balancing, hub & spoke models
Azure Governance & Cost Management