Certified Remote
PUBLISHED
Nov 14, 2025
EPAM Systems is seeking a Senior Site Reliability Engineer with expertise in Azure to drive the reliability, scalability, and performance of our cloud-based systems. In this role, you will collaborate with development and operations teams to automate infrastructure, monitor services, and resolve incidents efficiently for global clients.
As a Senior Site Reliability Engineer (SRE) focused on Azure at EPAM Systems, you will play a critical role in ensuring the reliability and efficiency of our clients' cloud infrastructures. EPAM is a leading digital transformation services provider, and you will work on innovative projects across various industries, leveraging Azure to build scalable, resilient systems.
Your primary responsibilities will include designing, implementing, and maintaining monitoring solutions to proactively identify and mitigate issues before they impact users. You will automate deployment processes, optimize resource utilization, and contribute to chaos engineering practices to enhance system resilience. Collaboration with software engineers, product managers, and other stakeholders will be key to defining service level objectives (SLOs) and error budgets.
In this remote position based in Brazil, you will have the opportunity to work with cutting-edge Azure technologies, including cloud-native services, AI-driven operations, and security best practices. EPAM fosters a culture of continuous learning and innovation, providing you with the tools and support to advance your career in cloud reliability engineering.
If you are passionate about making systems more reliable and efficient in a dynamic, fast-paced environment, join our team and contribute to world-class digital solutions.
The employer recommends obtaining this certification to validate your skills and enhance your application.
Note: You can still apply for this position without the certification, but having it will make your profile stand out and may be required to move forward in the hiring process.