Certified Remote
PUBLISHED
Oct 7, 2025
Join ECS as a Mid-level Site Reliability Engineer, where you'll ensure the reliability, scalability, and performance of our critical infrastructure systems. Collaborate with cross-functional teams to optimize cloud environments and implement proactive monitoring solutions for seamless operations.
At ECS, we are at the forefront of delivering innovative technology solutions to federal and commercial clients. As a Mid-level Site Reliability Engineer, you will play a pivotal role in maintaining the uptime and efficiency of our distributed systems. Your responsibilities will include designing and implementing automated solutions to detect, diagnose, and resolve system issues before they impact users. You will work closely with software engineers to integrate reliability practices into the software development lifecycle, leveraging tools like Terraform for infrastructure as code and Ansible for configuration management.
Key aspects of the role involve proactive capacity planning, incident response, and post-mortem analysis to continuously improve system resilience. You will monitor application performance using advanced telemetry and contribute to on-call rotations to ensure 24/7 availability. This position offers the opportunity to work on cutting-edge projects in a remote-certified environment, fostering innovation while supporting mission-critical applications.
If you are passionate about building reliable systems at scale and thrive in a collaborative setting, ECS provides the platform to advance your career in site reliability engineering.
The employer recommends obtaining this certification to validate your skills and enhance your application.
Note: You can still apply for this position without the certification, but having it will make your profile stand out and may be required to move forward in the hiring process.