Digital

Senior Product Reliability Engineer - Iguazio

Job ID: 93043

This role is pivotal in ensuring the reliability and efficiency of Iguazio's products and requires a proactive, skilled professional dedicated to continuous improvement and excellence in client service.

  • Tel Aviv


Do you want to do work that matters, alongside supportive leaders who will help you grow faster than you ever thought possible? Are you a creative problem-solver who is energized by challenges? You’ve come to the right place.

Who You'll Work With

Iguazio is a leading AI and machine learning company based in Herzliya. Recognized for helping enterprises deliver enterprise-wide analytics at scale, Iguazio’s advanced Machine Learning Operations (MLOps) pipeline enables clients to streamline and manage their AI, from the initial concept all the way to production, in a simplified and automated manner.
The Dev Support Team’s responsibility is to investigate and solve intrinsic problems, find smart and elegant workarounds (written in Python and Bash), build, and characterize support tools and utilities.

Your impact within our firm

On this role you will develop and maintain effective communication channels with both internal teams and external clients to ensure seamless information flow.  
You will gain an in-depth understanding of Iguazio's technology and infrastructure, identifying and mastering the most intricate areas.
You will take full ownership of customers’ technical product issues from initial troubleshooting through to resolution, including writing Root Cause Analyses (RCAs) and communicating findings.
Also, you will develop supportability tools for internal teams and external clients to enhance the usability and maintenance of Iguazio's products.
You will lead problem-solving efforts for the entire technology stack in collaboration with clients and internal teams. Act as a consultant on infrastructure best practices, focusing on reliability and scalability.
You will lead and manage Iguazio's deployments across cloud platforms - AWS, Azure, Google Cloud.
Work closely with developers, product managers, and the architecture team to ensure cohesive operations, while developing and providing comprehensive procedures for the operation and maintenance of Iguazio’s platform.
Create and lead initiatives to improve support processes across the company.

Your qualifications and skills

  • 5+ years of experience in troubleshooting and recovery procedures, with hands-on experience in L3/ development support
  • Extensive work experience with Docker and orchestration systems (e.g Kubernetes), cloud services (AWS, Azure, Google Cloud), and Linux environments
  • Strong troubleshooting skills at a developer level, with proficiency in writing scripts (Bash, Python)
  • Excellent verbal and written English communication and presentation skills
  • Familiarity with MLOps technologies and implementations
  • An academic degree in Computer Science, Industrial Engineering, or a related field is an advantage

Please review the additional requirements regarding essential job functions of McKinsey colleagues.
Apply Now Apply Later
Job Skill Group - CSS Associate
Job Skill Code - ENGS - Senior Software Engineer
Function - Technology
Industry - High Tech
Post to LinkedIn - Yes
Posted to LinkedIn Date - Tue Nov 05 00:00:00 GMT 2024
LinkedIn Posting City - Tel Aviv
LinkedIn Posting State/Province -
LinkedIn Posting Country - Israel
LinkedIn Job Title - Senior Product Reliability Engineer - Iguazio
LinkedIn Function - Engineering;Information Technology
LinkedIn Industry - Computer Software;Information Technology and Services
LinkedIn Seniority Level - Mid-Senior level