Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

FLUIX AI is a rapidly growing Enterprise B2B SAAS startup based in the San Francisco Bay Area. We specialize in providing innovative solutions for data centers and facilities, leveraging the latest advancements in Machine Learning (ML) and Artificial Intelligence (AI). Our mission is to use AI to solve the world’s inefficiencies, starting with the world’s most important buildings. Facilities that provide the world with communication, data, food, manufactured goods, etc. are ultimately inefficient and require real-time and dynamic optimization. With A.I.M.I. our Artificial Intelligence for Managing Infrastructure Platform, we will usher in a new age of automation & optimization for facilities.

We are seeking a skilled Site Reliability Engineer to join our growing team. The ideal candidate will help ensure the reliability, scalability, and performance of our hybrid-based (Cloud & On-Prem) platform while supporting our AI/ML infrastructure. You will work closely with our engineering, data science, and operations teams to build and maintain robust systems that support our cutting-edge solutions. Your expertise in ML and AI, integration with GenAI model providers, and experience with data centers and manufacturing sites will be crucial in driving the success of our platform.

Who you’ll work closely with

Abhi Sastri

Founder & CEO

Chase Overcash

CTO

What you’ll do

  • Design, implement, and maintain scalable systems while optimizing performance, ensuring high availability and disaster recovery, and assisting with codebase refactoring for modular deployment.

  • Develop and maintain automation tools to streamline operations, improve efficiency, and automate repetitive tasks to enhance system reliability.

  • Collaborate with engineering and data science teams to integrate ML and AI models into production environments, while working with the GenAI community to ensure seamless integration and high performance of cutting-edge models within our technology stack.

  • Identify areas for improvement and drive initiatives to enhance system reliability and performance, while staying updated on industry trends and advancements in SRE practices, ML, and AI technologies.

  • Respond to and resolve incidents to minimize impact and ensure timely resolution, while conducting post-incident reviews and implementing improvements to prevent recurrence.

  • Create and manage multiple cloud instances (dev, staging, test), optimize cloud infrastructure and data center operations, and ensure the security and compliance of both infrastructure and applications.

Your background

  • Bachelorʼs degree in Computer Science, Engineering, or a related field (or equivalent experience).

  • Proven experience as a Site Reliability Engineer or similar role in a SaaS environment, with a strong background in managing and optimizing cloud infrastructure (AWS preferred, or GCP, Azure), experience with ML and AI technologies including GenAI model integration, and familiarity with data center operations and manufacturing site integrations.

  • Proficiency in programming and scripting languages (e.g., Python, Go, Bash), experience with containerization and orchestration tools (Docker, Kubernetes), a strong understanding of networking, security, and performance optimization, and knowledge of CI/CD pipelines and DevOps practices.

  • Excellent problem-solving skills with attention to detail, strong communication and collaboration abilities, and the capacity to thrive in a fast-paced, dynamic startup environment.

www.fluix.ai

Full-time I San Francisco

Apply for this role

Benefits

Competitive Salary

Attractive compensation package, including equity options.

Benefits

Comprehensive health, dental, and vision insurance, along with other standard benefits.

Work Environment

A dynamic and collaborative San Francisco Bay Area work environment.

Growth Opportunities

Opportunities for professional growth and development, with the chance to shape the future of technology in the industry.

Join the team

Apply for this role

Join the team

FLUIX AI is a rapidly growing Enterprise B2B SAAS startup based in the San Francisco Bay Area. We specialize in providing innovative solutions for data centers and facilities, leveraging the latest advancements in Machine Learning (ML) and Artificial Intelligence (AI). Our mission is to use AI to solve the world’s inefficiencies, starting with the world’s most important buildings. Facilities that provide the world with communication, data, food, manufactured goods, etc. are ultimately inefficient and require real-time and dynamic optimization. With A.I.M.I. our Artificial Intelligence for Managing Infrastructure Platform, we will usher in a new age of automation & optimization for facilities.

We are seeking a skilled Site Reliability Engineer to join our growing team. The ideal candidate will help ensure the reliability, scalability, and performance of our hybrid-based (Cloud & On-Prem) platform while supporting our AI/ML infrastructure. You will work closely with our engineering, data science, and operations teams to build and maintain robust systems that support our cutting-edge solutions. Your expertise in ML and AI, integration with GenAI model providers, and experience with data centers and manufacturing sites will be crucial in driving the success of our platform.

Who you’ll work closely with

Abhi Sastri

Founder & CEO

Chase Overcash

CTO

What you’ll do

  • Design, implement, and maintain scalable systems while optimizing performance, ensuring high availability and disaster recovery, and assisting with codebase refactoring for modular deployment.

  • Develop and maintain automation tools to streamline operations, improve efficiency, and automate repetitive tasks to enhance system reliability.

  • Collaborate with engineering and data science teams to integrate ML and AI models into production environments, while working with the GenAI community to ensure seamless integration and high performance of cutting-edge models within our technology stack.

  • Identify areas for improvement and drive initiatives to enhance system reliability and performance, while staying updated on industry trends and advancements in SRE practices, ML, and AI technologies.

  • Respond to and resolve incidents to minimize impact and ensure timely resolution, while conducting post-incident reviews and implementing improvements to prevent recurrence.

  • Create and manage multiple cloud instances (dev, staging, test), optimize cloud infrastructure and data center operations, and ensure the security and compliance of both infrastructure and applications.

Your background

  • Bachelorʼs degree in Computer Science, Engineering, or a related field (or equivalent experience).

  • Proven experience as a Site Reliability Engineer or similar role in a SaaS environment, with a strong background in managing and optimizing cloud infrastructure (AWS preferred, or GCP, Azure), experience with ML and AI technologies including GenAI model integration, and familiarity with data center operations and manufacturing site integrations.

  • Proficiency in programming and scripting languages (e.g., Python, Go, Bash), experience with containerization and orchestration tools (Docker, Kubernetes), a strong understanding of networking, security, and performance optimization, and knowledge of CI/CD pipelines and DevOps practices.

  • Excellent problem-solving skills with attention to detail, strong communication and collaboration abilities, and the capacity to thrive in a fast-paced, dynamic startup environment.

Benefits

Competitive Salary

Attractive compensation package, including equity options.

Benefits

Comprehensive health, dental, and vision insurance, along with other standard benefits.

Work Environment

A dynamic and collaborative San Francisco Bay Area work environment.

Growth Opportunities

Opportunities for professional growth and development, with the chance to shape the future of technology in the industry.

Site Reliability Engineer (SRE)

© Copyright 2024, All Rights Reserved by FLUIX.AI

© Copyright 2024, All Rights Reserved by FLUIX.AI

© Copyright 2024, All Rights Reserved by FLUIX.AI