- Posted 02 May 2023
- Salary $185k - $200k Base
- LocationRedwood City
- Job type Full Time
- DisciplineSoftware Engineering
- Reference224931
Staff Software Engineer - Backend Distributed Systems
Job description
At Snorkel AI, we’re redefining how people and organizations build AI applications. Snorkel started as a research project in the Stanford AI Lab in 2016, creating a higher-level interface to machine learning through programmatically labeled and managed training data. From deploying in some of the world’s largest and most sophisticated tech organizations, to empowering scientists, doctors, and journalists — we’ve seen firsthand how this approach democratizes and accelerates AI. Now, we’re building Snorkel Flow to bring our technology to everyone!
Building Snorkel Flow requires outstanding engineers and technologies across the stack, including scalable data pipelines, elegant and intuitive interfaces (both visual and programmatic), state-of-the-art ML modeling techniques, and best practices for seamless deployment. Modern AI approaches require large labeled training datasets to learn from. While traditional approaches typically rely on armies of human annotators to label by hand, Snorkel Flow empowers users to programmatically label and build training data sets to drive a radically faster, more flexible, and higher quality end-to-end AI development process. Snorkel Flow is an end-to-end development platform, complete with a GUI and powerful programmatic interfaces for driving the development process for full AI application workflows: from preprocessing, to programmatic training data creation, to ML model training, to analysis, and deployment. It's the data-first platform for enterprise AI.
Overview
Snorkel AI is looking for a staff distributed systems architect and backend engineer. The company’s flagship product is a cloud-based enterprise software used by data scientists and ML engineers. Snorkel products are used by large enterprises to solve their most impactful problems in today’s data-centric AI world.
You will be part of the backend team that is building a scalable and reliable distributed system that empowers users to solve their most pressing needs in a data-centric AI world. The team has a variety of technical backgrounds, from machine learning PhDs to full-stack engineers who are building large-scale production systems. You will become one of these pragmatic, high-output, product-focused engineers.
Location
Hybrid schedule with 1 or 2 days per week in our Redwood City HQ.
Main Responsibilities
Prototype, optimize, and maintain scalable back-end services that will power new ML development workflows
Design extensible and testable interfaces between internal services including the underlying storage and data models
Own the architecture, design, development, and operations of large-scale systems designed for AI/ML tasks including data management systems, data engineering workflow systems, distributed compute systems and connect to the front-end components
Work with customers to understand their product use case, desired capabilities, and scale requirements and translate that to engineering specifications and code
Be an engaged team player in a customer-focused cross-functional environment where you will feel excited to take on whatever is most impactful for the company and product
Work a hybrid schedule with one or two days per week in our Redwood City HQ and work remotely with "No Meeting" Tuesdays and Thursdays
Must haves
Bachelor's degree in Computer Science or related field
4+ years experience in delivering distributed systems and services in a production setting for cloud-native applications
Ability to design and build efficient scalable data storage and retrieval systems for AI/ML tasks
Strong communication and coding skills with emphasis on designing for scale and robustness
Proactive and positive attitude to lead, learn, troubleshoot and take ownership of shipping multi-quarter large feature development as well as immediate debugging and unblocking customers
Nice to haves
8+ years of professional software engineering experience
Experience with architecting and developing production web-scale systems (monitoring, telemetry, performance, reliability, triage and debug)
Strong development and debugging skills in Python
Experience developing enterprise software products for machine learning and/or data science applications
Experience with distributed compute frameworks and/or deep learning frameworks
Experience building and maintaining large scale, distributed and high performance data pipelines