Staff Software Engineer - AI/ML | Redwood City |

Job description

At Snorkel AI, we’re redefining how people and organizations build AI applications. Snorkel started as a research project in the Stanford AI Lab in 2016, creating a higher-level interface to machine learning through programmatically labeled and managed training data. From deploying in some of the world’s largest and most sophisticated tech organizations, to empowering scientists, doctors, and journalists — we’ve seen firsthand how this approach democratizes and accelerates AI. Now, we’re building Snorkel Flow to bring our technology to everyone!

Building Snorkel Flow requires outstanding engineers and technologies across the stack, including scalable data pipelines, elegant and intuitive interfaces (both visual and programmatic), state-of-the-art ML modeling techniques, and best practices for seamless deployment. Modern AI approaches require large labeled training datasets to learn from. While traditional approaches typically rely on armies of human annotators to label by hand, Snorkel Flow empowers users to programmatically label and build training data sets to drive a radically faster, more flexible, and higher quality end-to-end AI development process. Snorkel Flow is an end-to-end development platform, complete with a GUI and powerful programmatic interfaces for driving the development process for full AI application workflows: from preprocessing, to programmatic training data creation, to ML model training, to analysis, and deployment. It's the data-first platform for enterprise AI.

Overview
As a Staff AI/ML Engineer, you'll build systems to power large-scale machine learning and foundation model (e.g. large language model) workloads. You’ll work closely with other engineers, product managers, and field team members to ensure that Snorkel Flow users working with different data modalities (e.g. text, PDF, image) and different use cases can build high quality training datasets, integrate with the latest foundation model technology to build and adapt models, and take advantage of state-of-the-art error analysis and development automation.

Location
Hybrid schedule with 1 or 2 days per week in our Redwood City HQ.

Main Responsibilities

Own the architecture, design, development, and operations of large-scale systems designed for AI/ML tasks including distributed compute systems, data management systems, data engineering workflow systems, and end user experiences
Recognize and act on opportunities to integrate the latest foundation model and related technologies to power user workflows
Prototype, optimize, and maintain scalable back-end services that will power new ML and foundation model development workflows
Design extensible and testable interfaces between internal services including the underlying storage and data models
Be an engaged team player in a customer-focused cross-functional environment where you will feel excited to take on whatever is most impactful for the company and product
Work a hybrid schedule with one or two days per week in our Redwood City HQ and work remotely with "No Meeting" Tuesdays and Thursdays

Must haves

4+ years experience in delivering distributed and ML systems and services in a production setting for cloud-native applications
Experience with distributed compute frameworks and deep learning frameworks
Ability to design and build efficient scalable data storage, compute, and retrieval systems for AI/ML tasks
Strong communication and coding skills with emphasis on designing for scale and robustness
Experience owning the delivery of large multi-person projects

Nice to haves

8+ years of professional software engineering experience
Experience with architecting and developing production web-scale systems (monitoring, telemetry, performance, reliability, triage and debug)
Strong development and debugging skills in Python
Experience working with foundation models (e.g. large language models)
Experience developing enterprise software products for machine learning and/or data science applications