Protege

Forward-Deployed Data Scientist (Media Curation & Delivery)

Protege Remote Today
data

Company Overview:

We are building Protege to solve the biggest unmet need in AI — getting access to the right training data. The process today is time intensive, incredibly expensive, and often ends in failure. The Protege platform facilitates the secure, efficient, and privacy-centric exchange of AI training data.

Solving AI’s data problem is a generational opportunity. We’re backed by world-class investors and already powering partnerships with some of the most ambitious teams in AI. The company that succeeds will be one of the largest in AI — and in tech.

We’re a lean, fast-moving, high-trust team of builders who are obsessed with velocity and impact. Our culture is built for people who thrive on ambiguity, own outcomes, and want to shape the future of data and AI.

Role Overview:

We’re looking for a Forward-Deployed Data Scientist (Media Curation & Delivery) to bridge the gap between Protege’s media catalog and our customers’ AI data needs.

In this role, you’ll partner with our sales, product, and account management teams to design and deliver custom media datasets for AI model training and evaluation. You’ll become an expert on Protege’s growing catalog of audio and video content — from longform assets with title-level metadata to clip-level content generated using TwelveLabs embeddings.

Your job: understand what customers are building, identify the content that best fits their needs, and assemble high-quality sample sets and final deliveries that meet their technical and conceptual specs.

This is a deeply hands-on, technical role — you’ll use SQL and internal tools to explore our catalog, iterate with AI models, and perform human-in-the-loop curation. You’ll help shape how Protege delivers the right data to accelerate the world’s best AI teams.

Key Responsibilities:

Curate and Deliver Media Datasets

  • Work with Sales and Account Management to interpret customer requirements and translate them into curation strategies

  • Query and analyze Protege’s media catalog (SQL, internal APIs, and metadata tools) to identify relevant content

  • Use AI tools and transcoded embeddings to surface and refine clip-level content

  • Conduct iterative sample reviews with customers — gathering feedback, refining selections, and ensuring final packages meet spec

Be the Catalog Expert

  • Develop a deep understanding of Protege’s media catalog structure, metadata, and growth patterns

  • Track and analyze content coverage, diversity, and modality mix; identify gaps relative to customer demand

  • Partner with Product and Partnerships to feed back catalog insights that inform sourcing priorities

Operate at the Intersection of Product, Data, and Customer

  • Collaborate cross-functionally to ensure content packaging aligns with technical, ethical, and licensing requirements

  • Develop methods, scripts, or internal tools that make curation more efficient and scalable

  • Support the evolution of Protege’s delivery platform — helping define how internal users and customers search, sample, and export data

Human-in-the-Loop Media Search & Curation

  • Work closely with embedding-based systems to iterate between algorithmic selection and human review

  • Define best practices for embedding queries, relevance evaluation, and content diversity

  • Push for operational excellence and quality assurance at every stage

About You:

  • 4–7 years of experience in data science, media analytics, or technical curation roles

  • Strong proficiency in SQL; you’re comfortable writing complex queries to slice large datasets and generate insights

  • Comfortable working with media metadata, embeddings, and unstructured content

  • Experience collaborating with sales, account management, or customer success teams on technically nuanced deliverables

  • Strong analytical instincts — you enjoy exploring data, pattern-matching, and translating findings into action

  • Detail-oriented with a high standard for data quality and usability

  • Excellent communicator who can navigate between technical depth and customer-friendly clarity

  • Thrives in ambiguous, fast-moving environments with a mix of structure and creativity

  • You treat those around you with kindness

Bonus if you have these attributes:

  • Familiarity with video/audio processing, embeddings, or multimodal AI workflows

  • Prior experience curating or packaging datasets for machine learning

  • Background in content analysis, recommendation systems, or information retrieval

Why Protege:

  • Be the connective tissue between Protege’s platform, our data, and our customers

  • Build datasets that directly power the next generation of AI models

  • Operate at the cutting edge of multimodal data — where human judgment meets machine intelligence

  • Competitive compensation, equity, and benefits package

Sponsored

Explore Data

Skills in this job

People also search for