At Pareto.AI, we’re on a mission to enable top talent around the world to participate in the development of cutting-edge AI models.
In coming years, AI models will transform how we work and create thousands of new AI training jobs for skilled talent around the world. We’ve joined forces with top AI and crowd researchers at Anthropic, Character.AI, Imbue, Stanford, and University of Pennsylvania to build a fair and ethical platform for AI developers to collaborate with domain experts to train bespoke AI models.
We're looking for a Technical Partnerships Lead to own the end-to-end strategy and execution of data acquisition that powers frontier AI model training. This role sits at the intersection of AI research, partnerships, and growth—you'll work closely with our technical teams to understand what data we need, then figure out how to get it.
This is a zero-to-one role requiring technical fluency to understand what makes high-quality training data for frontier AI systems, combined with creative problem-solving to source it from everywhere: research labs, niche startups, enterprises, and specialist communities. The right person gets excited about the challenge of finding complex, realistic data in unconventional places—someone who can think strategically about where valuable data exists, build trust across wildly different industries, and execute relentlessly.
If you thrive at the intersection of technical depth, creative sourcing, and operational excellence, we'd love to hear from you.
Owns data acquisition for frontier AI training
Build and scale our data partnership portfolio from the ground up
Source niche, complex, realistic datasets that push the boundaries of what AI models can learn
Deliver coverage across diverse domains and data types that our models need
Finds data where others don't look
Identify where valuable training data exists and who controls it—from academic institutions to failed startups to enterprise archives
Build creative sourcing strategies tailored to each data type and domain
Navigate IP sensitivities and organizational constraints to unlock access to high-value datasets
Closes deals across wildly different industries
Negotiate partnerships with academic researchers, enterprise legal teams, bankrupt companies, frontier AI labs, and specialist communities
Structure creative agreements (buy, license, revenue-share, co-creation) that work for each partner's constraints
Build trust and credibility quickly with stakeholders from completely different professional worlds
Translates between technical and commercial
Partner with our Applied AI and Research teams to understand what makes training data valuable (realism, complexity, edge cases, diversity)
Communicate credibly with technical stakeholders about data quality and model requirements
Balance technical needs with commercial realities (cost, timeline, IP risk)
Builds what doesn't exist yet
Create processes, frameworks, and systems as you learn what works
Develop prioritization logic that accounts for data quality, strategic fit, cost, and speed
Establish metrics and reporting for data operations (pipeline health, cost efficiency, portfolio coverage)
You likely have:
2-5 years in growth, BD, partnerships, or entrepreneurial experience building from zero to scale
Track record of closing 20-30+ partnerships or deals, ideally across different industries or stakeholder types
Demonstrated ability to scale impact—whether through systems, creativity, or hustle
Technical fluency to grasp complex concepts quickly and have credible conversations with researchers
Comfort with ambiguity and ability to build from scratch
Strong communication skills—can adapt to academic researchers, corporate lawyers, startup founders
We'd be especially excited if you have:
Experience in the AI/ML ecosystem (data platforms, research labs, AI startups)
Active in AI research community (X/Twitter, conferences, researcher networks)
Experience with data licensing, content acquisition, or marketplace supply-side growth
Background negotiating non-standard deals (IP-sensitive, complex structures, regulated industries)
Track record of creative sourcing—finding things in unconventional places
Network of relationships with data providers, research institutions, or specialist communities
You won't thrive here if:
You need clear processes and well-defined roadmaps to operate
You're uncomfortable with frequent rejection or high ambiguity
You've only worked in one industry and haven't demonstrated adaptability
You prefer optimizing existing systems over building new ones
The data challenge: You're not sourcing commodity datasets. You're finding complex, realistic, frontier training data that doesn't exist in standard marketplaces. This requires creative thinking about where data lives and how to access it.
The relationship challenge: In a single week, you might negotiate with a bankruptcy trustee, a university IRB committee, a Fortune 500 legal team, and a frontier AI researcher. Each requires completely different approaches.
The building challenge: You're creating this function from scratch. What works for sourcing medical data won't work for legal data. You'll need to experiment, learn fast, and build tailored approaches.
The impact: The data partnerships you build directly determine what our models can learn and how we differentiate in the market. High leverage, high visibility.