Note: We are recruiting on behalf of our valued client. This opportunity is for a position with their organization, not with People Culture Talent. We're excited to help connect talented professionals with this exceptional team!
The open AI evaluation platform redefining how the world's leading AI labs measure model performance is seeking a Data Scientist with expertise in experimentation, causal inference, and retention analytics to drive data-informed decision-making and optimize user engagement. In this role, you will design and analyze experiments (A/B tests, quasi-experiments), develop measurement frameworks for key metrics (DAU, WAU, MAU, retention), and provide actionable insights to improve product growth and user retention. Proficiency in PySpark is highly desirable to handle large-scale datasets efficiently.
Experimentation & Causal Inference
Design, implement, and analyze A/B tests, multi-armed bandits, and quasi-experimental methods to measure the impact of product changes.
Apply causal inference techniques (e.g., difference-in-differences, propensity score matching, synthetic control, regression discontinuity) to estimate treatment effects in non-randomized settings.
Collaborate with product, engineering, and marketing teams to define hypotheses, success metrics, and statistical power requirements.
Ensure rigorous statistical validity (e.g., controlling for biases, multiple testing corrections, confidence intervals).
Retention & Engagement Analytics
Develop and refine retention measurement frameworks (e.g., cohort analysis, survival analysis, churn prediction).
Define and track core engagement metrics (DAU, WAU, MAU, rolling retention, N-day retention) and diagnose trends.
Identify key drivers of retention through segmentation, funnel analysis, and predictive modeling.
Work with growth teams to optimize onboarding, engagement loops, and monetization strategies.
Data Infrastructure & Scalable Analytics
Build and maintain scalable data pipelines (using PySpark, SQL, or big data tools) to process and analyze large datasets.
Develop automated dashboards and reports (e.g., Tableau, Looker, Metabase) to monitor experiment performance and retention trends.
Ensure data quality and consistency in metric definitions across teams.
Optimize queries and computations for performance and cost efficiency in distributed systems (e.g., Databricks, AWS EMR, GCP BigQuery).
Cross-Functional Collaboration
Partner with product managers, engineers, and marketers to translate business questions into data-driven analyses.
Present findings and recommendations to executive stakeholders in clear, actionable formats.
Mentor junior data scientists and analysts on best practices in experimentation and retention analytics.
3+ years of experience in data science, analytics, or experimentation (or equivalent in academic research).
Strong background in statistics and causal inference (hypothesis testing, Bayesian methods, experimental design).
Hands-on experience with SQL and Python (Pandas, NumPy, SciPy, StatsModels, Scikit-learn).
Proficiency in experimentation tools (e.g., Optimizely, Statsig, Eppo, or custom in-house systems).
Experience defining and analyzing retention metrics (DAU/WAU/MAU, cohort retention, churn).
Familiarity with big data tools (PySpark, Hadoop, or similar distributed computing frameworks).
Expertise in PySpark for large-scale data processing and analytics.
Experience with time-series forecasting, survival analysis, or uplift modeling.
Knowledge of ML for retention (e.g., propensity models, clustering, recommendation systems).
Experience with data visualization tools (Tableau, Looker, Plotly, Matplotlib/Seaborn).
Background in growth analytics, product analytics, or marketing analytics.
Advanced degree (MS/PhD) in Statistics, Economics, Computer Science, or a quantitative field.
Experience with reinforcement learning or bandit algorithms for dynamic experimentation.
Knowledge of MLOps or productionizing models (e.g., MLflow, Airflow, Docker).
Their openings span more than one career level. The starting salary for this role is $200k and could range up to $400k USD, plus equity. The provided salary depends on many factors, such as work experience and transferable skills, business needs and impact, and market demands.
Comprehensive health, dental, vision, and additional support programs.
The opportunity to work on cutting-edge AI with a small, mission-driven team.
A culture that values transparency, trust, and community impact.
Visa sponsorship available.
This fast-growing startup is redefining what "better" means in AI. Built by researchers from UC Berkeley's SkyLab and backed by Felicis, Andreessen Horowitz, Kleiner Perkins, Lightspeed, and the University of California, this open evaluation platform has become the definitive source for understanding how AI models actually perform in the real world.
With over a million daily users and the trust of every major AI lab — including OpenAI, Google, and Anthropic — their crowdsourced benchmarks and human preference data power the decisions shaping the future of artificial intelligence. Their leaderboards aren't just influential; they're the industry standard.
Behind the platform is a team of researchers, engineers, and builders from UC Berkeley, Google, Stanford, DeepMind, and beyond — people who seek truth, move fast, and care deeply about craftsmanship and impact. They're building a company where deep expertise meets curiosity, and where the work genuinely matters.