Odyssey is an AI lab pioneering general-purpose world models—a new form of multimodal intelligence unlocking entirely new consumer, enterprise, and intelligence applications. World models are the next major frontier in AI, and Odyssey is leading the way with breakthrough models like Odyssey-2 Pro.
We need a deeply experienced Data Architect to take full ownership of our data practice. This is a crucial technical leadership position focused on architecture, strategy, and getting things done. You should be an expert with serious, hands-on data engineering chops, capable of defining the long-term architectural vision while still diving into the code. Success in this role requires a complete understanding of the data lifecycle: from partnering with Operations to source data, designing robust data recipes and ensuring the resulting data assets are optimized for our world models.
Define and implement the long-term technical architecture for our data platform, ensuring scalability, reliability, and support for high-volume, multimodal datasets.
Take ownership of the end-to-end data lifecycle, from sourcing and acquisition to delivery for machine learning model training.
Design and build robust data processing pipelines, including data recipes for cleaning, feature engineering, and normalization, specifically addressing the complexity of inputs required for world models.
Develop and manage the data curation system, including flexible metadata schemas, evolving labels, and modular tagging pipelines, to allow researchers to dynamically categorize, resample, and select high-quality training data.
Work closely with ML Research and Engineering teams to understand immediate and future data requirements, translating research needs into actionable data infrastructure and acquisition strategies.
Lead the integration of sophisticated signals and quality filtering into the data flow, such as VLM analysis, pose estimation, and aesthetic scoring, to ensure training datasets meet high quality standards.
Drive the strategy for data acquisition, evaluating the trade-offs between various methods, aligning with budget constraints and quality requirements.
You live and breathe data, with a strong belief in data quality and diversity as a primary lever for optimizing model performance.
8+ years building data platforms, focused on data architecture and engineering.
Experience supporting ML teams, specifically preparing and optimizing data for model training.
Great at designing and building reliable, high-volume data pipelines (ETL/ELT).
Expert in cloud data technologies like data warehousing and lakehouse architectures (e.g., Snowflake, Databricks, BigQuery, and AWS S3/Redshift).
Proficient with modern data processing frameworks (e.g., Spark, Flink, Kafka) and various databases (NoSQL, graph, relational).
Knows how to set up practical data governance, quality checks, and metadata management.
A strong technical leader who can set a clear technical direction and mentor other engineers.
Experienced with complex data types (images, video, text) and signal processing.
Degree in Computer Science, Engineering, or a related field.