Since 2005, MDCalc has been an essential part of the clinician’s workflow to help achieve better patient outcomes. Actively used by more than 65% of physicians worldwide, MDCalc is the most broadly used medical reference – at the point-of-care – for clinical decision tools and content, and one of only four references used by >50% of US HCPs. These evidence-based tools and content are used by millions of medical professionals globally and support 50+ specialties and cover 200+ patient conditions.
To continue accelerating this growth, we are expanding the Engineering team with a Senior Data Engineer who will help build and scale the data infrastructure that powers decision-making across the company. This is an opportunity for an experienced data engineer who enjoys working close to product and business teams, building reliable data systems, and transforming complex data into actionable insights.
This role will help define how data moves through MDCalc’s platform, designing the pipelines and architecture that enable reliable analytics, product insights, and data-driven decision making across the organization.
As a Senior Data Engineer at MDCalc, you will design, build, and maintain the data pipelines and infrastructure that support analytics, product insights, and operational decision-making across the company. A key part of this role is managing how data moves across systems, shaping and transforming it through robust ETL/ELT pipelines so it can be reliably used by downstream analytics, product, and business applications.
You will work closely with product, engineering, and business stakeholders to ensure data is reliable, accessible, and structured for effective use. This includes building programmatic data pipelines, primarily in Python, to extract, transform, and deliver data across MDCalc’s systems and data platform.
You will also contribute to the architecture of MDCalc’s data platform, helping define how data is structured and delivered across the organization. As a senior individual contributor, you will help establish best practices for data modeling, pipeline development, and data governance.
The responsibilities of this individual include the following, but are not limited to:
Design, build, and maintain scalable data pipelines and ELT/ETL workflows that support analytics, operational reporting, and business intelligence use cases
Build programmatic data pipelines (primarily in Python) that extract data from application and third-party systems, transform it into usable formats, and deliver it to downstream data platforms and consumers
Own and improve core data models and transformations to ensure data is accurate, well-structured, and easy for stakeholders to use
Partner with Product, Engineering, and Analytics teams to understand data needs and translate them into reliable data solutions
Develop and maintain systems that move data across the platform, ensuring it is properly shaped, structured, and available for downstream analysis and product use cases
Help shape and maintain the architecture of MDCalc’s modern data stack, including warehousing, orchestration, transformation, and monitoring
Improve data quality, observability, and reliability through testing, validation, and proactive monitoring practices
Support the ingestion and integration of data from a variety of application, product, and third-party sources
Establish and reinforce best practices around data governance, documentation, naming conventions, and maintainability
Identify and drive opportunities to improve performance, scalability, and efficiency across our data systems
Design efficient data workflows that query, transform, and deliver datasets to downstream systems and stakeholders
Contribute to technical direction and architectural decisions as a senior member of the team
Serve as a thought partner to teammates and cross-functional stakeholders on how to best leverage data across the business
5+ years experience in data engineering
Strong SQL skills and experience building and optimizing data models for analytical use cases
Experience building and maintaining reliable data pipelines in a modern cloud data environment
Strong proficiency in Python or a comparable programming language commonly used in data engineering
Experience building programmatic ETL/ELT pipelines using Python or similar tools to move and transform data across systems
Experience working with data warehouses such as Snowflake
Experience with transformation and orchestration tools such as dbt, Airflow, Dagster, or similar tools
Strong understanding of data architecture, data modeling, and pipeline design best practices
Ability to operate independently, prioritize effectively, and drive work forward in a fast-moving environment
Ability to make a true difference in medicine: MDCalc is the most broadly used medical reference used by 65% of physicians worldwide.
Medical, Dental, & Vision coverage, with option to extend to your dependents
Company-sponsored short-term insurance
Fully-paid 8 week parental leave, after 6 months of employment
Company-sponsored 401k, after 3 months of employment
Unlimited vacation for salaried roles - we trust you to take the time you need
Tri-annual company offsites to connect, reflect, and plan together
Work from home monthly stipend
Hybrid work environment with a great team office in Greenwich Village, NYC
A culture of fun and motivated team members who believe in a greater mission here at