There are 2 broad categories of Data science roles,
- Structured data (80-90%) - Predictive analytics, Forecasting, Likelihood of diagnosis, Estimate cost. Typically, batch
- Text data - NLP, Deep Learning, GenAI, building low latency systems. Typically, real-time processing
Levels:
- Data Scientist (3-7 years)
Core Skills:
- Databricks
- Pyspark (Important skill as most of the feature engineering involves Big Data)
- Azure
- ML Libraries - PyTorch, deep learning, tensor flow
- Monitoring
- Ability to vibe code using GitHub Copilot, Claude Code or similar tools
NLP Specific skills:
- Prompt engineering
- Context engineering
- Agentic frameworks
Good to have skills:
- Deployment
- Containerization