Data science roles, workflow, tools, Python setup
Jupyter, variables, data types, notebook hygiene
Start Python notebook
Control flow, functions, errors, modules
Lists, dictionaries, file handling, reusable code
Build Python practice notebook
NumPy arrays and vectorised thinking
pandas DataFrames, CSV/Excel/JSON I/O
Load first analysis dataset
Filtering, grouping, aggregation, joins
Missing values, duplicates, type conversion, dates
Create cleaning log
Visualisation with Matplotlib and Seaborn
Chart selection, EDA commentary, insight writing
Build exploratory charts
Notebook storytelling and reproducibility
Project review and AI-assisted documentation
Submit Exploratory Analysis Notebook
Descriptive statistics, spread, distributions
Sampling, bias, probability, uncertainty
Start statistical report
Confidence intervals and experiment thinking
Hypothesis testing, p-values, business significance
Draft test interpretation
Correlation, causation, regression interpretation
A/B readout lab and stakeholder memo writing
Add statistical recommendation
Probability, Bayes thinking, risk communication
Statistical report QA and presentation
Submit Statistical Decision Report
SQL SELECT, WHERE, GROUP BY, HAVING, CASE
Joins, subqueries, CTEs, validation
Start analytics case schema
Window functions, date and string functions
Business questions, cohorts, segmentation analysis
Write case-study queries
EDA workflow and metric validation
Dashboard or visual story build lab
Create analytics narrative
SQL style, documentation, insight presentation
Project review and AI-assisted executive summary
Submit Analytics Case Study Pack
ML workflow, problem framing, target definition
Train/test split, leakage, baselines, metrics
Start modelling dataset
Preprocessing, imputation, encoding, scaling
Pipelines and cross-validation
Build reusable ML pipeline
Linear regression and regularisation concepts
Regression metrics and error analysis
Train first regression benchmark
Classification and logistic regression
Confusion matrix, precision, recall, F1
Train first classifier
Decision trees and random forests
Feature importance and business explanation
Compare tree-based model
Gradient boosting and model tuning
Hyperparameter search and validation strategy
Improve benchmark model
Thresholding, calibration, and trade-offs
Model comparison summary for stakeholders
Draft benchmark report
Supervised learning project lab
Presentation, feedback, and model QA
Submit Prediction Model Benchmark
Clustering, similarity, k-means
Cluster profiling and validation
Start segmentation project
Hierarchical clustering and PCA concepts
Dimensionality reduction and visualisation
Add segment exploration
Feature engineering for behavioural data
Anomaly detection and stability checks
Profile final segments
Bias, proxy variables, ethics, when not to use ML
Recommendation writing and project review
Submit Segmentation Project
Advanced feature engineering and imbalance
Sampling strategies and robust validation
Start explainable ML report
Ensembles, boosting, stacking concepts
Time-aware validation and model drift thinking
Improve advanced model
Explainability: permutation importance, SHAP concepts
Model cards and fairness review
Draft model card
Model risk, pilot-readiness, stakeholder narrative
Explainable ML presentation and feedback
Submit Explainable ML Report
NLP workflow and text preprocessing
TF-IDF, embeddings, text classification
Start NLP prototype
Similarity search and text insight use cases
Evaluation and error analysis for text models
Train text model
Neural network foundations and deep learning literacy
Training loops, overfitting, regularisation
Compare neural baseline
Transformers, attention, foundation models
NLP prototype review and AI-assisted summary
Submit NLP Insight Prototype
GenAI for data science workflows
Prompt design, structured outputs, safe data handling
Start LLM assistant
Function calling and data science tool workflows
AI helpers for SQL, Python, metrics, and model docs
Add tool workflow
Embeddings and vector search
RAG architecture and document Q&A
Add retrieval workflow
RAG evaluation, citations, prompt injection risks
Human review, confidence, refusal, and escalation
Evaluate LLM assistant
Responsible AI governance and AI-use declarations
Security, fairness, accuracy, transparency checklist
Draft governance note
LLM assistant demo and project review
Presentation, feedback, and risk review
Submit LLM Data Science Assistant
MLOps foundations, packaging, dependency files
APIs, batch scoring, Streamlit or Flask demos
Start capstone repository
Model versioning and experiment tracking concepts
Testing, validation, and reproducibility checks
Build capstone pipeline
Monitoring concepts, drift, retraining triggers
Production-readiness checklist
Add monitoring plan
Capstone scoping and data acquisition review
Cleaning, EDA, feature engineering lab
Prepare capstone dataset
Capstone modelling and benchmark comparison
Model tuning, explainability, and risk review
Choose final capstone model
Deployment demo build lab
Model card, executive summary, and GitHub README
Package capstone demo
Portfolio storytelling and interview defence
CV, LinkedIn, GitHub optimisation
Prepare final presentation
Capstone demo day
Mock interview, feedback, and next-step plan
Submit End-to-End Data Science Capstone