CI/CD for Models

Learning Objectives

By the end of this lesson, you will be able to:

Explain what CI/CD means in the context of ML models (not just apps).
Describe the core stages: code linting, data and feature tests, model training, evaluation, and deployment gates.
Sketch a minimal CI/CD pipeline for a Flow‑style model (e.g., a classification or regression model).
Identify when a CI/CD setup is over‑engineered for a given lab.

Introduction

In software engineering, CI/CD (Continuous Integration / Continuous Delivery) is the practice of:

automatically building, testing, and (optionally) deploying code on every change.

For ML, CI/CD for models extends this idea: you automate data checks, training, evaluation, and deployment decisions so that models are treated as reproducible artifacts, not one‑off scripts.

In Flow‑style labs, you will not always need a GitHub Actions–style CI/CD monster.
But you will benefit from understanding the pattern:

push a change → run tests → train/evaluate → optionally deploy.

This lesson focuses on the mental model and light‑weight practices, not vendor‑specific YAML files.

What CI/CD for Models Means

Continuous Integration (CI)

CI for ML usually means:

Every time you push code or data changes, the system:
1. Lints and checks your code (style, tests, import safety).
2. Validates datasets or features (schema, nulls, expected ranges).
3. Optionally trains and evaluates the model on a small or sampled dataset.

The goal is early feedback: you want to know fast if a change breaks:

the data pipeline,
feature logic, or
basic model behavior.

Continuous Delivery (CD)

CD is the idea that:

Once the CI checks pass, the system can deliver the model (or artifacts) to a staging or production environment, subject to extra gates.

Delivery can be:

Automatic (e.g., if metrics pass).
Manual approval (human‑in‑the‑loop).

For ML, “delivery” often means:

registering a new model version in a registry,
updating a model server or API endpoint, or
flipping a feature flag that routes traffic to the new model.

The key idea: you push the change; the pipeline decides what to do with it.

Typical Stages in a CI/CD Pipeline for Models

Here is a minimal flow you can adapt to Flow‑style labs:

1. Code and Data Validation

When code is pushed:

Run linters (flake8, ruff, black‑style checks) and unit tests.
Validate that:
- importable modules exist,
- config files are valid,
- environment variables or paths are present.
Validate data/features:
- schema didn’t change unexpectedly.
- no new columns or removed columns.
- no large spikes in missing values.

These checks are fast and rely on small samples or metadata.

2. Model Training (Optional, On Demand)

In many real‑world setups:

CI trains a model on every push or PR, but on a small or sampled dataset to keep latency low.
Full training on large data is triggered separately or overnight.

In a Flow‑style context, you can:

run a quick training run to ensure the training script still works,
without waiting for a full epoch‑heavy train.

3. Evaluation and Validation

After training (or on a test‑only run), evaluate with:

metrics: accuracy, precision, recall, F1, MSE, etc.
bounds: ensure performance doesn’t drop below a threshold.
consistency checks: features + model → stable predictions.

You can encode:

“model cannot be promoted if F1 drops below 0.7.”
“data cannot be merged if nulls exceed 5%.”

This turns evaluation into automation‑driven gates.

4. Model Registration and Artifact Storage

If the model passes:

Save the trained model (e.g., pickle, joblib, or a standard format) to a versioned store.
Register metadata:
- commit hash,
- dataset version,
- hyperparameters,
- metrics on the test set.

This is your model registry pattern in simple form.

5. Deployment Gates

Finally, decide what to do with the model:

Staging
- Deploy to a staging endpoint and run canary or A/B tests.
Production
- Only promote if the model beats the current one or meets safety checks.

In practice you might:

swap a symbolic link to the latest model file,
update a config file used by a Flask/FastAPI model service, or
update a feature flag that routes traffic.

Why This Matters for Flow Engineers

For ML‑backed services in the Flow Initiative, CI/CD for models:

Turns models into reproducible artifacts (not fragile notebooks).
Reduces risk of breaking production with a bad train or data change.
Makes it easier to roll back to a previous model when something breaks.
Builds trust with stakeholders who need to see repeatable processes.

In African‑centric and public‑goods contexts, this is particularly important when:

Models influence governance, education, or reward‑based systems.
Errors are more than theoretical; they can affect people’s outcomes.
You want transparent, verifiable processes for updates.

What You Can Do in a Flow‑Style Lab

Even without a full CI/CD platform, you can adopt CI/CD‑style disciplines:

Version control everything (code, data pointers, configs).
Wrap training in a script (e.g., train.py) that outputs a model artifact.
Add a run_evaluation.py script that:
- loads the model,
- runs it on a test set,
- prints metrics, and
- exits with a non‑zero code if metrics are too low.

Then, in a more “real” setting, you would:

wire this script into GitHub Actions / GitLab CI / Jenkins,
and gate model deployment on its exit status.

For learning, that local‑style discipline is the core.

Common Anti‑Patterns

1. “No‑op” CI

Having a CI job that only runs echo "OK" or a trivial test.
It gives an illusion of automation but provides no guardrails.

2. Training Everything in CI

Letting CI run full training on full‑size data for every PR.
This makes the pipeline slow and frustrating.

3. No Model Versioning

Overwriting a single model.pkl file.
You cannot reproduce or roll back old models.

4. No Clear Deployment Gates

Nobody knows what triggers a “production update.”
People update models manually, inconsistently.

Practical Exercises

Exercise 1: Sketch a Minimal CI/CD Flow

Pick a small Flow‑style supervised learning lab:

Sketch a CI/CD pipeline as a list:
- code/data validation,
- quick training,
- evaluation,
- model storage,
- deployment gate.

Write this as a markdown note, not as YAML.

Exercise 2: Write a Local “CI” Script

Create a simple script (e.g., ci_model.py):

It lints your code,
runs a small test‑only evaluation of the model on a held‑out set,
and prints metrics plus a pass/fail message.

Think of this as a “stand‑in” for a real CI job.

Exercise 3: Design a Model Versioning Scheme

For the same lab:

Decide how to version models (e.g., model_v1.pkl, model_v2.pkl, or model_{timestamp}_{hash}.pkl).
Write down:
- how you would store metadata along with each model,
- how you would choose which version to serve in production.

Self‑Assessment

Rate yourself from 1 to 5:

I can explain what CI/CD for models means.
I can sketch a pipeline that includes data, training, evaluation, and deployment gates.
I can see how to turn a local training script into a CI‑style automation.
I can avoid typical anti‑patterns like opaque or over‑heavy CI jobs.

Action item: write a short note in your lab repo describing a simple CI/CD‑style setup you would like to apply to your next Flow‑style ML project.

Next Steps

Read 02-monitoring-model-performance.md next to learn how to track models in production and feed that back into CI/CD.
Use this CI/CD‑thinking habit whenever you design a model‑backed service.
Treat model CI/CD as a first‑class engineering practice, not a side‑product of ML.

This lesson gives Flow Initiative trainees an intermediate‑level understanding of CI/CD for ML models, focusing on how to treat models as versioned artifacts, automate training and evaluation, and gate deployment decisions in a disciplined way.

Learning Objectives​

Introduction​

What CI/CD for Models Means​

Continuous Integration (CI)​

Continuous Delivery (CD)​

Typical Stages in a CI/CD Pipeline for Models​

1. Code and Data Validation​

2. Model Training (Optional, On Demand)​

3. Evaluation and Validation​

4. Model Registration and Artifact Storage​

5. Deployment Gates​

Why This Matters for Flow Engineers​

What You Can Do in a Flow‑Style Lab​

Common Anti‑Patterns​

1. “No‑op” CI​

2. Training Everything in CI​

3. No Model Versioning​

4. No Clear Deployment Gates​

Practical Exercises​

Exercise 1: Sketch a Minimal CI/CD Flow​

Exercise 2: Write a Local “CI” Script​

Exercise 3: Design a Model Versioning Scheme​

Self‑Assessment​

Next Steps​