ML Pipelines
An ML pipeline is an automated, reproducible sequence of steps that transforms raw data into a deployed model. Pipelines make ML workflows repeatable, scalable, and maintainable.
Why ML Pipelines?
Manual workflows don’t scale: data collection → preprocessing → training → evaluation → deployment done ad-hoc is error-prone and not reproducible.
Automation: pipelines trigger automatically when new data arrives or on a schedule.
Reproducibility: each pipeline run is a documented, versioned artifact.
Collaboration: teams can work on modular components without interfering.
CI/CD for ML: pipelines integrate with software release practices.
Pipeline Components
A typical ML pipeline:
Data Ingestion
→ Data Validation
→ Feature Engineering
→ Training
→ Evaluation
→ Model Validation (performance gate)
→ Push to Registry
→ Deployment
Each step is a component with well-defined inputs and outputs (data artifacts, model artifacts).
TFX (TensorFlow Extended)
Google’s production ML pipeline framework.
Components:
ExampleGen: ingest data; convert to TFRecord.StatisticsGen: compute dataset statistics.SchemaGen: infer schema from statistics.ExampleValidator: validate data against schema.Transform: feature engineering (scalers, encoders).Trainer: train the model.Evaluator: compute metrics; compare to baseline.Pusher: push to serving if evaluation passes.
Metadata: TFX stores artifact metadata in ML Metadata (MLMD), a database of all inputs, outputs, and executions. Enables lineage queries.
Orchestration: TFX pipelines run on Airflow, Kubeflow Pipelines, or Vertex AI Pipelines.
Kubeflow Pipelines
Kubernetes-native pipeline orchestration. Pipelines are defined as Python functions with the @component decorator; KFP compiles them to Argo Workflows YAML.
@kfp.component
def train(dataset: Input[Dataset], model: Output[Model]):
...
@kfp.pipeline
def my_pipeline():
data = ingest()
trained = train(dataset=data.output)
evaluate(model=trained.output)
Artifacts: large artifacts (datasets, models) stored in GCS/S3; small parameters passed as JSON.
UI: visualize pipeline DAGs, step status, logs, artifact lineage.
Metaflow
Netflix’s ML pipeline framework. Python-native; steps are class methods.
from metaflow import FlowSpec, step
class TrainFlow(FlowSpec):
@step
def start(self):
self.next(self.preprocess)
@step
def preprocess(self):
...
self.next(self.train)
@step
def train(self):
...
self.next(self.end)
@step
def end(self):
pass
if __name__ == "__main__":
TrainFlow()
Runs locally or on AWS/GCP. Auto-versioning of all inputs and outputs. Time-travel: inspect any past run’s artifacts.
ZenML
Framework-agnostic ML pipelines with a focus on portability. Pipelines are defined as Python functions; steps can run on any orchestrator (local, Airflow, Kubeflow) and stack (AWS, GCP, Azure).
Continuous Training
Trigger-based retraining: retrain when drift is detected, performance drops, or new labeled data is available.
Pipeline as code: the pipeline DAG is in version control; changes to training logic trigger a new pipeline run.
Automated evaluation gate: a step in the pipeline compares the new model’s metrics to the current production model. Only promotes the new model if it improves by at least a threshold.
Rollback: maintain the previous model version; roll back automatically if the new model degrades production metrics.
Pipeline Testing
Unit tests: test individual components (feature transformers, model constructors).
Integration tests: run the full pipeline on a small subset of data. Ensure all components connect and produce correct artifacts.
Smoke tests: verify that training runs without errors and produces a model with reasonable metrics.
Regression tests: verify that metrics match a stored baseline within tolerance.