Writing Index

Articles on applied AI, data science, product systems, and engineering

A crawlable index of practical notes by Yangming Li, covering LLM systems, AI system architecture, NLP, MLOps, statistical learning, experimentation, and product thinking. The dedicated AI Engineering column collects the posts closest to production AI systems.

Production AI

AI Engineering

Column

AI Engineering index

LLM systems, AI system architecture, MLOps, evaluation, deployment, and production AI infrastructure.

2026 · RAG Evaluation

RAG evaluation guide with metrics and test sets

How to evaluate retrieval quality, generation faithfulness, citations, no-answer behavior, and production reliability.

2026 · Copilot Testing

Copilot agent golden test set

Build reusable Copilot agent cases, rubrics, schema checks, regression gates, and release decisions.

2026 · Agent Evaluation

AI agent evaluation is more important than prompting

Why production AI agents need custom eval sets, trajectory checks, calibrated judges, regression tests, and business-ready metrics.

2026 · Agent Evaluation

Testing and evaluating Copilot agents

Schema-first evaluation, custom graders, release gates, human review, and monitoring for reliable Copilot Studio agents.

2025 · LLM Systems

Model Context Protocol guide

How connected AI tools use external data, tools, and services.

2025 · AI Systems

Agentic AI systems with n8n

A technical guide to agent architecture, system integration, and production constraints.

Models and text

Applied AI and ML

2026 · Applied ML

A 90% accurate churn model can still lose money

Why high-accuracy churn prediction can improve renewal rate while losing revenue, and how uplift thinking fixes the target.

2026 · Applied ML

Beyond A/B Testing: Uplift modeling in industry

A practical guide to CATE, meta-learners, Qini, AUUC, decile lift, online experiments, and incremental ROI.

2026 · Causal Inference

Causal inference for product analytics

How product teams connect experiments, observational data, ATE, CATE, uplift, guardrails, and decisions.

2026 · Trustworthy ML

Model calibration in Python

Reliability diagrams, expected calibration error, confidence bins, decision thresholds, and monitoring checks.

2026 · RAG Evaluation

RAG evaluation Python example and scorecard

A layered guide with retrieval metrics, a synthetic golden dataset, framework comparison, and monitoring checks.

2026 · Agent Evaluation

AI agent evaluation is more important than prompting

A practical essay on moving agent development from demo-driven iteration to evaluation-driven production readiness.

2026 · Agent Evaluation

Testing and evaluating Copilot agents

Moving a Copilot agent from a demo to a reviewed, monitored, production-ready AI system.

2025 · LLM Systems

Model Context Protocol guide

How connected AI tools use external data, tools, and services.

2025 · AI Systems

Agentic AI systems with n8n

A technical guide to agent architecture, system integration, and production constraints.

2025 · NLP

CMU Advanced NLP course notes

Study notes on foundational and advanced natural language processing topics.

2025 · ML

Random Forest explained

An ensemble learning guide with practical machine learning context.

2026 · Deep Learning

PyTorch review

A compact review of tensors, shapes, broadcasting, reshaping, and distributions.

2025 · Trustworthy AI

Trustworthy machine learning

Notes on transparency, fairness, privacy, robustness, and accountability.

Data systems

Data platforms and MLOps

2024 · MLOps

MLOps essential skills

Model management and deployment foundations for production ML work.

2024 · Data Engineering

Databricks lakehouse guide

Delta Lake, MLflow, Unity Catalog, and data platform implementation notes.

2024 · Containers

Docker for machine learning

Using Docker for reproducible data science and ML environments.

2024 · Cloud Native

Kubernetes guide

Container orchestration concepts and implementation patterns.

Product systems

Product and experimentation

2026 · Retention

Articles on applied AI, data science, product systems, and engineering

AI Engineering

AI Engineering index

RAG evaluation guide with metrics and test sets

Copilot agent golden test set

AI agent evaluation is more important than prompting

Testing and evaluating Copilot agents

Model Context Protocol guide

Agentic AI systems with n8n

Applied AI and ML

A 90% accurate churn model can still lose money

Beyond A/B Testing: Uplift modeling in industry

Causal inference for product analytics

Model calibration in Python

RAG evaluation Python example and scorecard

AI agent evaluation is more important than prompting

Testing and evaluating Copilot agents

Model Context Protocol guide

Agentic AI systems with n8n

CMU Advanced NLP course notes

Random Forest explained

PyTorch review

Trustworthy machine learning

Data platforms and MLOps

MLOps essential skills

Databricks lakehouse guide

Docker for machine learning

Kubernetes guide

Product and experimentation

A 90% accurate churn model can still lose money

Beyond A/B Testing: Uplift modeling in industry

A/B testing sample size in Python

Causal inference for product analytics

High-impact value propositions

What makes products successful?

A/B test engineering guide

Jira for agile project management