Databricks lakehouse guide
Data engineering, MLflow, Delta Lake, and platform patterns.
A crawlable guide to Yangming Li's writing on data products, AI data products, analytics systems, experimentation, data engineering, and decision-support workflows.
A data product is not a dashboard with a nicer name. It is a workflow that turns data into a decision, action, or repeatable operating habit. Yangming Li's writing connects data engineering, statistics, product design, and AI systems because data products usually fail at the boundaries: unclear users, weak definitions, stale data, missing ownership, or outputs that do not fit the next step.
Start with the product strategy articles for the user and workflow lens, then read the Databricks, MLOps, statistical testing, and A/B testing notes for implementation depth.
A useful data product has a named user, a decision cadence, a trusted data source, a clear metric definition, and a feedback loop. For AI data products, it also needs model behavior that can be reviewed and monitored. A ranking model, forecast, summary, theme detector, or recommendation can be valuable only if the user understands what it means and what to do next.
The product surface matters. A user should be able to see evidence, compare options, inspect exceptions, and correct errors. A data product should make uncertainty visible when the data does not support a confident action. It should also preserve enough metadata for later audits: when the data refreshed, which model or rule generated the output, and what changed after human review.
Good data products usually separate ingestion, transformation, semantic definitions, serving, application logic, and measurement. The data platform might include a lakehouse, warehouse, dbt-style transformations, feature pipelines, or MLflow-style experiment tracking. The product layer should not hide data quality issues; it should expose freshness, coverage, and known limitations where those signals affect decisions.
For AI data products, the architecture should keep model outputs versioned and reproducible. If a summary, risk label, or extracted field later changes a workflow, teams need to know which prompt, model, schema, source data, and reviewer decision produced it. That is a product requirement, not only a logging detail.
Evaluation for a data product should combine data checks, product checks, and decision checks. Data checks ask whether the source is fresh, complete, and semantically stable. Product checks ask whether the user can interpret the output, find evidence, and take the next action. Decision checks ask whether the product actually changes a repeated workflow in a useful direction.
For an AI data product, add model-specific checks: ground truth quality, confidence behavior, drift, reviewer corrections, and examples that should become test cases. A product can look successful in page views while still failing the decision. The more useful metric is whether users trust the output enough to act, and whether the system gives them enough context to challenge it when needed.
One implementation example is an experiment dashboard that connects feature flags, assignments, metrics, guardrails, and decision notes. Another is a document analytics tool that extracts structured fields, validates them, and lets reviewers correct outputs before they enter a reporting layer. A third is a model monitoring surface that combines data drift, prediction drift, human overrides, and incident examples. Each example depends on the same foundation: stable definitions, traceable outputs, and a product surface that respects how decisions are actually made.
Data products can create false confidence when they compress messy reality into a single score. They can also slow teams down when every answer requires a custom analysis. The useful middle ground is a product that standardizes repeated decisions while keeping enough context for human judgment. That means clear definitions, simple navigation, traceable data, and a willingness to show uncertainty rather than hide it.
The strongest data products also make maintenance visible. Owners should know which tables, jobs, metrics, and model outputs power the product, and users should know when a number is fresh enough to trust. That operational clarity is part of the user experience, especially when the same metric is reused in planning, experimentation, and executive reporting. It prevents debates about data lineage from replacing the actual decision.
Data engineering, MLflow, Delta Lake, and platform patterns.
Decision-support statistics and interpretation trade-offs.
Experiment infrastructure for product measurement.
Product thinking for applied AI systems and data workflows.