Machine Learning & NLP

Machine Learning and NLP

A guide to Yangming Li's writing on statistical machine learning, natural language processing, topic modeling, sentiment analysis, trustworthy ML, and practical text analytics.

Start here

Machine learning and NLP on this site are treated as applied systems work. The goal is not only to understand a model family, but to understand when it is useful, how to evaluate it, how it fails, and how it connects to a product or decision workflow. Start with the trustworthy machine learning notes for the reliability lens, then read the NLP notes, BERT fine-tuning article, and statistical testing guide for implementation context.

The recurring theme is that useful models need measurement. A classifier needs error analysis and calibration, not just accuracy. A topic model needs human interpretation and stability checks, not just clusters. A text analytics workflow needs sampling, review, and monitoring because language changes over time.

Statistical ML and NLP focus

Yangming Li's machine learning writing spans classic ML, deep learning, NLP, and evaluation. The site includes practical notes on random forests, generalized linear models, deep neural networks, BERT sentiment analysis, CMU NLP study notes, trustworthy machine learning, and uncertainty quantification for LLMs. The common thread is applied judgment: knowing what the model is estimating, what assumptions are being made, and how errors would affect a real workflow.

For text systems, the useful questions are often operational. What labels are reliable enough to train against? Which examples are ambiguous? How will the system handle new vocabulary, domain-specific phrasing, multilingual text, or short complaint snippets? How should reviewers correct the model, and how should those corrections become training or evaluation data?

Healthcare NLP and complaint theme discovery

Healthcare NLP and complaint theme discovery are good examples of why text analytics needs careful framing. A workflow might help summarize themes, route issues, or identify emerging categories, but it should not pretend that unsupervised clusters are facts. Topic modeling can reveal candidate themes; human reviewers still need to name, merge, split, and validate those themes. Classifiers can support triage; monitoring should watch for drift, underrepresented categories, and phrases that change meaning across contexts.

A practical architecture would keep raw text, normalized text, model outputs, reviewer labels, theme definitions, and evaluation examples separate. That separation makes audits easier and keeps the product from confusing a model suggestion with a verified truth.

Evaluation checklist

Define the task as classification, extraction, ranking, summarization, clustering, or review support.
Measure performance by slice: class, source, time period, text length, language, and reviewer confidence.
Use confusion matrices, calibration checks, stability tests, and representative error examples.
Track label quality, missing classes, drift, and changes in user vocabulary.
Keep human-readable explanations and reviewer corrections connected to the evaluation set.

Related implementation examples

A complaint theme discovery workflow might begin with unsupervised clustering or embeddings to surface candidate groups. The implementation should then move quickly into human review: name each theme, attach examples, mark ambiguous comments, and decide whether the theme is stable enough to become a label. A supervised classifier can be trained only after those labels have operational meaning.

A sentiment analysis workflow has a different shape. It needs representative labels, clear handling for neutral or mixed sentiment, evaluation by slice, and a plan for drift. A BERT or PEFT-based model may be appropriate when context matters, but the product still needs a fallback for low-confidence text and a review process for examples that carry business risk.

A trustworthy ML review should ask whether the model is understandable enough for its use, whether groups are harmed differently, whether data handling is appropriate, and whether monitoring can detect degradation before users lose trust. These questions are part of the system, not a final compliance pass.

Trade-offs and limitations

ML and NLP systems are only as useful as their measurement and feedback loop. More complex models may improve representation but increase cost, latency, and debugging difficulty. Simpler models may be easier to explain but miss subtle context. Topic models can discover patterns but can also create unstable or misleading themes. LLMs can summarize and reason over text but require grounding, uncertainty handling, and review for sensitive workflows.

The practical compromise is to match the model to the review burden. If the workflow needs transparent categories, a simpler classifier with strong error analysis may beat a larger model. If the workflow needs rich language understanding, a transformer or LLM can help, but the product should budget for evaluation data, monitoring, and reviewer feedback from the start.

Machine Learning and NLP

Start here

Statistical ML and NLP focus

Healthcare NLP and complaint theme discovery

Evaluation checklist

Related implementation examples

Trade-offs and limitations

Trustworthy machine learning

BERT sentiment analysis

Advanced NLP notes

Healthcare AI analytics

Machine Learning and NLP

Start here

Statistical ML and NLP focus

Healthcare NLP and complaint theme discovery

Evaluation checklist

Related implementation examples

Trade-offs and limitations

Related writing

Trustworthy machine learning

BERT sentiment analysis

Advanced NLP notes

Healthcare AI analytics