🔬 WaferDefectClassifier

Automated wafer map defect classification with a 3-stage AI review pipeline —
ResNet18 confidence gate → local VLM briefing → self-contained HTML report

✨ Highlights


📦 Dataset	WM-811K — 811,457 wafer maps, 9 defect classes
🏆 Best model	ResNet18 — macro-F1 0.9109 on held-out test set
⚡ Auto-acceptance	96.6 % of samples cleared without human review
🤖 VLM triage	Qwen2.5-VL-3B-Instruct runs locally — no API calls
📄 Report	Single self-contained HTML file, no server required
🔁 Modes	Full · Mock (layout test) · Disabled (scores only)

Motivation

Semiconductor wafer manufacturing involves hundreds of interdependent process steps. When a defect pattern appears on a wafer map, its spatial arrangement directly indicates which process step or equipment caused the failure. Accurate, automated classification enables faster root-cause analysis, reduces engineer workload, and tightens process control feedback loops — directly impacting yield and cost.

This project builds a configurable deep-learning pipeline on the WM-811K benchmark that goes beyond classification: uncertain predictions are automatically routed to a local VLM which writes a structured briefing card for the human reviewer.

Model Results

All models trained on a 70/15/15 stratified split of WM-811K with focal loss + weighted sampling to handle severe class imbalance (none class × 985 vs Near-Full × 149).

Test-set macro-F1

Model	Params	Macro-F1	Accuracy
ResNet18 ⭐	11 M	0.9109	~97 %
ConvNeXt-Tiny	28 M	—	—
ViT-Tiny	5.7 M	0.8366	—
SimpleCNN	0.3 M	—	—

ResNet18 is the production model used by the review pipeline.

Per-class F1 (ResNet18)

Class	F1	Notes
Center	~0.97	Strong spatial signal
Donut	~0.95	Distinctive ring pattern
Edge-Ring	~0.98	High contrast at edge
Edge-Loc	~0.89	Confused with Loc
Loc	~0.86	Hardest class
Near-Full	~0.91	Rare (149 samples)
Random	~0.90	Diffuse pattern
Scratch	~0.95	Linear feature
none	~0.99	Majority class

Review policy (cost-sensitive sweep)

total_cost = |reviewed| + k × |missed_errors|      k ∈ {1, 5, 10}

k	Optimal policy	Cost	vs. baseline (conf=0.85/margin=0.20)
1	conf=0.50, margin=OFF	542	−503 (−48 %)
5	conf=0.85, margin=OFF	1,701	tied (margin = dead weight)
10	conf=0.95, margin=OFF	2,305	−216 (−9 %)

Finding: the margin gate never improves on confidence-only gating.
Default is --margin-thresh 0.00 (disabled).

Quick Start

1. Clone & install

git clone https://github.com/yz847zzz/WaferDefectClassifier.git
cd WaferDefectClassifier
python -m venv .venv
# Windows
.venv\Scripts\activate
# Linux / macOS
source .venv/bin/activate

pip install -r requirements.txt
# For VLM triage (Stage 2):
pip install transformers accelerate qwen-vl-utils

2. Configure paths

cp .env.example .env
# Edit .env — set TORCH_HOME and HF_HOME to a drive with ≥ 15 GB free

3. Prepare data

# Download LSWMD.pkl from Kaggle and place at data/raw/LSWMD.pkl
# https://www.kaggle.com/datasets/qingyi/wm811k-wafer-map
python scripts/preprocess_data.py --config configs/resnet18.yaml

4. Train

python scripts/train.py --config configs/resnet18.yaml

5. Run the full review pipeline

# Full pipeline — ResNet18 gate + VLM briefing + HTML report
python scripts/run_pipeline.py

# Fast first-pass — no VLM, 2-column cards
python scripts/run_pipeline.py --disable-vlm

# Layout test — mock VLM, no GPU needed for Stage 2
python scripts/run_pipeline.py --mock-vlm --max-review 50

Open outputs/reports/review_queue.html in any browser.

Review Pipeline

A 3-stage system that converts raw wafer maps into a priority-sorted HTML review queue.

 Input
 ──────────────────────────────────────────────────────────────────────────
  .npy array (N, H, W) float32  |  folder of .npy files  |  test split

                         │
                         ▼
 ┌───────────────────────────────────────────────────────────────────────┐
 │  STAGE 1 — NNFilter  (ResNet18 confidence gate)                       │
 │                                                                       │
 │   Accept condition:  top1_prob  ≥  conf_thresh  (default 0.85)        │
 │                      margin     ≥  margin_thresh (default 0.00 = OFF) │
 │                                                                       │
 │   ┌──────────────────────┐       ┌───────────────────────────────┐    │
 │   │  AUTO-ACCEPTED       │       │  UNCERTAIN → Stage 2          │    │
 │   │  96.6 % of samples   │       │  3.4 % of samples             │    │
 │   └──────────────────────┘       └───────────────────────────────┘    │
 └───────────────────────────────────────────────────────────────────────┘
                                             │
                                             ▼  (capped at --max-review)
 ┌───────────────────────────────────────────────────────────────────────┐
 │  STAGE 2 — VLMTriage  (Qwen2.5-VL-3B-Instruct, local GPU)            │
 │                                                                       │
 │   Per sample the VLM receives:                                        │
 │     • Color-coded wafer map image  (320 × 320 px)                     │
 │     • ResNet18 top-3 predictions + confidence scores                  │
 │     • Flag reason  (low confidence / margin)                          │
 │                                                                       │
 │   VLM writes a structured briefing card:                              │
 │     DESCRIPTION    what it sees in the image                          │
 │     PATTERN        name of the visible spatial pattern                │
 │     AGREEMENT      agree / partial / disagree with ResNet18           │
 │     BEST_CLASS     which of the top-3 looks most plausible            │
 │     RECOMMENDATION accept / review_carefully / reject                 │
 │     REASONING      one-sentence justification                         │
 │                                                                       │
 │   Priority:  HIGH (disagree) · MEDIUM (partial) · LOW (agree)        │
 │                                                                       │
 │  ⚠  The VLM is ADVISORY ONLY — it does not override ResNet18.        │
 │     Human reviewers make the final decision.                          │
 └───────────────────────────────────────────────────────────────────────┘
                                             │
                                             ▼
 ┌───────────────────────────────────────────────────────────────────────┐
 │  STAGE 3 — ReportBuilder  (self-contained HTML)                       │
 │                                                                       │
 │   • Priority-sorted queue: HIGH → MEDIUM → LOW                        │
 │   • 3-column cards: [wafer image] [score bars] [VLM briefing]         │
 │   • All images base64-embedded — open in any browser, offline         │
 │   • Sticky header: thresholds · counts · timestamp                    │
 └───────────────────────────────────────────────────────────────────────┘
                                             │
                                             ▼
                         outputs/reports/review_queue.html

Pipeline modes

Flag	Stage 2	Layout	Use case
(none)	Qwen2.5-VL-3B	3-column	Production
`--mock-vlm`	Placeholder text	3-column + `[MOCK]` badge	CI / layout testing
`--disable-vlm`	Skipped	2-column	Fast triage / no GPU

CLI reference

python scripts/run_pipeline.py [options]

Input:
  --input-npy FILE    .npy array (N,H,W) float32
  --indices FILE      index subset of --input-npy
  --input-dir DIR     folder of per-sample .npy files
  (none)              demo mode: preprocessed test split

Thresholds:
  --conf-thresh F     auto-accept threshold (default 0.85)
  --margin-thresh F   margin gate (default 0.00 = disabled)
  --max-review N      VLM sample cap (default 200)

VLM mode:
  --disable-vlm       skip VLM, ResNet18 scores only
  --mock-vlm          placeholder cards, no Qwen loading

Output:
  --out FILE          HTML path (default outputs/reports/review_queue.html)

All Scripts

Script	Purpose
`preprocess_data.py`	Parse WM-811K pickle → float32 `.npy` arrays + stratified splits
`train.py`	Training loop with focal loss, LR scheduler, best-F1 checkpointing
`evaluate.py`	Per-class metrics, confusion matrix, multi-model comparison plots
`error_analysis.py`	Misclassification grids, CSV, markdown report (ResNet18)
`review_policy_analysis.py`	Confidence/margin histogram analysis, ROC-style curves
`threshold_sweep.py`	110-config Pareto sweep over (conf, margin) pairs
`cost_optimization.py`	Cost-function optimization for k = 1, 5, 10
`run_pipeline.py`	End-to-end: NNFilter → VLMTriage → HTML report

Project Structure

WaferDefectClassifier/
├── configs/                  YAML training configs per model
│   ├── simple_cnn.yaml
│   ├── resnet18.yaml
│   ├── convnext_tiny.yaml
│   └── vit_tiny.yaml
├── data/                     raw/, processed/, splits/  (not tracked)
├── scripts/                  CLI entry points (see table above)
├── src/
│   ├── datasets/             WaferDataset, CLASS_NAMES, data loading
│   ├── models/               SimpleCNN, ResNet18, ConvNeXt, ViT + factory
│   ├── training/             Trainer, focal loss, Evaluator
│   ├── explainability/       Grad-CAM, attention rollout (stubs)
│   ├── agents/
│   │   └── vlm_reviewer.py   LocalVLMReviewer — prompt, call, parse
│   ├── pipeline/
│   │   ├── nn_filter.py      Stage 1 — ResNet18 confidence gate
│   │   ├── vlm_triage.py     Stage 2 — VLM briefing + MockVLMTriage
│   │   └── report_builder.py Stage 3 — self-contained HTML renderer
│   └── utils/                config, paths, seed, logging, metrics, viz
├── tests/                    pytest unit tests (no full dataset needed)
├── .env.example              template for TORCH_HOME / HF_HOME
└── requirements.txt

Dataset

WM-811K — 811,457 real production wafer maps from TSMC, labeled with 9 defect pattern types.

Class	Train count	Notes
none	~103 K	Background / no defect
Center	~4.3 K
Donut	~555
Edge-Loc	~5.2 K
Edge-Ring	~9.7 K
Loc	~3.6 K
Near-Full	~104	Rarest class
Random	~866
Scratch	~432

Download

Go to https://www.kaggle.com/datasets/qingyi/wm811k-wafer-map
Download LSWMD.pkl
Place at data/raw/LSWMD.pkl
Run python scripts/preprocess_data.py --config configs/resnet18.yaml

Environment Notes

All large files must stay on the drive you configure — do not cache to C:.

This project sets TORCH_HOME and HF_HOME from .env before any import:

Variable	Default	What it caches
`TORCH_HOME`	`E:/cache/torch`	PyTorch hub weights
`HF_HOME`	`E:/cache/huggingface`	Qwen2.5-VL-3B (~7 GB) + tokenizer

Edit .env to point these at any drive with ≥ 15 GB free.

Windows note: num_workers=0 is set in all YAML configs (Python multiprocessing spawn on Windows is incompatible with DataLoader workers).

Roadmap

Phase	Status	Feature
1	✅	Project skeleton, config utilities, dataset loader
2	✅	Model zoo: SimpleCNN · ResNet18 · ConvNeXt-Tiny · ViT-Tiny
3	✅	Training loop, checkpointing, LR scheduler
4	✅	Evaluation: confusion matrix, per-class metrics, multi-model comparison
5	✅	Class imbalance: focal loss + weighted sampler
6	✅	Error analysis: misclassification grids, CSV, markdown report
7	✅	Cost-sensitive review policy + 110-config threshold sweep
8	✅	End-to-end pipeline: NNFilter → VLMTriage → HTML report
9	🔲	Grad-CAM / attention rollout for explainability
10	🔲	Similar-case retrieval (embedding + ANN index)
11	🔲	Active-learning loop: reviewer corrections → retraining
12	🔲	Diffusion-based rare-class augmentation
13	🔲	Transfer to SEM / optical defect inspection

License

MIT — see LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔬 WaferDefectClassifier

✨ Highlights

🗂️ Table of Contents

Motivation

Model Results

Test-set macro-F1

Per-class F1 (ResNet18)

Review policy (cost-sensitive sweep)

Quick Start

1. Clone & install

2. Configure paths

3. Prepare data

4. Train

5. Run the full review pipeline

Review Pipeline

Pipeline modes

CLI reference

All Scripts

Project Structure

Dataset

Download

Environment Notes

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🔬 WaferDefectClassifier

✨ Highlights

🗂️ Table of Contents

Motivation

Model Results

Test-set macro-F1

Per-class F1 (ResNet18)

Review policy (cost-sensitive sweep)

Quick Start

1. Clone & install

2. Configure paths

3. Prepare data

4. Train

5. Run the full review pipeline

Review Pipeline

Pipeline modes

CLI reference

All Scripts

Project Structure

Dataset

Download

Environment Notes

Roadmap

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages