Skip to content

yz847zzz/WaferDefectClassifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔬 WaferDefectClassifier

Automated wafer map defect classification with a 3-stage AI review pipeline —
ResNet18 confidence gate → local VLM briefing → self-contained HTML report

Python PyTorch timm Dataset License


✨ Highlights

📦 Dataset WM-811K — 811,457 wafer maps, 9 defect classes
🏆 Best model ResNet18 — macro-F1 0.9109 on held-out test set
Auto-acceptance 96.6 % of samples cleared without human review
🤖 VLM triage Qwen2.5-VL-3B-Instruct runs locally — no API calls
📄 Report Single self-contained HTML file, no server required
🔁 Modes Full · Mock (layout test) · Disabled (scores only)

🗂️ Table of Contents

  1. Motivation
  2. Model Results
  3. Quick Start
  4. Review Pipeline
  5. All Scripts
  6. Project Structure
  7. Dataset
  8. Environment Notes
  9. Roadmap

Motivation

Semiconductor wafer manufacturing involves hundreds of interdependent process steps. When a defect pattern appears on a wafer map, its spatial arrangement directly indicates which process step or equipment caused the failure. Accurate, automated classification enables faster root-cause analysis, reduces engineer workload, and tightens process control feedback loops — directly impacting yield and cost.

This project builds a configurable deep-learning pipeline on the WM-811K benchmark that goes beyond classification: uncertain predictions are automatically routed to a local VLM which writes a structured briefing card for the human reviewer.


Model Results

All models trained on a 70/15/15 stratified split of WM-811K with focal loss + weighted sampling to handle severe class imbalance (none class × 985 vs Near-Full × 149).

Test-set macro-F1

Model Params Macro-F1 Accuracy
ResNet18 11 M 0.9109 ~97 %
ConvNeXt-Tiny 28 M
ViT-Tiny 5.7 M 0.8366
SimpleCNN 0.3 M

ResNet18 is the production model used by the review pipeline.

Per-class F1 (ResNet18)

Class F1 Notes
Center ~0.97 Strong spatial signal
Donut ~0.95 Distinctive ring pattern
Edge-Ring ~0.98 High contrast at edge
Edge-Loc ~0.89 Confused with Loc
Loc ~0.86 Hardest class
Near-Full ~0.91 Rare (149 samples)
Random ~0.90 Diffuse pattern
Scratch ~0.95 Linear feature
none ~0.99 Majority class

Review policy (cost-sensitive sweep)

total_cost = |reviewed| + k × |missed_errors|      k ∈ {1, 5, 10}
k Optimal policy Cost vs. baseline (conf=0.85/margin=0.20)
1 conf=0.50, margin=OFF 542 −503 (−48 %)
5 conf=0.85, margin=OFF 1,701 tied (margin = dead weight)
10 conf=0.95, margin=OFF 2,305 −216 (−9 %)

Finding: the margin gate never improves on confidence-only gating.
Default is --margin-thresh 0.00 (disabled).


Quick Start

1. Clone & install

git clone https://github.com/yz847zzz/WaferDefectClassifier.git
cd WaferDefectClassifier
python -m venv .venv
# Windows
.venv\Scripts\activate
# Linux / macOS
source .venv/bin/activate

pip install -r requirements.txt
# For VLM triage (Stage 2):
pip install transformers accelerate qwen-vl-utils

2. Configure paths

cp .env.example .env
# Edit .env — set TORCH_HOME and HF_HOME to a drive with ≥ 15 GB free

3. Prepare data

# Download LSWMD.pkl from Kaggle and place at data/raw/LSWMD.pkl
# https://www.kaggle.com/datasets/qingyi/wm811k-wafer-map
python scripts/preprocess_data.py --config configs/resnet18.yaml

4. Train

python scripts/train.py --config configs/resnet18.yaml

5. Run the full review pipeline

# Full pipeline — ResNet18 gate + VLM briefing + HTML report
python scripts/run_pipeline.py

# Fast first-pass — no VLM, 2-column cards
python scripts/run_pipeline.py --disable-vlm

# Layout test — mock VLM, no GPU needed for Stage 2
python scripts/run_pipeline.py --mock-vlm --max-review 50

Open outputs/reports/review_queue.html in any browser.


Review Pipeline

A 3-stage system that converts raw wafer maps into a priority-sorted HTML review queue.

 Input
 ──────────────────────────────────────────────────────────────────────────
  .npy array (N, H, W) float32  |  folder of .npy files  |  test split

                         │
                         ▼
 ┌───────────────────────────────────────────────────────────────────────┐
 │  STAGE 1 — NNFilter  (ResNet18 confidence gate)                       │
 │                                                                       │
 │   Accept condition:  top1_prob  ≥  conf_thresh  (default 0.85)        │
 │                      margin     ≥  margin_thresh (default 0.00 = OFF) │
 │                                                                       │
 │   ┌──────────────────────┐       ┌───────────────────────────────┐    │
 │   │  AUTO-ACCEPTED       │       │  UNCERTAIN → Stage 2          │    │
 │   │  96.6 % of samples   │       │  3.4 % of samples             │    │
 │   └──────────────────────┘       └───────────────────────────────┘    │
 └───────────────────────────────────────────────────────────────────────┘
                                             │
                                             ▼  (capped at --max-review)
 ┌───────────────────────────────────────────────────────────────────────┐
 │  STAGE 2 — VLMTriage  (Qwen2.5-VL-3B-Instruct, local GPU)            │
 │                                                                       │
 │   Per sample the VLM receives:                                        │
 │     • Color-coded wafer map image  (320 × 320 px)                     │
 │     • ResNet18 top-3 predictions + confidence scores                  │
 │     • Flag reason  (low confidence / margin)                          │
 │                                                                       │
 │   VLM writes a structured briefing card:                              │
 │     DESCRIPTION    what it sees in the image                          │
 │     PATTERN        name of the visible spatial pattern                │
 │     AGREEMENT      agree / partial / disagree with ResNet18           │
 │     BEST_CLASS     which of the top-3 looks most plausible            │
 │     RECOMMENDATION accept / review_carefully / reject                 │
 │     REASONING      one-sentence justification                         │
 │                                                                       │
 │   Priority:  HIGH (disagree) · MEDIUM (partial) · LOW (agree)        │
 │                                                                       │
 │  ⚠  The VLM is ADVISORY ONLY — it does not override ResNet18.        │
 │     Human reviewers make the final decision.                          │
 └───────────────────────────────────────────────────────────────────────┘
                                             │
                                             ▼
 ┌───────────────────────────────────────────────────────────────────────┐
 │  STAGE 3 — ReportBuilder  (self-contained HTML)                       │
 │                                                                       │
 │   • Priority-sorted queue: HIGH → MEDIUM → LOW                        │
 │   • 3-column cards: [wafer image] [score bars] [VLM briefing]         │
 │   • All images base64-embedded — open in any browser, offline         │
 │   • Sticky header: thresholds · counts · timestamp                    │
 └───────────────────────────────────────────────────────────────────────┘
                                             │
                                             ▼
                         outputs/reports/review_queue.html

Pipeline modes

Flag Stage 2 Layout Use case
(none) Qwen2.5-VL-3B 3-column Production
--mock-vlm Placeholder text 3-column + [MOCK] badge CI / layout testing
--disable-vlm Skipped 2-column Fast triage / no GPU

CLI reference

python scripts/run_pipeline.py [options]

Input:
  --input-npy FILE    .npy array (N,H,W) float32
  --indices FILE      index subset of --input-npy
  --input-dir DIR     folder of per-sample .npy files
  (none)              demo mode: preprocessed test split

Thresholds:
  --conf-thresh F     auto-accept threshold (default 0.85)
  --margin-thresh F   margin gate (default 0.00 = disabled)
  --max-review N      VLM sample cap (default 200)

VLM mode:
  --disable-vlm       skip VLM, ResNet18 scores only
  --mock-vlm          placeholder cards, no Qwen loading

Output:
  --out FILE          HTML path (default outputs/reports/review_queue.html)

All Scripts

Script Purpose
preprocess_data.py Parse WM-811K pickle → float32 .npy arrays + stratified splits
train.py Training loop with focal loss, LR scheduler, best-F1 checkpointing
evaluate.py Per-class metrics, confusion matrix, multi-model comparison plots
error_analysis.py Misclassification grids, CSV, markdown report (ResNet18)
review_policy_analysis.py Confidence/margin histogram analysis, ROC-style curves
threshold_sweep.py 110-config Pareto sweep over (conf, margin) pairs
cost_optimization.py Cost-function optimization for k = 1, 5, 10
run_pipeline.py End-to-end: NNFilter → VLMTriage → HTML report

Project Structure

WaferDefectClassifier/
├── configs/                  YAML training configs per model
│   ├── simple_cnn.yaml
│   ├── resnet18.yaml
│   ├── convnext_tiny.yaml
│   └── vit_tiny.yaml
├── data/                     raw/, processed/, splits/  (not tracked)
├── scripts/                  CLI entry points (see table above)
├── src/
│   ├── datasets/             WaferDataset, CLASS_NAMES, data loading
│   ├── models/               SimpleCNN, ResNet18, ConvNeXt, ViT + factory
│   ├── training/             Trainer, focal loss, Evaluator
│   ├── explainability/       Grad-CAM, attention rollout (stubs)
│   ├── agents/
│   │   └── vlm_reviewer.py   LocalVLMReviewer — prompt, call, parse
│   ├── pipeline/
│   │   ├── nn_filter.py      Stage 1 — ResNet18 confidence gate
│   │   ├── vlm_triage.py     Stage 2 — VLM briefing + MockVLMTriage
│   │   └── report_builder.py Stage 3 — self-contained HTML renderer
│   └── utils/                config, paths, seed, logging, metrics, viz
├── tests/                    pytest unit tests (no full dataset needed)
├── .env.example              template for TORCH_HOME / HF_HOME
└── requirements.txt

Dataset

WM-811K — 811,457 real production wafer maps from TSMC, labeled with 9 defect pattern types.

Class Train count Notes
none ~103 K Background / no defect
Center ~4.3 K
Donut ~555
Edge-Loc ~5.2 K
Edge-Ring ~9.7 K
Loc ~3.6 K
Near-Full ~104 Rarest class
Random ~866
Scratch ~432

Download

  1. Go to https://www.kaggle.com/datasets/qingyi/wm811k-wafer-map
  2. Download LSWMD.pkl
  3. Place at data/raw/LSWMD.pkl
  4. Run python scripts/preprocess_data.py --config configs/resnet18.yaml

Environment Notes

All large files must stay on the drive you configure — do not cache to C:.

This project sets TORCH_HOME and HF_HOME from .env before any import:

Variable Default What it caches
TORCH_HOME E:/cache/torch PyTorch hub weights
HF_HOME E:/cache/huggingface Qwen2.5-VL-3B (~7 GB) + tokenizer

Edit .env to point these at any drive with ≥ 15 GB free.

Windows note: num_workers=0 is set in all YAML configs (Python multiprocessing spawn on Windows is incompatible with DataLoader workers).


Roadmap

Phase Status Feature
1 Project skeleton, config utilities, dataset loader
2 Model zoo: SimpleCNN · ResNet18 · ConvNeXt-Tiny · ViT-Tiny
3 Training loop, checkpointing, LR scheduler
4 Evaluation: confusion matrix, per-class metrics, multi-model comparison
5 Class imbalance: focal loss + weighted sampler
6 Error analysis: misclassification grids, CSV, markdown report
7 Cost-sensitive review policy + 110-config threshold sweep
8 End-to-end pipeline: NNFilter → VLMTriage → HTML report
9 🔲 Grad-CAM / attention rollout for explainability
10 🔲 Similar-case retrieval (embedding + ANN index)
11 🔲 Active-learning loop: reviewer corrections → retraining
12 🔲 Diffusion-based rare-class augmentation
13 🔲 Transfer to SEM / optical defect inspection

License

MIT — see LICENSE for details.

About

Wafer map defect classification with ResNet18 + local VLM (Qwen2.5-VL-3B) review pipeline

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages