A compact Node.js + TypeScript service that periodically checks HardverApró, records search results in MongoDB, detects new or changed listings, and sends concise email notifications. The project highlights practical concerns like polite scraping, change detection heuristics, durable storage, and templated notification rendering — all in a small, easy-to-follow codebase.
HardverApró (hardverapro.hu) is a Hungarian classifieds/marketplace site focused on computer hardware, electronics and related items. This project targets that site as a realistic example domain for scraping and monitoring listings.
Prerequisites: Node.js (tested on 20.x), Docker (optional).
- Install dependencies
npm install- Build TypeScript
npm run tsc- Run locally
npm startBuild and run using the provided Docker files in docker/:
docker build -t hardverapro-notifier:0.1 -f docker/Dockerfile .
docker run -d --name hardverapro-notifier --network mongo-network -p 55555:3000 \
--cap-add=SYS_TIME --cap-add=SYS_NICE --restart unless-stopped hardverapro-notifier:0.1or
cd docker
docker compose up -d --buildThe application periodically executes configured searches on HardverApró, stores results in MongoDB, compares with previous runs, and sends email notifications for new or changed items. It's designed as a small, focused service suitable for a portfolio or interview demo.
- src/app.ts — application entrypoint and server startup
- src/services — core services: scraping, DB repositories, email sending
- src/services/scraping — scraper abstractions and site-specific scrapers
- src/services/db — MongoDB repositories and utilities
- src/view — email/web rendering and controllers
- docker/ — Dockerfile, compose, and container scripts
- package.json — scripts and dependencies
- AGENTS.md — developer/agent guidance
At a glance:
- Scheduler: a cron job or internal scheduler wakes up according to configured intervals and enqueues a search job.
- Scraping: site-specific scrapers fetch HTML and parse it into structured
SearchResultobjects. Scrapers include simple retry/backoff and basic caching to avoid hitting the site too aggressively. - Normalize & persist: parsed results are normalized and stored in MongoDB via repository classes. Each run is timestamped so previous snapshots remain available.
- Diffing & detection: repository logic computes diffs between the latest snapshot and the most recent previous snapshot. Items are classified as
new,removed, orupdated(field-level comparisons are used where useful). - Rendering & notification: when noteworthy changes are found, the renderer composes a friendly email (templated Handlebars HTML + plaintext fallback) listing changes and links.
- Delivery & retries: the email sender attempts to deliver via configured SMTP; transient failures are retried with backoff and permanent failures get logged for manual inspection.
- Observability: success/failure events and summary logs are recorded; errors include context to help debugging (search query, last successful run timestamp).
Compact flow (visual):
Scraper Trigger → Scraper → Normalizer → DB Snapshot → Diff Engine → Renderer → Email Sender
Key implementation notes:
- Scheduler: implemented via cron in container or started process for local runs.
- Scraper: separate classes per site, returning a unified
SearchResultmodel. - Diffing: lightweight comparison (ID + selected fields) to avoid noise and focus on meaningful changes.
- Emails: HTML templates live under
src/view/email/templates/; include subject, summary, and per-item sections.
Why this matters (talking points for interviews):
- Demonstrates system design: separated concerns, durable storage, idempotent jobs, and retry policies.
- Shows practical engineering: scraping etiquette (rate limits, caching), change detection heuristics, and templated notification delivery.
- Database: default connection expects MongoDB at
mongodb:27017. Updatesrc/config.jsonand keep it aligned withsrc/config-example.json(config is validated at startup). - For development, run
npm run tsc -- -win one terminal andnode dist/app.jsin another to iterate quickly. - You can enable debug logging by setting the
LOG_LEVELenvironment variable to'debug'