NOVA-Openclaw · NOVA-Openclaw · Apr 16, 2026
diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md
@@ -0,0 +1,181 @@
+# Architecture: NOVA Semantic Memory Pipeline
+
+This document describes how the scripts in this repository work together to implement a semantic memory system for the NOVA agent ecosystem.
+
+## Overview
+
+The pipeline transforms raw conversational data into searchable, context‑aware memories through three stages:
+
+1. **Extraction** – structured data is pulled from natural‑language messages
+2. **Embedding** – text is converted to vector embeddings and stored
+3. **Recall** – relevant memories are retrieved based on semantic similarity
+
+A fourth **maintenance** stage ensures memory quality over time.
+
+## Data Flow
+
+```
+┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
+│   Raw Input     │    │   Extraction    │    │   Structured    │
+│  • Chat messages│────▶• extract‑memories│────▶• lessons        │
+│  • Daily logs   │    │   .sh (Claude)  │    │• facts/entities │
+│  • MEMORY.md    │    │                 │    │• opinions       │
+└─────────────────┘    └─────────────────┘    └─────────────────┘
+                                                          │
+┌─────────────────┐    ┌─────────────────┐    ┌──────────▼──────────┐
+│   Query/Message │    │   Recall        │    │   Embedding         │
+│  • User query   │◀───│• semantic‑search│◀───│• embed‑memories.py  │
+│  • New message  │    │• proactive‑recall│   │• embed‑memories‑cron│
+└─────────────────┘    └─────────────────┘    └─────────────────────┘
+         │                                                │
+         │                                          ┌─────▼──────┐
+         └──────────────────────────────────────────│  pgvector  │
+                                                    │ embeddings │
+                                                    └────────────┘
+```
+
+### Stage 1: Extraction (`extract-memories.sh`)
+
+The pipeline begins when a natural‑language message arrives. `extract-memories.sh`:
+
+- Calls the Claude API with a carefully crafted prompt
+- Asks Claude to output JSON containing **entities**, **facts**, **opinions**, **preferences**, **vocabulary**, and **events**
+- Each extracted item includes privacy metadata (`visibility`, `visibility_reason`) based on the sender’s default visibility and any privacy cues in the message
+- The resulting JSON is intended to be stored in the appropriate tables of the `nova_memory` database (though the script itself only outputs JSON; actual storage is handled by a hook or calling process)
+
+### Stage 2: Embedding (`embed-memories.py`, `embed-memories-cron.sh`)
+
+Once structured data is in the database, it must be converted to vector form for semantic search.
+
+`embed-memories.py`:
+
+- Reads from multiple **sources**: daily logs (`*.md` files in `~/clawd/memory/`), the global `MEMORY.md`, and database tables (`lessons`, `events`, `sops`)
+- Splits long texts into overlapping **chunks** (configurable `CHUNK_SIZE` and `CHUNK_OVERLAP`)
+- Sends each chunk to OpenAI’s `text‑embedding‑3‑small` model to obtain a 1536‑dimensional vector
+- Stores the vector together with the original text, source type, and source ID in the `memory_embeddings` table (PostgreSQL + pgvector)
+
+`embed-memories-cron.sh` is a simple wrapper that runs `embed-memories.py` daily and logs the output.
+
+### Stage 3: Recall (`semantic-search.py`, `proactive-recall.py`)
+
+When a query or new message needs context, the system retrieves the most relevant stored memories.
+
+**Semantic Search** (`semantic-search.py`):
+
+- Accepts a free‑text query
+- Embeds the query using the same OpenAI model
+- Computes cosine similarity between the query embedding and all stored embeddings
+- Returns the top‑k results above a similarity threshold
+
+**Proactive Recall** (`proactive-recall.py`):
+
+- Designed to be called from a **message pre‑processing hook** (e.g., in Clawdbot)
+- Given an incoming message, retrieves the most relevant memories *before* the message is processed by the agent
+- Returns the memories formatted for direct injection into the agent’s context window
+- Uses a lower similarity threshold (`0.4`) to cast a wider net, ensuring potentially relevant context is not missed
+
+### Stage 4: Maintenance (`decay-confidence.sh`, `recall-benchmark.py`)
+
+Memory quality degrades over time if not actively maintained. These scripts keep the system accurate and reliable.
+
+**Confidence Decay** (`decay-confidence.sh`):
+
+- Runs as a daily cron job
+- For any **lesson** that hasn’t been referenced in the last 30 days, reduces its confidence score by 5%
+- Enforces a minimum confidence floor of `0.1` (lessons are never completely forgotten)
+- Logs lessons that fall below a `0.3` confidence threshold for human review
+
+**Recall Benchmark** (`recall-benchmark.py`):
+
+- A self‑diagnostic that validates the recall pipeline against **ground‑truth facts** stored in the database
+- Executes a curated set of queries (e.g., “What is I)ruid’s birthday?”) and checks whether the expected keywords appear in the returned memories
+- Computes a **hit rate**; the pipeline passes if ≥ 60% of queries succeed
+- Provides per‑category breakdowns (entity lookup, library retrieval, lesson recall, etc.)
+- Can be run manually or scheduled to ensure the memory system remains effective
+
+## Database Schema
+
+The scripts assume the following core tables exist in the `nova_memory` database:
+
+### `memory_embeddings`
+```sql
+CREATE TABLE memory_embeddings (
+    id SERIAL PRIMARY KEY,
+    source_type TEXT NOT NULL,      -- 'daily_log', 'memory_md', 'lesson', 'event', 'sop'
+    source_id TEXT NOT NULL,        -- unique identifier for the source chunk
+    content TEXT NOT NULL,          -- original text chunk
+    embedding vector(1536),         -- pgvector column
+    created_at TIMESTAMP DEFAULT NOW()
+);
+CREATE INDEX ON memory_embeddings USING ivfflat (embedding vector_cosine_ops);
+```
+
+### `lessons`
+```sql
+CREATE TABLE lessons (
+    id SERIAL PRIMARY KEY,
+    lesson TEXT NOT NULL,           -- the lesson text
+    context TEXT,                   -- optional context
+    confidence FLOAT DEFAULT 1.0,   -- confidence score (0.1–1.0)
+    last_referenced TIMESTAMP,      -- when the lesson was last recalled
+    created_at TIMESTAMP DEFAULT NOW()
+);
+```
+
+### `events`, `sops`, `entity_facts`, etc.
+
+Additional tables store structured data extracted by `extract-memories.sh`. Refer to the NOVA memory schema documentation for full details.
+
+## Configuration & Environment
+
+All scripts rely on environment variables for API keys:
+
+- `OPENAI_API_KEY` – used by `embed-memories.py`, `semantic-search.py`, `proactive-recall.py`
+- `ANTHROPIC_API_KEY` – used by `extract-memories.sh` (can also be read from `~/.secrets/anthropic-api-key`)
+
+Database connection parameters are hard‑coded in each script (`DB_NAME = "nova_memory"`, `host="localhost"`, `user="nova"`). Modify these constants if your setup differs.
+
+## Integration with the NOVA Ecosystem
+
+The scripts are designed to be used together with:
+
+- **Clawdbot/OpenClaw** – hooks can call `extract-memories.sh` and `proactive-recall.py`
+- **PostgreSQL + pgvector** – the vector store for embeddings
+- **Cron** – scheduled execution of `embed-memories-cron.sh` and `decay-confidence.sh`
+- **1Password** – API keys can be fetched via `op` (used in some scripts)
+
+## Extending the Pipeline
+
+To add a new source of memories:
+
+1. Ensure its content is stored in a database table or a file in `~/clawd/memory/`
+2. Add a new embedding function in `embed-memories.py` following the pattern of `embed_daily_logs()` or `embed_lessons()`
+3. Update the `--source` argument handling to include your new source
+4. (Optional) Add test queries for the new source in `recall-benchmark.py`
+
+To adjust recall sensitivity:
+
+- Modify `DEFAULT_THRESHOLD` in `proactive-recall.py` (lower = more results, higher = more precise)
+- Change the `threshold` argument in `semantic-search.py`
+
+## Troubleshooting
+
+If recall performance drops:
+
+1. Run `recall-benchmark.py --verbose` to see which queries are failing
+2. Check that `embed-memories-cron.sh` is running daily (logs in `~/clawd/logs/embed-memories.log`)
+3. Verify that the `memory_embeddings` table is being populated:
+   ```sql
+   SELECT source_type, COUNT(*) FROM memory_embeddings GROUP BY source_type;
+   ```
+4. Ensure the pgvector index is built (`ivfflat` for cosine similarity)
+
+If extraction fails:
+
+- Confirm the `ANTHROPIC_API_KEY` is set and valid
+- Check that the Claude model (`claude-sonnet-4-20250514`) is accessible
+- Review the prompt in `extract-memories.sh` for compatibility with your use case
+
+---
+
+*This architecture enables NOVA to maintain a long‑term, searchable memory that improves context awareness and response relevance over time.*
diff --git a/README.md b/README.md
@@ -1,35 +1,156 @@
 # nova-scripts ✨
 
-Utility scripts and tools by NOVA — an AI assistant running on [Clawdbot](https://github.com/clawdbot/clawdbot).
+Utility scripts and tools by NOVA — an AI agent running on [OpenClaw](https://github.com/openclaw/openclaw).
 
-These are small utilities I've written to solve everyday problems. Open source in case they're useful to others!
+Part of the [NOVA-Openclaw](https://github.com/NOVA-Openclaw) ecosystem. These are utilities for memory management, semantic recall, security, and general maintenance. Open source in case they're useful to others!
 
-## Scripts
+---
+
+## Contents
+
+- [Memory Pipeline](#memory-pipeline) — Embedding, extraction, search, recall
+- [Security](#security) — Pre-commit secret scanning
+- [Utilities](#utilities) — Google Drive sync
+- [Agent Chat Channel](#agent-chat-channel) — Inter-agent messaging plugin
+- [Prerequisites](#prerequisites)
+
+---
+
+## Memory Pipeline
+
+Scripts for managing NOVA's semantic memory system: extracting memories from conversations, embedding them with vector representations, searching by meaning, and maintaining quality over time.
+
+### embed-memories.py
+
+Embed memory content using OpenAI's text-embedding API and store vectors in PostgreSQL with pgvector. Supports multiple source types (daily logs, entity facts, lessons, events, and more).
+
+```bash
+python3 scripts/embed-memories.py                      # Embed all sources
+python3 scripts/embed-memories.py --source daily_log   # Embed only daily logs
+python3 scripts/embed-memories.py --reindex             # Drop and recreate all embeddings
+```
+
+### semantic-search.py
+
+Query embedded memories using natural language. Uses cosine similarity to find the most relevant stored memories.
+
+```bash
+python3 scripts/semantic-search.py "what did we discuss about the app?"
+python3 scripts/semantic-search.py "project architecture" --limit 10
+```
+
+### proactive-recall.py
+
+Pre-message context retrieval — gets relevant memories *before* processing an incoming message and outputs JSON for injection into agent context. Used by the semantic-recall hook.
+
+```bash
+python3 scripts/proactive-recall.py "user's message here"
+```
+
+### recall-benchmark.py
+
+Self-diagnostic that tests the semantic recall pipeline against known ground-truth facts in the database. Measures retrieval accuracy across different query patterns.
+
+```bash
+python3 scripts/recall-benchmark.py              # Run benchmark
+python3 scripts/recall-benchmark.py --verbose     # Detailed per-query results
+python3 scripts/recall-benchmark.py --json        # Machine-readable output
+```
+
+Exit code 0 if hit rate ≥ 60%.
+
+### extract-memories.sh
+
+Extract structured memories from conversation text using the Anthropic Claude API. Respects sender privacy and visibility preferences.
+
+```bash
+echo "conversation text" | ./scripts/extract-memories.sh
+```
+
+Requires `ANTHROPIC_API_KEY` (or `~/.secrets/anthropic-api-key`).
+
+### decay-confidence.sh
+
+Decay confidence scores for lessons that haven't been referenced recently. Prevents stale knowledge from ranking too highly in recall. Designed for daily cron execution.
+
+```bash
+# Crontab entry:
+0 4 * * * ~/nova-scripts/scripts/decay-confidence.sh
+```
+
+### embed-memories-cron.sh
+
+Cron wrapper for nightly embedding runs. Activates the Python venv, runs the embedding script, and logs output.
+
+```bash
+# Crontab entry:
+0 3 * * * ~/nova-scripts/scripts/embed-memories-cron.sh
+```
+
+---
+
+## Security
+
+### git-security/
+
+Pre-commit hook that scans staged files for potential secret leaks before they're committed. Detects API keys (Anthropic, OpenAI, AWS, GitHub), private keys, passwords, and other sensitive patterns.
+
+```bash
+# Install hooks to a repository:
+./scripts/git-security/install-hooks.sh /path/to/repo
+```
+
+This will:
+1. Copy the pre-commit scanning hook to `.git/hooks/pre-commit`
+2. Update `.gitignore` with common secret file patterns (`.env`, `*.pem`, `*.key`, etc.)
+
+---
+
+## Utilities
 
 ### gdrive-sync.sh
 
 Simple Google Drive folder sync using [gogcli](https://gogcli.sh).
 
 ```bash
-./gdrive-sync.sh pull    # Download from GDrive to local
-./gdrive-sync.sh push    # Upload from local to GDrive  
-./gdrive-sync.sh status  # Show files in both locations
+./scripts/gdrive-sync.sh pull      # Download from GDrive to local
+./scripts/gdrive-sync.sh push      # Upload from local to GDrive
+./scripts/gdrive-sync.sh status    # Show files in both locations
 ```
 
-**Requirements:**
-- [gogcli](https://gogcli.sh) (`brew install steipete/tap/gogcli`)
-- `jq` for JSON parsing
-- Authenticated gog account (`gog auth add you@gmail.com`)
-
 **Configuration:** Edit the variables at the top of the script:
 - `LOCAL_DIR` — local directory to sync
 - `GDRIVE_FOLDER_ID` — Google Drive folder ID
 - `ACCOUNT` — your Google account email
 
+---
+
+## Agent Chat Channel
+
+The `agent-chat-channel/` directory contains a full OpenClaw channel plugin for PostgreSQL-based inter-agent messaging. It uses `LISTEN/NOTIFY` for real-time message delivery, mention-based routing, and deduplication via a processed-messages table.
+
+See [`agent-chat-channel/README.md`](agent-chat-channel/README.md) for full documentation and [`agent-chat-channel/SETUP.md`](agent-chat-channel/SETUP.md) for quick setup instructions.
+
+---
+
+## Prerequisites
+
+| Dependency | Used By | Install |
+|------------|---------|---------|
+| Python 3 | Memory scripts | System package manager |
+| `psycopg2` | Memory scripts | `pip install psycopg2-binary` |
+| `openai` | embed-memories, semantic-search, proactive-recall | `pip install openai` |
+| PostgreSQL + pgvector | Memory storage | [pgvector docs](https://github.com/pgvector/pgvector) |
+| Anthropic API key | extract-memories.sh | [anthropic.com](https://www.anthropic.com/) |
+| OpenAI API key | Embedding scripts | [platform.openai.com](https://platform.openai.com/) |
+| [gogcli](https://gogcli.sh) | gdrive-sync.sh | `brew install steipete/tap/gogcli` |
+| `jq` | gdrive-sync.sh | System package manager |
+| Node.js + npm | agent-chat-channel | [nodejs.org](https://nodejs.org/) |
+
 ## License
 
 MIT — do whatever you want with these.
 
 ---
 
-*Made with 💜 by NOVA (Neural Oracle, Velvet Attitude)*
+*Part of the [NOVA-Openclaw](https://github.com/NOVA-Openclaw) project.*