A principled, multi-graph memory system for long-horizon agentic reasoning.
- Read the full paper: https://arxiv.org/abs/2601.03236
MAGMA (Multi-Graph based Agentic Memory Architecture) is a sophisticated memory system designed for long-term conversation memory and multi-hop reasoning. It creates interconnected event nodes linked by temporal, semantic, and causal relationships, enabling intelligent question answering across extended dialogues.
- Python 3.9 or higher
- Virtual environment (recommended)
- Clone the repository:
git clone https://github.com/FredJiang0324/MAMGA.git
cd MAMGA- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Set up environment variables:
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY# Test on LoCoMo dataset (10 samples included)
python test_fixed_memory.py --sample 0 --model gpt-4o-mini --max-questions 10 --category-to-test 1,2,3,4,5
# Test specific question categories
python test_fixed_memory.py --sample 0 --category-to-test 1 # Multi-hop only
# Test multiple samples
python test_fixed_memory.py --sample 0 1 2 --max-questions 50
# Full dataset path: data/locomo10.json# Test with the main evaluation script (40% accuracy on multi-session)
python test_longmemeval_chunked.py --dataset data/longmemeval_s_cleaned.json --max-questions 5
# Test with sample data (included)
python test_longmemeval_chunked.py --dataset examples/longmemeval_sample.json --max-questions 5
# Note: Download longmemeval_s_cleaned.json separately (see Datasets section)This system is evaluated on two primary datasets:
- 10 conversation samples with extensive Q&A pairs
- 5 question categories: Multi-hop, Temporal, Open-domain, Single-hop, Adversarial
- Tests long-term memory and reasoning capabilities
- Status: Included in repository (2.7MB)
- Multi-session conversation dataset
- Focus on counting and aggregation across sessions
- Tests ability to track information across conversation boundaries
- Status: Download from HuggingFace (see instructions below)
- Sample: Small sample included in
examples/longmemeval_sample.json
-
LoCoMo dataset is included and ready to use
-
LongMemEval dataset - Download from HuggingFace:
mkdir -p data/
cd data/
wget https://huggingface.co/datasets/xiaowu0162/longmemeval-cleaned/resolve/main/longmemeval_s_cleaned.json
cd ..- MiniLM (default): Fast, offline, 384-dimensional embeddings
- OpenAI: Higher quality, requires API key, 1536-dimensional embeddings
# Use MiniLM (default)
python test_fixed_memory.py --embedding-model minilm
# Use OpenAI embeddings
python test_fixed_memory.py --embedding-model openaiSupports OpenAI models:
gpt-4o-mini(Default)gpt-4.1-minigpt-4ogpt-3.5-turbo
Memory is cached for efficiency:
# Default cache location
./locomo_trg_cache/sample{N}/
# Custom cache directory
python test_fixed_memory.py --cache-dir ./my_cache
# Force rebuild
python test_fixed_memory.py --rebuildConfigure retrieval behavior in memory/query_engine.py:
# Retrieval parameters
vector_search_k = 20 # Initial vector search results
keyword_threshold = 0.3 # Minimum keyword score
top_k_final = 5 # Final context nodes
max_traversal_hops = 3 # Graph traversal depthtrg-memory/
βββ memory/ # Core memory modules
β βββ trg_memory.py # Main memory engine
β βββ graph_db.py # Graph database
β βββ vector_db.py # Vector database
β βββ query_engine.py # Query processing
β βββ memory_builder.py # Memory construction
β βββ ...
βββ utils/ # Utility modules
β βββ memory_layer.py # LLM controller
β βββ load_dataset.py # Dataset loader
βββ test_fixed_memory.py # LoCoMo test script
βββ test_longmemeval_chunked.py # LongMemEval test script
βββ load_longmemeval.py # LongMemEval loader
βββ examples/ # Sample datasets
βββ data/ # Full datasets (not included)
βββ requirements.txt # Dependencies
The system uses multiple evaluation metrics:
- Exact Match: Binary correctness
- F1 Score: Token-level overlap (0-100%)
- BLEU Score: N-gram similarity (0-100%)
- LLM Judge: GPT-based semantic evaluation (0-100%)
MIT License - see LICENSE file for details.
Have ideas or suggestions? Please feel free to submit issues or pull requests! π
A more detailed documentation is coming soon π, and we will update in the Github page.
If you find this project useful, please consider citing our paper:
@article{jiang2026magma,
title={MAGMA: A Multi-Graph based Agentic Memory Architecture for AI Agents},
author={Jiang, Dongming and Li, Yi and Li, Guanpeng and Li, Bingzhe},
journal={arXiv preprint arXiv:2601.03236},
year={2026}
}