ExifData Analytics is a toolbox designed for the analytical evaluation of EXIF metadata from AI-generated images. This project provides the ability to gain insights into image generation parameters and trends. It's useful for understanding the usage patterns of different models and settings used in AI image generation.
Analytics
- Metadata Extraction and Normalization
- Database Management for metadata entries with integritiy Checks
- Parameter Analysis (Sampler, CFG scale, Size, Model, VAE, Denoising strength)
- Keyword Analysis with Keywords provided by the user, a txt file with Keywords or the default Keyword list
- TF-IDF Analysis for important terms in metadata
- Visualization of parameter frequencies and TF-IDF scores
Advanced Analytics
- Correlation analysis (parameter co-occurrence patterns)
- Prompt effectiveness analysis (essential vs variable tags)
- Parameter recommendations (3-strategy recommendation engine)
- Dataset comparison (side-by-side statistical analysis)
- Trend detection (temporal parameter evolution)
Performance
- Advanced analytics: ~3 seconds for 5,000 images
- Metadata extraction: ~260 images/second (tested on 12-core system) On a 12 Core 24 Threads CPU + SSD + 32GB 25k Images in 01:38 min data extracted and written to the Database
The project consists of three main scripts:
-
DB_filler.py:- Responsible for executing ExifTool and managing user input for data extraction.
- Transfers data to the database operations script.
-
parameter_statistic.py:- Performs statistical evaluation of the data from the database.
- Analyzes various parameters and generates visualizations and text reports.
-
parameter_statistic_DB.py:- Handles database operations, including inserting and updating metadata entries.
- Provides functions for data retrieval and database management.
-
advanced_analytics.py:- Advanced correlation and trend analysis
- Parameter recommendations based on historical usage
- Dataset comparison tools
- Prompt effectiveness analysis
install Python3
install ExifTool by Phil Harvey https://exiftool.org/install.html and have it in your PATH and accessible from the command line
pip install -r requirements.txt
You can change the settings in the config.ini ore use the default values:
-
change the Logging directory
-
choose another Database name
-
Increase or Decrease Batch Size for Processing.
Number of images to process in each batch.
Adjust based on memory availability and processing speed requirements. -
Increase or Decrease Number of worker threads.
Adjust based on available CPU cores and desired parallelism. -
Example with:
batch_size = 100 max_workers = 24
enable_blocklist- Enable/disable security blocklist (default: true, NOT recommended to disable)custom_blocked_paths- Add custom sensitive paths to block (comma-separated)allow_network_paths_without_confirmation- Skip network path warnings for automation (default: false)
continue_on_error- Continue processing if some files fail (default: true)exiftool_timeout- Maximum time to wait per file in seconds (default: 30)max_file_size_mb- Skip files larger than this to avoid hanging (default: 100, 0=unlimited)
- In the command prompt or terminal, run:
python DB_filler.py - Follow the prompts to enter the directory containing your images and specify if subfolders should be included.
- After the metadata extraction is complete, run:
python parameter_statistic.py - This will generate statistics and plots based on your image metadata.
- The results, statistics, plots and Logs will be saved in the project directory, Plots in the
output_plotsFolder and Logs inLOG_files.
-
Managing the SQLite database operations and it provides functions for database maintenance, optimization, and retrieval of statistics.
python parameter_statistic_DB.pyKey Functions:
-
database_stats: Retrieves general statistics about the database contents. -
optimize_database: Optimizes the database for better performance. -
metadata_sample: Get sample of image metadata -
check_database_integrity: Verifies the integrity of the database. -
backup_database: Creates a backup of the database in the specified directory. -
database_size: Returns the current size of the database file. -
last_modified: Retrieves the last modification date of the database. -
clear_database: Removes all records from the database.
-
After metadata extraction, run advanced analytics for deeper insights:
python advanced_analytics.py --all
-
Correlation Analysis - Discover parameter co-occurrence patterns
python advanced_analytics.py --correlations
Outputs: Top parameter combinations, model+sampler frequencies
-
Prompt Effectiveness - Analyze prompt patterns and tag usage
python advanced_analytics.py --prompt-analysis
Outputs: Essential tags (>90% frequency), variable tags, prompt template
-
Parameter Recommendations - Get suggestions based on historical usage
python advanced_analytics.py --recommend "sampler=Heun,model=YourModel"Outputs: Most-used combinations, similar settings, experimental suggestions
-
Dataset Comparison - Compare two filtered subsets statistically
python advanced_analytics.py --compare "model=ModelA" "model=ModelB"
Outputs: Side-by-side parameter distributions, usage percentages
-
Trend Detection - Analyze parameter evolution over time
python advanced_analytics.py --trends
Outputs: Quarterly parameter evolution, drift detection
All reports saved to: advanced_analytics/ directory
Performance: ~3 seconds for 5,000 images
-
keyword_analysis.txt
Keyword: Searched Keyword Count: 0 Top Models: Model Name Top Samplers: Sampler Name Keyword: Second Searched Keyword Count: 0 Top Models: Model Name Top Samplers: Sampler Name ... -
prompt_word_counts.txt
Analysis for prompt words: prompt1: 154 prompt2: 146 prompt3: 134 ... -
negative_prompt_word_counts.txt
Analysis for negative prompt words: negative prompt1: 154 negative prompt2: 146 negative prompt3: 134 negative prompt4: 130 ...- parameter_counts.txt Always the full DB
Analysis for Sampler: Sampler Sampler Name 65 Sampler Name 55 Sampler Name 29 ... Analysis for CFG scale: CFG scale 6 35 4 29 8 29 ... Analysis for Size: Size 576x768 149 768x768 5 768x576 3 ... Analysis for Model: Model Model Name 30 Model Name 28 Model Name 22 ... Analysis for VAE: VAE VAE Name 124 VAE Name 30 VAE Name 14 ... Analysis for Denoising strength: Denoising strength 0.35 154 0.47 129 0.72 97 ...- tfidf_analysis.txt
TF-IDF Analysis of Prompts: prompt1: 47.638000568343095 prompt2: 31.389964942854668 prompt3: 24.169336638845234 ...
-
correlation_analysis.txt
Top Parameter Combinations (Sampler + CFG + Denoising): 1. DPM++ 2M Karras | CFG:5.5 | Denoise:0.37 1721 (33.7%) 2. Heun | CFG:5.5 | Denoise:0.37 1703 (33.3%) 3. Euler a | CFG:5.5 | Denoise:0.37 1669 (32.7%) Top Model + Sampler Combinations: 1. ModelA + DPM++ 2M Karras 878 (17.2%) 2. ModelB + DPM++ 2M Karras 860 (16.8%) ... -
prompt_effectiveness.txt
ESSENTIAL TAGS (>90% frequency): masterpiece 5110 (100.0%) solo 5110 (100.0%) best quality 5110 (100.0%) VARIABLE TAGS (<50% frequency): bunker 259 (5.1%) attic 170 (3.3%) PROMPT TEMPLATE: [essential tags], [VARIABLE LOCATION] -
trend_analysis.txt
Parameter Evolution: Q1 (Earliest): Euler a (65%) + ModelA (100%) Q2: Heun (66%) + ModelA (100%) Q3: Euler a (66%) + ModelB (99%) Q4 (Latest): Heun (66%) + ModelB (100%) Insight: Systematic A/B testing detected -
comparison_ModelA_vs_ModelB.txt
Dataset 1 (ModelA): 2563 images Dataset 2 (ModelB): 2547 images Sampler Distribution: Dataset 1 Dataset 2 DPM++ 2M Karras 878 (34.3%) 860 (33.8%) Heun 856 (33.4%) 847 (33.3%) Euler a 829 (32.3%) 840 (33.0%)
Advanced Analytics Module
- Added
advanced_analytics.pyfor in-depth parameter analysis - Correlation analysis: Discover parameter co-occurrence patterns
- Prompt effectiveness: Analyze tag usage and extract templates
- Parameter recommendations: Get suggestions based on historical data
- Dataset comparison: Compare filtered subsets statistically
- Trend detection: Track parameter evolution over time
Security Enhancements
- Path validation (blocklist + whitelist)
- Configurable security settings in
config.ini - Custom blocked paths support
- Network path detection and warnings
Error Handling Improvements
- Robust file validation (magic bytes, size limits, permissions)
- Categorized error reporting (8 error types)
- Continue-on-error mode with detailed statistics
- ExifTool timeout protection
- Processing summary with success/failure breakdown
Configuration
[Security]section for path protection settings[ErrorHandling]section for robust processing
Thanks to Phil Harvey for his awesome exif data tool https://exiftool.org
