ExifData_Analytics

ExifData Analytics is a toolbox designed for the analytical evaluation of EXIF metadata from AI-generated images. This project provides the ability to gain insights into image generation parameters and trends. It's useful for understanding the usage patterns of different models and settings used in AI image generation.

Key Features

Analytics

Metadata Extraction and Normalization
Database Management for metadata entries with integritiy Checks
Parameter Analysis (Sampler, CFG scale, Size, Model, VAE, Denoising strength)
Keyword Analysis with Keywords provided by the user, a txt file with Keywords or the default Keyword list
TF-IDF Analysis for important terms in metadata
Visualization of parameter frequencies and TF-IDF scores

Advanced Analytics

Correlation analysis (parameter co-occurrence patterns)
Prompt effectiveness analysis (essential vs variable tags)
Parameter recommendations (3-strategy recommendation engine)
Dataset comparison (side-by-side statistical analysis)
Trend detection (temporal parameter evolution)

Performance

Advanced analytics: ~3 seconds for 5,000 images
Metadata extraction: ~260 images/second (tested on 12-core system) On a 12 Core 24 Threads CPU + SSD + 32GB 25k Images in 01:38 min data extracted and written to the Database

Project Structure

The project consists of three main scripts:

DB_filler.py:
- Responsible for executing ExifTool and managing user input for data extraction.
- Transfers data to the database operations script.
parameter_statistic.py:
- Performs statistical evaluation of the data from the database.
- Analyzes various parameters and generates visualizations and text reports.
parameter_statistic_DB.py:
- Handles database operations, including inserting and updating metadata entries.
- Provides functions for data retrieval and database management.
advanced_analytics.py:
- Advanced correlation and trend analysis
- Parameter recommendations based on historical usage
- Dataset comparison tools
- Prompt effectiveness analysis

How to Use

install Python3

install ExifTool by Phil Harvey  https://exiftool.org/install.html  and have it in your PATH and accessible from the command line

pip install -r requirements.txt

Configuration

You can change the settings in the config.ini ore use the default values:

Basic Settings

change the Logging directory
choose another Database name
Increase or Decrease Batch Size for Processing.
Number of images to process in each batch.
Adjust based on memory availability and processing speed requirements.
Increase or Decrease Number of worker threads.
Adjust based on available CPU cores and desired parallelism.
Example with:
```
batch_size = 100
max_workers = 24
```

Security Settings

enable_blocklist - Enable/disable security blocklist (default: true, NOT recommended to disable)
custom_blocked_paths - Add custom sensitive paths to block (comma-separated)
allow_network_paths_without_confirmation - Skip network path warnings for automation (default: false)

Error Handling Settings

continue_on_error - Continue processing if some files fail (default: true)
exiftool_timeout - Maximum time to wait per file in seconds (default: 30)
max_file_size_mb - Skip files larger than this to avoid hanging (default: 100, 0=unlimited)

Image Metadata Extraction Script

In the command prompt or terminal, run:
```
python DB_filler.py
```
Follow the prompts to enter the directory containing your images and specify if subfolders should be included.

Metadata Analysis Script

After the metadata extraction is complete, run:
```
python parameter_statistic.py
```
This will generate statistics and plots based on your image metadata.
The results, statistics, plots and Logs will be saved in the project directory, Plots in the output_plots Folder and Logs inLOG_files.

Database Management Script

Managing the SQLite database operations and it provides functions for database maintenance, optimization, and retrieval of statistics.
```
python parameter_statistic_DB.py
```
Key Functions:
database_stats: Retrieves general statistics about the database contents.
optimize_database: Optimizes the database for better performance.
metadata_sample: Get sample of image metadata
check_database_integrity: Verifies the integrity of the database.
backup_database: Creates a backup of the database in the specified directory.
database_size: Returns the current size of the database file.
last_modified: Retrieves the last modification date of the database.
clear_database: Removes all records from the database.

Advanced Analytics Script

After metadata extraction, run advanced analytics for deeper insights:
```
python advanced_analytics.py --all
```
Correlation Analysis - Discover parameter co-occurrence patterns
```
python advanced_analytics.py --correlations
```
Outputs: Top parameter combinations, model+sampler frequencies
Prompt Effectiveness - Analyze prompt patterns and tag usage
```
python advanced_analytics.py --prompt-analysis
```
Outputs: Essential tags (>90% frequency), variable tags, prompt template
Parameter Recommendations - Get suggestions based on historical usage
```
python advanced_analytics.py --recommend "sampler=Heun,model=YourModel"
```
Outputs: Most-used combinations, similar settings, experimental suggestions
Dataset Comparison - Compare two filtered subsets statistically
```
python advanced_analytics.py --compare "model=ModelA" "model=ModelB"
```
Outputs: Side-by-side parameter distributions, usage percentages
Trend Detection - Analyze parameter evolution over time
```
python advanced_analytics.py --trends
```
Outputs: Quarterly parameter evolution, drift detection

All reports saved to: advanced_analytics/ directory

Performance: ~3 seconds for 5,000 images

Data Examples

keyword_analysis.txt

Keyword: Searched Keyword
Count: 0
Top Models: Model Name
Top Samplers: Sampler Name

Keyword: Second Searched Keyword
Count: 0
Top Models: Model Name
Top Samplers: Sampler Name

...

prompt_word_counts.txt

Analysis for prompt words:
prompt1: 154
prompt2: 146
prompt3: 134

...

negative_prompt_word_counts.txt

Analysis for negative prompt words:
negative prompt1: 154
negative prompt2: 146
negative prompt3: 134
negative prompt4: 130

...

parameter_counts.txt Always the full DB

Analysis for Sampler:
Sampler
Sampler Name  65
Sampler Name  55
Sampler Name  29
...


Analysis for CFG scale:
CFG scale
6     35
4     29
8     29
...

Analysis for Size:
Size
576x768    149
768x768      5
768x576      3
...

Analysis for Model:
Model
Model Name  30
Model Name  28
Model Name  22
...


Analysis for VAE:
VAE
VAE Name  124
VAE Name   30
VAE Name   14
...

Analysis for Denoising strength:
Denoising strength
0.35  154
0.47  129
0.72   97
...

tfidf_analysis.txt

TF-IDF Analysis of Prompts:
prompt1: 47.638000568343095
prompt2: 31.389964942854668
prompt3: 24.169336638845234
    
...

Plot Example

Advanced Analytics Examples

correlation_analysis.txt

Top Parameter Combinations (Sampler + CFG + Denoising):
1. DPM++ 2M Karras | CFG:5.5 | Denoise:0.37    1721 (33.7%)
2. Heun | CFG:5.5 | Denoise:0.37               1703 (33.3%)
3. Euler a | CFG:5.5 | Denoise:0.37            1669 (32.7%)

Top Model + Sampler Combinations:
1. ModelA + DPM++ 2M Karras    878 (17.2%)
2. ModelB + DPM++ 2M Karras    860 (16.8%)
...

prompt_effectiveness.txt

ESSENTIAL TAGS (>90% frequency):
  masterpiece           5110 (100.0%)
  solo                  5110 (100.0%)
  best quality          5110 (100.0%)

VARIABLE TAGS (<50% frequency):
  bunker                259 (5.1%)
  attic                 170 (3.3%)

PROMPT TEMPLATE:
  [essential tags], [VARIABLE LOCATION]

trend_analysis.txt

Parameter Evolution:

Q1 (Earliest): Euler a (65%) + ModelA (100%)
Q2:            Heun (66%) + ModelA (100%)
Q3:            Euler a (66%) + ModelB (99%)
Q4 (Latest):   Heun (66%) + ModelB (100%)

Insight: Systematic A/B testing detected

comparison_ModelA_vs_ModelB.txt

Dataset 1 (ModelA): 2563 images
Dataset 2 (ModelB): 2547 images

Sampler Distribution:
                    Dataset 1      Dataset 2
DPM++ 2M Karras     878 (34.3%)    860 (33.8%)
Heun                856 (33.4%)    847 (33.3%)
Euler a             829 (32.3%)    840 (33.0%)

Update History

Version 1.1 - February 2026

Advanced Analytics Module

Added advanced_analytics.py for in-depth parameter analysis
Correlation analysis: Discover parameter co-occurrence patterns
Prompt effectiveness: Analyze tag usage and extract templates
Parameter recommendations: Get suggestions based on historical data
Dataset comparison: Compare filtered subsets statistically
Trend detection: Track parameter evolution over time

Security Enhancements

Path validation (blocklist + whitelist)
Configurable security settings in config.ini
Custom blocked paths support
Network path detection and warnings

Error Handling Improvements

Robust file validation (magic bytes, size limits, permissions)
Categorized error reporting (8 error types)
Continue-on-error mode with detailed statistics
ExifTool timeout protection
Processing summary with success/failure breakdown

Configuration

[Security] section for path protection settings
[ErrorHandling] section for robust processing

Acknowledgements

Thanks to Phil Harvey for his awesome exif data tool https://exiftool.org

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ExifData_Analytics

Key Features

Project Structure

How to Use

Configuration

Basic Settings

Security Settings

Error Handling Settings

Image Metadata Extraction Script

Metadata Analysis Script

Database Management Script

Advanced Analytics Script

Data Examples

Plot Example

Advanced Analytics Examples

Update History

Version 1.1 - February 2026

Acknowledgements

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
DB_filler.py		DB_filler.py
LICENSE		LICENSE
README.md		README.md
advanced_analytics.py		advanced_analytics.py
config.ini		config.ini
parameter_statistic.py		parameter_statistic.py
parameter_statistic_DB.py		parameter_statistic_DB.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

ExifData_Analytics

Key Features

Project Structure

How to Use

Configuration

Basic Settings

Security Settings

Error Handling Settings

Image Metadata Extraction Script

Metadata Analysis Script

Database Management Script

Advanced Analytics Script

Data Examples

Plot Example

Advanced Analytics Examples

Update History

Version 1.1 - February 2026

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages