Skip to content

Confuzu/ExifData_Analytics

Repository files navigation

ExifData_Analytics

ExifData Analytics is a toolbox designed for the analytical evaluation of EXIF metadata from AI-generated images. This project provides the ability to gain insights into image generation parameters and trends. It's useful for understanding the usage patterns of different models and settings used in AI image generation.

Key Features

Analytics

  • Metadata Extraction and Normalization
  • Database Management for metadata entries with integritiy Checks
  • Parameter Analysis (Sampler, CFG scale, Size, Model, VAE, Denoising strength)
  • Keyword Analysis with Keywords provided by the user, a txt file with Keywords or the default Keyword list
  • TF-IDF Analysis for important terms in metadata
  • Visualization of parameter frequencies and TF-IDF scores

Advanced Analytics

  • Correlation analysis (parameter co-occurrence patterns)
  • Prompt effectiveness analysis (essential vs variable tags)
  • Parameter recommendations (3-strategy recommendation engine)
  • Dataset comparison (side-by-side statistical analysis)
  • Trend detection (temporal parameter evolution)

Performance

  • Advanced analytics: ~3 seconds for 5,000 images
  • Metadata extraction: ~260 images/second (tested on 12-core system) On a 12 Core 24 Threads CPU + SSD + 32GB 25k Images in 01:38 min data extracted and written to the Database

Project Structure

The project consists of three main scripts:

  1. DB_filler.py:

    • Responsible for executing ExifTool and managing user input for data extraction.
    • Transfers data to the database operations script.
  2. parameter_statistic.py:

    • Performs statistical evaluation of the data from the database.
    • Analyzes various parameters and generates visualizations and text reports.
  3. parameter_statistic_DB.py:

    • Handles database operations, including inserting and updating metadata entries.
    • Provides functions for data retrieval and database management.
  4. advanced_analytics.py:

    • Advanced correlation and trend analysis
    • Parameter recommendations based on historical usage
    • Dataset comparison tools
    • Prompt effectiveness analysis

How to Use

install Python3
install ExifTool by Phil Harvey  https://exiftool.org/install.html  and have it in your PATH and accessible from the command line
pip install -r requirements.txt

Configuration

You can change the settings in the config.ini ore use the default values:

Basic Settings

  • change the Logging directory

  • choose another Database name

  • Increase or Decrease Batch Size for Processing.
    Number of images to process in each batch.
    Adjust based on memory availability and processing speed requirements.

  • Increase or Decrease Number of worker threads.
    Adjust based on available CPU cores and desired parallelism.

  • Example with:

    batch_size = 100
    max_workers = 24
    

Security Settings

  • enable_blocklist - Enable/disable security blocklist (default: true, NOT recommended to disable)
  • custom_blocked_paths - Add custom sensitive paths to block (comma-separated)
  • allow_network_paths_without_confirmation - Skip network path warnings for automation (default: false)

Error Handling Settings

  • continue_on_error - Continue processing if some files fail (default: true)
  • exiftool_timeout - Maximum time to wait per file in seconds (default: 30)
  • max_file_size_mb - Skip files larger than this to avoid hanging (default: 100, 0=unlimited)

Image Metadata Extraction Script

  • In the command prompt or terminal, run:
    python DB_filler.py
    
  • Follow the prompts to enter the directory containing your images and specify if subfolders should be included.

Metadata Analysis Script

  • After the metadata extraction is complete, run:
    python parameter_statistic.py
    
  • This will generate statistics and plots based on your image metadata.
  • The results, statistics, plots and Logs will be saved in the project directory, Plots in the output_plots Folder and Logs inLOG_files.

Database Management Script

  • Managing the SQLite database operations and it provides functions for database maintenance, optimization, and retrieval of statistics.

    python parameter_statistic_DB.py
    

    Key Functions:

  • database_stats: Retrieves general statistics about the database contents.

  • optimize_database: Optimizes the database for better performance.

  • metadata_sample: Get sample of image metadata

  • check_database_integrity: Verifies the integrity of the database.

  • backup_database: Creates a backup of the database in the specified directory.

  • database_size: Returns the current size of the database file.

  • last_modified: Retrieves the last modification date of the database.

  • clear_database: Removes all records from the database.

Advanced Analytics Script

  • After metadata extraction, run advanced analytics for deeper insights:

    python advanced_analytics.py --all
  • Correlation Analysis - Discover parameter co-occurrence patterns

    python advanced_analytics.py --correlations

    Outputs: Top parameter combinations, model+sampler frequencies

  • Prompt Effectiveness - Analyze prompt patterns and tag usage

    python advanced_analytics.py --prompt-analysis

    Outputs: Essential tags (>90% frequency), variable tags, prompt template

  • Parameter Recommendations - Get suggestions based on historical usage

    python advanced_analytics.py --recommend "sampler=Heun,model=YourModel"

    Outputs: Most-used combinations, similar settings, experimental suggestions

  • Dataset Comparison - Compare two filtered subsets statistically

    python advanced_analytics.py --compare "model=ModelA" "model=ModelB"

    Outputs: Side-by-side parameter distributions, usage percentages

  • Trend Detection - Analyze parameter evolution over time

    python advanced_analytics.py --trends

    Outputs: Quarterly parameter evolution, drift detection

All reports saved to: advanced_analytics/ directory

Performance: ~3 seconds for 5,000 images

Data Examples

  • keyword_analysis.txt

    Keyword: Searched Keyword
    Count: 0
    Top Models: Model Name
    Top Samplers: Sampler Name
    
    Keyword: Second Searched Keyword
    Count: 0
    Top Models: Model Name
    Top Samplers: Sampler Name
    
    ...
    
    
  • prompt_word_counts.txt

    Analysis for prompt words:
    prompt1: 154
    prompt2: 146
    prompt3: 134
    
    ...    
    
    
  • negative_prompt_word_counts.txt

    Analysis for negative prompt words:
    negative prompt1: 154
    negative prompt2: 146
    negative prompt3: 134
    negative prompt4: 130
    
    ...
    
    
    • parameter_counts.txt Always the full DB
    Analysis for Sampler:
    Sampler
    Sampler Name  65
    Sampler Name  55
    Sampler Name  29
    ...
    
    
    Analysis for CFG scale:
    CFG scale
    6     35
    4     29
    8     29
    ...
    
    Analysis for Size:
    Size
    576x768    149
    768x768      5
    768x576      3
    ...
    
    Analysis for Model:
    Model
    Model Name  30
    Model Name  28
    Model Name  22
    ...
    
    
    Analysis for VAE:
    VAE
    VAE Name  124
    VAE Name   30
    VAE Name   14
    ...
    
    Analysis for Denoising strength:
    Denoising strength
    0.35  154
    0.47  129
    0.72   97
    ...
    
    • tfidf_analysis.txt
    TF-IDF Analysis of Prompts:
    prompt1: 47.638000568343095
    prompt2: 31.389964942854668
    prompt3: 24.169336638845234
        
    ...
    

Plot Example

CFG scale_counts

Advanced Analytics Examples

  • correlation_analysis.txt

    Top Parameter Combinations (Sampler + CFG + Denoising):
    1. DPM++ 2M Karras | CFG:5.5 | Denoise:0.37    1721 (33.7%)
    2. Heun | CFG:5.5 | Denoise:0.37               1703 (33.3%)
    3. Euler a | CFG:5.5 | Denoise:0.37            1669 (32.7%)
    
    Top Model + Sampler Combinations:
    1. ModelA + DPM++ 2M Karras    878 (17.2%)
    2. ModelB + DPM++ 2M Karras    860 (16.8%)
    ...
    
  • prompt_effectiveness.txt

    ESSENTIAL TAGS (>90% frequency):
      masterpiece           5110 (100.0%)
      solo                  5110 (100.0%)
      best quality          5110 (100.0%)
    
    VARIABLE TAGS (<50% frequency):
      bunker                259 (5.1%)
      attic                 170 (3.3%)
    
    PROMPT TEMPLATE:
      [essential tags], [VARIABLE LOCATION]
    
  • trend_analysis.txt

    Parameter Evolution:
    
    Q1 (Earliest): Euler a (65%) + ModelA (100%)
    Q2:            Heun (66%) + ModelA (100%)
    Q3:            Euler a (66%) + ModelB (99%)
    Q4 (Latest):   Heun (66%) + ModelB (100%)
    
    Insight: Systematic A/B testing detected
    
  • comparison_ModelA_vs_ModelB.txt

    Dataset 1 (ModelA): 2563 images
    Dataset 2 (ModelB): 2547 images
    
    Sampler Distribution:
                        Dataset 1      Dataset 2
    DPM++ 2M Karras     878 (34.3%)    860 (33.8%)
    Heun                856 (33.4%)    847 (33.3%)
    Euler a             829 (32.3%)    840 (33.0%)
    

Update History

Version 1.1 - February 2026

Advanced Analytics Module

  • Added advanced_analytics.py for in-depth parameter analysis
  • Correlation analysis: Discover parameter co-occurrence patterns
  • Prompt effectiveness: Analyze tag usage and extract templates
  • Parameter recommendations: Get suggestions based on historical data
  • Dataset comparison: Compare filtered subsets statistically
  • Trend detection: Track parameter evolution over time

Security Enhancements

  • Path validation (blocklist + whitelist)
  • Configurable security settings in config.ini
  • Custom blocked paths support
  • Network path detection and warnings

Error Handling Improvements

  • Robust file validation (magic bytes, size limits, permissions)
  • Categorized error reporting (8 error types)
  • Continue-on-error mode with detailed statistics
  • ExifTool timeout protection
  • Processing summary with success/failure breakdown

Configuration

  • [Security] section for path protection settings
  • [ErrorHandling] section for robust processing

Acknowledgements

Thanks to Phil Harvey for his awesome exif data tool https://exiftool.org

About

ExifData Analytics: A project for extracting, storing, and analyzing EXIF metadata from AI-generated images.

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages