Skip to content

NickS785/SierraPy

Repository files navigation

SierraPy

A Python library for Sierra Chart data parsing and analysis, providing efficient tools for reading and processing financial market data.

Features

  • Fast SCID Reading: Memory-mapped I/O for high-performance reading of Sierra Chart intraday (.scid) files
  • DLY File Support: Parse Sierra Chart daily (.dly) files with automatic format detection
  • Ticker Management: Smart file management with front month identification for futures contracts
  • Data Export: Export to CSV and Parquet formats with optimized performance
  • Real-time Data: DTC client for connecting to Sierra Chart servers (optional)

Installation

From Source

git clone https://github.com/NickS785/sierrapy.git
cd sierrapy
pip install -e .

Dependencies

  • Python >= 3.8
  • numpy >= 1.20.0
  • pandas >= 1.3.0
  • pyarrow >= 5.0.0 (optional, for Parquet export)

Quick Start

Reading SCID Files

import sierrapy

# Open a SCID file with automatic format detection
reader = sierrapy.FastScidReader("/path/to/ESU25-CME.scid")

with reader.open() as r:
    # Get basic info
    print(f"Records: {len(r)}")
    print(f"Columns: {r.columns()}")

    # Read as pandas DataFrame
    df = r.to_pandas()
    print(df.head())

    # Read specific time range (timestamps in epoch milliseconds)
    start_ms = 1726000000000  # 2024-09-10 22:40:00 UTC
    end_ms = 1726086400000    # 2024-09-12 00:00:00 UTC
    df_filtered = r.to_pandas(start_ms=start_ms, end_ms=end_ms)

Managing Multiple SCID Files by Ticker

# Manage all SCID files in a directory
manager = sierrapy.ScidTickerFileManager("/path/to/scid/files")

# Get available tickers
tickers = manager.get_tickers()
print(f"Available tickers: {tickers}")

# Get front month data for ES
front_data = manager.get_front_month_data("ES")
print(front_data.head())

# Get file statistics
stats = manager.get_file_statistics("ES")
print(f"ES files: {stats['file_count']}, Total size: {stats['total_size_mb']:.1f} MB")

Reading DLY Files

# Manage daily files
dly_manager = sierrapy.TickerFileManager("/path/to/dly/files")

# Get front month daily data
daily_data = dly_manager.get_front_month_data("CL")
print(daily_data.head())

# Get continuous series across all contracts
continuous = dly_manager.get_continuous_series("CL",
                                             start_date=pd.Timestamp("2024-01-01"),
                                             end_date=pd.Timestamp("2024-12-31"))

Data Export

# Export SCID data to CSV
with sierrapy.FastScidReader("/path/to/data.scid").open() as reader:
    reader.export_csv("/path/to/output.csv")

# Export to Parquet with compression
    stats = reader.export_to_parquet_optimized(
        "/path/to/output.parquet",
        compression="zstd",
        chunk_records=2_000_000
    )
    print(f"Compression ratio: {stats['compression_ratio']:.2f}")

Front Month Logic

The "front month" is determined as the closest contract to expiry that:

  1. Has not yet expired relative to the reference date
  2. Has the largest file size (most data) if multiple contracts have the same expiry

This logic works well for:

  • Energy contracts (CL, NG, etc.): Expire month before delivery
  • Financial contracts (ES, NQ, etc.): Expire in delivery month
  • Other contracts: Default to delivery month expiry

Asynchronous Front-Month Reader

For workflows that require stitching contracts over long horizons, the AsyncFrontMonthScidReader orchestrates roll schedules (rolling one month before expiry) and loads multiple .scid files concurrently:

import asyncio
import sierrapy


async def load_continuous_series():
    reader = sierrapy.AsyncScidReader("/path/to/scid/folder")

    # Build a roll schedule (one month before expiry) and load front-month data
    df = await reader.load_front_month_series(
        "CL",
        start="2024-01-01",
        end="2024-12-31",
        volume_per_bar=50_000,  # bucket trades into ~50k volume bars while loading
        resample_rule="15T",  # optionally resample the stitched series to 15-minute bars
    )

    # Load multiple raw files concurrently
    raw = await reader.load_scid_files([
        "/path/to/CLU24-NYM.scid",
        "/path/to/CLZ24-NYM.scid",
    ], volume_per_bar=25_000)

    return df, raw


continuous_df, raw_files = asyncio.run(load_continuous_series())
print(continuous_df.head())

Tail stitching and effective expiry

AsyncScidReader.load_front_month_continuous now performs tail stitching by default. When the scheduled contract runs out of bars before its roll window ends, the reader fills the gap with the next available contract (for example, using GCZ head bars after GCV runs dry). The helper also caps each contract's effective expiry at the timestamp of its final bar so metadata never indicates an expiry later than the observed data. The behaviour can be disabled by passing allow_tail=False.

File Format Support

SCID Files

  • V10_40B: Modern format with int64 microseconds (40-byte records)
  • V8_44B: Legacy format with float64 days (40-byte records)
  • Automatic format detection
  • Header support (typically 56 bytes)

DLY Files

  • CSV format with automatic delimiter detection
  • Flexible column naming (date, datetime, timestamp)
  • Support for OHLC + volume/open interest data

CLI Tools

SCID File Inspector

# Get file info
sierrapy-scid /path/to/file.scid --info

# Export to CSV with time filter
sierrapy-scid /path/to/file.scid --export output.csv --start 2024-09-10T00:00:00Z --end 2024-09-11T00:00:00Z

Performance

  • Memory-mapped I/O: O(1) open time for multi-GB files
  • Zero-copy reads: Minimal memory allocation for large datasets
  • Vectorized operations: NumPy-based processing for speed
  • Chunked processing: Handle files larger than available RAM
  • Binary search: Fast time-based filtering
  • In-flight aggregation: Bucket to volume bars or resample to new timeframes while reading

Examples

See the /examples directory for more detailed usage examples:

  • basic_scid_reading.py - Simple SCID file reading
  • ticker_management.py - Working with multiple contracts
  • data_export.py - Export workflows
  • performance_comparison.py - Performance benchmarks

Development

Setup Development Environment

git clone https://github.com/yourusername/sierrapy.git
cd sierrapy
pip install -e ".[dev]"

Running Tests

pytest tests/

Code Formatting

black src/ tests/
flake8 src/ tests/
mypy src/

License

MIT License. See LICENSE for details.

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Changelog

v0.1.0 (Initial Release)

  • FastScidReader with memory-mapped I/O
  • TickerFileManager for DLY files
  • ScidTickerFileManager for SCID files
  • Front month identification logic
  • CSV and Parquet export capabilities
  • Basic DTC client support

About

A Python framework used to parse Sierra Chart .scid and .dly data files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages