Skip to content

Add databricks-iceberg skill#188

Open
irfanelahi-ds wants to merge 4 commits intodatabricks-solutions:mainfrom
irfanelahi-ds:feature/databricks-iceberg-skill
Open

Add databricks-iceberg skill#188
irfanelahi-ds wants to merge 4 commits intodatabricks-solutions:mainfrom
irfanelahi-ds:feature/databricks-iceberg-skill

Conversation

@irfanelahi-ds
Copy link

Why this matters

Apache Iceberg is one of the most active areas of customer demand w.r.t interoperating with the wider Iceberg ecosystem. And it has many nuances and user journeys e.g. creating and sharing managed iceberg tables, enabling UniForm on existing delta tables to share with Iceberg clients like Snowflake or Trino without ETL, Reading Foreign Iceberg tables managed by other IRC (e.g., Snowflake) in Databricks or debugging why PyIceberg/OSS Spark can't connect to UC. Without a dedicated skill, Claude has no grounded reference for Databricks-specific Iceberg behaviour and falls back on generic Iceberg docs that don't reflect how UC actually implements things (e.g. PARTITIONED BY mapping to Liquid Clustering, the IRC endpoint path, EXTERNAL USE SCHEMA requirements, vended credential flow etc).

This PR closes that gap.

What's included

File Coverage
1-managed-iceberg-tables.md Native Iceberg DDL/DML, Liquid Clustering (PARTITIONED BY vs CLUSTER BY), Predictive Optimization, Iceberg v3, limitations
2-uniform-and-compatibility.md External Iceberg Reads (UniForm) for regular Delta tables; Compatibility Mode for Streaming Tables and Materialized Views in SDP
3-iceberg-rest-catalog.md IRC endpoint, auth (PAT/OAuth), credential vending, IP access list requirements
4-snowflake-interop.md Bidirectional Snowflake↔Databricks — catalog integration (vended creds, AWS/Azure/GCS), foreign catalogs, networking gotchas
5-external-engine-interop.md PyIceberg and OSS Spark connection configs; troubleshooting guide

Some Examples of Specific problems this solves for customers

  • Snowflake ↔ Databricks interop: Step-by-step setup for both directions, tried and tested.
  • PARTITIONED BY vs CLUSTER BY confusion: Clarifies that both produce Liquid Clustering and identical Iceberg metadata for external engines, and exactly when each requires TBLPROPERTIES changes
  • External engine setup: Correct PyIceberg and OSS Spark configs against UC IRC, including the version constraints and cloud bundle requirements that cause silent failures
  • Networking blind spot: IP access list requirements for external engines hitting the IRC endpoint are underdocumented. This skill explicitly covers it

Irfan Elahi and others added 4 commits February 26, 2026 16:18
Adds a new skill covering Apache Iceberg on Databricks:
- Managed Iceberg tables (DDL, DML, Liquid Clustering, Iceberg v3)
- External Iceberg Reads / UniForm and Compatibility Mode for Delta tables
- Iceberg REST Catalog (IRC) — auth, credential vending, IP access list guidance
- Snowflake interoperability (bidirectional: catalog integration + foreign catalogs)
- External engine interop — PyIceberg, OSS Spark, EMR, Flink, Kafka Connect

Registers the skill in README.md and install_skills.sh.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Correct factual error: PO is not auto-enabled, it must be explicitly
  enabled via ALTER ... SET DBPROPERTIES
- Add enable examples at catalog, schema, and table levels
- Add automatic statistics collection as a PO capability
- Add ANALYZE TABLE as the manual statistics collection alternative

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove blockquote repeating PARTITIONED BY/CLUSTER BY behaviour
  already covered in Critical Rules
- Remove 4 Common Issues rows that duplicate Critical Rules:
  write.metadata.path, Iceberg library in DBR, PARTITIONED BY Liquid
  Clustering, and CLUSTER BY v2 DV/row-tracking requirement

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Clarify IRC endpoint description in 1-managed-iceberg-tables.md
- Improve wording of Future Modes note in 2-uniform-and-compatibility.md
- Remove Limitations section from 3-iceberg-rest-catalog.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant