diff --git a/databricks-skills/README.md b/databricks-skills/README.md index ddc5b081..78235ccf 100644 --- a/databricks-skills/README.md +++ b/databricks-skills/README.md @@ -34,7 +34,7 @@ cp -r ai-dev-kit/databricks-skills/databricks-agent-bricks .claude/skills/ ### 🤖 AI & Agents - **databricks-agent-bricks** - Knowledge Assistants, Genie Spaces, Supervisor Agents -- **databricks-genie** - Genie Spaces: create, curate, and query via Conversation API +- **databricks-genie** - Genie Spaces: create, query, analyze, benchmark, and optimize - **databricks-model-serving** - Deploy MLflow models and AI agents to endpoints - **databricks-unstructured-pdf-generation** - Generate synthetic PDFs for RAG - **databricks-vector-search** - Vector similarity search for RAG and semantic search diff --git a/databricks-skills/databricks-genie/SKILL.md b/databricks-skills/databricks-genie/SKILL.md index 576771da..e40dc9e4 100644 --- a/databricks-skills/databricks-genie/SKILL.md +++ b/databricks-skills/databricks-genie/SKILL.md @@ -1,6 +1,6 @@ --- name: databricks-genie -description: "Create and query Databricks Genie Spaces for natural language SQL exploration. Use when building Genie Spaces or asking questions via the Genie Conversation API." +description: "Create, query, analyze, benchmark, and optimize Databricks Genie Spaces for natural language SQL exploration. Use when building Genie Spaces, asking questions via the Genie Conversation API, or auditing and improving existing spaces." --- # Databricks Genie @@ -11,6 +11,8 @@ Create and query Databricks Genie Spaces - natural language interfaces for SQL-b Genie Spaces allow users to ask natural language questions about structured data in Unity Catalog. The system translates questions into SQL queries, executes them on a SQL warehouse, and presents results conversationally. +For existing spaces, this skill also supports analysis, benchmark testing, and optimization workflows. + ## When to Use This Skill Use this skill when: @@ -18,6 +20,8 @@ Use this skill when: - Adding sample questions to guide users - Connecting Unity Catalog tables to a conversational interface - Asking questions to a Genie Space programmatically (Conversation API) +- Auditing an existing Genie Space against best practices +- Benchmarking and optimizing an existing Genie Space ## MCP Tools @@ -90,12 +94,18 @@ ask_genie( 2. Create space → create_or_update_genie 3. Query space → ask_genie (or test in Databricks UI) 4. Curate (optional) → Use Databricks UI to add instructions +5. Optimize (optional) → references/workflow-analyze.md, references/workflow-benchmark.md, references/workflow-optimize.md ``` ## Reference Files - [spaces.md](spaces.md) - Creating and managing Genie Spaces - [conversation.md](conversation.md) - Asking questions via the Conversation API +- [references/workflow-analyze.md](references/workflow-analyze.md) - Configuration analysis workflow +- [references/workflow-benchmark.md](references/workflow-benchmark.md) - Benchmark analysis workflow +- [references/workflow-optimize.md](references/workflow-optimize.md) - Optimization workflow +- [references/best-practices-checklist.md](references/best-practices-checklist.md) - Genie best-practice checks +- [references/space-schema.md](references/space-schema.md) - Serialized space schema reference ## Prerequisites @@ -104,6 +114,11 @@ Before creating a Genie Space: 1. **Tables in Unity Catalog** - Bronze/silver/gold tables with the data 2. **SQL Warehouse** - A warehouse to execute queries (auto-detected if not specified) +For optimization scripts (`scripts/*.py`): +- `databricks-sdk >= 0.85` +- Databricks auth configured (`databricks configure` or env vars) +- `CAN EDIT` permission on target Genie Space + ### Creating Tables Use these skills in sequence: @@ -117,6 +132,7 @@ Use these skills in sequence: | **No warehouse available** | Create a SQL warehouse or provide `warehouse_id` explicitly | | **Poor query generation** | Add instructions and sample questions that reference actual column names | | **Slow queries** | Ensure warehouse is running; use OPTIMIZE on tables | +| **Permission denied on optimization scripts** | Ensure user has `CAN EDIT` on target space | ## Related Skills diff --git a/databricks-skills/databricks-genie/conversation.md b/databricks-skills/databricks-genie/conversation.md index d3a4676f..82aab0f5 100644 --- a/databricks-skills/databricks-genie/conversation.md +++ b/databricks-skills/databricks-genie/conversation.md @@ -212,6 +212,15 @@ ask_genie(space_id, "Calculate customer lifetime value for all customers", timeout_seconds=180) ``` +## Advanced Optimization Links + +If conversation output quality is inconsistent across similar questions: + +- Audit serialized space configuration with `references/workflow-analyze.md` +- Run benchmark checks with `references/workflow-benchmark.md` +- Use `scripts/run_benchmark.py` for repeatable per-question scoring inputs +- Apply guided optimization in `references/workflow-optimize.md` + ## Troubleshooting ### "Genie Space not found" diff --git a/databricks-skills/databricks-genie/references/best-practices-checklist.md b/databricks-skills/databricks-genie/references/best-practices-checklist.md new file mode 100644 index 00000000..71a1d263 --- /dev/null +++ b/databricks-skills/databricks-genie/references/best-practices-checklist.md @@ -0,0 +1,201 @@ +# Genie Space Best Practices Checklist + +Evaluate each item against the serialized space JSON. For each item, determine: +- **pass**: Meets the criterion +- **fail**: Does not meet the criterion +- **warning**: Partially meets — improvement recommended +- **na**: Not applicable to this space's configuration + +Provide a brief explanation for each assessment and, for any fail/warning, give a specific actionable fix referencing actual table names, column names, or instruction text from the space. + +## Table of Contents + +- [Data Sources](#data-sources) +- [Tables](#tables) +- [Columns](#columns) +- [Metric Views](#metric-views) +- [Instructions](#instructions) +- [Text Instructions](#text-instructions) +- [Example Question SQLs](#example-question-sqls) +- [Join Specs](#join-specs) +- [SQL Snippets](#sql-snippets) +- [Benchmarks](#benchmarks) +- [Config](#config) + +--- + +## Data Sources + +### Tables + +**Table Count (1–25, ideally ≤5 initially)** +- Check: `serialized_space.data_sources.tables` array length +- Why: Too many tables increases ambiguity and response latency. Start small and expand as needed. +- Fail if: 0 tables or >25 tables +- Warning if: >10 tables + +**Table Descriptions** +- Check: Each table in `data_sources.tables[].description` +- Why: Genie uses table descriptions to decide which tables are relevant to a question. Missing descriptions cause incorrect table selection. +- Fail if: Any table has no description or a generic/empty description +- Good: `"description": "Daily sales transactions with line-item details, one row per product per order"` +- Bad: `"description": ""` or `"description": "sales table"` + +**Focused Table Selection** +- Check: Whether tables appear relevant to the space's stated purpose (`title`, `description`) +- Why: Including unnecessary tables adds noise and confuses Genie's table selection. +- Warning if: Tables seem unrelated to the space's purpose + +### Columns + +**Column Descriptions** +- Check: `data_sources.tables[].column_configs[].description` +- Why: Column descriptions help Genie map user questions to the right columns. Descriptions should provide context beyond what the column name conveys. +- Fail if: Columns with non-obvious names have no description +- Good: `"description": "Total revenue in USD after discounts and before tax"` +- Bad: `"description": "amount"` (just restates the column name) + +**Column Synonyms** +- Check: `data_sources.tables[].column_configs[].synonyms` array +- Why: Users use varied terminology. Synonyms map business language to column names. +- Warning if: Key business columns lack synonyms +- Good: Column `total_sales` with `"synonyms": ["revenue", "sales amount", "total revenue"]` + +**Example Values Enabled** +- Check: `data_sources.tables[].column_configs[].get_example_values` (v1) or `enable_format_assistance` (v2) depending on space version +- Why: Example values help Genie understand the data distribution and generate correct filter values. +- Warning if: Filterable columns (strings, categoricals) don't have `get_example_values: true` (v1) or `enable_format_assistance: true` (v2) +- Note: v2 spaces reject `get_example_values` — use `enable_format_assistance` instead + +**Value Dictionary Enabled** +- Check: `data_sources.tables[].column_configs[].build_value_dictionary` (v1) or `enable_entity_matching` (v2) depending on space version +- Why: For columns with a small set of discrete values (e.g., status, region, category), a value dictionary lets Genie match user terms to exact values. +- Warning if: Low-cardinality categorical columns don't have `build_value_dictionary: true` (v1) or `enable_entity_matching: true` (v2) +- Note: v2 spaces reject `build_value_dictionary` — use `enable_entity_matching` instead + +**Irrelevant Columns Hidden** +- Check: Whether columns unrelated to the space's purpose are included +- Why: Extra columns increase ambiguity. Hide columns users won't query. +- Warning if: Columns like internal IDs, audit timestamps, or ETL metadata are exposed + +### Metric Views + +**Metric View Descriptions** +- Check: `data_sources.metric_views[].description` (if metric_views exist) +- Why: Metric views surface pre-computed metrics. Without descriptions, Genie can't match questions to the right metric. +- Fail if: Metric views exist but lack descriptions +- NA if: No metric views are defined + +--- + +## Instructions + +### Text Instructions + +**At Least 1 Text Instruction** +- Check: `serialized_space.instructions.text_instructions` array length +- Why: Text instructions provide global context that shapes how Genie interprets questions and writes SQL. +- Fail if: No text instructions exist + +**Instructions Are Focused and Minimal** +- Check: Length and content of text instructions +- Why: Overly long or verbose instructions dilute their impact. Instructions should be concise directives, not documentation. SQL examples, metrics, and join logic belong in their respective sections. +- Warning if: Instructions are excessively long (>500 words total) or contain embedded SQL + +**Business Jargon Mapped** +- Check: Whether domain-specific terms are defined in instructions +- Why: If users say "churn rate" but the data uses "customer_attrition_pct", instructions should map this. +- Warning if: The space uses specialized terminology without definitions + +### Example Question SQLs + +**At Least 1 Example SQL** +- Check: `serialized_space.instructions.example_question_sqls` array length +- Why: Example SQLs teach Genie complex query patterns it can't infer from schema alone. +- Fail if: No example SQLs exist + +**Examples Cover Complex Patterns** +- Check: Whether example SQLs include multi-table joins, window functions, CTEs, or business logic +- Why: Simple queries (single table SELECT) don't need examples — Genie handles those. Examples should demonstrate patterns Genie would struggle with. +- Warning if: All examples are simple single-table queries + +**Examples Are Diverse** +- Check: Whether example SQLs cover different question types and tables +- Why: Redundant examples waste context. Each should teach a distinct pattern. +- Warning if: Multiple examples use nearly identical patterns + +**Queries Are Concise** +- Check: Length and complexity of example SQL queries +- Why: Example queries should be as short as possible while remaining complete. Excessive comments or formatting waste tokens. +- Warning if: Queries contain unnecessary verbosity + +**Parameters Have Descriptions** +- Check: `instructions.example_question_sqls[].parameters[].description` (if parameters exist) +- Why: Parameter descriptions help Genie understand what values to substitute. +- Fail if: Parameters exist without descriptions +- NA if: No parameters are used + +**Complex Examples Have Usage Guidance** +- Check: `instructions.example_question_sqls[].usage_guidance` on complex examples +- Why: Usage guidance tells Genie when to apply a pattern — what keywords or question types should trigger it. +- Warning if: Complex multi-step examples lack usage guidance + +### Join Specs + +**Join Specs for Multi-Table Relationships** +- Check: `serialized_space.instructions.join_specs` array +- Why: Without explicit join specs, Genie may guess wrong join conditions, especially for self-joins or non-obvious foreign keys. +- Warning if: Multiple tables exist but no join specs are defined +- NA if: Only 1 table is configured +- Note: Multiple join specs between the same table pair is the correct pattern for multi-column joins (not a problem to flag). Each `sql` element must be a single equality expression — compound AND/OR is not supported. Recommend adding `comment` and `instruction` fields to related join specs to ensure they are used together. + +**Join Specs Have Comments** +- Check: `instructions.join_specs[].comment` +- Why: Comments explain the business meaning of the relationship, helping Genie choose the right join for a given question. +- Warning if: Join specs exist without comments + +### SQL Snippets + +**Filter Snippets Defined** +- Check: `serialized_space.instructions.sql_snippets.filters` array +- Why: Common filters (time periods, active records, business-specific conditions) reduce errors when Genie needs to filter data. +- Warning if: No filter snippets exist and the space has date/time or status columns + +**Expression Snippets Defined** +- Check: `serialized_space.instructions.sql_snippets.expressions` array +- Why: Reusable expressions for categorizations, calculations, and business logic ensure consistency across queries. +- Warning if: The space has complex business logic but no expression snippets + +**Measure Snippets Defined** +- Check: `serialized_space.instructions.sql_snippets.measures` array +- Why: Measures define standard aggregations (revenue, count, average) that should be computed consistently. +- Warning if: Only 0-1 measures exist and the space has numeric columns that could be aggregated + +--- + +## Benchmarks + +**At Least 10 Diverse Q&A Pairs** +- Check: `serialized_space.benchmarks.questions` array length and diversity +- Why: Benchmarks validate that Genie produces correct SQL. Diverse coverage catches regressions across different question types. +- Fail if: Fewer than 10 benchmark questions +- Warning if: Questions cluster around a single topic or table + +**Benchmark Coverage** +- Check: Whether benchmarks cover different tables, join patterns, aggregations, and filter types +- Why: Narrow benchmarks miss entire categories of user questions. +- Warning if: Benchmarks only test one type of query pattern + +--- + +## Config + +**Sample Questions Present** +- Check: `serialized_space.config.sample_questions` array +- Why: Sample questions appear in the Genie UI as starting points for users. They demonstrate what the space can answer. +- Warning if: No sample questions are defined + +**Sample Questions Are Representative** +- Check: Whether sample questions cover the space's key capabilities +- Why: Sample questions should showcase the most valuable query patterns and guide users toward what the space does well. +- Warning if: Sample questions are generic or don't reflect the space's data diff --git a/databricks-skills/databricks-genie/references/space-schema.md b/databricks-skills/databricks-genie/references/space-schema.md new file mode 100644 index 00000000..a0d9c542 --- /dev/null +++ b/databricks-skills/databricks-genie/references/space-schema.md @@ -0,0 +1,359 @@ +# Serialized Space JSON Schema Reference + +The `serialized_space` field is a JSON string (parsed into a dict) returned by `client.genie.get_space(space_id, include_serialized_space=True)`. It contains the full Genie Space configuration. + +## Table of Contents + +- [Top-Level Structure](#top-level-structure) +- [`config`](#config) +- [`data_sources`](#data_sources) +- [Tables](#tables) +- [`instructions`](#instructions) +- [Text Instructions](#text-instructions) +- [Example Question SQLs](#example-question-sqls) +- [Join Specs](#join-specs) +- [SQL Functions](#sql-functions) +- [SQL Snippets](#sql-snippets) +- [`benchmarks`](#benchmarks) +- [Validation Rules](#validation-rules) + +## Top-Level Structure + +```json +{ + "version": 2, + "config": { ... }, + "data_sources": { ... }, + "instructions": { ... }, + "benchmarks": { ... } +} +``` + +--- + +## `config` + +Space-level configuration. + +```json +{ + "config": { + "sample_questions": [ + { + "id": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", + "question": ["What are the top 10 products by revenue?"] + } + ] + } +} +``` + +| Field | Type | Description | +|-------|------|-------------| +| `sample_questions` | array | Questions shown in the Genie UI as starting points | +| `sample_questions[].id` | string | 32-char lowercase hex identifier | +| `sample_questions[].question` | array of strings | The question text displayed to users | + +--- + +## `data_sources` + +Tables, columns, and metric views available to Genie. + +### Tables + +```json +{ + "data_sources": { + "tables": [ + { + "id": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", + "identifier": "catalog.schema.table_name", + "description": ["Human-readable table description"], + "column_configs": [ + { + "id": "b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6a7", + "column_name": "column_name", + "description": ["What this column contains"], + "synonyms": ["alias1", "alias2"], + // v1 only (rejected by v2 spaces — use v2 equivalents below) + "get_example_values": true, + "build_value_dictionary": true, + "exclude": false, + // v2 only (replaces v1 fields above) + "enable_entity_matching": false, + "enable_format_assistance": false + } + ] + } + ], + "metric_views": [ + { + "id": "c3d4e5f6a7b8c9d0e1f2a3b4c5d6a7b8", + "identifier": "catalog.schema.metric_view_name", + "description": ["What this metric view computes"] + } + ] + } +} +``` + +| Field | Type | Description | +|-------|------|-------------| +| `tables` | array | Unity Catalog tables exposed to Genie | +| `tables[].id` | string | 32-char lowercase hex identifier | +| `tables[].identifier` | string | Fully qualified table name (`catalog.schema.table`) | +| `tables[].description` | array of strings | Human-readable description of the table | +| `tables[].column_configs` | array | Column definitions with metadata | +| `tables[].column_configs[].id` | string | 32-char lowercase hex identifier | +| `tables[].column_configs[].column_name` | string | Column name | +| `tables[].column_configs[].description` | array of strings | Contextual description beyond the column name | +| `tables[].column_configs[].synonyms` | array of strings | Alternative names users might use | +| `tables[].column_configs[].get_example_values` | boolean | (v1 only — use `enable_format_assistance` in v2) Whether Genie fetches sample values for this column | +| `tables[].column_configs[].build_value_dictionary` | boolean | (v1 only — use `enable_entity_matching` in v2) Whether Genie builds a dictionary of discrete values | +| `tables[].column_configs[].exclude` | boolean | Whether to hide this column from Genie | +| `tables[].column_configs[].enable_entity_matching` | boolean | (v2 only — replaces `build_value_dictionary`) Whether Genie matches user terms to column values | +| `tables[].column_configs[].enable_format_assistance` | boolean | (v2 only — replaces `get_example_values`) Whether Genie applies format hints for this column | +| `metric_views` | array | Pre-computed metric views | +| `metric_views[].id` | string | 32-char lowercase hex identifier | +| `metric_views[].identifier` | string | Fully qualified metric view name | +| `metric_views[].description` | array of strings | What the metric view computes | + +> **Version Note:** Spaces with `"version": 2` reject v1 fields (`get_example_values`, `build_value_dictionary`). Use their v2 equivalents (`enable_format_assistance`, `enable_entity_matching`) instead. Including v1 fields in a v2 space config will cause API errors. + +--- + +## `instructions` + +Guidance that shapes how Genie interprets questions and generates SQL. + +### Text Instructions + +```json +{ + "instructions": { + "text_instructions": [ + { + "id": "d4e5f6a7b8c9d0e1f2a3b4c5d6a7b8c9", + "content": ["Revenue is calculated as quantity * unit_price.", "Always filter out cancelled orders unless explicitly requested."] + } + ] + } +} +``` + +| Field | Type | Description | +|-------|------|-------------| +| `text_instructions` | array | Free-text instructions applied globally | +| `text_instructions[].id` | string | 32-char lowercase hex identifier | +| `text_instructions[].content` | array of strings | The instruction text segments | + +### Example Question SQLs + +```json +{ + "instructions": { + "example_question_sqls": [ + { + "id": "e5f6a7b8c9d0e1f2a3b4c5d6a7b8c9d0", + "question": ["What is the monthly revenue trend for the last year?"], + "sql": ["SELECT DATE_TRUNC('month', order_date) AS month, ", "SUM(quantity * unit_price) AS revenue ", "FROM catalog.schema.orders ", "WHERE order_date >= DATE_ADD(CURRENT_DATE(), -365) ", "GROUP BY 1 ORDER BY 1"], + "usage_guidance": ["Use this pattern for any time-series trend question involving revenue or sales amounts"], + "parameters": [ + { + "name": "time_period", + "description": ["The time granularity (day, week, month, quarter, year)"], + "type_hint": "STRING", + "default_value": { "values": ["month"] } + } + ] + } + ] + } +} +``` + +| Field | Type | Description | +|-------|------|-------------| +| `example_question_sqls` | array | Question-SQL pairs that teach Genie query patterns | +| `example_question_sqls[].id` | string | 32-char lowercase hex identifier | +| `example_question_sqls[].question` | array of strings | Natural language question segments | +| `example_question_sqls[].sql` | array of strings | The SQL query segments (join to form full query) | +| `example_question_sqls[].usage_guidance` | array of strings | When Genie should apply this pattern | +| `example_question_sqls[].parameters` | array | Parameterized values in the query | +| `example_question_sqls[].parameters[].name` | string | Parameter name | +| `example_question_sqls[].parameters[].description` | array of strings | What the parameter represents | +| `example_question_sqls[].parameters[].type_hint` | string | Data type hint (e.g., `"STRING"`, `"INT"`) | +| `example_question_sqls[].parameters[].default_value` | object | Default value with `values` array | + +### Join Specs + +```json +{ + "instructions": { + "join_specs": [ + { + "id": "f6a7b8c9d0e1f2a3b4c5d6a7b8c9d0e1", + "left": { + "identifier": "catalog.schema.orders", + "alias": "orders" + }, + "right": { + "identifier": "catalog.schema.customers", + "alias": "customers" + }, + "join_type": "LEFT JOIN", + "sql": ["orders.customer_id = customers.id"], + "comment": ["Link orders to customer details; use LEFT JOIN to include orders with unknown customers"], + "instruction": ["Always use this join when relating orders to customer demographics"] + } + ] + } +} +``` + +> **Multi-column joins:** Use separate join specs for each condition — compound `AND` expressions are not supported in the `sql` field. Add `comment` and `instruction` fields to both specs indicating they should always be used together. + +| Field | Type | Description | +|-------|------|-------------| +| `join_specs` | array | Explicit join definitions for multi-table queries | +| `join_specs[].id` | string | 32-char lowercase hex identifier | +| `join_specs[].left.identifier` | string | Left table fully qualified name | +| `join_specs[].left.alias` | string | Alias for the left table in the join | +| `join_specs[].right.identifier` | string | Right table fully qualified name | +| `join_specs[].right.alias` | string | Alias for the right table in the join | +| `join_specs[].join_type` | string | Join type (INNER JOIN, LEFT JOIN, etc.) | +| `join_specs[].sql` | array of strings | Join condition expression segments. Each element must be a single equality expression (e.g., `"t1.col = t2.col"`). Compound conditions with AND/OR are not supported — use separate join specs for multi-column joins. | +| `join_specs[].comment` | array of strings | Business context for the relationship | +| `join_specs[].instruction` | array of strings | Guidance on when/how to use this join | + +### SQL Functions + +```json +{ + "instructions": { + "sql_functions": [ + { + "id": "a7b8c9d0e1f2a3b4c5d6a7b8c9d0e1f2", + "identifier": "catalog.schema.calculate_margin", + "description": "Calculates profit margin percentage from revenue and cost" + } + ] + } +} +``` + +| Field | Type | Description | +|-------|------|-------------| +| `sql_functions` | array | Unity Catalog functions available to Genie | +| `sql_functions[].id` | string | 32-char lowercase hex identifier | +| `sql_functions[].identifier` | string | Fully qualified function name | +| `sql_functions[].description` | string | What the function does | + +### SQL Snippets + +```json +{ + "instructions": { + "sql_snippets": { + "filters": [ + { + "id": "b8c9d0e1f2a3b4c5d6a7b8c9d0e1f2a3", + "display_name": "Last 30 Days", + "sql": "WHERE order_date >= DATE_ADD(CURRENT_DATE(), -30)", + "synonyms": ["recent", "last month", "past 30 days"], + "instruction": ["Apply this filter when the user asks about recent data"], + "comment": ["Standard recency filter for order-based queries"] + } + ], + "expressions": [ + { + "id": "c9d0e1f2a3b4c5d6a7b8c9d0e1f2a3b4", + "alias": "customer_segment", + "display_name": "Customer Segment", + "sql": "CASE WHEN lifetime_value > 10000 THEN 'Enterprise' WHEN lifetime_value > 1000 THEN 'Mid-Market' ELSE 'SMB' END", + "synonyms": ["segment", "customer tier", "customer type"], + "instruction": ["Use this expression when classifying customers by value tier"], + "comment": ["Segments align with the sales team's tiering model"] + } + ], + "measures": [ + { + "id": "d0e1f2a3b4c5d6a7b8c9d0e1f2a3b4c5", + "alias": "total_revenue", + "display_name": "Total Revenue", + "sql": "SUM(quantity * unit_price)", + "synonyms": ["revenue", "sales", "total sales"], + "instruction": ["Use this measure for any revenue aggregation"], + "comment": ["Revenue includes all non-cancelled order line items"] + } + ] + } + } +} +``` + +| Field | Type | Description | +|-------|------|-------------| +| `sql_snippets` | object | Reusable SQL fragments organized by type | +| `sql_snippets.filters` | array | Common WHERE clause patterns | +| `sql_snippets.expressions` | array | Reusable CASE/calculation expressions | +| `sql_snippets.measures` | array | Standard aggregation definitions | +| `sql_snippets.*[].id` | string | 32-char lowercase hex identifier | +| `sql_snippets.filters[].display_name` | string | Human-readable name for the filter | +| `sql_snippets.filters[].sql` | string | The SQL fragment | +| `sql_snippets.expressions[].alias` | string | Column alias used in generated SQL | +| `sql_snippets.expressions[].display_name` | string | Human-readable name for the expression | +| `sql_snippets.expressions[].sql` | string | The SQL fragment | +| `sql_snippets.measures[].alias` | string | Column alias used in generated SQL | +| `sql_snippets.measures[].display_name` | string | Human-readable name for the measure | +| `sql_snippets.measures[].sql` | string | The SQL fragment | +| `sql_snippets.*[].synonyms` | array of strings | Terms that trigger this snippet | +| `sql_snippets.*[].instruction` | array of strings | Guidance on when to use this snippet | +| `sql_snippets.*[].comment` | array of strings | Additional context or notes | + +--- + +## `benchmarks` + +Q&A pairs for validating Genie's SQL generation accuracy. + +```json +{ + "benchmarks": { + "questions": [ + { + "id": "e1f2a3b4c5d6a7b8c9d0e1f2a3b4c5d6", + "question": ["What are the top 5 customers by total spend?"], + "answer": [ + { + "format": "SQL", + "content": ["SELECT c.name, SUM(o.amount) AS total_spend ", "FROM catalog.schema.customers c ", "JOIN catalog.schema.orders o ON c.id = o.customer_id ", "GROUP BY 1 ORDER BY 2 DESC LIMIT 5"] + } + ] + } + ] + } +} +``` + +| Field | Type | Description | +|-------|------|-------------| +| `benchmarks.questions` | array | Benchmark question-answer pairs | +| `benchmarks.questions[].id` | string | 32-char lowercase hex identifier (unique across sample_questions and benchmarks) | +| `benchmarks.questions[].question` | array of strings | The test question segments (join to form full question) | +| `benchmarks.questions[].answer` | array of objects | Expected answers (find the one with `format: "SQL"` for SQL benchmarks) | +| `benchmarks.questions[].answer[].format` | string | Answer format (typically `"SQL"`) | +| `benchmarks.questions[].answer[].content` | array of strings | The expected SQL query segments (join to form full query) | + +--- + +## Validation Rules + +| Rule | Details | +|------|---------| +| **IDs** | 32-char lowercase hexadecimal, no hyphens (e.g., `a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6`) | +| **Sorting** | Collections with IDs or identifiers must be sorted alphabetically | +| **String length** | Maximum 25,000 characters per string | +| **Array length** | Maximum 10,000 items per array | +| **ID uniqueness** | Question IDs must be unique across `sample_questions` and `benchmarks.questions` | diff --git a/databricks-skills/databricks-genie/references/workflow-analyze.md b/databricks-skills/databricks-genie/references/workflow-analyze.md new file mode 100644 index 00000000..88cf8f90 --- /dev/null +++ b/databricks-skills/databricks-genie/references/workflow-analyze.md @@ -0,0 +1,80 @@ +# Workflow: Analyze with Best Practices + +Evaluate the space configuration against the best practices checklist and produce a detailed analysis report. + +## Step 3a: Load Checklist + +Read `references/best-practices-checklist.md` for the full evaluation criteria. + +## Step 3b: Load Schema Reference (if needed) + +If you need to understand specific fields in the serialized space JSON, read `references/space-schema.md`. + +## Step 3c: Evaluate Each Checklist Item + +For each item in the checklist, examine the fetched space configuration and determine: + +- **Status**: `pass`, `fail`, `warning`, or `na` +- **Explanation**: Why this assessment was made, referencing specific data from the space +- **Fix** (for fail/warning only): A specific, actionable recommendation + +Be concrete — reference actual table names, column names, instruction text, and field values from the space. Don't give generic advice. + +Examples of specific fixes: +- "Add a description to column `unit_price` in table `catalog.schema.orders` — e.g., `'Unit price in USD for a single item'`" +- "Add synonyms `['revenue', 'sales amount']` to column `total_sales` in table `catalog.schema.transactions`" +- "Enable `enable_format_assistance: true` (v2) or `get_example_values: true` (v1) on column `region` in table `catalog.schema.stores` — this column appears filterable" +- "Add a join spec between `catalog.schema.orders` and `catalog.schema.customers` on `orders.customer_id = customers.id`" + +## Step 3d: Generate Output + +Present the analysis in this format: + +#### Summary +- Total items evaluated: N +- Pass: X | Fail: Y | Warning: Z | N/A: W + +#### Data Sources +| Item | Status | Explanation | +|------|--------|-------------| +| ... | ... | ... | + +Fixes: +1. ... + +#### Instructions +| Item | Status | Explanation | +|------|--------|-------------| +| ... | ... | ... | + +Fixes: +1. ... + +#### Benchmarks +| Item | Status | Explanation | +|------|--------|-------------| +| ... | ... | ... | + +Fixes: +1. ... + +#### Config +| Item | Status | Explanation | +|------|--------|-------------| +| ... | ... | ... | + +Fixes: +1. ... + +#### Priority Recommendations +List the top 3-5 most impactful fixes, ordered by expected improvement to Genie accuracy. + +## Step 3e: Save Report + +**Claude Code (local):** +1. Create a `reports//` directory in the user's project root if it doesn't already exist. +2. Save the full analysis markdown (everything from Step 3d) to `reports//config-analysis.md` in the project root. +3. Inform the user of the saved file path. + +**Databricks notebook:** +Create a new notebook code cell that renders the analysis as cell output using `displayHTML()` or by printing the markdown string. Do not display the report only in the chat panel. diff --git a/databricks-skills/databricks-genie/references/workflow-benchmark.md b/databricks-skills/databricks-genie/references/workflow-benchmark.md new file mode 100644 index 00000000..e7ce97d4 --- /dev/null +++ b/databricks-skills/databricks-genie/references/workflow-benchmark.md @@ -0,0 +1,148 @@ +# Workflow: Analyze with Benchmarks + +Run the space's benchmark questions against Genie via the SDK, compare generated SQL to expected SQL, and produce a detailed accuracy report. + +## Table of Contents + +- [Step 3a: Extract Benchmark Questions](#step-3a-extract-benchmark-questions) +- [Step 3b: Present Benchmarks for Selection](#step-3b-present-benchmarks-for-selection) +- [Step 3c: Run Selected Benchmarks](#step-3c-run-selected-benchmarks) +- [Step 3d: Analyze Each Result](#step-3d-analyze-each-result) +- [Step 3e: Generate Benchmark Report](#step-3e-generate-benchmark-report) +- [Step 3f: Save Report](#step-3f-save-report) + +## Step 3a: Extract Benchmark Questions + +From the fetched space configuration, read `serialized_space.benchmarks.questions`. + +Parse the benchmark data format: +- `question` is an **array of strings** — join them to get the full question text +- `answer` is a **list of objects** — find the one with `format: "SQL"` +- `answer[].content` is an **array of strings** — join them to get the full expected SQL + +If the space has no benchmarks (empty or missing `benchmarks.questions`), inform the user: +> "This Genie Space has no benchmark questions configured. Benchmarks are question-answer pairs that let you test Genie's SQL generation accuracy. Would you like to run the best practices analysis instead?" + +## Step 3b: Present Benchmarks for Selection + +Display the benchmark questions as a numbered list: + +``` +Found N benchmark questions: + 1. What are the top 5 customers by total spend? + 2. What is the monthly revenue trend? + 3. ... + +Which benchmarks would you like to run? Enter numbers (e.g., "1,3,5"), a range (e.g., "1-5"), or "all". +``` + +Wait for the user's selection before proceeding. + +## Step 3c: Run Selected Benchmarks + +Read `scripts/run_benchmark.py` for the implementation, then execute each selected benchmark question **sequentially**: + +- **Claude Code**: Run via bash: + ```bash + python scripts/run_benchmark.py "" + ``` +- **Databricks notebook**: Read the script to understand the implementation. For each selected question, create a new notebook code cell containing the function definition and a call to it. Replace any `sys.exit()` calls with `raise` statements. Run the cell and read its output before proceeding to the next question. Report progress in the chat after each cell completes. + +After each question completes, report progress: +``` +[1/5] "What are the top 5 customers by total spend?" — SQL generated +[2/5] "What is the monthly revenue trend?" — failed: +[3/5] "Show cancelled orders from last quarter" — timed out +``` + +**Error handling:** +- **Exit code 1 / `RuntimeError`** (script-level error: auth failure, space not found) → halt all remaining benchmarks and report the error to the user +- **`status: "FAILED"`, `"TIMEOUT"`, or `"ERROR"`** in the result (including cancelled/expired terminal statuses surfaced as `ERROR`) → record the result and continue to the next question + +## Step 3d: Analyze Each Result + +For each benchmark that produced SQL (`status: "COMPLETED"` with `generated_sql` present), compare the generated SQL against the expected SQL across these dimensions: + +| Dimension | What to compare | +|-----------|----------------| +| Tables referenced | Same tables used (ignoring alias differences)? | +| Join conditions | Same joins with equivalent conditions? | +| WHERE clauses | Same filters applied (accounting for equivalent expressions)? | +| Aggregations | Same aggregate functions on same columns? | +| GROUP BY | Same grouping columns? | +| ORDER BY | Same ordering columns and direction? | +| LIMIT | Same row limit? | +| Column selection | Same output columns (ignoring aliases)? | +| Expressions | Same calculations and transformations? | + +Assign a verdict to each question: +- **correct**: Generated SQL is semantically equivalent to expected SQL (may differ in formatting, aliases, or expression order) +- **partial**: Right general approach but with meaningful differences (e.g., missing a filter, different aggregation) +- **incorrect**: Wrong logic (wrong tables, wrong joins, wrong calculations) +- **error**: Genie could not generate SQL (failed, timed out, cancelled/expired, or returned a text response instead) + +## Step 3e: Generate Benchmark Report + +Produce the report in this markdown format: + +```markdown +# Benchmark Analysis: + +**Space ID:** `` +**Date:** +**Questions tested:** X of Y total benchmarks + +## Summary + +| Verdict | Count | +|---------|-------| +| Correct | X | +| Partial | X | +| Incorrect | X | +| Error | X | +| **Score** | **X/Y (Z%)** | + +Score counts correct as 1, partial as 0.5, incorrect and error as 0. + +## Detailed Results + +### 1. + +**Verdict:** correct | partial | incorrect | error + +**Expected SQL:** +```sql + +``` + +**Generated SQL:** +```sql +"> +``` + +**Analysis:** + + +--- + +### 2. +... + +## Patterns & Recommendations + + +- If multiple questions missed a specific join, recommend adding a join spec +- If aggregations are consistently wrong, recommend adding example SQLs +- If certain tables are never used correctly, recommend improving table/column descriptions +- Link recommendations to specific Genie Space configuration changes +``` + +## Step 3f: Save Report + +**Claude Code (local):** +1. Create a `reports//` directory in the user's project root if it doesn't already exist. +2. Save the full report markdown to `reports//benchmark-analysis.md` in the project root. +3. Inform the user of the saved file path. + +**Databricks notebook:** +Create a new notebook code cell that renders the benchmark report as cell output using `displayHTML()` or by printing the markdown string. Do not display the report only in the chat panel. diff --git a/databricks-skills/databricks-genie/references/workflow-optimize.md b/databricks-skills/databricks-genie/references/workflow-optimize.md new file mode 100644 index 00000000..c5ae9b95 --- /dev/null +++ b/databricks-skills/databricks-genie/references/workflow-optimize.md @@ -0,0 +1,164 @@ +# Workflow: Optimize Genie Space + +Create a new optimized Genie Space by applying findings from the best practices analysis (Option A) and benchmark analysis (Option B). The original space is preserved — a new copy is created with improvements applied. + +## Table of Contents + +- [Step 3a: Verify Prerequisites](#step-3a-verify-prerequisites) +- [Step 3b: Load Analysis Results](#step-3b-load-analysis-results) +- [Step 3c: Generate Optimization Plan](#step-3c-generate-optimization-plan) +- [Step 3d: Present Changes for User Review](#step-3d-present-changes-for-user-review) +- [Step 3e: Apply Changes to Config](#step-3e-apply-changes-to-config) +- [Step 3f: Create New Genie Space](#step-3f-create-new-genie-space) +- [Step 3g: Save Report & Present Results](#step-3g-save-report-present-results) + +## Step 3a: Verify Prerequisites + +Check that both analysis reports exist: + +- **Claude Code**: Verify these files exist in the project root: + - `reports//config-analysis.md` + - `reports//benchmark-analysis.md` +- **Databricks notebook**: Verify that Option A and Option B analyses are available in previous notebook cell outputs or kernel memory variables. + +If either is missing, inform the user which workflow(s) need to run first and offer to start them. + +## Step 3b: Load Analysis Results + +- **Claude Code**: Read all three files: + - `reports//config-analysis.md` (best practices findings) + - `reports//benchmark-analysis.md` (benchmark findings) + - `reports//space-config.json` (original space configuration) +- **Databricks notebook**: Read previous cell outputs containing the analyses. Use the `space_config` variable from kernel memory for the original configuration. + +Also load `references/space-schema.md` as reference for valid field structures and validation rules. + +## Step 3c: Generate Optimization Plan + +Parse the analysis reports and map findings to specific config changes. Group changes by category: + +| Finding Source | Change Category | Config Path | +|---|---|---| +| Best practices: fail/warning on table descriptions | Data source | `data_sources.tables[].description` | +| Best practices: fail/warning on column descriptions | Data source | `data_sources.tables[].column_configs[].description` | +| Best practices: warning on missing synonyms | Data source | `data_sources.tables[].column_configs[].synonyms` | +| Best practices: warning on example values | Data source | `data_sources.tables[].column_configs[].get_example_values` (v1) or `enable_format_assistance` (v2) | +| Best practices: fail on text instructions | Instruction | `instructions.text_instructions` | +| Best practices: fail on example SQLs | Instruction | `instructions.example_question_sqls` | +| Best practices: warning on join specs | Instruction | `instructions.join_specs` | +| Best practices: warning on SQL snippets | Instruction | `instructions.sql_snippets.*` | +| Benchmark: incorrect/partial verdicts | Multiple | Add example SQLs for failing patterns, improve instructions | + +**Join spec constraints:** Each `sql` array element must contain a single equality expression. +For multi-column joins, create separate join specs with `comment` and `instruction` fields +that reference each other (e.g., "Always use with the companion YEAR join spec"). +Do not combine conditions with AND/OR. + +For each change: +1. Reference the specific finding that motivates it (e.g., "config-analysis: fail on column descriptions for `catalog.schema.orders`") +2. Generate the actual new values — not just "add a description" but the actual description text, SQL, synonyms, etc. +3. Use the space's existing data, table names, column names, and business context to produce accurate values + +## Step 3d: Present Changes for User Review + +Present a structured summary of all proposed changes: + +``` +## Proposed Optimization: + +**Changes summary:** X data source changes, Y instruction changes, Z benchmark-driven changes + +### Data Source Changes +1. [table: catalog.schema.orders] Add description: "..." (was: missing) +2. [column: orders.unit_price] Add description: "..." (was: missing) +3. [column: orders.region] Add synonyms: ["area", "territory"] (was: none) +4. [column: orders.region] Enable get_example_values (v1) / enable_format_assistance (v2) (was: false) +... + +### Instruction Changes +1. Add text instruction: "..." +2. Add example SQL: "What is the monthly revenue trend?" → SELECT ... +3. Add join spec: orders ↔ customers on customer_id (LEFT JOIN) +4. Add filter snippet: "Last 30 Days" → WHERE order_date >= DATE_ADD(CURRENT_DATE(), -30) +... + +### Benchmark-Driven Changes +1. Add example SQL for pattern that caused incorrect verdict on Q3: ... +2. Improve column description for `total_amount` (caused partial verdict on Q7): ... +... +``` + +Wait for user approval. The user may request modifications before proceeding. Apply any requested changes to the plan before continuing. + +## Step 3e: Apply Changes to Config + +1. Deep copy the original `serialized_space` dict from the space config +2. Apply each approved change to produce the updated config: + - For new entries (text instructions, example SQLs, join specs, snippets, benchmarks), generate a new 32-char lowercase hex ID for each + - For modifications (descriptions, synonyms, example values), update the existing entries in place +3. Validate the updated config against the rules in `references/space-schema.md`: + - All IDs are 32-char lowercase hex + - Collections with IDs or identifiers are sorted alphabetically + - Question IDs are unique across `sample_questions` and `benchmarks.questions` + - Strings do not exceed 25,000 characters + - Arrays do not exceed 10,000 items + +## Step 3f: Create New Genie Space + +Read `scripts/create_optimized_space.py` for the implementation, then execute it: + +- **Claude Code**: + 1. Save the updated config dict as JSON to `reports//optimized-space-config.json` + 2. Run the creation script via bash: + ```bash + python scripts/create_optimized_space.py reports//optimized-space-config.json + ``` +- **Databricks notebook**: Read the script to understand the implementation. Create a new notebook code cell containing the function definition and a call to it. Replace any `sys.exit()` calls with `raise` statements. The cell should: + 1. Take the updated config dict (from the previous cell) + 2. Call `create_optimized_space(original_space_id, updated_config)` to create the new space + 3. Print the result JSON + + Run the cell and read its output. + +If the script fails: +- **`ImportError`**: Prompt user to `pip install "databricks-sdk>=0.85"` (Claude Code only — SDK is pre-installed in Databricks) +- **Auth failure**: Prompt user to run `databricks configure` or check environment variables (Claude Code only — Databricks notebooks auto-authenticate) +- **Permission denied (`403` / `PERMISSION_DENIED`)**: User may not have permission to create Genie Spaces +- **Not found (`404` / `NOT_FOUND`)**: Verify the original space ID + +## Step 3g: Save Report & Present Results + +**Claude Code (local):** +1. Save a summary report to `reports//optimization-report.md` in the project root. +2. Inform the user of the saved file path. + +**Databricks notebook:** +Create a new notebook code cell that renders the optimization report as cell output using `displayHTML()` or by printing the markdown string. Do not display the report only in the chat panel. + +The report should include: + +```markdown +# Optimization Report: + +**Original Space ID:** `` +**New Space ID:** `` +**New Space URL:** `https:///spaces/` +**Date:** + +## Changes Applied + +### Data Source Changes (X total) +1. ... + +### Instruction Changes (Y total) +1. ... + +### Benchmark-Driven Changes (Z total) +1. ... + +## Next Steps +- Compare the original and optimized spaces side by side +- Run benchmark analysis (Option B) on the new space to measure improvement +- Review and adjust the new space's configuration in the Genie UI +- Share the new space with your team for feedback +``` diff --git a/databricks-skills/databricks-genie/scripts/create_optimized_space.py b/databricks-skills/databricks-genie/scripts/create_optimized_space.py new file mode 100644 index 00000000..c53e3cd4 --- /dev/null +++ b/databricks-skills/databricks-genie/scripts/create_optimized_space.py @@ -0,0 +1,117 @@ +#!/usr/bin/env python3 +""" +Create a new optimized Databricks Genie Space from an updated configuration. + +Usage: python create_optimized_space.py +Output: JSON to stdout with new_space_id, new_space_title, original_space_id +Exit codes: 0 success, 1 error (message to stderr) + +Requires: + - databricks-sdk >= 0.85 (pip install "databricks-sdk>=0.85") + - Databricks CLI profile configured (databricks configure) + - CAN EDIT permission on the original Genie Space +""" + +import json +import sys + + +def create_optimized_space(original_space_id: str, updated_config: dict) -> dict: + """Create a new Genie Space from an updated serialized config.""" + try: + from databricks.sdk import WorkspaceClient + except ImportError: + print( + 'Error: databricks-sdk is not installed. Run: pip install "databricks-sdk>=0.85"', + file=sys.stderr, + ) + sys.exit(1) + + try: + client = WorkspaceClient() + except Exception as e: + print( + f"Error: Failed to initialize Databricks client. " + f"Ensure your CLI profile is configured (databricks configure).\n{e}", + file=sys.stderr, + ) + sys.exit(1) + + # Fetch original space to get warehouse_id and title + try: + original_space = client.genie.get_space(space_id=original_space_id) + except Exception as e: + error_msg = str(e) + if "PERMISSION_DENIED" in error_msg or "403" in error_msg: + print( + f"Error: Permission denied. You need CAN EDIT permission on space '{original_space_id}'.", + file=sys.stderr, + ) + elif "NOT_FOUND" in error_msg or "404" in error_msg: + print( + f"Error: Genie Space '{original_space_id}' not found. Check the space ID.", + file=sys.stderr, + ) + else: + print(f"Error: Failed to fetch space '{original_space_id}': {e}", file=sys.stderr) + sys.exit(1) + + if not original_space.warehouse_id: + print( + f"Error: Original space '{original_space_id}' has no warehouse_id. " + f"Cannot create a new space without a warehouse.", + file=sys.stderr, + ) + sys.exit(1) + + new_title = f"[Optimized] {original_space.title}" + + # Create the new space + try: + new_space = client.genie.create_space( + warehouse_id=original_space.warehouse_id, + serialized_space=json.dumps(updated_config, ensure_ascii=False), + title=new_title, + description=original_space.description, + ) + except Exception as e: + error_msg = str(e) + if "PERMISSION_DENIED" in error_msg or "403" in error_msg: + print( + "Error: Permission denied. You may not have permission to create Genie Spaces.", + file=sys.stderr, + ) + else: + print(f"Error: Failed to create new space: {e}", file=sys.stderr) + sys.exit(1) + + return { + "new_space_id": new_space.space_id, + "new_space_title": new_title, + "original_space_id": original_space_id, + } + + +if __name__ == "__main__": + if len(sys.argv) != 3: + print( + "Usage: python create_optimized_space.py ", + file=sys.stderr, + ) + sys.exit(1) + + original_space_id = sys.argv[1] + config_path = sys.argv[2] + + try: + with open(config_path, "r", encoding="utf-8") as f: + updated_config = json.load(f) + except FileNotFoundError: + print(f"Error: Config file not found: {config_path}", file=sys.stderr) + sys.exit(1) + except json.JSONDecodeError as e: + print(f"Error: Invalid JSON in config file: {e}", file=sys.stderr) + sys.exit(1) + + result = create_optimized_space(original_space_id, updated_config) + print(json.dumps(result, indent=2, ensure_ascii=False)) diff --git a/databricks-skills/databricks-genie/scripts/fetch_space.py b/databricks-skills/databricks-genie/scripts/fetch_space.py new file mode 100644 index 00000000..a77cafc3 --- /dev/null +++ b/databricks-skills/databricks-genie/scripts/fetch_space.py @@ -0,0 +1,84 @@ +#!/usr/bin/env python3 +""" +Fetch a Databricks Genie Space's serialized configuration. + +Usage: python fetch_space.py +Output: JSON to stdout +Exit codes: 0 success, 1 error (message to stderr) + +Requires: + - databricks-sdk >= 0.85 (pip install "databricks-sdk>=0.85") + - Databricks CLI profile configured (databricks configure) + - CAN EDIT permission on the target Genie Space +""" + +import json +import sys + + +def fetch_space(space_id: str) -> dict: + """Fetch a Genie Space with its serialized configuration.""" + try: + from databricks.sdk import WorkspaceClient + except ImportError: + print( + 'Error: databricks-sdk is not installed. Run: pip install "databricks-sdk>=0.85"', + file=sys.stderr, + ) + sys.exit(1) + + try: + client = WorkspaceClient() + except Exception as e: + print( + f"Error: Failed to initialize Databricks client. " + f"Ensure your CLI profile is configured (databricks configure).\n{e}", + file=sys.stderr, + ) + sys.exit(1) + + try: + space = client.genie.get_space( + space_id=space_id, + include_serialized_space=True, + ) + except Exception as e: + error_msg = str(e) + if "PERMISSION_DENIED" in error_msg or "403" in error_msg: + print( + f"Error: Permission denied. You need CAN EDIT permission on space '{space_id}'.", + file=sys.stderr, + ) + elif "NOT_FOUND" in error_msg or "404" in error_msg: + print( + f"Error: Genie Space '{space_id}' not found. Check the space ID.", + file=sys.stderr, + ) + else: + print(f"Error: Failed to fetch space '{space_id}': {e}", file=sys.stderr) + sys.exit(1) + + if not space.serialized_space: + print( + "Error: Could not retrieve serialized_space. " + "Ensure you have CAN EDIT permission on the Genie Space.", + file=sys.stderr, + ) + sys.exit(1) + + return { + "title": space.title, + "description": space.description, + "space_id": space_id, + "warehouse_id": space.warehouse_id, + "serialized_space": json.loads(space.serialized_space), + } + + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: python fetch_space.py ", file=sys.stderr) + sys.exit(1) + + result = fetch_space(sys.argv[1]) + print(json.dumps(result, indent=2, ensure_ascii=False)) diff --git a/databricks-skills/databricks-genie/scripts/run_benchmark.py b/databricks-skills/databricks-genie/scripts/run_benchmark.py new file mode 100644 index 00000000..a6a75995 --- /dev/null +++ b/databricks-skills/databricks-genie/scripts/run_benchmark.py @@ -0,0 +1,151 @@ +#!/usr/bin/env python3 +""" +Run a single benchmark question against a Databricks Genie Space. + +Usage: python run_benchmark.py +Output: JSON to stdout +Exit codes: 0 = result available (success or Genie-level failure), 1 = script-level error (stderr) + +Requires: + - databricks-sdk >= 0.85 (pip install "databricks-sdk>=0.85") + - Databricks CLI profile configured (databricks configure) + - CAN EDIT permission on the target Genie Space +""" + +import json +import sys +import time +from datetime import timedelta + + +def run_benchmark(space_id: str, question: str) -> dict: + """Send a question to Genie and return the structured result.""" + try: + from databricks.sdk import WorkspaceClient + from databricks.sdk.service.dashboards import MessageStatus + except ImportError: + print( + 'Error: databricks-sdk is not installed. Run: pip install "databricks-sdk>=0.85"', + file=sys.stderr, + ) + sys.exit(1) + + try: + client = WorkspaceClient() + except Exception as e: + print( + f"Error: Failed to initialize Databricks client. " + f"Ensure your CLI profile is configured (databricks configure).\n{e}", + file=sys.stderr, + ) + sys.exit(1) + + result = { + "space_id": space_id, + "question": question, + "status": None, + "generated_sql": None, + "query_description": None, + "text_response": None, + "error": None, + } + + try: + start = client.genie.start_conversation( + space_id=space_id, + content=question, + ) + except Exception as e: + error_msg = str(e) + if "PERMISSION_DENIED" in error_msg or "403" in error_msg: + print( + f"Error: Permission denied on space '{space_id}'.", + file=sys.stderr, + ) + sys.exit(1) + if "NOT_FOUND" in error_msg or "404" in error_msg: + print( + f"Error: Genie Space '{space_id}' not found.", + file=sys.stderr, + ) + sys.exit(1) + result["status"] = "ERROR" + result["error"] = f"SDK error: {error_msg}" + return result + + conversation_id = start.response.conversation_id + message_id = start.response.message_id + deadline = time.time() + timedelta(minutes=5).total_seconds() + sleep_seconds = 1 + message = None + active_statuses = { + MessageStatus.SUBMITTED, + MessageStatus.ASKING_AI, + MessageStatus.FETCHING_METADATA, + MessageStatus.FILTERING_CONTEXT, + MessageStatus.PENDING_WAREHOUSE, + MessageStatus.EXECUTING_QUERY, + } + + while time.time() < deadline: + try: + message = client.genie.get_message( + space_id=space_id, + conversation_id=conversation_id, + message_id=message_id, + ) + except Exception as e: + result["status"] = "ERROR" + result["error"] = f"SDK error while polling message: {e}" + return result + + if message.status == MessageStatus.COMPLETED: + result["status"] = "COMPLETED" + break + + if message.status == MessageStatus.FAILED: + result["status"] = "FAILED" + result["error"] = ( + message.error.message if message.error else "Unknown Genie error" + ) + return result + + if message.status in active_statuses: + time.sleep(min(sleep_seconds, 10)) + sleep_seconds += 1 + continue + + result["status"] = "ERROR" + result["error"] = ( + f"Genie returned terminal status '{message.status}' before completion" + ) + return result + + if result["status"] is None: + result["status"] = "TIMEOUT" + result["error"] = "Genie did not respond within 5 minutes" + return result + + if message.attachments: + for attachment in message.attachments: + if attachment.query and attachment.query.query: + result["generated_sql"] = attachment.query.query + if attachment.query.description: + result["query_description"] = attachment.query.description + break + if attachment.text and attachment.text.content: + result["text_response"] = attachment.text.content + + return result + + +if __name__ == "__main__": + if len(sys.argv) != 3: + print( + "Usage: python run_benchmark.py ", + file=sys.stderr, + ) + sys.exit(1) + + output = run_benchmark(sys.argv[1], sys.argv[2]) + print(json.dumps(output, indent=2, ensure_ascii=False)) diff --git a/databricks-skills/databricks-genie/spaces.md b/databricks-skills/databricks-genie/spaces.md index 8549d6bd..cd62e536 100644 --- a/databricks-skills/databricks-genie/spaces.md +++ b/databricks-skills/databricks-genie/spaces.md @@ -182,6 +182,15 @@ The tool finds the existing space by name and updates it. 6. **Test** in the Databricks UI +## Advanced Optimization + +For deeper quality tuning after initial setup: + +- Use `scripts/fetch_space.py` to retrieve serialized config for a target space +- Run the best-practices workflow in `references/workflow-analyze.md` +- Run benchmark-driven SQL analysis in `references/workflow-benchmark.md` +- Use `references/workflow-optimize.md` with `scripts/create_optimized_space.py` to create an optimized copy + ## Troubleshooting ### No warehouse available diff --git a/databricks-skills/install_skills.sh b/databricks-skills/install_skills.sh index 30339ade..b060f754 100755 --- a/databricks-skills/install_skills.sh +++ b/databricks-skills/install_skills.sh @@ -62,7 +62,7 @@ get_skill_description() { "databricks-config") echo "Profile authentication setup for Databricks" ;; "databricks-dbsql") echo "Databricks SQL - SQL scripting, MVs, geospatial, AI functions, federation" ;; "databricks-docs") echo "Documentation reference via llms.txt" ;; - "databricks-genie") echo "Genie Spaces - create, curate, and query via Conversation API" ;; + "databricks-genie") echo "Genie Spaces - create, query, analyze, benchmark, and optimize via Genie API" ;; "databricks-iceberg") echo "Apache Iceberg - managed tables, UniForm, IRC, Snowflake interop, migration" ;; "databricks-jobs") echo "Databricks Lakeflow Jobs - workflow orchestration" ;; "databricks-python-sdk") echo "Databricks Python SDK, Connect, and REST API" ;; @@ -97,7 +97,7 @@ get_skill_extra_files() { case "$1" in "databricks-agent-bricks") echo "1-knowledge-assistants.md 2-supervisor-agents.md" ;; "databricks-aibi-dashboards") echo "widget-reference.md sql-patterns.md" ;; - "databricks-genie") echo "spaces.md conversation.md" ;; + "databricks-genie") echo "spaces.md conversation.md references/best-practices-checklist.md references/space-schema.md references/workflow-analyze.md references/workflow-benchmark.md references/workflow-optimize.md scripts/create_optimized_space.py scripts/fetch_space.py scripts/run_benchmark.py" ;; "databricks-asset-bundles") echo "alerts_guidance.md SDP_guidance.md" ;; "databricks-iceberg") echo "1-managed-iceberg-tables.md 2-uniform-and-compatibility.md 3-iceberg-rest-catalog.md 4-snowflake-interop.md 5-external-engine-interop.md" ;; "databricks-app-apx") echo "backend-patterns.md best-practices.md frontend-patterns.md" ;;