Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
176 changes: 176 additions & 0 deletions examples/ai-test-agents/goose/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
# Goose AI Developer Agent for PyAirbyte

This example demonstrates how to use [Goose](https://github.com/block/goose), an open-source AI developer agent from Block (formerly Square), to test and validate PyAirbyte functionality.

## About Goose

Goose is an AI developer agent that runs on your machine and helps with coding tasks. It can execute code, run tests, debug issues, and automate development workflows. Goose supports any LLM provider and integrates with MCP (Model Context Protocol) servers.

## Prerequisites

1. Goose installed on your system
2. Python 3.10 or higher
3. PyAirbyte installed
4. LLM API key (OpenAI, Anthropic, or other supported provider)

## Installation

### Install Goose

**Option 1: Using Homebrew (macOS/Linux)**
```bash
brew install block/goose/goose
```

**Option 2: Using pipx (Cross-platform)**
```bash
pipx install goose-ai
```

**Option 3: Using cargo (Rust)**
```bash
cargo install goose-cli
```

### Install PyAirbyte

```bash
pip install airbyte
```

## Usage

### Interactive Mode

Start Goose in interactive mode and ask it to test PyAirbyte:

```bash
goose session start
```

Then provide prompts like:

```
Test PyAirbyte by:
1. Creating a source-faker connector with count=10
2. Reading data into a local cache
3. Validating that data was successfully read
4. Printing the first few records
```

### Session File Mode

You can also create a session file with predefined tasks:

```bash
goose session start --plan test_pyairbyte_session.md
```

See `test_pyairbyte_session.md` for an example session plan.

### Using Goose with MCP

Goose can integrate with PyAirbyte's MCP server to access connector functionality:

1. Start PyAirbyte's MCP server:
```bash
airbyte-mcp
```

2. Configure Goose to use the MCP server (add to `~/.config/goose/profiles.yaml`):
```yaml
default:
provider: openai
processor: gpt-4o
accelerator: gpt-4o-mini
moderator: passive
mcp_servers:
airbyte:
command: airbyte-mcp
```

3. Start Goose and it will have access to PyAirbyte's 44+ MCP tools

## Example Tasks

### Task 1: Basic Connector Test

Ask Goose to:
```
Write and run a Python script that:
1. Imports airbyte
2. Creates a source-faker connector
3. Checks the connection
4. Reads 10 records into a local cache
5. Validates the data and prints results
```

### Task 2: Connector Discovery

Ask Goose to:
```
Write a script to discover all available PyAirbyte source connectors
and verify that source-faker is in the list
```

### Task 3: Data Validation

Ask Goose to:
```
Create a test that:
1. Reads data from source-faker
2. Validates the schema of the returned data
3. Checks that all expected columns are present
4. Verifies data types are correct
```

## Example Session Output

When you run Goose with the test tasks, you should see output like:

```
🪿 Goose: I'll help you test PyAirbyte. Let me create a test script...

[Goose creates and runs a Python script]

✓ Successfully created source-faker connector
✓ Connection check passed
✓ Read 10 records into cache
✓ Data validation passed

Results:
- Stream: users
- Records: 10
- Columns: id, name, email, created_at
- All validations passed ✓
```

## Advantages of Using Goose

1. **Interactive Testing**: Goose can interactively test PyAirbyte and adapt based on results
2. **Code Generation**: Automatically generates test scripts and validation code
3. **Error Handling**: Can debug and fix issues it encounters during testing
4. **MCP Integration**: Can leverage PyAirbyte's MCP server for advanced operations
5. **Multi-step Workflows**: Can execute complex test scenarios with multiple steps

## Comparison with Hercules

While Hercules uses predefined Gherkin scenarios for testing, Goose provides:
- More flexible, conversational testing approach
- Ability to adapt tests based on intermediate results
- Code generation and debugging capabilities
- Integration with development tools and MCP servers

## Limitations

- Goose requires an LLM API key and makes API calls for each interaction
- Results may vary based on the LLM model used
- Less deterministic than traditional test frameworks
- Best suited for exploratory testing and development tasks

## Additional Resources

- [Goose Documentation](https://block.github.io/goose/)
- [Goose GitHub Repository](https://github.com/block/goose)
- [PyAirbyte Documentation](https://docs.airbyte.com/using-airbyte/pyairbyte/getting-started)
- [PyAirbyte MCP Server](https://docs.airbyte.com/using-airbyte/pyairbyte/mcp-server)
77 changes: 77 additions & 0 deletions examples/ai-test-agents/goose/goose_demo.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
"""
Example test script for PyAirbyte that can be executed by Goose.

This script demonstrates basic PyAirbyte functionality that Goose can run
to validate the library is working correctly.
"""

import airbyte as ab


def example_basic_source_connector():
"""Example: Test creating and reading from a source connector."""
print("Testing basic source connector functionality...")

source = ab.get_source(
"source-faker", config={"count": 10}, install_if_missing=True
)

print("Checking connection...")
source.check()
print("✓ Connection check passed")

print("Reading data into cache...")
cache = ab.new_local_cache()
result = source.read(cache)

df = cache["users"].to_pandas()

assert len(df) > 0, "No data was read from source"
assert "id" in df.columns, "Expected 'id' column not found"

print(f"✓ Successfully read {len(df)} records from source-faker")
print(f"✓ Columns: {', '.join(df.columns)}")

return True


def example_connector_discovery():
"""Example: Test discovering available connectors."""
print("\nTesting connector discovery...")

from airbyte.registry import get_available_connectors

sources = get_available_connectors(connector_type="source")

assert len(sources) > 0, "No source connectors found"

faker_found = any(c.name == "source-faker" for c in sources)
assert faker_found, "source-faker not found in available connectors"

print(f"✓ Found {len(sources)} source connectors")
print("✓ source-faker is available")

return True


def main():
"""Run all tests."""
print("=" * 60)
print("PyAirbyte Test Suite")
print("=" * 60)

try:
example_basic_source_connector()
example_connector_discovery()

print("\n" + "=" * 60)
print("All tests passed! ✓")
print("=" * 60)

except Exception as e:
print(f"\n✗ Test failed: {e}")
raise


if __name__ == "__main__":
main()
86 changes: 86 additions & 0 deletions examples/ai-test-agents/goose/test_pyairbyte_session.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Goose Session: Test PyAirbyte Functionality

This session file contains a series of tasks for Goose to execute when testing PyAirbyte.

## Session Goal

Test basic PyAirbyte functionality including connector creation, data reading, and validation.

## Tasks

### Task 1: Test Basic Source Connector

Create and run a Python script that:

1. Imports the airbyte library
2. Creates a source-faker connector with the following configuration:
- count: 10
3. Checks the connection using the `check()` method
4. Creates a local DuckDB cache
5. Reads data from the source into the cache
6. Retrieves the data from the "users" stream as a pandas DataFrame
7. Validates that:
- At least 1 record was read
- The DataFrame contains an "id" column
8. Prints a success message with the number of records read

Expected output: Script should execute successfully and print "Successfully read N records from source-faker"

### Task 2: Test Connector Discovery

Create and run a Python script that:

1. Imports the airbyte library
2. Uses `get_available_connectors()` to retrieve all available source connectors
3. Validates that:
- At least one source connector is available
- The "source-faker" connector is in the list
4. Prints the total number of source connectors found
5. Confirms that source-faker is available

Expected output: Script should print the number of connectors and confirm source-faker availability

### Task 3: Test Stream Selection

Create and run a Python script that:

1. Creates a source-faker connector
2. Discovers the available streams using `get_available_streams()`
3. Prints the list of available stream names
4. Selects only the "users" stream
5. Reads data from only the selected stream
6. Validates that only the "users" stream data is present in the cache

Expected output: Script should show available streams and confirm successful selective sync

### Task 4: Test Data Schema Validation

Create and run a Python script that:

1. Creates a source-faker connector
2. Reads data into a cache
3. Retrieves the "users" stream data
4. Validates the schema by checking for expected columns:
- id
- name (or similar name field)
- email (or similar email field)
5. Prints the actual columns found
6. Validates that the data types are appropriate (e.g., id is numeric or string)

Expected output: Script should print the schema and confirm all expected columns are present

## Success Criteria

All four tasks should complete successfully with:
- No Python exceptions or errors
- All validation assertions passing
- Clear output messages confirming success
- Proper cleanup of resources

## Notes for Goose

- Use proper error handling in all scripts
- Print clear status messages for each step
- If any task fails, provide detailed error information
- Clean up resources (close connections, etc.) after each task
- Use the latest PyAirbyte API patterns