diff --git a/examples/ai-test-agents/goose/README.md b/examples/ai-test-agents/goose/README.md new file mode 100644 index 000000000..48c4cc879 --- /dev/null +++ b/examples/ai-test-agents/goose/README.md @@ -0,0 +1,176 @@ +# Goose AI Developer Agent for PyAirbyte + +This example demonstrates how to use [Goose](https://github.com/block/goose), an open-source AI developer agent from Block (formerly Square), to test and validate PyAirbyte functionality. + +## About Goose + +Goose is an AI developer agent that runs on your machine and helps with coding tasks. It can execute code, run tests, debug issues, and automate development workflows. Goose supports any LLM provider and integrates with MCP (Model Context Protocol) servers. + +## Prerequisites + +1. Goose installed on your system +2. Python 3.10 or higher +3. PyAirbyte installed +4. LLM API key (OpenAI, Anthropic, or other supported provider) + +## Installation + +### Install Goose + +**Option 1: Using Homebrew (macOS/Linux)** +```bash +brew install block/goose/goose +``` + +**Option 2: Using pipx (Cross-platform)** +```bash +pipx install goose-ai +``` + +**Option 3: Using cargo (Rust)** +```bash +cargo install goose-cli +``` + +### Install PyAirbyte + +```bash +pip install airbyte +``` + +## Usage + +### Interactive Mode + +Start Goose in interactive mode and ask it to test PyAirbyte: + +```bash +goose session start +``` + +Then provide prompts like: + +``` +Test PyAirbyte by: +1. Creating a source-faker connector with count=10 +2. Reading data into a local cache +3. Validating that data was successfully read +4. Printing the first few records +``` + +### Session File Mode + +You can also create a session file with predefined tasks: + +```bash +goose session start --plan test_pyairbyte_session.md +``` + +See `test_pyairbyte_session.md` for an example session plan. + +### Using Goose with MCP + +Goose can integrate with PyAirbyte's MCP server to access connector functionality: + +1. Start PyAirbyte's MCP server: +```bash +airbyte-mcp +``` + +2. Configure Goose to use the MCP server (add to `~/.config/goose/profiles.yaml`): +```yaml +default: + provider: openai + processor: gpt-4o + accelerator: gpt-4o-mini + moderator: passive + mcp_servers: + airbyte: + command: airbyte-mcp +``` + +3. Start Goose and it will have access to PyAirbyte's 44+ MCP tools + +## Example Tasks + +### Task 1: Basic Connector Test + +Ask Goose to: +``` +Write and run a Python script that: +1. Imports airbyte +2. Creates a source-faker connector +3. Checks the connection +4. Reads 10 records into a local cache +5. Validates the data and prints results +``` + +### Task 2: Connector Discovery + +Ask Goose to: +``` +Write a script to discover all available PyAirbyte source connectors +and verify that source-faker is in the list +``` + +### Task 3: Data Validation + +Ask Goose to: +``` +Create a test that: +1. Reads data from source-faker +2. Validates the schema of the returned data +3. Checks that all expected columns are present +4. Verifies data types are correct +``` + +## Example Session Output + +When you run Goose with the test tasks, you should see output like: + +``` +🪿 Goose: I'll help you test PyAirbyte. Let me create a test script... + +[Goose creates and runs a Python script] + +āœ“ Successfully created source-faker connector +āœ“ Connection check passed +āœ“ Read 10 records into cache +āœ“ Data validation passed + +Results: +- Stream: users +- Records: 10 +- Columns: id, name, email, created_at +- All validations passed āœ“ +``` + +## Advantages of Using Goose + +1. **Interactive Testing**: Goose can interactively test PyAirbyte and adapt based on results +2. **Code Generation**: Automatically generates test scripts and validation code +3. **Error Handling**: Can debug and fix issues it encounters during testing +4. **MCP Integration**: Can leverage PyAirbyte's MCP server for advanced operations +5. **Multi-step Workflows**: Can execute complex test scenarios with multiple steps + +## Comparison with Hercules + +While Hercules uses predefined Gherkin scenarios for testing, Goose provides: +- More flexible, conversational testing approach +- Ability to adapt tests based on intermediate results +- Code generation and debugging capabilities +- Integration with development tools and MCP servers + +## Limitations + +- Goose requires an LLM API key and makes API calls for each interaction +- Results may vary based on the LLM model used +- Less deterministic than traditional test frameworks +- Best suited for exploratory testing and development tasks + +## Additional Resources + +- [Goose Documentation](https://block.github.io/goose/) +- [Goose GitHub Repository](https://github.com/block/goose) +- [PyAirbyte Documentation](https://docs.airbyte.com/using-airbyte/pyairbyte/getting-started) +- [PyAirbyte MCP Server](https://docs.airbyte.com/using-airbyte/pyairbyte/mcp-server) diff --git a/examples/ai-test-agents/goose/goose_demo.py b/examples/ai-test-agents/goose/goose_demo.py new file mode 100644 index 000000000..e99208810 --- /dev/null +++ b/examples/ai-test-agents/goose/goose_demo.py @@ -0,0 +1,77 @@ +""" +Example test script for PyAirbyte that can be executed by Goose. + +This script demonstrates basic PyAirbyte functionality that Goose can run +to validate the library is working correctly. +""" + +import airbyte as ab + + +def example_basic_source_connector(): + """Example: Test creating and reading from a source connector.""" + print("Testing basic source connector functionality...") + + source = ab.get_source( + "source-faker", config={"count": 10}, install_if_missing=True + ) + + print("Checking connection...") + source.check() + print("āœ“ Connection check passed") + + print("Reading data into cache...") + cache = ab.new_local_cache() + result = source.read(cache) + + df = cache["users"].to_pandas() + + assert len(df) > 0, "No data was read from source" + assert "id" in df.columns, "Expected 'id' column not found" + + print(f"āœ“ Successfully read {len(df)} records from source-faker") + print(f"āœ“ Columns: {', '.join(df.columns)}") + + return True + + +def example_connector_discovery(): + """Example: Test discovering available connectors.""" + print("\nTesting connector discovery...") + + from airbyte.registry import get_available_connectors + + sources = get_available_connectors(connector_type="source") + + assert len(sources) > 0, "No source connectors found" + + faker_found = any(c.name == "source-faker" for c in sources) + assert faker_found, "source-faker not found in available connectors" + + print(f"āœ“ Found {len(sources)} source connectors") + print("āœ“ source-faker is available") + + return True + + +def main(): + """Run all tests.""" + print("=" * 60) + print("PyAirbyte Test Suite") + print("=" * 60) + + try: + example_basic_source_connector() + example_connector_discovery() + + print("\n" + "=" * 60) + print("All tests passed! āœ“") + print("=" * 60) + + except Exception as e: + print(f"\nāœ— Test failed: {e}") + raise + + +if __name__ == "__main__": + main() diff --git a/examples/ai-test-agents/goose/test_pyairbyte_session.md b/examples/ai-test-agents/goose/test_pyairbyte_session.md new file mode 100644 index 000000000..54a710722 --- /dev/null +++ b/examples/ai-test-agents/goose/test_pyairbyte_session.md @@ -0,0 +1,86 @@ +# Goose Session: Test PyAirbyte Functionality + +This session file contains a series of tasks for Goose to execute when testing PyAirbyte. + +## Session Goal + +Test basic PyAirbyte functionality including connector creation, data reading, and validation. + +## Tasks + +### Task 1: Test Basic Source Connector + +Create and run a Python script that: + +1. Imports the airbyte library +2. Creates a source-faker connector with the following configuration: + - count: 10 +3. Checks the connection using the `check()` method +4. Creates a local DuckDB cache +5. Reads data from the source into the cache +6. Retrieves the data from the "users" stream as a pandas DataFrame +7. Validates that: + - At least 1 record was read + - The DataFrame contains an "id" column +8. Prints a success message with the number of records read + +Expected output: Script should execute successfully and print "Successfully read N records from source-faker" + +### Task 2: Test Connector Discovery + +Create and run a Python script that: + +1. Imports the airbyte library +2. Uses `get_available_connectors()` to retrieve all available source connectors +3. Validates that: + - At least one source connector is available + - The "source-faker" connector is in the list +4. Prints the total number of source connectors found +5. Confirms that source-faker is available + +Expected output: Script should print the number of connectors and confirm source-faker availability + +### Task 3: Test Stream Selection + +Create and run a Python script that: + +1. Creates a source-faker connector +2. Discovers the available streams using `get_available_streams()` +3. Prints the list of available stream names +4. Selects only the "users" stream +5. Reads data from only the selected stream +6. Validates that only the "users" stream data is present in the cache + +Expected output: Script should show available streams and confirm successful selective sync + +### Task 4: Test Data Schema Validation + +Create and run a Python script that: + +1. Creates a source-faker connector +2. Reads data into a cache +3. Retrieves the "users" stream data +4. Validates the schema by checking for expected columns: + - id + - name (or similar name field) + - email (or similar email field) +5. Prints the actual columns found +6. Validates that the data types are appropriate (e.g., id is numeric or string) + +Expected output: Script should print the schema and confirm all expected columns are present + +## Success Criteria + +All four tasks should complete successfully with: +- No Python exceptions or errors +- All validation assertions passing +- Clear output messages confirming success +- Proper cleanup of resources + +## Notes for Goose + +- Use proper error handling in all scripts +- Print clear status messages for each step +- If any task fails, provide detailed error information +- Clean up resources (close connections, etc.) after each task +- Use the latest PyAirbyte API patterns