Stanford CS 349D - Optimizing UDFs with Early Filters

Overview

This project aims to reduce the computational cost of calling user-defined functions (UDFs) in data queries by using probabilistic predicates (PPs) as early filters. We construct PPs for each simple clause and incorporate them into a custom query optimizer designed for efficient data handling.

Getting Started

Initialize your environment by following these steps:

conda create --name cs349d
conda activate cs349d
pip install -r requirements.txt
git submodule init
git submodule update

Directory Structure

data/: Contains datasets used for training and testing the PP models. This folder is crucial for ensuring that our models are trained on representative data.
dataloader/: Includes scripts for loading and preprocessing data. These scripts are tailored to format the data correctly for model training.
ml_udf/: Stores the user-defined functions (UDFs) for image datasets that are optimized by the probabilistic predicates, including yolov5 and FastRCNN.
pp_models/: Contains the machine learning models that serve as probabilistic predicates. These models are designed to predict the necessity of executing UDFs, thereby acting as early filters.
pp_params/: Holds parameters files for loading and tuning the PP models. Adjusting these parameters can significantly affect the models' accuracy and efficiency.
query_optimizer/: Contains the query optimizer module that integrates probabilistic predicates into data query processes. This module is key to achieving optimal performance.
qo_tests/: Includes test scripts to run the query optimizer per simple clause.
query_test/: Includes scripts for testing the ML UDF for simple queries.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
__pycache__		__pycache__
dataloader		dataloader
ml_udf		ml_udf
pp_models		pp_models
qo_test		qo_test
query_optimizer		query_optimizer
query_test		query_test
submodules		submodules
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
config.py		config.py
environment.yml		environment.yml
requirements.txt		requirements.txt
yolov5s.pt		yolov5s.pt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stanford CS 349D - Optimizing UDFs with Early Filters

Overview

Getting Started

Directory Structure

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Stanford CS 349D - Optimizing UDFs with Early Filters

Overview

Getting Started

Directory Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages