This document describes the workflow on how to contribute to the openml-python package. If you are interested in connecting a machine learning package with OpenML (i.e. write an openml-python extension) or want to find other ways to contribute, see this page.
The scope of the OpenML Python package is to provide a Python interface to the OpenML platform which integrates well with Python's scientific stack, most notably numpy, scipy and pandas. To reduce opportunity costs and demonstrate the usage of the package, it also implements an interface to the most popular machine learning package written in Python, scikit-learn. Thereby it will automatically be compatible with many machine learning libraries written in Python.
We aim to keep the package as light-weight as possible, and we will try to keep the number of potential installation dependencies as low as possible. Therefore, the connection to other machine learning libraries such as pytorch, keras or tensorflow should not be done directly inside this package, but in a separate package using the OpenML Python connector. More information on OpenML Python connectors can be found here.
Great! You've decided you want to help out. Now what? All contributions should be linked to issues on the GitHub issue tracker. In particular for new contributors, the good first issue label should help you find issues which are suitable for beginners. Resolving these issues allows you to start contributing to the project without much prior knowledge. Your assistance in this area will be greatly appreciated by the more experienced developers as it helps free up their time to concentrate on other issues.
If you encounter a particular part of the documentation or code that you want to improve, but there is no related open issue yet, open one first. This is important since you can first get feedback or pointers from experienced contributors.
To let everyone know you are working on an issue, please leave a comment that states you will work on the issue (or, if you have the permission, assign yourself to the issue). This avoids double work!
To contribute to the openml-python package, follow these steps:
- Determine how you want to contribute (see above).
- Set up your local development environment.
- Fork and clone the
openml-pythonrepository. Then, create a new branch from themainbranch. If you are new togit, see our detailed documentation, or rely on your favorite IDE. - Install the local dependencies to run the tests for your contribution.
- Test your installation to ensure everything is set up correctly.
- Fork and clone the
- Implement your contribution. If contributing to the documentation, see here.
- Create a pull request.
We recommend following the instructions below to install all requirements locally.
However, it is also possible to use the openml-python docker image for testing and building documentation. Moreover, feel free to use any alternative package managers, such as pip.
- To ensure a smooth development experience, we recommend using the
uvpackage manager. Thus, first installuv. If any Python version already exists on your system, follow the steps below, otherwise see here.pip install uv
- Create a virtual environment using
uvand activate it. This will ensure that the dependencies foropenml-pythondo not interfere with other Python projects on your system.uv venv --seed --python 3.8 ~/.venvs/openml-python source ~/.venvs/openml-python/bin/activate pip install uv # Install uv within the virtual environment
- Then install openml with its test dependencies by running
from the repository folder. Then configure the pre-commit to be able to run unit tests, as well as pre-commit through:
uv pip install -e .[test]
pre-commit install
To test your installation and run the tests for the first time, run the following from the repository folder:
pytest testsFor Windows systems, you may need to add pytest to PATH before executing the command.
Executing a specific unit test can be done by specifying the module, test case, and test. You may then run a specific module, test case, or unit test respectively:
pytest tests/test_datasets/test_dataset.py
pytest tests/test_datasets/test_dataset.py::OpenMLDatasetTest
pytest tests/test_datasets/test_dataset.py::OpenMLDatasetTest::test_get_dataTo test your new contribution, add unit tests, and, if needed, examples for any new functionality being introduced. Some notes on unit tests and examples:
- If a unit test contains an upload to the test server, please ensure that it is followed by a file collection for deletion, to prevent the test server from bulking up. For example,
TestBase._mark_entity_for_removal('data', dataset.dataset_id),TestBase._mark_entity_for_removal('flow', (flow.flow_id, flow.name)). - Please ensure that the example is run on the test server by beginning with the call to
openml.config.start_using_configuration_for_example(), which is done by default for tests derived fromTestBase. - Add the
@pytest.mark.sklearnmarker to your unit tests if they have a dependency on scikit-learn.
Some tests require admin privileges on the test server and will be automatically skipped unless you provide an admin API key. For regular contributors, the tests will skip gracefully. For core contributors who need to run these tests locally, you can set up the key by exporting the variable as below before running the tests:
# For windows
$env:OPENML_TEST_SERVER_ADMIN_KEY = "admin-key"
# For linux/mac
export OPENML_TEST_SERVER_ADMIN_KEY="admin-key"You can go to the openml-python GitHub repository to create the pull request by comparing the branch from your fork with the main branch of the openml-python repository. When creating a pull request, make sure to follow the comments and structured provided by the template on GitHub.
An incomplete contribution -- where you expect to do more work before
receiving a full review -- should be submitted as a draft. These may be useful
to: indicate you are working on something to avoid duplicated work,
request broad review of functionality or API, or seek collaborators.
Drafts often benefit from the inclusion of a
task list
in the PR description.
The preferred workflow for contributing to openml-python is to
fork the main repository on
GitHub, clone, check out the branch main, and develop on a new branch
branch. Steps:
-
Make sure you have git installed, and a GitHub account.
-
Fork the project repository by clicking on the 'Fork' button near the top right of the page. This creates a copy of the code under your GitHub user account. For more details on how to fork a repository see this guide.
-
Clone your fork of the openml-python repo from your GitHub account to your local disk:
git clone git@github.com:YourLogin/openml-python.git cd openml-python -
Switch to the
developbranch:git checkout main
-
Create a
featurebranch to hold your development changes:git checkout -b feature/my-feature
Always use a
featurebranch. It's good practice to never work on themainbranch! To make the nature of your pull request easily visible, please prepend the name of the branch with the type of changes you want to merge, such asfeatureif it contains a new feature,fixfor a bugfix,docfor documentation andmaintfor other maintenance on the package. -
Develop the feature on your feature branch. Add changed files using
git addand thengit commitfiles:git add modified_files git commit
to record your changes in Git, then push the changes to your GitHub account with:
git push -u origin my-feature
-
Follow these instructions to create a pull request from your fork.
(If any of the above seems like magic to you, please look up the Git documentation on the web, or ask a friend or another contributor for help.)
Pre-commit is used for various style checking and code formatting. Before each commit, it will automatically run:
- ruff a code formatter and linter. This will automatically format your code. Make sure to take a second look after any formatting takes place, if the resulting code is very bloated, consider a (small) refactor.
- mypy a static type checker. In particular, make sure each function you work on has type hints.
If you want to run the pre-commit tests without doing a commit, run:
$ make checkor on a system without make, like Windows:
$ pre-commit run --all-filesMake sure to do this at least once before your first commit to check your setup works.
We welcome all forms of documentation contributions — whether it's Markdown docstrings, tutorials, guides, or general improvements.
Our documentation is written either in Markdown or as a jupyter notebook and lives in the docs/ and examples/ directories of the source code repository.
To preview the documentation locally, you will need to install a few additional dependencies:
uv pip install -e .[examples,docs]When dependencies are installed, run
mkdocs serveThis will open a preview of the website.