The Reinforcement Learning for the Computing Continuum library provides a common interface to define environments and RL algorithms based on the Ray RLLib library[^1].
It includes the following components:
-
a simple
Environmentimplementation, that should be used as a base class when defining more complex problems. It is created by loading the parameters included in theenv_configconfiguration. -
an
Algorithmclass, used to define RL algorithms for training/hyperparameter tuning experiments, supported by a factory of RayAlgorithmConfiggenerators. -
a simple
Callbacksimplementation, that should be used as base class when defining more complex problems. -
two simple custom neural network models, based on PyTorch and TensorFlow, that can be used as starting points to implement more complex networks if needed.
-
a
TrainingExperimentclass, to be used as entrypoint to define training experiments, as explained in the following section. -
an
Tunerclass, used to define hyperparameter tuning experiments, supported by theAlgorithmclass that works as trainable and by aTuneConfigandRunConfiggenerator. -
a
TuningExperimentclass, to be used as entrypoint to define automatic hyperparameter tuning, as explained in the following section. -
a simple
ProgressReporterfor Ray Tune, which periodically logs information related to the number of executed trials, the hardware resources usage and the optimization process on theexp_progress.jsonfile (see the section on expected outputs), instead of writing them on the console. To use the provided reporter (or a similar user-defined one), configure it through thetune_configdictionary or JSON file as explained in the README. -
a
Logger, that can be used to printINFO,WARNINGandERRORmessages in a standard format.
Detailed information about these components are provided in the following sections.
To build the RL4CC library, the RL4CC module needs to be installed as a package,
so that its classes and functions can imported with from RL4CC.x.y import z.
To install RL4CC as a package, place yourself in the repo main directory (at the
same level as the setup.py), with the virtual environment that contains the
RL4CC dependencies activated.
Then use:
pip3 install .
to install RL4CC as a package.
Now by checking the installed packages with pip3 freeze, you will notice that
RL4CC is among the dependencies.
To define and start a training experiment exploiting one of the available algorithms:
-
define the
exp_configconfiguration (and, if no previous checkpoint is provided, theenv_configandray_configconfigurations) as detailed in the README. These configurations can be defined in Python as dictionaries or using JSON files. -
initialize a
TrainingExperimentobject by passing theenv_configconfiguration or a path to theexp_config.jsonfile. -
call the
TrainingExperiment.run()method.
Example using the predefined BaseEnvironment and BaseCallbacks classes and
with a JSON file for exp_config:
from RL4CC.experiments.train import TrainingExperiment
exp = TrainingExperiment(exp_config_file="config_files/exp_config.json")
exp.run()
To use a custom Environment implementation, this needs to be registered in the
ray.tune.registry. As an example, suppose that your code directory follows
the structure:
.
├── RL4CC
├── src
│ ├── __init__.py
│ └── my_custom_environment.py
└── main.py
and that the src/__init__.py file includes, similarly to the one reported
here for the base Environment,
from .my_custom_environment import MyCustomEnvironment
from ray.tune.registry import register_env
register_env("MyCustomEnvironment", lambda config: MyCustomEnvironment(config))
To guarantee that the environment is properly loaded when starting the
experiment, your main.py file should include:
import src
from RL4CC.experiments.train import TrainingExperiment
exp = TrainingExperiment(exp_config_file="config_files/exp_config.json")
exp.run()
i.e., you must ensure that src/__init__.py is actually executed.
To use a custom neural network, this needs to be registered in the
ray.rllib.models.ModelCatalog. As an example, suppose that your code
directory follows the structure:
.
├── RL4CC
├── src
│ ├── __init__.py
│ └── my_custom_model.py
└── main.py
and that the src/__init__.py file includes, similarly to the one reported
in the models directory here,
from .my_custom_model import MyCustomModel
from ray.rllib.models import ModelCatalog
ModelCatalog.register_custom_model("my_custom_model", MyCustomModel)
To guarantee that the model is properly loaded when starting the
experiment, your main.py file should include:
import src
from RL4CC.experiments.train import TrainingExperiment
exp = TrainingExperiment(exp_config_file="config_files/exp_config.json")
exp.run()
i.e., you must ensure that src/__init__.py is actually executed.
Moreover, the custom_model section of the ray_config configuration must be
properly defined, as detailed in the corresponding
README.
If you want to automatically generate plots during the training, you can use the
TrainingExperimentWithPlots class, which is a subclass of TrainingExperiment.
This class will automatically generate plots the last iteration and a moving
average of all the iterations. The plots will be saved in the plots directory
in the logdir previously specified.
In order to specify the plots to be generated, you can use the RELEVANT_KEYS of
the callbacks: in particular, define a custom callback class (extending the
BaseCallbacksForPlotsclass).
As we use custom_metrics to save all metrics that we want to plot, you should set
"reporting": {
"keep_per_episode_custom_metrics": true
}
in the ray_config configuration.
The outputs produced during the training experiment are saved in a suitable
sub-directory of the logdir specified in the exp_config
config (or in ~/ray_results
if nothing is provided). These include:
complete_config: a directory containing the configuration (exp_config,env_configandray_config) used to define the experiment, saved as JSON files.
Note
Two important notes:
- JSON files are saved here, regardless the fact that the user passed the
configuration(s) as file(s) or as
dictobject(s). - While
env_config.jsonandexp_config.jsonare simply copied from the user-defined configurations, theray_config.jsonfile reported here includes also the default values assigned to keys that were not included in the user-defined configuration.
-
exp_progress.json: a file that, during the training, is progressively updated with information related to the last executed iteration, the last saved checkpoint, etc. It also reports the start and end timestamps of the experiment, its duration (in seconds) and the average length (in seconds) of each training iteration. -
checkpoints: a directory with checkpoints saved according to the frequency specified in theexp_configconfig. Regardless the specified interval, a checkpoint is always saved at the end of the training process. -
evaluations.json: a json file containing key "evaluations", which is an array of dictionaries with values collected during the evaluation phase, which runs according to the frequency specified in theexp_configconfig. The dictionary structure follows the one described for the progress.csv file, with an additional field specifying after how many training iterations it has been run. -
figures: a directory with plots generated during the training, according to the frequency specified in theexp_configconfig. -
progress.csvand/orresult.json, according to the logging configuration specified in theray_configconfig. Each row of these files includes values collected during one training iteration. By default, this will store:-
values related to the environment and agent behaviour, as, e.g., the minimum, maximum and average observed reward, the episode length, the number of observed episodes, etc.
-
values related to the Ray cluster status and the resources usage, as, e.g., the number of healthy workers, the percentage of CPU utilization, the execution time, etc.
-
custom values specified by properly implementing the training callbacks (see, e.g., the provided
BaseCallbacksclass).
-
Hyperparameter Tuning is an integration of the Ray Tune, Air, Rllib libraries.
To define and start a tuning experiment exploiting one of the available algorithms:
-
define the
tune_configconfiguration in theexp_configconfiguration as indicated in the README; note that, since the tuning experiment will run multiple training experiments, also theenv_configandray_configconfigurations need to be defined as described in the previous section. You can definetune_configas a dictionary or create a JSON file liketune_config.json. -
initialize a
TuningExperimentobject by providing theexp_configconfiguration, as a dictionary or as a path to anexp_config.jsonfile. -
call the
TrainingExperiment.run()method.
Note
Note that, since Air's RunConfig is used on top of the algorithm object, the user can provide a list of callbacks (classes) as parameters to the run method, overwriting any previous callbacks indicated.
Example when using the predefined BaseEnvironment and a exp_config given as
file:
from RL4CC.experiments.tune import TuningExperiment
# Basic usage:
exp = TuningExperiment(exp_config_file="config_files/exp_config.json")
exp.run()
The RL4CC Logger can be configured to print messages with different verbosity
levels, using either the sys.stdout/err streams or suitably-defined file
streams according to the information specified in the exp_config
config.
The format of logged messages is:
{TIME} [{LOGGER_NAME}] (level {LEVEL}) {MESSAGE_TYPE}: {MESSAGE}
where:
TIMEis given bydatetime.datetime.now().- The
LOGGER_NAMEis provided as parameter in theLoggerconstructor (default:RL4CCLogger). - The message
LEVELis 0 for warnings and errors (which are always printed regardless the verbosity level specified by the user), while it is specified as parameter when calling theLogger.log()method for generic messages. As mentioned in theexp_configconfig, generic messages are printed only if the correspondingLEVELis lower than the verbosity imposed by the user. - The
MESSAGE_TYPEisINFOwhen callingLogger.log(),WARNINGwhen callingLogger.warn()andERRORwhen callingLogger.error().
Warning
file streams TBA
To expand the module with generators for new algorithms:
- implement a suitable subclass of the base
AlgoConfigGenerator(see, as an example, what is provided for the PPO algorithm) - add the new generator to the generators factory
The RL4CC repository is organized as follows:
- the branch
mainhosts production-ready releases, i.e. tested code that has passed reviews on lower stages; - the branch
develophosts changes that may not be completely stable. This is basically a quality/staging branch. When the changes have been tested and are stable, we can make a PR to main; - the branch
testis the collector of the initial merges among all developers. From here we move on todevelop.
Each developer can create their own branch named test-[your-initials],
from which you can merge to test. No direct PR will be accepted on any
branch that is not test.
Regression tests for algorithm generators and training experiments with Ray versions 2.8.1, 2.10.0 and 2.20.0 are available among the utilities.
[^1] The RL4CC library has been developed and tested considering Ray RLLib versions up to 2.20.0. Carefully select an appropriate version of the official Ray documentation when looking for additional information.
