When running dataset_creation.py, if the ordered_genes_file does not exist, the script calls order_genes from utils.py. This results in a KeyError because utils.py expects the dataset name to be stored in config['data']['dataset'], but dataset_creation.py stores the CLI argument in config['dataset'] (at the root level).
Assignment (dataset_creation.py): The CLI argument is assigned to the root:
config['dataset'] = args.dataset
Access (utils.py,): The function tries to access it inside the data block:
dataset = config['data']['dataset']
Furthermore, even if dataset existed in the YAML under data, utils.py would read the static YAML value and ignore the --dataset CLI argument provided by the user, potentially processing the wrong dataset.
Suggested Fix Update dataset_creation.py to assign the CLI argument to the location where utils.py expects it.
Change line in dataset_creation.py:
# Old
# config['dataset'] = args.dataset
# New
config['data']['dataset'] = args.dataset
When running dataset_creation.py, if the ordered_genes_file does not exist, the script calls order_genes from utils.py. This results in a KeyError because utils.py expects the dataset name to be stored in config['data']['dataset'], but dataset_creation.py stores the CLI argument in config['dataset'] (at the root level).
Assignment (dataset_creation.py): The CLI argument is assigned to the root:
Access (utils.py,): The function tries to access it inside the data block:
Furthermore, even if dataset existed in the YAML under data, utils.py would read the static YAML value and ignore the --dataset CLI argument provided by the user, potentially processing the wrong dataset.
Suggested Fix Update dataset_creation.py to assign the CLI argument to the location where utils.py expects it.
Change line in dataset_creation.py: