Skip to content

KeyError: 'dataset' in utils.py due to config structure mismatch #4

@GravityBoi

Description

@GravityBoi

When running dataset_creation.py, if the ordered_genes_file does not exist, the script calls order_genes from utils.py. This results in a KeyError because utils.py expects the dataset name to be stored in config['data']['dataset'], but dataset_creation.py stores the CLI argument in config['dataset'] (at the root level).

Assignment (dataset_creation.py): The CLI argument is assigned to the root:

config['dataset'] = args.dataset

Access (utils.py,): The function tries to access it inside the data block:

    dataset = config['data']['dataset'] 

Furthermore, even if dataset existed in the YAML under data, utils.py would read the static YAML value and ignore the --dataset CLI argument provided by the user, potentially processing the wrong dataset.

Suggested Fix Update dataset_creation.py to assign the CLI argument to the location where utils.py expects it.

Change line in dataset_creation.py:

# Old
# config['dataset'] = args.dataset

# New
config['data']['dataset'] = args.dataset

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions