This script automates the process of backing up a MongoDB database running in a Docker container. It also uploads the backups to an AWS S3 bucket for safe storage and retention.
- Creates compressed backups of your MongoDB database using
mongodump, optionally limited to a subset of collections. - Stores backups locally in a specified directory.
- Uploads backups to an AWS S3 bucket.
- Automatically cleans up old backups from the local directory (optional).
- Restores a backup to the MongoDB database.
- Verifies count and full index-spec parity between two namespaces after a restore/migration.
- Ensure Docker is installed on the host machine and The MongoDB instance must be running in a Docker container.
- Install the AWS CLI by following the instructions here.
- Configure the AWS CLI with your AWS credentials:
Provide:
aws configure
- AWS Access Key ID
- AWS Secret Access Key
- Default region (e.g.,
us-east-1) - Default output format (e.g.,
json)
Ensure the IAM user associated with the AWS credentials has the correct permission, the following JSON policy can be attached to the IAM user:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::ACCOUNT:user/USER"
},
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket"
],
"Resource": "arn:aws:s3:::BUCKET"
},
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::ACCOUNT:user/USER"
},
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::BUCKET"
}
]
}sudo apt install dotenv
Create a .env file in the same directory as the script and add the following environment variables:
# Set variables
CONTAINER_NAME="name-of-your-mongo-container"
S3_BUCKET_NAME="name-of-your-s3-bucket"
MONGO_PORT="27017"
MONGO_USER="user"
MONGO_PASSWORD="password"
MONGO_DB_NAME="test"
USE_CREDENTIALS="true" # set to false if the DB has no auth
USE_REMOTE="true" # set to false to skip all S3 interactions| Variable | Description |
|---|---|
CONTAINER_NAME |
Name of the MongoDB container. |
S3_BUCKET_NAME |
Name of the AWS S3 bucket where backups will be stored (required only when USE_REMOTE=true). |
MONGO_PORT |
Port number of the MongoDB instance. |
MONGO_USER |
MongoDB username (required only when USE_CREDENTIALS=true). |
MONGO_PASSWORD |
MongoDB password (required only when USE_CREDENTIALS=true). |
MONGO_DB_NAME |
Name of the MongoDB database to backup. |
USE_CREDENTIALS |
When false, skips MongoDB username/password in dump/restore commands. Defaults to true. |
USE_REMOTE |
When false, skips all S3 upload/download/list actions. Defaults to true. |
If your
.envis saved with Windows (CRLF) line endings, the script strips the trailing carriage return from these values automatically, so a stray\rwon't break the container name, port, or S3 path. Only the CR is removed — other whitespace (e.g. inside a password) is preserved as-is.
The script can be run manually or scheduled to run at regular intervals using cron jobs.
The following commands can be used to run the script manually:
First Make the script executable:
chmod +x mongo_backup_manager.shThe help command can be used to display the script's usage information:
./mongo_backup_manager.sh helpThe backup command can be used to create a backup of the MongoDB database:
./mongo_backup_manager.sh backupTo back up only a subset of collections, pass a comma-separated list (no spaces around the commas):
./mongo_backup_manager.sh backup users,tasksA single collection is dumped directly with mongodump --collection and needs no mongosh; if the name is misspelled the script detects mongodump's "does not exist" report and errors instead of writing an empty archive. Backing up more than one collection requires mongosh inside the container — mongodump can't include multiple specific collections in one pass, so the script enumerates the database's collections and dumps the whole database minus everything you didn't request (names not present are skipped with a warning, and it errors if none of the requested collections exist).
The list_backups_local command can be used to list all backups stored locally:
./mongo_backup_manager.sh list_backups_localThe list_backups_s3 command can be used to list all backups stored in the S3 bucket:
./mongo_backup_manager.sh list_backups_s3When running locally-only without S3, set USE_REMOTE=false in .env to bypass S3 upload, download, and listing.
If you have a backup in the S3 bucket and you want to download it to your local machine, you can use the download_backup command by specifying the backup file name (the backup file name can be obtained from the list_backups_s3 command).
For example, first list the backups in the S3 bucket:
./mongo_backup_manager.sh list_backups_s3
Performing health check...
Health check passed! All required variables are set.
Listing MongoDB backups in S3...
2024-11-20 13:29:01 1383 mongodb-backups/mongo_backup_2024-11-20_13-28-59.gzThen download the backup file:
./mongo_backup_manager.sh download_backup mongo_backup_2024-11-20_13-28-59.gz
Performing health check...
Health check passed! All required variables are set.
Downloading MongoDB backup from S3...
download: s3://hana-rewards-db-backups/mongodb-backups/mongo_backup_2024-11-20_13-28-59.gz to backups_temp/mongo_backup_2024-11-20_13-28-59.gz
Backup downloaded from S3 successfully.
Download completed successfully.The backup file will be downloaded to the backups_temp directory.
To restore a backup, you can use the restore_backup command by specifying the backup file name, this backup can either be in the folder with the local backups or if you want to restore a backup from S3, you can download the backup file first and then restore it.
The following command showcases how to restore a backup:
./mongo_backup_manager.sh restore backups_temp/mongo_backup_2024-11-20_13-28-59.gzTo restore only a subset of collections (into MONGO_DB_NAME), pass a comma-separated list (no spaces around the commas) as the single extra argument:
./mongo_backup_manager.sh restore backups_temp/mongo_backup_2024-11-20_13-28-59.gz users,tasksOnly the listed collections are restored from the archive (via mongorestore --nsInclude); --drop only affects the collections being restored, so the rest of the database is left untouched. A single name (e.g. users) restores just that one collection in place.
You can optionally remap a namespace by specifying source database and collection, then destination database and collection (all four arguments are required together):
./mongo_backup_manager.sh restore backups_temp/mongo_backup_2024-11-20_13-28-59.gz new-world users my-app stateful_usersThis restores the collection new-world.users from the archive into my-app.stateful_users on the destination. The script will show: Remapping namespace: new-world.users -> my-app.stateful_users. The MongoDB user in .env must have readWrite (or at least listCollections and insert) on the destination database (e.g. my-app), or you will get "Command listCollections requires authentication".
Before copying or restoring anything, restore runs a preflight auth check: it uses mongosh to confirm the configured credentials can reach the destination database (the remap target, or MONGO_DB_NAME for a plain restore) and aborts with a clear message if they can't — so a permissions problem fails fast instead of partway through mongorestore. The preflight is skipped automatically if mongosh isn't available inside the container.
Example output:
Performing health check...
Health check passed! All required variables are set.
Copying MongoDB backup file to the container...
Successfully copied 3.07kB to mongodb-prod:/tmp/mongo_backup_2024-11-20_13-28-59.gz
Restoring MongoDB backup from backups_temp/mongo_backup_2024-11-20_13-28-59.gz...
2024-11-20T13:59:11.468+0000 The --db and --collection flags are deprecated for this use-case; please use --nsInclude instead, i.e. with --nsInclude=${DATABASE}.${COLLECTION}
2024-11-20T13:59:11.485+0000 preparing collections to restore from
2024-11-20T13:59:11.497+0000 reading metadata for test.user_tasks from archive '/tmp/mongo_backup_2024-11-20_13-28-59.gz'
....
....
2024-11-20T14:00:19.170+0000 8 document(s) restored successfully. 0 document(s) failed to restore.
MongoDB restore completed successfully.
Cleaning up temporary backup file in the container...
Restore completed successfully.
After a restore — especially a namespace remap during a migration — use the verify command to prove that the target namespace matches the source rather than trusting that mongorestore recreated everything correctly:
./mongo_backup_manager.sh verify <source_db> <source_collection> <dest_db> <dest_collection>The argument order mirrors the restore remap, so you can run the same source/dest pair you just restored:
./mongo_backup_manager.sh verify sodax-registration users new-world stateful_usersverify connects to the running MongoDB via mongosh inside the container and compares:
- Document count (
countDocuments) of both collections. - The full index spec of every index — name, key,
unique,sparse,partialFilterExpression,collation, and TTL (expireAfterSeconds). The cosmeticv(index version) andns(namespace, which legitimately differs across DBs) fields are stripped before comparison, and field ordering is normalized so only meaningful differences are reported.
This catches mismatches a names-only check would miss — for example a unique index on stateful_partner_naming.name that lost its sparse: true, which would otherwise pass review and later throw duplicate-key errors on rows whose name is null.
The command exits non-zero on any mismatch, so it can gate a migration script or run in CI. Example output on success:
Performing health check...
Health check passed! All required variables are set.
Verifying sodax-registration.users -> new-world.stateful_users...
source sodax-registration.users: count=8 indexes=3
target new-world.stateful_users: count=8 indexes=3
VERIFY OK: document counts and full index specs match.
Verification passed.
And on a mismatch (note identical counts and index names — only the spec differs):
target new-world.stateful_users: count=8 indexes=3
VERIFY FAILED:
- index on source missing/differs on target: {"key":{"name":1},"name":"name_1","sparse":true,"unique":true}
- index on target absent/differs on source: {"key":{"name":1},"name":"name_1","unique":true}
Verification failed.
The MongoDB user in
.envneeds read access (e.g.read) on both the source and destination databases.verifyrequiresmongoshto be available inside the container (included in the officialmongoimages from 5.0+).
To schedule the script to run at regular intervals, you can use cron jobs. Here's how you can set up a cron job to run the script every hour:
- Open the crontab file for editing:
crontab -e
- Add the following line to the crontab file:
0 * * * * /path/to/mongo_backup_manager.sh >> /var/log/mongo_backup_manager.log 2>&1