MongoDB Backup Data Directory

May 3, 2018

 

Dedicated to mongodump haters

 

MongoDB 3.6 brought us a number of new improvements we can take now into consideration if it's more convenient, reliable and fast platform for NoSQL. A quick look into the website give us the most important add-ons to facilitate our work:

  • Change streams enable you to create powerful data pipelines, moving data to wherever it’s needed using a time-ordered sequence of changes as they occur in the database.

  • New causal consistency enforces strict, sequential ordering of operations within a session, regardless of which node in the cluster is serving the request. Shard-aware secondary reads ensure data consistency from any secondary, even as data is balanced across the cluster.

  • Fully expressive array updates allow to perform complex array manipulations against all matching elements in any array in a single atomic update.

  • Retryable writes reduces the error handling you have to implement in your code. The MongoDB drivers will now automatically retry write operations in the event of transient network errors or primary replica elections, while the server enforces exactly-once semantics.

  • MongoDB 3.6 now extends binding to localhost by default to all packages and platforms, denying all external connections to the database until permitted by you. Combined with new IP whitelisting support, you can now configure MongoDB to only accept external connections from approved IP addresses.

  • MongoDB Atlas clusters can now span multiple cloud provider regions, enabling you to build apps that maintain continuous availability in the event of geographic outages, and improve customer experience by collocating data closer to users.

 

Backup methods have not changed with the new version and MongoDB still suggests us a few options how to do backup:

  • Back up with Atlas

  • Back up with MongoDB Cloud Manager or Ops Manager

  • Back up by copying underlying data files

  • Back up with filesystem snapshots

  • Back up with cp or rsync

  • Back up with mongodump

     

A short description of each method you can find in MongoDB documentation:

https://docs.mongodb.com/manual/core/backups/

 

Unfortunately, if we want to use MongoDB Community in production environment we are forced not to use Atlas and Ops Manager options. But if you have more than 200 GB of data in replica set (I suppose you don't take any risk and really have it in production) and don't have enough free disk space? And presumable hate any kind of dump?

 

When we have a replica set the best way is to run backups on secondary servers and that's the case I want to describe. I've chosen a simple way to backup data directory: copying data files (without rsync or cp) and compress them into a *.gz file. This goal we can achieve with bash script or python script.

 

In order to do it we need to stop all writes on MongoDB instance and allow data consistency in our backup. When a backup will stop the script will run "unlock" command and a secondary server will continue to work.

 

I've used in the script below bash command to compress a tar file but you can write function with zipfile or tarfile libraries too. Also it requires Python version 3 so please do a few steps before running a script:

 

yum -y update

yum -y install yum-utils

yum -y groupinstall development

yum -y install https://centos7.iuscommunity.org/ius-release.rpm

yum -y install python36u

yum -y install python36u-pip

yum -y install python36u-devel

pip3.6 install pymongo

 

Also let's create backup directory and backup script itself:

cd /root

mkdir mongodb_backup

mkdir mongodb_jobs

cd /mongodb_jobs

vim mongodb_directory_backup.py

 

And here is the simple code:

import os

import smtplib

import platform

import subprocess

from datetime import datetime

from pymongo import MongoClient

from email.mime.multipart import MIMEMultipart

from email.mime.text import MIMEText

 

# Declaring global variables

timestamp = datetime.now().strftime('%d%m%Y')

host = platform.uname()[1]

output_filename = host + '_mongodb_' + timestamp

 

backup_dir = '/root/mongodb_backup/' + timestamp + '/'

work_dir = '/root/mongodb_backup/'

data_dir = '/opt/mongodb/data'

 

mongo_shell = '/usr/bin/mongo'

mongo_host = '127.0.0.1'

mongo_port = '27017'

conn_string = 'mongodb://' + mongo_host + ':' + mongo_port + '/'

 

zip_command = 'tar -czvf ' + backup_dir + output_filename + '.tar.gz ' + data_dir

remove_command = 'find ' + work_dir + '*' + ' -type d -ctime +2 -exec rm -rf {} \;'

 

from_address = '[email address]'

to_address = '[email address]'

 

# Connecting to the local MongoDB instance

client = MongoClient(conn_string)

 

# Connecting to a SMTP server

mail_server = smtplib.SMTP('[SMTPServerAddress]', [Port])

mail_server.starttls()

mail_server.login('[Username]', '[Password]')

mail_server.connect()

 

# Get the master status

MongoDB = client.admin

serverStatus = MongoDB.command('isMaster')

isMasterStatus = serverStatus['ismaster']

 

# Check if the instance has Primary replica role

if isMasterStatus != 'True':

                pass

else:

                message = MIMEMultipart()

                message['From'] = from_address

                message['To'] = to_address

                message['Subject'] = 'MongoDB backup failure on ' + host

 

                html = """\

                <html>

                  <head></head>

                  <body>

                       The host """ + host + """ has a Primary role!<br>

                       Cannot backup the data directory.<br>

                       Please check replica set state.

                  </body>

                </html>

                """

                body = MIMEText(html, 'html')

                message.attach(body)

 

                mail_server.sendmail(from_address, to_address, message.as_string())

                quit()

 

# Creating a backup directory locally

try:

    os.makedirs(backup_dir)

except OSError:

    if not os.path.isdir(backup_dir):

        raise

 

# Run command to block all writes

MongoDB.command("fsync", lock = True)

 

# Copying data directory with tar/zip

try:

                subprocess.call([zip_command], cwd = backup_dir, stdout = subprocess.DEVNULL, stderr = subprocess.STDOUT, shell = True)

except OSError:

    MongoDB.command("fsyncUnlock")

 

message = MIMEMultipart()

          message['From'] = from_address

          message['To'] = to_address

          message['Subject'] = 'MongoDB backup failure on ' + host

               

html = """\

               <html>

                  <head></head>

                  <body>

                       The script cannot create a compressed tar file.<br>

                       Cannot backup the data directory.<br>

                       Please check the reason!

                  </body>

                </html>

                """

 

body = MIMEText(html, 'html')

message.attach(body)

 

mail_server.sendmail(from_address, to_address, message.as_string())

quit()

 

# Unlock objects to start writes

MongoDB.command("fsyncUnlock")

 

# Deleting backup files older than 2 days

subprocess.call([remove_command], cwd = work_dir, stdout = subprocess.DEVNULL, stderr = subprocess.STDOUT, shell = True)

 

# Quit the job

mail_server.quit()

quit()

 

The last step is to create a cron job:

crontab -e

 

# MongoDB data directory backup

0 6 * * 0 /usr/bin/python3.6 /root/mongodb_jobs/mongodb_directory_backup.py

 

 

Good luck !

 

 

Please reload

Featured Posts

I'm busy working on my blog posts. Watch this space!

Please reload

Recent Posts

October 31, 2017

October 29, 2017

Please reload

Archive
Please reload

Search By Tags