Simplifying your back end using MongoDB as queue repository and persistent database

Marcel Balassiano SQL Server and MongoDB consulting

In this post, I intend to show a back-end architecture implemented for one of our clients. This architecture uses MongoDB as permanent database and queue repository. Is is important to know that MongoDB is not the best database to use as a repository as it is limited by amount of inserts per second. My baseline is very simple - If quantity of Inserts per second is greater than 10000 than MongoDB should not be used as queue repository.

I use this query to get an idea on how many I/Ops there are in the database. This will return quantity of inserted per second using field insertedDate

db.collection.aggregate( [ { $group: { _id: { DateSt: { $dateToString: { format: "%Y-%m-%d-%H-%M-%S", date: "$ insertedDate " } }},"count":{$sum:1} } } ] )

Before run this query I count inserted per minutes changing the format to: "%Y-%m-%d-%H-%M

After discovered the pick minutes I group by second (heavy query).

This old architecture has caused poor inserts performance because:

Only one collection receives all events (1) then increase numbers of locks in same collection
Single validation process runs (3), it is not possible create multi validation processes, because there is a risk that the same event will be send twice.
Exist control mechanism (step 3.1 ) to determine if the event was send or not , the mechanism updates the field "status" for each document in the collection and consequently decreases insert performance.

The first change is create to 4 collections to receive events. The events are devided between collections using specific field in the document (ex: field Zone). process (1) reads this field value and insert the events in the correct collection.

New Collections:

Queue_zone_ center -> receive events field zone = center

Queue_zone_ south -> receive events field zone = south

Queue_zone_ north -> receive events field zone = north

Queue_zone_ others -> receive events field others zones

For each collection exist one validation process , compared to old architecture that has only 1 process, now exist 4 process to valid the events , then 4x more faster.
Create new collection to control last event id.

Ex: collection_control_process:

Document :{name :Process_zone_north , last_id : 111,last_run:2017-01-05 23:13:13}

Document :{name :Process_zone_south , last_id : 443,last_run:2017-01-05 23:13:16}

Document :{name :Process_zone_center , last_id : 5623,last_run:2017-01-05 23:13:12}

Document :{name :Process_zone_others , last_id : 34,last_run:2017-01-05 23:13:12}

After cycle process zone north:

Before:

Document :{name :Process_zone_north , last_id : 111,last_run:2017-01-05 23:13:13}

Update to:

Document :{name :Process_zone_north , last_id : 987,last_run:2017-01-05 23:17:20}

My propose in this post is show back end using only Mongodb as repository and persistent, but remember others technologies can be more efficient, depends ofyour database workload.

![endif]--![endif]--![endif]--![endif]--![endif]--![endif]--![endif]--![endif]--![endif]--![endif]--![endif]--![endif]--

#MongoDB #MarcelBalassiano #architecture