For optimal performance in terms of the storage layer, use disks backed by
Avoid
#mongodb #mongo #disk #raid #SSD
RAID-10. RAID-5 and RAID-6 do not typically provide sufficient performance to support a MongoDB deployment.Avoid
RAID-0 with MongoDB deployments. While RAID-0 provides good write performance, it also provides limited availability and can lead to reduced performance on read operations, particularly when using Amazon’s EBS volumes.#mongodb #mongo #disk #raid #SSD
How to detect unused indexes on
#mongodb #mongo #index #unused_indexes
MongoDB collections?db.your_collection.aggregate([{$indexStats:{}}]).pretty()ops displays operations per second on a specific index. If ops is very low compared to other indexes you can drop the index.#mongodb #mongo #index #unused_indexes
If you have a dedicated server only for
Set the readahead setting to 0 regardless of storage media type (spinning, SSD, etc.).
Setting a higher readahead benefits sequential I/O operations. However, since
In general, set the readahead setting to 0 unless testing shows a measurable, repeatable, and reliable benefit in a higher readahead value. MongoDB Professional Support can provide advice and guidance on non-zero readahead configurations.
Ensure that readahead settings for the block devices that store the database files are appropriate. For random access use patterns, set low readahead values. A readahead of 32 (16 kB) often works well.
For a standard block device, you can run
#mongodb #mongo #ra #readahead #mmapv1 #wiredtiger
MongoDB database I would I highly recommend to set block size as mongo suggested for their engines.For the WiredTiger storage engine:Set the readahead setting to 0 regardless of storage media type (spinning, SSD, etc.).
Setting a higher readahead benefits sequential I/O operations. However, since
MongoDB disk access patterns are generally random, setting a higher readahead provides limited benefit or performance degradation. As such, for most workloads, a readahead of 0 provides optimal MongoDB performance.In general, set the readahead setting to 0 unless testing shows a measurable, repeatable, and reliable benefit in a higher readahead value. MongoDB Professional Support can provide advice and guidance on non-zero readahead configurations.
For the MMAPv1 storage engine:Ensure that readahead settings for the block devices that store the database files are appropriate. For random access use patterns, set low readahead values. A readahead of 32 (16 kB) often works well.
For a standard block device, you can run
sudo blockdev --report to get the readahead settings and sudo blockdev --setra <value> <device> to change the readahead settings. Refer to your specific operating system manual for more information.#mongodb #mongo #ra #readahead #mmapv1 #wiredtiger
In MongoDB it is suggested to turn
To run off
Add
Yours may be a little bit different. Now reboot your server using
Now you can check it by
In the next post we test this using
#mongodb #mongo #noatime #atime #xfs #linux #fstab #mount
atime to off. atime is set by Linux on each file accessed by applications. It is reported repeatedly that turning it off will improve disk performance on that partition.To run off
atime you need to set noatime on the partition you are placing mongoDB database files. Open /etc/fstab and look for your desired partition (mine is `/var`):/dev/mapper/mongo--vg-var /var xfs defaults 0 2
Add
noatime after defaults:/dev/mapper/mongo--vg-var /var xfs defaults,noatime 0 2
Yours may be a little bit different. Now reboot your server using
reboot --reboot. Now you can check it by
mount -l whether noatime is set or not:/dev/mapper/mongo--vg-var on /var type xfs (rw,noatime,attr2,inode64,logbsize=256k,sunit=512,swidth=1024,noquota)
In the next post we test this using
touch command in Linux.#mongodb #mongo #noatime #atime #xfs #linux #fstab #mount
Have you seen that when you want to query from
If you enter the above command in
#mongodb #mongo #shellBatchSize #limit
MongoDB in shell it just prints the last 10 records and prompts to enter it in order to see more? Well in MongoDB shell you can issue the below command to say how many records to return:DBQuery.shellBatchSize = 3000
If you enter the above command in
MongoDB shell and use find to query a collection that has more than 3000 documents, 3000 documents will be returned at once.#mongodb #mongo #shellBatchSize #limit
Now to make you MongoDB client connection secure just pass
When you run this script check your mongoDB logs (usually in
Now remove
It says that SSL mode in mongo config is set to
YOU NEED TO BE CAUTIOUS that we have created our SSL ourselves and it is vulnerable to man in the middle attack. For production usage purchase you SSL/TLS certifcate.
#mongodb #mongo #ssl #pymongo
ssl=True:# test_mongodb_ssl.py
client = pymongo.MongoClient('example.com', ssl=True)
When you run this script check your mongoDB logs (usually in
/var/log/mongodb/mongod.log`). The thing you should take into account is that when you pass `ssl=True parameter to MongoClient you just should see the below log (ip addresses wil vary):I NETWORK [listener] connection accepted from 172.15.141.162:50761 #49 (39 connections now open)
I NETWORK [conn49] end connection 172.15.141.162:50761 (38 connections now open)
Now remove
ssl=True from MongoClient or pass ssl=False. If you now run your test script, you would see something like below in mongod.log:I NETWORK [listener] connection accepted from 172.15.141.162:50762 #50 (39 connections now open)
I NETWORK [conn50] SSL mode is set to 'preferred' and connection 50 to 172.15.141.162:50762 is not using SSL.
It says that SSL mode in mongo config is set to
preferSSL and your new connection to mongo is not using it.YOU NEED TO BE CAUTIOUS that we have created our SSL ourselves and it is vulnerable to man in the middle attack. For production usage purchase you SSL/TLS certifcate.
#mongodb #mongo #ssl #pymongo
If you have followed our
To make the procedure automatic I have created a sample shell script that after automatic renewal will also renew the PEM files for
#mongodb #mongo #ssl #pem #openssl #lets_encrypt
MongoDB SSL configuration, you should by now know that we can generate SSL certificate using lets encrypt. I have used dehydrated that fully matches with cloud flare.To make the procedure automatic I have created a sample shell script that after automatic renewal will also renew the PEM files for
MongoDB#! /bin/bash
echo 'Binding new mongo private key PEM file and Cert PEM file...'
cat /etc/dehydrated/certs/mongo.example.com/privkey.pem /etc/dehydrated/certs/mongo.example.com/cert.pem > /etc/ssl/mongo.pem
echo 'Saved the new file in /etc/ssl/mongo.pem'
sudo touch /etc/ssl/ca.pem
sudo chmod 777 /etc/ssl/ca.pem
echo 'truncate ca.pem file and generate a new in /etc/ssl/ca.pem...'
sudo truncate -s 0 /etc/ssl/ca.pem
echo 'generate a ca.pem file using opessl by input -> /etc/ssl/ca.crt'
sudo openssl x509 -in /etc/ssl/ca.crt -out /etc/ssl/ca.pem -outform PEM
echo 'ca.pem is generated successfully in /etc/ssl'
echo 'append the chain.pem content to newly created ca.pem in /etc/ssl/ca.pem'
sudo cat /etc/dehydrated/certs/mongo.example.com/chain.pem >> /etc/ssl/ca.pem
echo 'done!'
#mongodb #mongo #ssl #pem #openssl #lets_encrypt
Today I fixed a really C**Py bug which have been bugged me all days and years, nights and midnights!
I use a scheduler to to get data from
I found the below parameter that you can set on your
When you don't set it it means no timeout! So I set it to 20000 Ms (20 Sec) in order to solve this nasty problem.
#mongodb #mongo #socketTimeoutMS #timeout #socket_timeout
I use a scheduler to to get data from
MongoDB and one the servers is outside of Iran and another in Iran. When I want to get data sometimes querying the db takes forever and it freezes the data gathering procedure. I had to restart (like windows) to reset the connection. I know it was stupid! :|I found the below parameter that you can set on your
pymongo.MongoClient:socketTimeoutMS=10000
socketTimeoutMS: (integer or None) Controls how long (in milliseconds) the driver will wait for a response after sending an ordinary (non-monitoring) database operation before concluding that a network error has occurred. Defaults to `None` (no timeout).When you don't set it it means no timeout! So I set it to 20000 Ms (20 Sec) in order to solve this nasty problem.
#mongodb #mongo #socketTimeoutMS #timeout #socket_timeout
In
To read more about
- https://docs.grafana.org/features/datasources/mysql/#using-mysql-in-grafana
#mongodb #mongo #mysql #grafana #dashboard #chart
Grafana if you are connected to MySQL you need to provide 3 value in your select query. One is time which must be called time_sec, the other is countable value which must be called value and the other is the label that is displayed on your graph which must be called metric:SELECT
UNIX_TIMESTAMP(your_date_field) as time_sec,
count(*) as value,
'your_label' as metric
FROM table
WHERE status='success'
GROUP BY your_date_field
ORDER BY your_date_field ASC
To read more about
Grafana head over here:- https://docs.grafana.org/features/datasources/mysql/#using-mysql-in-grafana
#mongodb #mongo #mysql #grafana #dashboard #chart
Months ago we have talked about how to get mongoDB data changes. THe problem with that article was that if for any reason your script
was stopped you will lose the data in the downtime period.
Now we have a new solution that you will read from the point in time that have read last time. MongoDB uses bson Timestamp in order for its internal usage like replication oplog logs. We can use the same Timestamp and store it somewhere to read from the exact point
that we have read last time.
In python you can import it like below:
Now to read data from that point read that time stamp from where you have saved it and query the oplog from that point:
After traversing cursors and catching mongoDB changes you can store the new timestamp that resides in
Now use a
If you remember from before we got last changes by the query below:
We read the last ts and read from the last record, that's why we were missing data.
#mongodb #mongo #replication #oplog #timestamp #cursor
was stopped you will lose the data in the downtime period.
Now we have a new solution that you will read from the point in time that have read last time. MongoDB uses bson Timestamp in order for its internal usage like replication oplog logs. We can use the same Timestamp and store it somewhere to read from the exact point
that we have read last time.
In python you can import it like below:
from bson.timestamp import Timestamp
Now to read data from that point read that time stamp from where you have saved it and query the oplog from that point:
ts = YOUR_TIMESTAMP_HERE
cursor = oplog.find({'ts': {'$gt': ts}},
cursor_type=pymongo.CursorType.TAILABLE_AWAIT,
oplog_replay=True)
After traversing cursors and catching mongoDB changes you can store the new timestamp that resides in
ts field in the document you have fetched from MongoDB oplog.Now use a
while True and read data until cursor is alive. The point of this post is that you can store ts somewhere and read from the point you have stored ts.If you remember from before we got last changes by the query below:
last = oplog.find().sort('$natural', pymongo.DESCENDING).limit(1).next()
ts = last['ts']We read the last ts and read from the last record, that's why we were missing data.
#mongodb #mongo #replication #oplog #timestamp #cursor
In order to get a random document from MongoDB collection you can use aggregate framework:
Read more here: https://www.mongodb.com/blog/post/how-to-perform-random-queries-on-mongodb
This method is the fastest and most efficient way of getting random data from a huge database like 100 M records.
#mongodb #mongo #aggregate #sample #random
db.users.aggregate( [ { $sample: { size: 1 } } ] )NOTE: MongoDB 3.2 introduced $sample to the aggregation pipeline.Read more here: https://www.mongodb.com/blog/post/how-to-perform-random-queries-on-mongodb
This method is the fastest and most efficient way of getting random data from a huge database like 100 M records.
#mongodb #mongo #aggregate #sample #random
MongoDB
How to Perform Random Queries on MongoDB | MongoDB Blog
in
most important part if this scenario is when you are using micro service architecture and you have tens of modules which works independently from each other and send their requests to
Now if you look at the MongoDB log you would see:
In the above log you would see
#mongodb #mongo #pymongo #appname
pymongo you can give name to your connections. This definitely helps to debug issues or trace logs when seeing mongoDB logs. Themost important part if this scenario is when you are using micro service architecture and you have tens of modules which works independently from each other and send their requests to
MongoDB:mc = pymongo.MongoClient(host, port, appname='YOUR_APP_NAME')
Now if you look at the MongoDB log you would see:
I COMMAND [conn173140] command MY_DB.users appName: "YOUR_APP_NAME" command: find { find: "deleted_users", filter: {}, sort: { acquired_date: 1 }, skip: 19973, limit: 1000, $readPreference: { mode: "secondaryPreferred" }, $db: "blahblah" } planSummary: COLLSCAN keysExamined:0 docsExamined:19973 hasSortStage:1 cursorExhausted:1 numYields:312 nreturned:0 reslen:235 locks:{ Global: { acquireCount: { r: 626 } }, Database: { acquireCount: { r: 313 } }, Collection: { acquireCount: { r: 313 } } } protocol:op_query 153msIn the above log you would see
YOUR_APP_NAME.#mongodb #mongo #pymongo #appname
How to ignore extra fields for schema validation in
Some records currently have extra fields that are not included in my model schema (by error, but I want to handle these cases). When I try to query the DB and transform the records into the schema, I get the following error:
For ignoring this error when having extra fields while getting data, set
#mongodb #mongo #python #mongoengine #strict #FieldDoesNotExist
Mongoengine?Some records currently have extra fields that are not included in my model schema (by error, but I want to handle these cases). When I try to query the DB and transform the records into the schema, I get the following error:
FieldDoesNotExist
The field 'X' does not exist on the document 'Y'
For ignoring this error when having extra fields while getting data, set
strict to False in your meta dictionary.class User(Document):
email = StringField(required=True, unique=True)
password = StringField()
meta = {'strict': False}
#mongodb #mongo #python #mongoengine #strict #FieldDoesNotExist
In
It uses
#mongodb #mongo #duplicates #duplication
MongoDB you can remove duplicate documents based on a specific field:db.yourCollection.aggregate([
{ "$group": {
"_id": { "yourDuplicateKey": "$yourDuplicateKey" },
"dups": { "$push": "$_id" },
"count": { "$sum": 1 }
}},
{ "$match": { "count": { "$gt": 1 } }}
]).forEach(function(doc) {
doc.dups.shift();
db.yourCollection.remove({ "_id": {"$in": doc.dups }});
});
It uses
aggregation to group by based on the given key then add its _id into dups field and its count in count field. It will project fields with count of more than 1 using $match. At the end loops over each document and remove all duplicate fields except the first one (`shift` will cause this behaviour).#mongodb #mongo #duplicates #duplication
How to configure a Delayed Replica Set Member?
Let's assume that our member is third in the array of replica members:
The
The
And finally
The use case for this is to have a replication that is used for analytical purposes or used for backup and so on.
#mongodb #mongo #replica #replication #primary #delayed_replica_set #slaveDelay
Let's assume that our member is third in the array of replica members:
cfg = rs.conf()
cfg.members[2].priority = 0
cfg.members[2].hidden = true
cfg.members[2].slaveDelay = 3600
rs.reconfig(cfg)
The
priority is set to 0 (preventing to be elected as primary).The
hidden to true in order to hide the node from clients querying the database.And finally
slaveDelay to number of seconds that we want it to get behind of Primary Node.The use case for this is to have a replication that is used for analytical purposes or used for backup and so on.
#mongodb #mongo #replica #replication #primary #delayed_replica_set #slaveDelay
How to add self-signed certificates to replica set nodes?
https://medium.com/@rossbulat/deploy-a-3-node-mongodb-3-6-replica-set-with-x-509-authentication-self-signed-certificates-d539fda94db4
#mongo #mongodb #ssl #self_signed #openssl
https://medium.com/@rossbulat/deploy-a-3-node-mongodb-3-6-replica-set-with-x-509-authentication-self-signed-certificates-d539fda94db4
#mongo #mongodb #ssl #self_signed #openssl
Medium
Deploy a 3-Node MongoDB 4.0 Replica Set with X.509 Authentication + Self Signed Certificates
This article will guide you through the process of setting up a MongoDB cluster that will utilise X.509 authentication with self signed…