Tech C**P
12 subscribers
161 photos
9 videos
59 files
304 links
مدرس و برنامه نویس پایتون و لینوکس @alirezastack
Download Telegram
For optimal performance in terms of the storage layer, use disks backed by RAID-10. RAID-5 and RAID-6 do not typically provide sufficient performance to support a MongoDB deployment.

Avoid RAID-0 with MongoDB deployments. While RAID-0 provides good write performance, it also provides limited availability and can lead to reduced performance on read operations, particularly when using Amazon’s EBS volumes.


#mongodb #mongo #disk #raid #SSD
How to detect unused indexes on MongoDB collections?

db.your_collection.aggregate([{$indexStats:{}}]).pretty()

ops displays operations per second on a specific index. If ops is very low compared to other indexes you can drop the index.

#mongodb #mongo #index #unused_indexes
If you have a dedicated server only for MongoDB database I would I highly recommend to set block size as mongo suggested for their engines.


For the WiredTiger storage engine:

Set the readahead setting to 0 regardless of storage media type (spinning, SSD, etc.).

Setting a higher readahead benefits sequential I/O operations. However, since MongoDB disk access patterns are generally random, setting a higher readahead provides limited benefit or performance degradation. As such, for most workloads, a readahead of 0 provides optimal MongoDB performance.

In general, set the readahead setting to 0 unless testing shows a measurable, repeatable, and reliable benefit in a higher readahead value. MongoDB Professional Support can provide advice and guidance on non-zero readahead configurations.


For the MMAPv1 storage engine:

Ensure that readahead settings for the block devices that store the database files are appropriate. For random access use patterns, set low readahead values. A readahead of 32 (16 kB) often works well.

For a standard block device, you can run sudo blockdev --report to get the readahead settings and sudo blockdev --setra <value> <device> to change the readahead settings. Refer to your specific operating system manual for more information.


#mongodb #mongo #ra #readahead #mmapv1 #wiredtiger
In MongoDB it is suggested to turn atime to off. atime is set by Linux on each file accessed by applications. It is reported repeatedly that turning it off will improve disk performance on that partition.

To run off atime you need to set noatime on the partition you are placing mongoDB database files. Open /etc/fstab and look for your desired partition (mine is `/var`):

/dev/mapper/mongo--vg-var /var            xfs     defaults        0       2


Add noatime after defaults:

/dev/mapper/mongo--vg-var /var            xfs     defaults,noatime        0       2

Yours may be a little bit different. Now reboot your server using reboot --reboot.


Now you can check it by mount -l whether noatime is set or not:

/dev/mapper/mongo--vg-var on /var type xfs (rw,noatime,attr2,inode64,logbsize=256k,sunit=512,swidth=1024,noquota)

In the next post we test this using touch command in Linux.

#mongodb #mongo #noatime #atime #xfs #linux #fstab #mount
We deployed a server for MongoDB due to the slowness we talked before. The new deployment of MongoDB with server customization was 250 Times faster than before! And 6 Times smaller than the previous database size!

#mongodb #mongo #dba
Have you seen that when you want to query from MongoDB in shell it just prints the last 10 records and prompts to enter it in order to see more? Well in MongoDB shell you can issue the below command to say how many records to return:


DBQuery.shellBatchSize = 3000


If you enter the above command in MongoDB shell and use find to query a collection that has more than 3000 documents, 3000 documents will be returned at once.

#mongodb #mongo #shellBatchSize #limit
Now to make you MongoDB client connection secure just pass ssl=True:

# test_mongodb_ssl.py
client = pymongo.MongoClient('example.com', ssl=True)


When you run this script check your mongoDB logs (usually in /var/log/mongodb/mongod.log`). The thing you should take into account is that when you pass `ssl=True parameter to MongoClient you just should see the below log (ip addresses wil vary):

I NETWORK  [listener] connection accepted from 172.15.141.162:50761 #49 (39 connections now open)
I NETWORK [conn49] end connection 172.15.141.162:50761 (38 connections now open)


Now remove ssl=True from MongoClient or pass ssl=False. If you now run your test script, you would see something like below in mongod.log:

I NETWORK  [listener] connection accepted from 172.15.141.162:50762 #50 (39 connections now open)
I NETWORK [conn50] SSL mode is set to 'preferred' and connection 50 to 172.15.141.162:50762 is not using SSL.

It says that SSL mode in mongo config is set to preferSSL and your new connection to mongo is not using it.

YOU NEED TO BE CAUTIOUS that we have created our SSL ourselves and it is vulnerable to man in the middle attack. For production usage purchase you SSL/TLS certifcate.

#mongodb #mongo #ssl #pymongo
If you have followed our MongoDB SSL configuration, you should by now know that we can generate SSL certificate using lets encrypt. I have used dehydrated that fully matches with cloud flare.

To make the procedure automatic I have created a sample shell script that after automatic renewal will also renew the PEM files for MongoDB

#! /bin/bash

echo 'Binding new mongo private key PEM file and Cert PEM file...'
cat /etc/dehydrated/certs/mongo.example.com/privkey.pem /etc/dehydrated/certs/mongo.example.com/cert.pem > /etc/ssl/mongo.pem
echo 'Saved the new file in /etc/ssl/mongo.pem'

sudo touch /etc/ssl/ca.pem
sudo chmod 777 /etc/ssl/ca.pem
echo 'truncate ca.pem file and generate a new in /etc/ssl/ca.pem...'
sudo truncate -s 0 /etc/ssl/ca.pem
echo 'generate a ca.pem file using opessl by input -> /etc/ssl/ca.crt'
sudo openssl x509 -in /etc/ssl/ca.crt -out /etc/ssl/ca.pem -outform PEM
echo 'ca.pem is generated successfully in /etc/ssl'

echo 'append the chain.pem content to newly created ca.pem in /etc/ssl/ca.pem'
sudo cat /etc/dehydrated/certs/mongo.example.com/chain.pem >> /etc/ssl/ca.pem
echo 'done!'

#mongodb #mongo #ssl #pem #openssl #lets_encrypt
Today I fixed a really C**Py bug which have been bugged me all days and years, nights and midnights!

I use a scheduler to to get data from MongoDB and one the servers is outside of Iran and another in Iran. When I want to get data sometimes querying the db takes forever and it freezes the data gathering procedure. I had to restart (like windows) to reset the connection. I know it was stupid! :|

I found the below parameter that you can set on your pymongo.MongoClient:

socketTimeoutMS=10000

socketTimeoutMS: (integer or None) Controls how long (in milliseconds) the driver will wait for a response after sending an ordinary (non-monitoring) database operation before concluding that a network error has occurred. Defaults to `None` (no timeout).
When you don't set it it means no timeout! So I set it to 20000 Ms (20 Sec) in order to solve this nasty problem.

#mongodb #mongo #socketTimeoutMS #timeout #socket_timeout
In Grafana if you are connected to MySQL you need to provide 3 value in your select query. One is time which must be called time_sec, the other is countable value which must be called value and the other is the label that is displayed on your graph which must be called metric:

SELECT
UNIX_TIMESTAMP(your_date_field) as time_sec,
count(*) as value,
'your_label' as metric
FROM table
WHERE status='success'
GROUP BY your_date_field
ORDER BY your_date_field ASC


To read more about Grafana head over here:

- https://docs.grafana.org/features/datasources/mysql/#using-mysql-in-grafana


#mongodb #mongo #mysql #grafana #dashboard #chart
How to check if a field exists in MongoDB and it's value is not empty?

db.users.find({ profile_image: {$exists: 1, $ne: ""}  }, { profile_image:1 })

NOTE: $ne makes sure that field is not empty and $exists check whether field exist or not.


#mongodb #mongo #find #exists #ne
Months ago we have talked about how to get mongoDB data changes. THe problem with that article was that if for any reason your script
was stopped you will lose the data in the downtime period.

Now we have a new solution that you will read from the point in time that have read last time. MongoDB uses bson Timestamp in order for its internal usage like replication oplog logs. We can use the same Timestamp and store it somewhere to read from the exact point
that we have read last time.

In python you can import it like below:

from bson.timestamp import Timestamp


Now to read data from that point read that time stamp from where you have saved it and query the oplog from that point:

ts = YOUR_TIMESTAMP_HERE
cursor = oplog.find({'ts': {'$gt': ts}},
cursor_type=pymongo.CursorType.TAILABLE_AWAIT,
oplog_replay=True)

After traversing cursors and catching mongoDB changes you can store the new timestamp that resides in ts field in the document you have fetched from MongoDB oplog.

Now use a while True and read data until cursor is alive. The point of this post is that you can store ts somewhere and read from the point you have stored ts.


If you remember from before we got last changes by the query below:

last = oplog.find().sort('$natural', pymongo.DESCENDING).limit(1).next()
ts = last['ts']


We read the last ts and read from the last record, that's why we were missing data.

#mongodb #mongo #replication #oplog #timestamp #cursor
In order to get a random document from MongoDB collection you can use aggregate framework:

db.users.aggregate(    [ { $sample: { size: 1 } } ] )

NOTE: MongoDB 3.2 introduced $sample to the aggregation pipeline.


Read more here: https://www.mongodb.com/blog/post/how-to-perform-random-queries-on-mongodb


This method is the fastest and most efficient way of getting random data from a huge database like 100 M records.

#mongodb #mongo #aggregate #sample #random
in pymongo you can give name to your connections. This definitely helps to debug issues or trace logs when seeing mongoDB logs. The
most important part if this scenario is when you are using micro service architecture and you have tens of modules which works independently from each other and send their requests to MongoDB:

mc = pymongo.MongoClient(host, port, appname='YOUR_APP_NAME')


Now if you look at the MongoDB log you would see:

I COMMAND  [conn173140] command MY_DB.users appName: "YOUR_APP_NAME" command: find { find: "deleted_users", filter: {}, sort: {        acquired_date: 1 }, skip: 19973, limit: 1000, $readPreference: { mode: "secondaryPreferred" }, $db: "blahblah" } planSummary:          COLLSCAN keysExamined:0 docsExamined:19973 hasSortStage:1 cursorExhausted:1 numYields:312 nreturned:0 reslen:235 locks:{ Global: {     acquireCount: { r: 626 } }, Database: { acquireCount: { r: 313 } }, Collection: { acquireCount: { r: 313 } } } protocol:op_query 153ms

In the above log you would see YOUR_APP_NAME.


#mongodb #mongo #pymongo #appname
How to ignore extra fields for schema validation in Mongoengine?

Some records currently have extra fields that are not included in my model schema (by error, but I want to handle these cases). When I try to query the DB and transform the records into the schema, I get the following error:

FieldDoesNotExist
The field 'X' does not exist on the document 'Y'



For ignoring this error when having extra fields while getting data, set strict to False in your meta dictionary.


class User(Document):
email = StringField(required=True, unique=True)
password = StringField()
meta = {'strict': False}



#mongodb #mongo #python #mongoengine #strict #FieldDoesNotExist
In MongoDB you can remove duplicate documents based on a specific field:

db.yourCollection.aggregate([
{ "$group": {
"_id": { "yourDuplicateKey": "$yourDuplicateKey" },
"dups": { "$push": "$_id" },
"count": { "$sum": 1 }
}},
{ "$match": { "count": { "$gt": 1 } }}
]).forEach(function(doc) {
doc.dups.shift();
db.yourCollection.remove({ "_id": {"$in": doc.dups }});
});

It uses aggregation to group by based on the given key then add its _id into dups field and its count in count field. It will project fields with count of more than 1 using $match. At the end loops over each document and remove all duplicate fields except the first one (`shift` will cause this behaviour).

#mongodb #mongo #duplicates #duplication