mongoDB VS PostgreSQL dml performance use python (pymongo & py-postgresql)

本文涉及的产品
云原生数据库 PolarDB MySQL 版,Serverless 5000PCU 100GB
云数据库 MongoDB,通用型 2核4GB
简介:
前面测试了mongodb和postgresql的插入性能对比, 参考如下 : 
本文将测试对比一下select, update, 以及select, insert, update混合场景的性能.
同样使用并行8个线程测试.
(因为使用的驱动问题, python测试结果性能较差, mongo改用motor驱动的异步调用后, 性能提升明显, 测试结果仅供参考, 如果是PG的话, 建议使用pgbench测试)
mongoDB : 
更新测试 :
# vi test.py
import threading
import time
import pymongo
import random

c=pymongo.MongoClient('/tmp/mongodb-5281.sock')
db = c.test_database
db.drop_collection('test_collection')
collection = db.test_collection
print(collection.count())
for i in range(0,1000000):
  collection.insert({'id':i, 'username': 'digoal.zhou', 'age':32, 'email':'digoal@126.com', 'qq':'276732431'})
collection.create_index("id")

class n_t(threading.Thread):   #The timer class is derived from the class threading.Thread
  def __init__(self, num):
    threading.Thread.__init__(self)
    self.thread_num = num

  def run(self): #Overwrite run() method, put what you want the thread do here
    c=pymongo.MongoClient('/tmp/mongodb-5281.sock')
    db = c.test_database
    collection = db.test_collection
    start_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(start_t))
    for i in range(0,125000):
      collection.update({'id':random.randrange(0,1000000)}, {'$set': {'age': random.randrange(0,1000000)}})
    stop_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(stop_t))
    print(stop_t-start_t)

def test():
  t_names = dict()
  for i in range(0,8):
    t_names[i] = n_t(i) 
    t_names[i].start()
  return

if __name__ == '__main__':
  test()

测试结果379秒 : 
[root@localhost ~]# python test.py
0
TID:0 1423070907.7699816
TID:1 1423070907.770696
TID:2 1423070907.7744105
TID:3 1423070907.7760801
TID:4 1423070907.779555
TID:7 1423070907.78037
TID:6 1423070907.7860947
TID:5 1423070907.78793
TID:6 1423071285.5971715
377.81107687950134
TID:7 1423071286.6500263
378.8696563243866
TID:0 1423071286.9464445
379.17646288871765
TID:1 1423071287.1227949
379.352098941803
TID:3 1423071287.1230247
379.3469445705414
TID:5 1423071287.2262568
379.4383268356323
TID:4 1423071287.2609653
379.4814102649689
TID:2 1423071287.4058232
379.6314127445221


查询测试 : 
import threading
import time
import pymongo
import random

c=pymongo.MongoClient('/tmp/mongodb-5281.sock')
db = c.test_database
# db.drop_collection('test_collection')
collection = db.test_collection
print(collection.count())
# for i in range(0,1000000):
#   collection.insert({'id':i, 'username': 'digoal.zhou', 'age':32, 'email':'digoal@126.com', 'qq':'276732431'})
# collection.create_index("id")

class n_t(threading.Thread):   #The timer class is derived from the class threading.Thread
  def __init__(self, num):
    threading.Thread.__init__(self)
    self.thread_num = num

  def run(self): #Overwrite run() method, put what you want the thread do here
    c=pymongo.MongoClient('/tmp/mongodb-5281.sock')
    db = c.test_database
    collection = db.test_collection
    start_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(start_t))
    for i in range(0,125000):
      collection.find_one({'id': random.randrange(0,1000000)})
    stop_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(stop_t))
    print(stop_t-start_t)

def test():
  t_names = dict()
  for i in range(0,8):
    t_names[i] = n_t(i) 
    t_names[i].start()
  return

if __name__ == '__main__':
  test()

测试结果361秒 : 
[root@localhost ~]# python test.py
2000000
TID:2 1423079875.4572093
TID:3 1423079875.4576375
TID:1 1423079875.4596934
TID:0 1423079875.4600854
TID:4 1423079875.4589622
TID:5 1423079875.4653761
TID:6 1423079875.463017
TID:7 1423079875.4694664
TID:7 1423080235.7239776
360.2545111179352
TID:3 1423080236.109339
360.6517014503479
TID:4 1423080236.1194305
360.66046833992004
TID:1 1423080236.260948
360.8012545108795
TID:2 1423080236.5218844
361.06467509269714
TID:5 1423080236.6404896
361.17511343955994
TID:0 1423080236.6446981
361.1846127510071
TID:6 1423080236.6607506
361.1977336406708


更新, 插入, 查询综合测试 : 
import threading
import time
import pymongo
import random

c=pymongo.MongoClient('/tmp/mongodb-5281.sock')
db = c.test_database
# db.drop_collection('test_collection')
collection = db.test_collection
print(collection.count())
# for i in range(0,1000000):
#   collection.insert({'id':i, 'username': 'digoal.zhou', 'age':32, 'email':'digoal@126.com', 'qq':'276732431'})
# collection.create_index("id")

class n_t(threading.Thread):   #The timer class is derived from the class threading.Thread
  def __init__(self, num):
    threading.Thread.__init__(self)
    self.thread_num = num

  def run(self): #Overwrite run() method, put what you want the thread do here
    c=pymongo.MongoClient('/tmp/mongodb-5281.sock')
    db = c.test_database
    collection = db.test_collection
    start_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(start_t))
    for i in range(0,125000):
      collection.insert({'id':random.randrange(1000001,2000000), 'username': 'digoal.zhou', 'age':32, 'email':'digoal@126.com', 'qq':'276732431'})
      collection.update({'id':random.randrange(0,1000000)}, {'$set': {'age': random.randrange(0,1000000)}})
      collection.find_one({'id': random.randrange(0,1000000)})
    stop_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(stop_t))
    print(stop_t-start_t)

def test():
  t_names = dict()
  for i in range(0,8):
    t_names[i] = n_t(i) 
    t_names[i].start()
  return

if __name__ == '__main__':
  test()

测试结果1150秒 : 
[root@localhost ~]# python test.py
2000000
TID:0 1423080359.006871
TID:1 1423080359.0083587
TID:2 1423080359.009925
TID:4 1423080359.0124109
TID:3 1423080359.015088
TID:5 1423080359.0179524
TID:6 1423080359.0209677
TID:7 1423080359.0235417
TID:4 1423081508.11507
1149.1026592254639
TID:7 1423081508.1888452
1149.1653034687042
TID:2 1423081508.2641344
1149.2542095184326
TID:1 1423081508.3973265
1149.3889677524567
TID:0 1423081508.5400703
1149.5331993103027
TID:6 1423081508.594207
1149.573239326477
TID:5 1423081509.0279126
1150.0099601745605
TID:3 1423081509.0943637
1150.0792756080627


PostgreSQL : 
更新测试 :
import threading
import time
import postgresql
import random

conn = { "user": "postgres",
         "database": "postgres",
         "unix": "/data01/pgdata/pg_root/.s.PGSQL.1921"
       }

db = postgresql.open(**conn)
db.execute("drop table if exists tt")
db.execute("create table tt(id int, username name, age int2, email text, qq text)")
ins = db.prepare("insert into tt values($1,$2,$3,$4,$5)")
for i in range(0,1000000):
  ins(i,'digoal.zhou',32,'digoal@126.com','276732431')
db.execute("create index idx_tt_id on tt(id)")
print(db.query("select count(1) as a from tt"))

class n_t(threading.Thread):   #The timer class is derived from the class threading.Thread
  def __init__(self, num):
    threading.Thread.__init__(self)
    self.thread_num = num

  def run(self): #Overwrite run() method, put what you want the thread do here
    conn = { "user": "postgres", 
             "database": "postgres",
             "unix": "/data01/pgdata/pg_root/.s.PGSQL.1921"
           }

    db = postgresql.open(**conn)
    upd = db.prepare("update tt set age=$1 where id=$2")
    start_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(start_t))
    for i in range(0,125000):
      upd(random.randrange(0,100), random.randrange(0,1000000))
    stop_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(stop_t))
    print(stop_t-start_t)

def test():
  t_names = dict()
  for i in range(0,8):
    t_names[i] = n_t(i) 
    t_names[i].start()
  return

if __name__ == '__main__':
  test()

测试结果244秒 : 
TID:0 1423072792.0481002
TID:1 1423072792.050467
TID:3 1423072792.0514963
TID:2 1423072792.051693
TID:5 1423072792.059382
TID:4 1423072792.0605848
TID:7 1423072792.0643597
TID:6 1423072792.0657377
TID:2 1423073034.8827112
242.8310182094574
TID:5 1423073035.024978
242.96559596061707
TID:4 1423073035.2550452
243.1944603919983
TID:7 1423073035.5245414
243.46018171310425
TID:1 1423073036.0639975
244.0135304927826
TID:3 1423073036.3519847
244.30048847198486
TID:0 1423073036.5292883
244.48118805885315
TID:6 1423073036.5383787
244.47264099121094


查询测试 : 
import threading
import time
import postgresql
import random

conn = { "user": "postgres",
         "database": "postgres",
         "unix": "/data01/pgdata/pg_root/.s.PGSQL.1921"
       }

db = postgresql.open(**conn)
# db.execute("drop table if exists tt")
# db.execute("create table tt(id int, username name, age int2, email text, qq text)")
# ins = db.prepare("insert into tt values($1,$2,$3,$4,$5)")
# for i in range(0,1000000):
#   ins(i,'digoal.zhou',32,'digoal@126.com','276732431')
# db.execute("create index idx_tt_id on tt(id)")
print(db.query("select count(1) as a from tt"))

class n_t(threading.Thread):   #The timer class is derived from the class threading.Thread
  def __init__(self, num):
    threading.Thread.__init__(self)
    self.thread_num = num

  def run(self): #Overwrite run() method, put what you want the thread do here
    conn = { "user": "postgres", 
             "database": "postgres",
             "unix": "/data01/pgdata/pg_root/.s.PGSQL.1921"
           }

    db = postgresql.open(**conn)
    sel = db.prepare("select * from tt where id=$1 limit 1")
    start_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(start_t))
    for i in range(0,125000):
      sel(random.randrange(0,1000000))
    stop_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(stop_t))
    print(stop_t-start_t)

def test():
  t_names = dict()
  for i in range(0,8):
    t_names[i] = n_t(i) 
    t_names[i].start()
  return

if __name__ == '__main__':
  test()

测试结果438秒 : 
postgres@localhost-> python test.py
[(1000000,)]
TID:2 1423081634.6041436
TID:1 1423081634.6072564
TID:5 1423081634.6098883
TID:0 1423081634.6110475
TID:6 1423081634.611464
TID:7 1423081634.6147678
TID:3 1423081634.617597
TID:4 1423081634.6184704
TID:7 1423082070.8112974
436.1965296268463
TID:4 1423082071.5796437
436.96117329597473
TID:5 1423082071.6695313
437.0596430301666
TID:0 1423082071.8521369
437.24108934402466
TID:1 1423082072.5634701
437.95621371269226
TID:2 1423082072.678791
438.0746474266052
TID:3 1423082072.9825838
438.3649866580963
TID:6 1423082072.9963892
438.3849251270294


更新, 插入, 查询综合测试 : 
import threading
import time
import postgresql
import random

conn = { "user": "postgres",
         "database": "postgres",
         "unix": "/data01/pgdata/pg_root/.s.PGSQL.1921"
       }

db = postgresql.open(**conn)
# db.execute("drop table if exists tt")
# db.execute("create table tt(id int, username name, age int2, email text, qq text)")
# ins = db.prepare("insert into tt values($1,$2,$3,$4,$5)")
# for i in range(0,1000000):
#   ins(i,'digoal.zhou',32,'digoal@126.com','276732431')
# db.execute("create index idx_tt_id on tt(id)")
print(db.query("select count(1) as a from tt"))

class n_t(threading.Thread):   #The timer class is derived from the class threading.Thread
  def __init__(self, num):
    threading.Thread.__init__(self)
    self.thread_num = num

  def run(self): #Overwrite run() method, put what you want the thread do here
    conn = { "user": "postgres", 
             "database": "postgres",
             "unix": "/data01/pgdata/pg_root/.s.PGSQL.1921"
           }

    db = postgresql.open(**conn)
    ins = db.prepare("insert into tt values($1,$2,$3,$4,$5)")
    upd = db.prepare("update tt set age=$1 where id=$2")
    sel = db.prepare("select * from tt where id=$1")

    start_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(start_t))
    for i in range(0,125000):
      ins(random.randrange(1000001,2000000),'digoal.zhou',32,'digoal@126.com','276732431')
      upd(random.randrange(0,100), random.randrange(0,1000000))
      sel(random.randrange(0,1000000))
    stop_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(stop_t))
    print(stop_t-start_t)

def test():
  t_names = dict()
  for i in range(0,8):
    t_names[i] = n_t(i) 
    t_names[i].start()
  return

if __name__ == '__main__':
  test()

测试结果938秒 : 
postgres@localhost-> python test.py
[(1000000,)]
TID:0 1423083626.888068
TID:2 1423083626.8912995
TID:1 1423083626.8920445
TID:3 1423083626.893638
TID:4 1423083626.8974612
TID:6 1423083626.9039218
TID:5 1423083626.9061637
TID:7 1423083626.908666

TID:5 1423084561.3804135
934.4742498397827
TID:6 1423084563.4344044
936.5304825305939
TID:2 1423084564.1677904
937.2764909267426
TID:0 1423084564.5768228
937.6887547969818
TID:4 1423084564.839536
937.9420747756958
TID:1 1423084564.9242597
938.0322151184082
TID:7 1423084565.0638845
938.1552186012268
TID:3 1423084565.345857
938.4522190093994


PostgreSQL使用pgbench的测试结果 : 
# 初始化数据
psql
truncate tt;
insert into tt select generate_series(1,1000000), 'digoal.zhou',32,'digoal@126.com','276732431';

更新
postgres@localhost-> vi test.sql
\setrandom v_id 0 1000000
\setrandom v_age 0 100
update tt set age=:v_age where id=:v_id;

测试结果32秒 : 
postgres@localhost-> pgbench -M prepared -n -r -f ./test.sql -c 8 -j 4 -t 125000
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 8
number of threads: 4
number of transactions per client: 125000
number of transactions actually processed: 1000000/1000000
tps = 31631.177435 (including connections establishing)
tps = 31637.366682 (excluding connections establishing)
statement latencies in milliseconds:
        0.002963        \setrandom v_id 0 1000000
        0.000671        \setrandom v_age 0 100
        0.232106        update tt set age=:v_age where id=:v_id;

查询
postgres@localhost-> vi test.sql
\setrandom v_id 0 1000000
select * from tt where id=:v_id;

测试结果15秒 : 
postgres@localhost-> pgbench -M prepared -n -r -f ./test.sql -c 8 -j 4 -t 125000
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 8
number of threads: 4
number of transactions per client: 125000
number of transactions actually processed: 1000000/1000000
tps = 66487.929382 (including connections establishing)
tps = 66514.422913 (excluding connections establishing)
statement latencies in milliseconds:
        0.002007        \setrandom v_id 0 1000000
        0.104391        select * from tt where id=:v_id;

插入, 更新, 查询
postgres@localhost-> vi test.sql
\setrandom v_newid 1000001 2000000
\setrandom v_id 0 1000000
\setrandom v_age 0 100
insert into tt values(:v_newid, 'digoal.zhou',32,'digoal@126.com','276732431');
update tt set age=:v_age where id=:v_id;
select * from tt where id=:v_id;

测试结果耗时69秒 : 
postgres@localhost-> pgbench -M prepared -n -r -f ./test.sql -c 8 -j 4 -t 125000
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 8
number of threads: 4
number of transactions per client: 125000
number of transactions actually processed: 1000000/1000000
tps = 14429.720005 (including connections establishing)
tps = 14431.006796 (excluding connections establishing)
statement latencies in milliseconds:
        0.003031        \setrandom v_newid 1000001 2000000
        0.000816        \setrandom v_id 0 1000000
        0.000733        \setrandom v_age 0 100
        0.150249        insert into tt values(:v_newid, 'digoal.zhou',32,'digoal@126.com','276732431');
        0.180603        update tt set age=:v_age where id=:v_id;
        0.190850        select * from tt where id=:v_id;


[小结]
python 测试结果 : 
mongoDB 
插入100W记录耗时 - 364秒
更新100W记录耗时 - 379
100W记录索引检索100W次耗时 - 361
综合测试, 插入 100W记录 , 更新 100W记录 , 查询 100W次耗时 - 1150

PostgreSQL
插入100W记录耗时 - 226秒.
更新100W记录耗时 - 244 秒.
100W记录索引检索100W次耗时 - 438 秒.
综合测试, 插入 100W记录 , 更新 100W记录 , 查询 100W次耗时 - 938 秒.

PostgreSQL 使用pgbench测试结果 : 
插入100W记录耗时 - 16秒
更新100W记录耗时 - 32 秒.
100W记录索引检索100W次耗时 - 15 秒.
综合测试, 插入 100W记录 , 更新 100W记录 , 查询 100W次耗时 - 69 秒.

[参考]
1.  http://blog.163.com/digoal@126/blog/static/16387704020151435825593/
相关实践学习
MongoDB数据库入门
MongoDB数据库入门实验。
快速掌握 MongoDB 数据库
本课程主要讲解MongoDB数据库的基本知识,包括MongoDB数据库的安装、配置、服务的启动、数据的CRUD操作函数使用、MongoDB索引的使用(唯一索引、地理索引、过期索引、全文索引等)、MapReduce操作实现、用户管理、Java对MongoDB的操作支持(基于2.x驱动与3.x驱动的完全讲解)。 通过学习此课程,读者将具备MongoDB数据库的开发能力,并且能够使用MongoDB进行项目开发。   相关的阿里云产品:云数据库 MongoDB版 云数据库MongoDB版支持ReplicaSet和Sharding两种部署架构,具备安全审计,时间点备份等多项企业能力。在互联网、物联网、游戏、金融等领域被广泛采用。 云数据库MongoDB版(ApsaraDB for MongoDB)完全兼容MongoDB协议,基于飞天分布式系统和高可靠存储引擎,提供多节点高可用架构、弹性扩容、容灾、备份回滚、性能优化等解决方案。 产品详情: https://www.aliyun.com/product/mongodb
相关文章
|
6天前
|
NoSQL MongoDB Python
深入了解 Python MongoDB 操作:排序、删除、更新、结果限制全面解析
使用 sort() 方法对结果进行升序或降序排序。 sort() 方法接受一个参数用于“字段名”,一个参数用于“方向”(升序是默认方向)。
73 0
|
6天前
|
安全 数据库 C++
Python Web框架比较:Django vs Flask vs Pyramid
【4月更文挑战第9天】本文对比了Python三大Web框架Django、Flask和Pyramid。Django功能全面,适合快速开发,但学习曲线较陡;Flask轻量灵活,易于入门,但默认配置简单,需自行添加功能;Pyramid兼顾灵活性和可扩展性,适合不同规模项目,但社区及资源相对较少。选择框架应考虑项目需求和开发者偏好。
|
6天前
|
存储 NoSQL MongoDB
MongoDB数据库转换为表格文件的Python实现
MongoDB数据库转换为表格文件的Python实现
39 0
|
3天前
|
C++ Python
vs配置python环境 - 蓝易云
以上就是在Visual Studio中配置Python环境的步骤,希望对你有所帮助。
11 1
|
6天前
|
数据采集 Web App开发 数据处理
Lua vs. Python:哪个更适合构建稳定可靠的长期运行爬虫?
Lua vs. Python:哪个更适合构建稳定可靠的长期运行爬虫?
|
6天前
|
存储 NoSQL 关系型数据库
【MongoDB 专栏】MongoDB 与传统关系型数据库的比较
【5月更文挑战第10天】本文对比了MongoDB与传统关系型数据库在数据模型、存储结构、扩展性、性能、事务支持、数据一致性和适用场景等方面的差异。MongoDB以其灵活的文档模型、优秀的扩展性和高性能在处理非结构化数据和高并发场景中脱颖而出,而关系型数据库则在事务处理和强一致性上更具优势。两者各有适用场景,选择应根据实际需求来定。随着技术发展,两者正相互融合,共同构建更丰富的数据库生态。
【MongoDB 专栏】MongoDB 与传统关系型数据库的比较
|
6天前
|
存储 NoSQL 关系型数据库
MongoDB非关系型数据库实战
【5月更文挑战第6天】MongoDB,流行的NoSQL数据库,以其灵活的数据模型和高性能备受青睐。本文介绍了MongoDB的基础,包括文档型数据库特性、安装配置、数据操作。通过电商订单管理的实战案例,展示了MongoDB在处理复杂数据结构和大规模数据时的优势,适用于电商、游戏、视频直播等场景。MongoDB的索引、全文搜索和地理空间功能进一步增强了其实用性。注意性能优化和扩展性以确保系统稳定性和可靠性。
|
6天前
|
NoSQL MongoDB Redis
Python与NoSQL数据库(MongoDB、Redis等)面试问答
【4月更文挑战第16天】本文探讨了Python与NoSQL数据库(如MongoDB、Redis)在面试中的常见问题,包括连接与操作数据库、错误处理、高级特性和缓存策略。重点介绍了使用`pymongo`和`redis`库进行CRUD操作、异常捕获以及数据一致性管理。通过理解这些问题、易错点及避免策略,并结合代码示例,开发者能在面试中展现其技术实力和实践经验。
332 8
Python与NoSQL数据库(MongoDB、Redis等)面试问答
|
6天前
|
存储 NoSQL MongoDB
「Python系列」Python MongoDB
MongoDB 是一个开源的 NoSQL 数据库,它使用 BSON(Binary JSON)格式存储数据,并提供了高性能、可扩展和灵活的数据存储解决方案。Python 中有一个名为 PyMongo 的驱动程序,允许 Python 应用程序与 MongoDB 数据库进行交互。
22 0
|
6天前
|
存储 SQL NoSQL
【MongoDB】如何区分MongoDB和关系型数据库?
【4月更文挑战第1天】【MongoDB】如何区分MongoDB和关系型数据库?