PostgreSQL 单列多条件查询优化 - 多个多边形查询4倍提升的技巧-阿里云开发者社区

PostgreSQL 单列多条件查询优化 - 多个多边形查询4倍提升的技巧

2017-08-20 4075

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

云原生数据库 PolarDB MySQL 版，Serverless 5000PCU 100GB

云原生数据库 PolarDB 分布式版，标准版 2核8GB

云数据库 RDS MySQL Serverless，0.5-2RCU 50GB

简介：

背景

在某些空间数据查询需求中，有一些这样的请求，例如查找与某些多边形中的任意一个相交的空间对象。

比如在菜鸟、新零售的业务中，查询某几个商场多边形，或者某几个小区多边形内覆盖的点。

SQL写法可能是这样的

select geo_point,* from table where ST_Within(geo_point, polygon_1) or ST_Within(geo_point, polygon_2) or ... ST_Within(geo_point, polygon_n);

PostgreSQL支持空间索引，同时支持bitmapAnd, bitmapOr index scan。也就是说只要geo_point字段有索引，不管多少个查询条件，都可以走index scan。

这个查询有什么优化空间么？

在讲这个空间优化前，我们来看另一个例子。

单列组合条件查询

单列组合条件查询与前面提到的需求类似，即一个字段，多个查询条件。是不是类似于一个字段，多个多边形匹配呢？

表结构如下

postgres=# \d+ a  
                                                Table "public.a"  
  Column  |            Type             | Collation | Nullable | Default | Storage  | Stats target | Description   
----------+-----------------------------+-----------+----------+---------+----------+--------------+-------------  
 id       | integer                     |           | not null |         | plain    |              |   
 info     | text                        |           |          |         | extended |              |   
 crt_time | timestamp without time zone |           |          |         | plain    |              |   
Indexes:  
    "a_pkey" PRIMARY KEY, btree (id)

查询需求如下，ID字段上有4个条件，任意条件满足即返回结果。

PostgreSQL根据统计信息，基于CBO成本优化，采用了bitmapOr，重复使用了多次单列索引，合并所有条件对应的数据块。最后进行一次recheck得到所要的结果。

bitmapAnd,bitmapOr是PostgreSQL数据库独有的特性，可以在多个查询条件的组合查询中使用多个索引进行数据块的合并和消除扫描，非常赞。

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from a where id <100 or id<200 or id<300 or id between 10 and 100 or id between 100 and 30000;  
                                                                QUERY PLAN                                                                   
-------------------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.a  (cost=39.85..3118.98 rows=2610 width=44) (actual time=5.099..15.355 rows=3154 loops=1)  
   Output: id, info, crt_time  
   Recheck Cond: ((a.id < 100) OR (a.id < 200) OR (a.id < 300) OR ((a.id >= 10) AND (a.id <= 100)) OR ((a.id >= 100) AND (a.id <= 30000)))  
   Heap Blocks: exact=38  
   Buffers: shared hit=16 read=40  
   ->  BitmapOr  (cost=39.85..39.85 rows=2610 width=0) (actual time=5.089..5.089 rows=0 loops=1)  
         Buffers: shared hit=15 read=3  
         ->  Bitmap Index Scan on a_pkey  (cost=0.00..1.70 rows=8 width=0) (actual time=0.003..0.003 rows=1 loops=1)  
               Index Cond: (a.id < 100)  
               Buffers: shared hit=3  
         ->  Bitmap Index Scan on a_pkey  (cost=0.00..1.77 rows=17 width=0) (actual time=0.001..0.001 rows=1 loops=1)  
               Index Cond: (a.id < 200)  
               Buffers: shared hit=3  
         ->  Bitmap Index Scan on a_pkey  (cost=0.00..1.84 rows=26 width=0) (actual time=0.001..0.001 rows=1 loops=1)  
               Index Cond: (a.id < 300)  
               Buffers: shared hit=3  
         ->  Bitmap Index Scan on a_pkey  (cost=0.00..1.72 rows=8 width=0) (actual time=0.004..0.004 rows=0 loops=1)  
               Index Cond: ((a.id >= 10) AND (a.id <= 100))  
               Buffers: shared hit=3  
         ->  Bitmap Index Scan on a_pkey  (cost=0.00..29.55 rows=2551 width=0) (actual time=5.079..5.079 rows=3153 loops=1)  
               Index Cond: ((a.id >= 100) AND (a.id <= 30000))  
               Buffers: shared hit=3 read=3  
 Planning time: 0.251 ms  
 Execution time: 15.699 ms  
(24 rows)

而实际上这4个条件，我们可以在逻辑上合成一个条件。

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from a where id <= 30000;  
                                                      QUERY PLAN                                                        
----------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.a  (cost=23.88..3006.82 rows=2560 width=44) (actual time=0.110..0.568 rows=3154 loops=1)  
   Output: id, info, crt_time  
   Recheck Cond: (a.id <= 30000)  
   Heap Blocks: exact=38  
   Buffers: shared hit=44  
   ->  Bitmap Index Scan on a_pkey  (cost=0.00..23.24 rows=2560 width=0) (actual time=0.102..0.102 rows=3154 loops=1)  
         Index Cond: (a.id <= 30000)  
         Buffers: shared hit=6  
 Planning time: 0.127 ms  
 Execution time: 0.835 ms  
(10 rows)

合成后，走精确索引扫描，性能提升几十倍。

实际上，空间类型的数据，也是一样的，而且合成起来更方便，通过st_union即可。

空间合成优化

select geo_point,* from table where ST_Within(geo_point, polygon_1) or ST_Within(geo_point, polygon_2) or ... ST_Within(geo_point, polygon_n);

优化为

select geo_point,* from table where ST_Within(geo_point, st_union(polygon_1,polygon_2,...polygon_n) );

性能提升也是非常明显的。

参考

http://postgis.net/docs/manual-2.3/ST_Union.html

https://www.postgresql.org/docs/9.6/static/indexes-bitmap-scans.html

PostgreSQL 单列多条件查询优化 - 多个多边形查询4倍提升的技巧

标签

背景

单列组合条件查询

空间合成优化

参考

关系型数据库

热门文章

最新文章

相关产品

相关课程

相关电子书

相关实验场景

推荐镜像