PostgreSQL GIN 单列聚集索引应用-阿里云开发者社区

PostgreSQL GIN 单列聚集索引应用

2017-02-21 4119

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

云原生数据库 PolarDB MySQL 版，Serverless 5000PCU 100GB

云原生数据库 PolarDB 分布式版，标准版 2核8GB

云数据库 RDS MySQL Serverless，0.5-2RCU 50GB

简介：

背景

聚集存储比较好理解，数据按照聚集KEY存储在一个或相邻的数据块中，对聚集KEY的轨迹、行为数据检索可以大幅减少IO。

那么聚集索引呢？我们知道通常BTREE索引中存储的是KEY以及对应数据的堆表行号，每条记录一个索引条目。

而GIN索引也是树结构，只不过它对于单个KEY只存储一个条目，所有行号会存储到一个posting list或者posting tree中。

因此这样的场景能体现GIN的几个优势

1. 对于带重复KEY的轨迹、行为数据，建立GIN索引的话，可以节约一些空间。

2. 在使用GIN索引扫描单个KEY时，如果要输出所有的轨迹数据，也能获得非常好的查询效率。

我们使用这里同样的例子，来测试一下就知道了

《PostgreSQL 聚集存储与 BRIN索引 - 高并发行为、轨迹类大吞吐数据查询场景解说》

正文

1. 构建离散存储测试数据，创建GIN索引

create unlogged table test(id int, info text, crt_time timestamp);    

insert into test select generate_series(1,10000), md5(id::text), clock_timestamp() from generate_series(1,10000) t(id);    

postgres=# \dt+    
                           List of relations    
 Schema |        Name        | Type  |  Owner   |  Size   | Description     
--------+--------------------+-------+----------+---------+-------------    
 public | test               | table | postgres | 7303 MB |     

set maintenance_work_mem='32GB';  

create index idx_test_id on test using gin (id);  

\di+ idx_test_id  

 public | idx_test_id              | index | postgres | test               | 391 MB     |   

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test where id=1;  
                                                         QUERY PLAN                                                           
----------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.test  (cost=84.79..12541.74 rows=9767 width=45) (actual time=3.808..17.915 rows=10000 loops=1)  
   Output: id, info, crt_time  
   Recheck Cond: (test.id = 1)  
   Heap Blocks: exact=10000  
   Buffers: shared hit=10008  
   ->  Bitmap Index Scan on idx_test_id  (cost=0.00..82.35 rows=9767 width=0) (actual time=1.962..1.962 rows=10000 loops=1)  
         Index Cond: (test.id = 1)  
         Buffers: shared hit=8  
 Planning time: 0.092 ms  
 Execution time: 18.480 ms  
(10 rows)

测试

$ vi test.sql    

\set id random(1,10000)    
select * from test where id=:id;    

$ pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 100000

2. 构建聚集存储测试数据，创建GIN索引

create unlogged table cluster_test_gin (like test);    

insert into cluster_test_gin select * from test order by id;    

set maintenance_work_mem ='32GB';    

create index idx_cluster_test_gin_id on cluster_test_gin using gin (id);    

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from cluster_test_gin where id=1;  
                                                               QUERY PLAN                                                                  
-----------------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.cluster_test_gin  (cost=90.83..13732.45 rows=10714 width=45) (actual time=1.037..2.236 rows=10000 loops=1)  
   Output: id, info, crt_time  
   Recheck Cond: (cluster_test_gin.id = 1)  
   Heap Blocks: exact=94  
   Buffers: shared hit=100  
   ->  Bitmap Index Scan on idx_cluster_test_gin_id  (cost=0.00..88.16 rows=10714 width=0) (actual time=1.010..1.010 rows=10000 loops=1)  
         Index Cond: (cluster_test_gin.id = 1)  
         Buffers: shared hit=6  
 Planning time: 0.092 ms  
 Execution time: 2.791 ms  
(10 rows)

测试

$ vi test.sql    

\set id random(1,10000)    
select * from cluster_test_gin where id=:id;    

$ pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 100000

测试结果

存储格式	按KEY查询轨迹 TPS	输出吞吐	CPU利用率	索引大小	表大小
离散存储 BTREE索引	2184	2184 万行/s	99.8%	2.1 GB	7.3 GB
离散存储 GIN索引	1620	1620 万行/s	99.8%	391 MB	7.3 GB
聚集存储 BTREE索引	4000	4000 万行/s	99.8%	2.1 GB	7.3 GB
聚集存储 GIN索引	3770	3770 万行/s	99.8%	391 MB	7.3 GB
聚集存储 BRIN索引	2255	2255 万行/s	99.8%	232 KB	7.3 GB
行列变换 array	850	850 行/s	99.8%	248 KB	4.5 GB
行列变换 jsonb	1650	1650 行/s	99.8%	248 KB	4.5 GB

参考

《宝剑赠英雄 - 任意组合字段等效查询, 探探PostgreSQL多列展开式B树》

《PostgreSQL GIN索引实现原理》

《PostgreSQL GIN multi-key search 优化》

《PostgreSQL 聚集存储与 BRIN索引 - 高并发行为、轨迹类大吞吐数据查询场景解说》

PostgreSQL GIN 单列聚集索引应用

标签

背景

正文

测试结果

参考

关系型数据库

热门文章

最新文章

相关产品

相关课程

相关电子书

相关实验场景

推荐镜像

PostgreSQL GIN 单列聚集索引 应用

标签

背景

正文

测试结果

参考

关系型数据库

热门文章

最新文章

相关产品

相关课程

相关电子书

相关实验场景

推荐镜像

PostgreSQL GIN 单列聚集索引应用