阿里云HybridDB for PG实践 - 列存储加字段和默认值-阿里云开发者社区

阿里云HybridDB for PG实践 - 列存储加字段和默认值

2017-08-20 2581

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

云原生数据库 PolarDB MySQL 版，Serverless 5000PCU 100GB

云原生数据库 PolarDB 分布式版，标准版 2核8GB

云数据库 RDS MySQL Serverless，0.5-2RCU 50GB

简介：

背景

Greenplum的Append only table支持更新、删除。通过什么支持呢？bitmap文件，标记被删除的行。

因此在更新，删除后，数据可能膨胀。

另一方面，列存储每列一个文件，同一行通过偏移对应起来。例如INT8的两个字段，通过偏移很快能找到某一行的A列对应的B列。

接下来谈谈加字段，在加字段时，AO表示不会REWRITE TABLE的。如果AO表以及有一些垃圾（被删除的数据）记录后，加的字段会使用相对偏移吗（换句话说，需不需要填充已删除的记录），答案是要填充。

例子

1、创建3张AO列存表。

postgres=# create table tbl1 (id int, info text) with (appendonly=true, blocksize=8192, compresstype=none, orientation=column);  
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.  
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.  
CREATE TABLE  
  
postgres=# create table tbl2 (id int, info text) with (appendonly=true, blocksize=8192, compresstype=none, orientation=column);  
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.  
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.  
CREATE TABLE  
  
postgres=# create table tbl3 (id int, info text) with (appendonly=true, blocksize=8192, compresstype=none, orientation=column);  
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.  
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.  
CREATE TABLE

2、前两张分别插入1000万记录，最后一张插入2000万记录。

postgres=# insertinto tbl1 select generate_series(1,10000000),'test';  
INSERT 0 10000000  
postgres=# insert into tbl2 select generate_series(1,10000000),'test';  
INSERT 0 10000000  
postgres=# insert into tbl3 select generate_series(1,20000000),'test';  
INSERT 0 20000000

3、分析表，并记录它们的大小

postgres=# analyze tbl1;  
ANALYZE  
postgres=# analyze tbl2;  
ANALYZE  
postgres=# analyze tbl3;  
ANALYZE  
  
postgres=# select pg_size_pretty(pg_relation_size('tbl1'));  
 pg_size_pretty   
----------------  
 88 MB  
(1 row)  
postgres=# select pg_size_pretty(pg_relation_size('tbl2'));  
 pg_size_pretty   
----------------  
 88 MB  
(1 row)  
postgres=# select pg_size_pretty(pg_relation_size('tbl3'));  
 pg_size_pretty   
----------------  
 173 MB  
(1 row)

4、更新第一张表，全表更新。并记录更新后的大小，翻了一倍。

postgres=# update tbl1 set info='test';  
UPDATE 10000000  
postgres=# analyze tbl1;  
ANALYZE  
postgres=# select pg_size_pretty(pg_relation_size('tbl1'));  
 pg_size_pretty   
----------------  
 173 MB  
(1 row)

5、对三个表添加字段，设置默认值。

postgres=# alter table tbl1 add column c1 int8 default 1;  
ALTER TABLE  
postgres=# alter table tbl2 add column c1 int8 default 1;  
ALTER TABLE  
postgres=# alter table tbl3 add column c1 int8 default 1;  
ALTER TABLE

6、分析表，查看表的大小。

postgres=# analyze tbl1;  
ANALYZE  
postgres=# analyze tbl2;  
ANALYZE  
postgres=# analyze tbl3;  
ANALYZE  
  
postgres=# select pg_size_pretty(pg_relation_size('tbl1'));  
 pg_size_pretty   
----------------  
 325 MB  
(1 row)  
  
postgres=# select pg_size_pretty(pg_relation_size('tbl2'));  
 pg_size_pretty   
----------------  
 163 MB  
(1 row)  
  
postgres=# select pg_size_pretty(pg_relation_size('tbl3'));  
 pg_size_pretty   
----------------  
 325 MB  
(1 row)

很显然，AO表在添加字段时，以已有文件的记录数为准（即使全部都删除了，也需要在新增字段上初始化这个值），。

如果有相对偏移量的概念，至少tbl2可以忽略前1000万行的偏移。

堆表不存在这个问题，因为堆表加字段会REWRITE全表。

表膨胀的处理

《如何检测、清理Greenplum垃圾 - 阿里云HybridDB for PG最佳实践》

阿里云HybridDB for PG实践 - 列存储加字段和默认值

标签

背景

例子

表膨胀的处理

关系型数据库

热门文章

最新文章

相关课程

相关电子书

相关实验场景