PostgreSQL vs Oracle checksum 配置与性能

本文涉及的产品
云原生数据库 PolarDB MySQL 版,Serverless 5000PCU 100GB
云原生数据库 PolarDB 分布式版,标准版 2核8GB
云数据库 RDS MySQL Serverless,0.5-2RCU 50GB
简介:

标签

PostgreSQL , Oracle , checksum , IO错误


背景

开启数据库block checksum可以发现磁盘、存储、IO系统的问题引入一些的物理错误。

PostgreSQL checksum

在PostgreSQL中,默认强制对XLOG(WAL)开启了checksum,因此可以保证从redo buffer写入redo file里面的数据是一致的,读取的时候也会根据每个REDO PAGE的checksum检查REDO PAGE的内容是否正确。(换言之没有参数来关闭WAL的checksum)

而对于DATAFILE,是否开启CHECKSUM则取决于initdb时配置的checksum参数。

initdb  
  
  -k, --data-checksums      use data page checksums  

通过pg_controldata检查当前数据库集群是否开启了checksum。

pg_controldata  
  
Data page checksum version:           0  

PostgreSQL 11 允许用户动态的修改checksum的开关,而不是初始化实例时固定:

《PostgreSQL 11 preview - Allow on-line enabling and disabling of data checksums (含pg_verify_checksums工具,离线检查数据文件有误块错误)》

Oracle checksum

Checksums allow Oracle to detect corruption caused by underlying disks, storage systems, or I/O systems.

If this parameter is set to OFF, DBWn calculates checksums only for the SYSTEM tablespace, but not for user tablespaces. In addition, no log checksum is performed when this parameter is set to OFF.

https://docs.oracle.com/cd/B28359_01/server.111/b28320/initparams046.htm#REFRN10030

DB_BLOCK_CHECKSUM determines whether DBWn and the direct loader will calculate a checksum (a number calculated from all the bytes stored in the block) and store it in the cache header of every data block when writing it to disk. Checksums are verified when a block is read - only if this parameter is TYPICAL or FULL and the last write of the block stored a checksum. In FULL mode, Oracle also verifies the checksum before a change application from update/delete statements and recomputes it after the change is applied. In addition, Oracle gives every log block a checksum before writing it to the current log.

Starting with Oracle Database 11g, most of the log block checksum is done by the generating foreground processes, while the LGWR performs the rest of the work, for better CPU and cache efficiency. Prior to Oracle Database 11g, the LGWR solely performed the log block checksum. When this parameter is set to FULL, the LGWR verifies the checksum of each log block generated by the foreground processes before writing it to disk.

If this parameter is set to OFF, DBWn calculates checksums only for the SYSTEM tablespace, but not for user tablespaces. In addition, no log checksum is performed when this parameter is set to OFF.

Checksums allow Oracle to detect corruption caused by underlying disks, storage systems, or I/O systems. If set to FULL, DB_BLOCK_CHECKSUM also catches in-memory corruptions and stops them from making it to the disk. Turning on this feature in TYPICAL mode causes only an additional 1% to 2% overhead. In the FULL mode it causes 4% to 5% overhead. Oracle recommends that you set DB_BLOCK_CHECKSUM to TYPICAL.

For backward compatibility the use of TRUE (implying TYPICAL) and FALSE (implying OFF) values is preserved.

PostgreSQL checksum 行为

1、开启checksum后,PostgreSQL 从shared buffer把数据write出去,需要计算checksum。

2、开启checksum后,从shared buffer外面(disk, os page cache)读取BLOCK到shared buffer里面,需要计算block的checksum,对比存储在page head里头的checksum是否一致。

3、已经在shared buffer里面的block,变更、读取时并不需要计算checksum。

checksum计算逻辑

1、pg_checksum_page

src/include/storage/checksum_impl.h

/*  
 * Compute the checksum for a Postgres page.  The page must be aligned on a  
 * 4-byte boundary.  
 *  
 * The checksum includes the block number (to detect the case where a page is  
 * somehow moved to a different location), the page header (excluding the  
 * checksum itself), and the page data.  
 */  
uint16  
pg_checksum_page(char *page, BlockNumber blkno)  
{  
        PageHeader      phdr = (PageHeader) page;  
        uint16          save_checksum;  
        uint32          checksum;  
  
        /* We only calculate the checksum for properly-initialized pages */  
        Assert(!PageIsNew(page));  
  
        /*  
         * Save pd_checksum and temporarily set it to zero, so that the checksum  
         * calculation isn't affected by the old checksum stored on the page.  
         * Restore it after, because actually updating the checksum is NOT part of  
         * the API of this function.  
         */  
        save_checksum = phdr->pd_checksum;  
        phdr->pd_checksum = 0;  
        checksum = pg_checksum_block(page, BLCKSZ);  
        phdr->pd_checksum = save_checksum;  
  
        /* Mix in the block number to detect transposed pages */  
        checksum ^= blkno;  
  
        /*  
         * Reduce to a uint16 (to fit in the pd_checksum field) with an offset of  
         * one. That avoids checksums of zero, which seems like a good idea.  
         */  
        return (checksum % 65535) + 1;  
}  

2、异步写(bg writer, backend process evict dirty page)时,计算checksum

src/backend/storage/buffer/bufmgr.c

/*  
 * FlushBuffer  
 *              Physically write out a shared buffer.  
 *  
 * NOTE: this actually just passes the buffer contents to the kernel; the  
 * real write to disk won't happen until the kernel feels like it.  This  
 * is okay from our point of view since we can redo the changes from WAL.  
 * However, we will need to force the changes to disk via fsync before  
 * we can checkpoint WAL.  
 *  
 * The caller must hold a pin on the buffer and have share-locked the  
 * buffer contents.  (Note: a share-lock does not prevent updates of  
 * hint bits in the buffer, so the page could change while the write  
 * is in progress, but we assume that that will not invalidate the data  
 * written.)  
 *  
 * If the caller has an smgr reference for the buffer's relation, pass it  
 * as the second parameter.  If not, pass NULL.  
 */  
static void  
FlushBuffer(BufferDesc *buf, SMgrRelation reln)  
  
  
        /*  
         * Update page checksum if desired.  Since we have only shared lock on the  
         * buffer, other processes might be updating hint bits in it, so we must  
         * copy the page to private storage if we do checksumming.  
         */  
        bufToWrite = PageSetChecksumCopy((Page) bufBlock, buf->tag.blockNum);  

3、将数据从shared buffer外围读入shared buffer时,校验checksum

src/backend/storage/page/bufpage.c

/*  
 * PageIsVerified  
 *              Check that the page header and checksum (if any) appear valid.  
 *  
 * This is called when a page has just been read in from disk.  The idea is  
 * to cheaply detect trashed pages before we go nuts following bogus item  
 * pointers, testing invalid transaction identifiers, etc.  
 *  
 * It turns out to be necessary to allow zeroed pages here too.  Even though  
 * this routine is *not* called when deliberately adding a page to a relation,  
 * there are scenarios in which a zeroed page might be found in a table.  
 * (Example: a backend extends a relation, then crashes before it can write  
 * any WAL entry about the new page.  The kernel will already have the  
 * zeroed page in the file, and it will stay that way after restart.)  So we  
 * allow zeroed pages here, and are careful that the page access macros  
 * treat such a page as empty and without free space.  Eventually, VACUUM  
 * will clean up such a page and make it usable.  
 */  
bool  
PageIsVerified(Page page, BlockNumber blkno)  
{  
        PageHeader      p = (PageHeader) page;  
        size_t     *pagebytes;  
        int                     i;  
        bool            checksum_failure = false;  
        bool            header_sane = false;  
        bool            all_zeroes = false;  
        uint16          checksum = 0;  
  
        /*  
         * Don't verify page data unless the page passes basic non-zero test  
         */  
        if (!PageIsNew(page))  
        {  
                if (DataChecksumsEnabled())  
                {  
                        checksum = pg_checksum_page((char *) page, blkno);  
  
                        if (checksum != p->pd_checksum)  
                                checksum_failure = true;  
                }  
  
...................  
  
  
/*  
 * Set checksum for a page in shared buffers.  
 *  
 * If checksums are disabled, or if the page is not initialized, just return  
 * the input.  Otherwise, we must make a copy of the page before calculating  
 * the checksum, to prevent concurrent modifications (e.g. setting hint bits)  
 * from making the final checksum invalid.  It doesn't matter if we include or  
 * exclude hints during the copy, as long as we write a valid page and  
 * associated checksum.  
 *  
 * Returns a pointer to the block-sized data that needs to be written. Uses  
 * statically-allocated memory, so the caller must immediately write the  
 * returned page and not refer to it again.  
 */  
char *  
PageSetChecksumCopy(Page page, BlockNumber blkno)  
{  
        static char *pageCopy = NULL;  
  
        /* If we don't need a checksum, just return the passed-in data */  
        if (PageIsNew(page) || !DataChecksumsEnabled())  
                return (char *) page;  
  
        /*  
         * We allocate the copy space once and use it over on each subsequent  
         * call.  The point of palloc'ing here, rather than having a static char  
         * array, is first to ensure adequate alignment for the checksumming code  
         * and second to avoid wasting space in processes that never call this.  
         */  
        if (pageCopy == NULL)  
                pageCopy = MemoryContextAlloc(TopMemoryContext, BLCKSZ);  
  
        memcpy(pageCopy, (char *) page, BLCKSZ);  
        ((PageHeader) pageCopy)->pd_checksum = pg_checksum_page(pageCopy, blkno);  
        return pageCopy;  
}  
  
/*  
 * Set checksum for a page in private memory.  
 *  
 * This must only be used when we know that no other process can be modifying  
 * the page buffer.  
 */  
void  
PageSetChecksumInplace(Page page, BlockNumber blkno)  
{  
        /* If we don't need a checksum, just return */  
        if (PageIsNew(page) || !DataChecksumsEnabled())  
                return;  
  
        ((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);  
}  

性能测试

对比开启checksum, 关闭checksum的性能。

从代码我们已经了解什么时候会需要计算checksum,所以设计一个这个的CASE,数据大量写出,使得bgwrite writeout dirty page频繁。

用到如下CASE

《HTAP数据库 PostgreSQL 场景与性能测试之 42 - (OLTP+OLAP) unlogged table 不含索引多表批量写入》

测试结果

checksum=0

transaction type: ./test.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 56  
number of threads: 56  
duration: 120 s  
number of transactions actually processed: 885490  
latency average = 7.588 ms  
latency stddev = 10.896 ms  
tps = 7376.390493 (including connections establishing)  
tps = 7377.158206 (excluding connections establishing)  
script statistics:  
 - statement latencies in milliseconds:  
         0.002  \set sid random(1,1024)        
         7.586  select ins_sensor(:sid, 1000);  

checksum=1

transaction type: ./test.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 56  
number of threads: 56  
duration: 120 s  
number of transactions actually processed: 867269  
latency average = 7.748 ms  
latency stddev = 20.287 ms  
tps = 7225.742548 (including connections establishing)  
tps = 7226.431737 (excluding connections establishing)  
script statistics:  
 - statement latencies in milliseconds:  
         0.002  \set sid random(1,1024)        
         7.746  select ins_sensor(:sid, 1000);  

开启checksum时,观察perf,可以看到pg_checksum_page占用了少量overhead,体现在性能差异上也非常的小。

   0.64%  postgres               [.] pg_checksum_page    

小结

在bgwriter, backend process write dirty page时,需要计算checksum。(耗费CPU)

在从shared buffer外面读入page时,需要校验checksum。(耗费CPU)

当shared buffer较少,同时产生脏页较快,bg writer 或者backend process 刷脏页较频繁时,可能会使得计算checksum引入一定的CPU消耗。实测极端写出的情况下,pg_checksum_page引入了0.64%左右的开销。

因此开启checksum,实际引入的开销并不大。

参考

1、https://www.postgresql.org/docs/11/static/runtime-config-preset.html

data_checksums (boolean)  

Reports whether data checksums are enabled for this cluster. See data checksums for more information.

2、https://www.postgresql.org/docs/11/static/runtime-config-developer.html

ignore_checksum_failure (boolean)  

Only has effect if data checksums are enabled.

Detection of a checksum failure during a read normally causes PostgreSQL to report an error, aborting the current transaction. Setting ignore_checksum_failure to on causes the system to ignore the failure (but still report a warning), and continue processing. This behavior may cause crashes, propagate or hide corruption, or other serious problems. However, it may allow you to get past the error and retrieve undamaged tuples that might still be present in the table if the block header is still sane. If the header is corrupt an error will be reported even if this option is enabled. The default setting is off, and it can only be changed by a superuser.

3、https://www.postgresql.org/docs/11/static/pageinspect.html#id-1.11.7.31.4

page_header(page bytea) returns record  

page_header shows fields that are common to all PostgreSQL heap and index pages.

A page image obtained with get_raw_page should be passed as argument. For example:

test=# SELECT * FROM page_header(get_raw_page('pg_class', 0));  
    lsn    | checksum | flags  | lower | upper | special | pagesize | version | prune_xid  
-----------+----------+--------+-------+-------+---------+----------+---------+-----------  
 0/24A1B50 |        0 |      1 |   232 |   368 |    8192 |     8192 |       4 |         0  

The returned columns correspond to the fields in the PageHeaderData struct. See src/include/storage/bufpage.h for details.

The checksum field is the checksum stored in the page, which might be incorrect if the page is somehow corrupted. If data checksums are not enabled for this instance, then the value stored is meaningless.

page_checksum(page bytea, blkno int4) returns smallint  

page_checksum computes the checksum for the page, as if it was located at the given block.

A page image obtained with get_raw_page should be passed as argument. For example:

test=# SELECT page_checksum(get_raw_page('pg_class', 0), 0);  
 page_checksum  
---------------  
         13443  

Note that the checksum depends on the block number, so matching block numbers should be passed (except when doing esoteric debugging).

The checksum computed with this function can be compared with the checksum result field of the function page_header. If data checksums are enabled for this instance, then the two values should be equal.

4、https://www.postgresql.org/docs/11/static/pgverifychecksums.html

pg_verify_checksums  
  
pg_verify_checksums — verify data checksums in an offline PostgreSQL database cluster  
  
Synopsis  
pg_verify_checksums [option] [[-D] datadir]  
  
Description  
pg_verify_checksums verifies data checksums in a PostgreSQL cluster.  

5、https://docs.oracle.com/cd/B28359_01/server.111/b28320/initparams046.htm#REFRN10030

DB_BLOCK_CHECKSUM determines whether DBWn and the direct loader will calculate a checksum (a number calculated from all the bytes stored in the block) and store it in the cache header of every data block when writing it to disk. Checksums are verified when a block is read - only if this parameter is TYPICAL or FULL and the last write of the block stored a checksum. In FULL mode, Oracle also verifies the checksum before a change application from update/delete statements and recomputes it after the change is applied. In addition, Oracle gives every log block a checksum before writing it to the current log.

Starting with Oracle Database 11g, most of the log block checksum is done by the generating foreground processes, while the LGWR performs the rest of the work, for better CPU and cache efficiency. Prior to Oracle Database 11g, the LGWR solely performed the log block checksum. When this parameter is set to FULL, the LGWR verifies the checksum of each log block generated by the foreground processes before writing it to disk.

If this parameter is set to OFF, DBWn calculates checksums only for the SYSTEM tablespace, but not for user tablespaces. In addition, no log checksum is performed when this parameter is set to OFF.

Checksums allow Oracle to detect corruption caused by underlying disks, storage systems, or I/O systems. If set to FULL, DB_BLOCK_CHECKSUM also catches in-memory corruptions and stops them from making it to the disk. Turning on this feature in TYPICAL mode causes only an additional 1% to 2% overhead. In the FULL mode it causes 4% to 5% overhead. Oracle recommends that you set DB_BLOCK_CHECKSUM to TYPICAL.

For backward compatibility the use of TRUE (implying TYPICAL) and FALSE (implying OFF) values is preserved.

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
相关文章
|
1月前
|
DataWorks Java 关系型数据库
DataWorks常见问题之将预警信息发送至邮箱
DataWorks是阿里云提供的一站式大数据开发与管理平台,支持数据集成、数据开发、数据治理等功能;在本汇总中,我们梳理了DataWorks产品在使用过程中经常遇到的问题及解答,以助用户在数据处理和分析工作中提高效率,降低难度。
34 1
|
29天前
struts+hibernate+oracle+easyui实现lazyout组件的简单案例——Dept实体类和对应的配置信息
struts+hibernate+oracle+easyui实现lazyout组件的简单案例——Dept实体类和对应的配置信息
13 1
|
4月前
|
存储 SQL 关系型数据库
PolarDB这个sql行存和列存性能差别好大 ,为什么?
PolarDB这个sql行存和列存性能差别好大 ,为什么?
33 0
|
4月前
|
存储 关系型数据库 数据库
postgresql|数据库|提升查询性能的物化视图解析
postgresql|数据库|提升查询性能的物化视图解析
153 0
|
1月前
|
关系型数据库 分布式数据库 数据库
PolarDB PostgreSQL版:Oracle兼容的高性能数据库
PolarDB PostgreSQL版是一款高性能的数据库,具有与Oracle兼容的特性。它采用了分布式架构,可以轻松处理大量的数据,同时还支持多种数据类型和函数,具有高可用性和可扩展性。它还提供了丰富的管理工具和性能优化功能,为企业提供了可靠的数据存储和处理解决方案。PolarDB PostgreSQL版在数据库领域具有很高的竞争力,可以满足各种企业的需求。
|
3月前
|
关系型数据库 MySQL Serverless
阿里云云原生数据库 PolarDB MySQL Serverless:卓越的性能与无与伦比的弹性
阿里云原生数据库 PolarDB MySQL Serverless 拥有卓越性能和无与伦比的弹性。通过实验体验,深入了解其基本管理和配置、智能弹性伸缩特性和全局一致性特性。实验包括主节点和只读节点的弹性压测以及全局一致性测试,旨在亲身体验 PolarDB 的强大性能。通过实验,可以更好地在实际业务场景中应用 PolarDB,并根据需求进行性能优化和调整。
679 2
|
29天前
|
JavaScript Java 数据库连接
struts+hibernate+oracle+easyui实现lazyout组件的简单案例——Emp实体类和对应的配置信息
struts+hibernate+oracle+easyui实现lazyout组件的简单案例——Emp实体类和对应的配置信息
23 0
|
3月前
|
关系型数据库 网络安全 数据安全/隐私保护
你会开启Postgresql 的SSL单向认证 配置?
你会开启Postgresql 的SSL单向认证 配置?
你会开启Postgresql 的SSL单向认证 配置?
|
3月前
|
存储 关系型数据库 分布式数据库
阿里云PolarDB解决乐麦多源数据存储性能问题
乐麦通过使用PolarDB数据库,使整个系统之间的数据查询分析更加高效
390 3
|
3月前
|
关系型数据库 数据挖掘 分布式数据库
报名预约|体验PolarDB澎湃性能与高性价比在线直播
「飞天技术沙龙数据库技术周」直播聚焦PolarDB产品体验

相关产品

  • 云原生数据库 PolarDB