PostgreSQL 10.0 preview 性能增强 - libpq支持pipeline batch模式减少网络交互提升性能

本文涉及的产品
云原生数据库 PolarDB MySQL 版,Serverless 5000PCU 100GB
云原生数据库 PolarDB 分布式版,标准版 2核8GB
云数据库 RDS MySQL Serverless,0.5-2RCU 50GB
简介:

标签

PostgreSQL , 10.0 , libpq , pipeline , batch


背景

PostgreSQL 10.0 libpq支持pipeline batch两种模式,batch模式意味着客户端可以将多个QUERY塞入pipeline,作为一个batch提交给server段,从而减少客户端和服务端的网络交互次数。

在网络环境不太好的环境中,特别是云环境,大幅提升性能。

+   <application>libpq</application> supports queueing up mulitiple queries into  
+   a pipeline to be executed as a batch on the server. Batching queries allows  
+   applications to avoid a client/server round-trip after each query to get  
+   the results before issuing the next query.  

详见

Hi all  

Following on from the foreign table batch inserts thread[1], here's a patch  
to add support for pipelining queries into asynchronous batches in libpq.  

Attached, and also available at  
https://github.com/2ndQuadrant/postgres/tree/dev/libpq-async-batch (subject  
to rebasing and force pushes).  

It's cleaned up over the draft I posted on that thread and has error  
recovery implemented. I've written and included the SGML docs for it. The  
test program is now pretty comprehensive, more so than for anything else in  
libpq anyway. I'll submit it to the next CF as a 9.7/10.0 candidate.  

I'm measuring 300x (not %) performance improvements doing batches on  
servers over the Internet, so this seems pretty worthwhile. It turned out  
to be way less invasive than I expected too.  

(I intentionally didn't add any way for clients to annotate each work-item  
in a batch with their own private data. I think that'd be really useful and  
would make implementing clients easier, but should be a separate patch).  

This should be very useful for optimising FDWs, Postgres-XC, etc.  


[1]  
http://www.postgresql.org/message-id/CAMsr+YFgDUiJ37DEfPRk8WDBuZ58psdAYJd8iNFSaGxtw=wU3g@mail.gmail.com  

--   
 Craig Ringer                   http://www.2ndQuadrant.com/  
 PostgreSQL Development, 24x7 Support, Training & Services  

一些例子

+  <para>  
+   An example of batch use may be found in the source distribution in  
+   <filename>src/test/examples/libpqbatch.c</filename>.  
+  </para>  
+  
+  <sect2>  
+   <title>When to use batching</title>  
+  
+   <para>  
+    Much like asynchronous query mode, there is no performance disadvantage to  
+    using batching and pipelining. It somewhat increased client application  
+    complexity and extra caution is required to prevent client/server network  
+    deadlocks, but can offer considerable performance improvements.  
+   </para>  
+  
+   <para>  
+    Batching is most useful when the server is distant, i.e. network latency  
+    ("ping time") is high, and when many small operations are being performed in  
+    rapid sequence. There is usually less benefit in using batches when each  
+    query takes many multiples of the client/server round-trip time to execute.  
+    A 100-statement operation run on a server 300ms round-trip-time away would take  
+    30 seconds in network latency alone without batching; with batching it may spend  
+    as little as 0.3s waiting for results from the server.  
+   </para>  
+  
+   <para>  
+    Use batches when your application does lots of small  
+    <literal>INSERT</literal>, <literal>UPDATE</literal> and  
+    <literal>DELETE</literal> operations that can't easily be transformed into  
+    operations on sets or into a  
+    <link linkend="libpq-copy"><literal>COPY</literal></link> operation.  
+   </para>  
+  
+   <para>  
+    Batching less useful when information from one operation is required by the  
+    client before it knows enough to send the next operation. The client must  
+    introduce a synchronisation point and wait for a full client/server  
+    round-trip to get the results it needs. However, it's often possible to  
+    adjust the client design to exchange the required information server-side.  
+    Read-modify-write cycles are especially good candidates; for example:  
+    <programlisting>  
+     BEGIN;  
+     SELECT x FROM mytable WHERE id = 42 FOR UPDATE;  
+     -- result: x=2  
+     -- client adds 1 to x:  
+     UPDATE mytable SET x = 3 WHERE id = 42;  
+     COMMIT;  
+    </programlisting>  
+    could be much more efficiently done with:  
+    <programlisting>  
+     UPDATE mytable SET x = x + 1;  
+    </programlisting>  
+   </para>  
+  
+   <note>  
+    <para>  
+     The batch API was introduced in PostgreSQL 9.6, but clients using it can  
+     use batches on server versions 8.4 and newer. Batching works on any server  
+     that supports the v3 extended query protocol.  
+    </para>  
+   </note>  
+  
+  </sect2>  
+  
+  <sect2 id="libpq-batch-using">  
+   <title>Using batch mode</title>  
+  
+   <para>  
+    To issue batches the application must switch  
+    <application>libpq</application> into batch mode. Enter batch mode with <link  
+    linkend="libpq-pqbeginbatchmode"><function>PQbeginBatchMode(conn)</function></link> or test  
+    whether batch mode is active with <link  
+    linkend="libpq-pqisinbatchmode"><function>PQisInBatchMode(conn)</function></link>. In batch mode only <link  
+    linkend="libpq-async">asynchronous operations</link> are permitted, and  
+    <literal>COPY</literal> is not allowed. (The restriction on <literal>COPY</literal> is an implementation  
+    limit; the PostgreSQL protocol and server can support batched <literal>COPY</literal>).  
+   </para>  
+  
+   <para>  
+    The client uses libpq's asynchronous query functions to dispatch work,  
+    marking the end of each batch with <function>PQsendEndBatch</function>.  
+    Concurrently, it uses <function>PQgetResult</function> and  
+    <function>PQgetNextQuery</function> to get results. It may eventually exit  
+    batch mode with <function>PQendBatchMode</function> once all results are  
+    processed.  
+   </para>  
+  
+   <note>  
+    <para>  
+     It is best to use batch mode with <application>libpq</application> in  
+     <link linkend="libpq-pqsetnonblocking">non-blocking mode</link>. If used in  
+     blocking mode it is possible for a client/server deadlock to occur. The  
+     client will block trying to send queries to the server, but the server will  
+     block trying to send results from queries it's already processed to the  
+     client. This only occurs when the client sends enough queries to fill its  
+     output buffer and the server's receive buffer before switching to  
+     processing input from the server, but it's hard to predict exactly when  
+     that'll happen so it's best to always use non-blocking mode.  
+    </para>  
+   </note>  
+  
+   <sect3 id="libpq-batch-sending">  
+    <title>Issuing queries</title>  
+  
+    <para>  
+     After entering batch mode the application dispatches requests  
+     using normal asynchronous <application>libpq</application> functions like  
+     <function>PQsendQueryParams</function>, <function>PQsendPrepare</function>,  
+     etc. The asynchronous requests are followed by a <link  
+     linkend="libpq-pqsendendbatch"><function>PQsendEndBatch(conn)</function></link> call to mark  
+     the end of the batch. The client <emphasis>does not</emphasis> need to call  
+     <function>PQgetResult</function> immediately after dispatching each  
+     operation. <link linkend="libpq-batch-results">Result processing</link>  
+     is handled separately.  
+    </para>  
+      
+    <para>  
+     Batched operations will be executed by the server in the order the client  
+     sends them. The server will send the results in the order the statements  
+     executed. The server usually begins executing the batch before all commands  
+     in the batch are queued and the end of batch command is sent. If any  
+     statement encounters an error the server aborts the current transaction and  
+     skips processing the rest of the batch. Query processing resumes after the  
+     end of the failed batch.  
+    </para>  
+  
+    <para>  
+     It's fine for one operation to depend on the results of a  
+     prior one. One query may define a table that the next query in the same  
+     batch uses; similarly, an application may create a named prepared statement  
+     then execute it with later statements in the same batch.  
+    </para>  
+  
+   </sect3>  
+  
+   <sect3 id="libpq-batch-results">  
+    <title>Processing results</title>  
+  
+    <para>  
+     The client <link linkend="libpq-batch-interleave">interleaves result  
+     processing with sending batch queries</link>, or for small batches may  
+     process all results after sending the whole batch.  
+    </para>  
+  
+    <para>  
+     To get the result of the first batch entry the client must call <link  
+     linkend="libpq-pqgetnextquery"><function>PQgetNextQuery</function></link>. It must then call  
+     <function>PQgetResult</function> and handle the results until  
+     <function>PQgetResult</function> returns null (or would return null if  
+     called). The result from the next batch entry may then be retrieved using  
+     <function>PQgetNextQuery</function> and the cycle repeated.  The  
+     application handles individual statement results as normal.  
+    </para>  
+  
+    <para>  
+     <function>PQgetResult</function> behaves the same as for normal asynchronous  
+     processing except that it may contain the new <type>PGresult</type> types  
+     <literal>PGRES_BATCH_END</literal> and <literal>PGRES_BATCH_ABORTED</literal>.  
+     <literal>PGRES_BATCH_END</literal> is reported exactly once for each  
+     <function>PQsendEndBatch</function> call at the corresponding point in  
+     the result stream and at no other time. <literal>PGRES_BATCH_ABORTED</literal>  
+     is emitted during error handling; see <link linkend="libpq-batch-errors">  
+     error handling</link>.  
+    </para>  
+  
+    <para>  
+     <function>PQisBusy</function>, <function>PQconsumeInput</function>, etc  
+     operate as normal when processing batch results.  
+    </para>  
+  
+    <para>  
+     <application>libpq</application> does not provide any information to the  
+     application about the query currently being processed. The application  
+     must keep track of the order in which it sent queries and the expected  
+     results. Applications will typically use a state machine or a FIFO queue  
+     for this.  
+    </para>  

这个patch的讨论,详见邮件组,本文末尾URL。

PostgreSQL社区的作风非常严谨,一个patch可能在邮件组中讨论几个月甚至几年,根据大家的意见反复的修正,patch合并到master已经非常成熟,所以PostgreSQL的稳定性也是远近闻名的。

参考

https://commitfest.postgresql.org/13/1024/

https://www.postgresql.org/message-id/attachment/44303/0001-Pipelining-batch-support-for-libpq.patch

https://github.com/2ndQuadrant/postgres/tree/dev/libpq-async-batch

https://www.postgresql.org/message-id/flat/CAMsr+YFUjJytRyV4J-16bEoiZyH=4nj+sQ7JP9ajwz=B4dMMZw@mail.gmail.com#CAMsr+YFUjJytRyV4J-16bEoiZyH=4nj+sQ7JP9ajwz=B4dMMZw@mail.gmail.com

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
相关文章
|
4月前
|
存储 SQL 关系型数据库
PolarDB这个sql行存和列存性能差别好大 ,为什么?
PolarDB这个sql行存和列存性能差别好大 ,为什么?
33 0
|
4月前
|
存储 关系型数据库 数据库
postgresql|数据库|提升查询性能的物化视图解析
postgresql|数据库|提升查询性能的物化视图解析
153 0
|
4月前
|
网络协议 Linux 网络架构
Linux三种网络模式 | 仅主机、桥接、NAT
Linux三种网络模式 | 仅主机、桥接、NAT
165 0
|
3月前
|
关系型数据库 MySQL Serverless
阿里云云原生数据库 PolarDB MySQL Serverless:卓越的性能与无与伦比的弹性
阿里云原生数据库 PolarDB MySQL Serverless 拥有卓越性能和无与伦比的弹性。通过实验体验,深入了解其基本管理和配置、智能弹性伸缩特性和全局一致性特性。实验包括主节点和只读节点的弹性压测以及全局一致性测试,旨在亲身体验 PolarDB 的强大性能。通过实验,可以更好地在实际业务场景中应用 PolarDB,并根据需求进行性能优化和调整。
679 2
|
4月前
|
网络协议
虚拟机的三种网络模式
虚拟机的三种网络模式
|
2月前
|
NoSQL 网络协议 Redis
Nomad 系列 -Nomad 网络模式
Nomad 系列 -Nomad 网络模式
|
4月前
|
存储 监控 数据安全/隐私保护
Docker网络模式:深度理解与容器网络配置
Docker 的网络模式是容器化应用中一个关键而复杂的方面。本文将深入讨论 Docker 的网络模式,包括基本概念、常用网络模式以及高级网络配置,并通过更为丰富和实际的示例代码,帮助读者全面掌握如何理解和配置容器网络。
|
8天前
|
存储 安全 测试技术
网络奇谭:虚拟机中的共享、桥接与Host-Only模式解析
网络奇谭:虚拟机中的共享、桥接与Host-Only模式解析
14 0
|
2月前
|
JSON Kubernetes Linux
Docker之网络模式
docker基础 网络模式
51 2
|
3月前
|
存储 关系型数据库 分布式数据库
阿里云PolarDB解决乐麦多源数据存储性能问题
乐麦通过使用PolarDB数据库,使整个系统之间的数据查询分析更加高效
390 3

相关产品

  • 云原生数据库 PolarDB