PostgreSQL汉字转拼音-阿里云开发者社区

PostgreSQL汉字转拼音

2017-10-28 4458

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

云原生数据库 PolarDB MySQL 版，Serverless 5000PCU 100GB

云原生数据库 PolarDB 分布式版，标准版 2核8GB

云数据库 RDS MySQL Serverless，0.5-2RCU 50GB

简介：

背景

在有些应用中，可能会有对拼音搜索、拼音首字母搜索、中文搜索共存的需求。在PostgreSQL中如何实现这个需求呢？

关键是函数转拼音和首字母，方法很简单，将映射关系存入数据库。创建一个函数来转换。

《PostgreSQL汉字转拼音或拼音首字母的应用》

映射文件请到以上文章的末尾下载。

方法

1、创建映射表

create table pinyin (hz varchar(1), py varchar(6), zm varchar(1));  
  
create index idx_pinyin_hz on pinyin(hz);  
  
create unique index idx_pinyin_hz_py on pinyin(hz, py);

2、写入一些测试映射，仅供演示用。映射文件请到以上文章的末尾下载。

test=# insert into pinyin values ('你','ni','n');  
INSERT 0 1  
test=# insert into pinyin values ('好','hao','h');  
INSERT 0 1

3、将字符串转换为拼音以及单字的函数

create or replace function get_hzpy(vhz text) returns text[] as $$  
declare  
  res text[];  
  tmp_py text;  
  tmp_zm text;  
begin  
for i in 1..length(vhz)   
loop  
  select py,zm into tmp_py,tmp_zm from pinyin where hz=substring(vhz, i, 1);  
  if not found then  
    res := array_cat(res, array[substring(vhz, i, 1)]);  
  else  
    res := array_cat(res, array[tmp_py, tmp_zm, substring(vhz, i, 1)]);  
  end if;  
end loop;  
return res;  
end;  
$$ language plpgsql strict immutable;

4、测试该函数，输入一个字符串，返回了它的单字和所有单字的拼音、首字母，没有的话只输出单字。

test=# select get_hzpy('你好abx呵呵, ');  
                     get_hzpy                       
--------------------------------------------------  
 {ni,n,你,hao,h,好,a,b,x,","," "}  
(1 row)

测试索引加速单字、拼音、首字母搜索

1、创建一张测试表，包含一个字符串

create table test(id int, info text);

2、创建函数倒排索引

test=# create index idx on test using gin (get_hzpy(info));  
CREATE INDEX

3、写入测试数据

test=# insert into test values (1, '你好abx呵呵, ');  
INSERT 0 1

4、按 "单字、拼音、首字母" 查询测试，使用倒排索引

test=# explain select * from test where get_hzpy(info) @> array['ni'];   -- 包含ni的记录  
                            QUERY PLAN                               
-------------------------------------------------------------------  
 Bitmap Heap Scan on test  (cost=4.10..20.92 rows=26 width=36)  
   Recheck Cond: (get_hzpy(info) @> '{ni}'::text[])  
   ->  Bitmap Index Scan on idx  (cost=0.00..4.09 rows=26 width=0)  
         Index Cond: (get_hzpy(info) @> '{ni}'::text[])  
(4 rows)

5、字典有的，可以查到，字典没有的，只能通过中文查

test=# select * from test where get_hzpy(info) @> array['ni'];  
 id |     info        
----+---------------  
  1 | 你好abx呵呵,   
(1 row)  
  
test=# select * from test where get_hzpy(info) @> array['he'];  
 id | info   
----+------  
(0 rows)

6、补齐字典，注意补齐字典并不影响已有的数据，因为只在数据发生变化时才会重算 get_hzpy(info) 的值，并更新索引。

下面这个例子可以清晰的表明这个意思。

test=# insert into pinyin values ('呵','he','h');  
INSERT 0 1  
  
test=# select * from test where get_hzpy(info) @> array['he'];  
 id | info   
----+------  
(0 rows)  
  
test=# update test set id=2 where id=1;  
UPDATE 1  
  
test=# select * from test where get_hzpy(info) @> array['he'];  
 id | info   
----+------  
(0 rows)

只有更新了info字典，才会更新索引内容。

test=# update test set info='呵呵,xi' where id=2;  
UPDATE 1  
test=# select * from test where get_hzpy(info) @> array['he'];  
 id |  info     
----+---------  
  2 | 呵呵,xi  
(1 row)

7、因此，建议字典越全面越好。将来如果字典更新了，需要重建一下索引，方法如下：

create index CONCURRENTLY idx_test_2 on test using gin(get_hzpy(info));  
drop index idx;

PostgreSQL汉字转拼音

标签

背景

方法

测试索引加速单字、拼音、首字母搜索

关系型数据库

热门文章

最新文章

相关产品

相关课程

相关电子书

相关实验场景

推荐镜像

PostgreSQL汉字转拼音

标签

背景

方法

测试索引加速 单字、拼音、首字母 搜索

关系型数据库

热门文章

最新文章

相关产品

相关课程

相关电子书

相关实验场景

推荐镜像

测试索引加速单字、拼音、首字母搜索