Internet of Vehicles – Window Querying with Alibaba Cloud RDS for PostgreSQL

本文涉及的产品
云原生数据库 PolarDB MySQL 版,Serverless 5000PCU 100GB
云原生数据库 PolarDB 分布式版,标准版 2核8GB
简介: Internet of Vehicles (IoV) is a popular topic of research in the field of IoT. One of the biggest issues facing IoV is collecting vehicle's travel tracks in real time.

Background

Internet of Vehicles (IoV) is one of the hottest topic of research in the field of Internet of Things (IoT). A typical scenario for Internet of Vehicles applications is collecting vehicle's travel tracks, but tracks of vehicles are usually not reported in real time. Several track records may be accumulated or reported at intervals.

A typical data structure is as follows:

(car_id, pos geometry, crt_time timestamp)  

Heavy traffic and traffic signals often occur during the vehicle traveling process. The reported track records may be as follows:

1, position 1, '2017-01-01 12:00:00'  
1, position 1, '2017-01-01 12:00:05 PM'  
1, position 1, '2017-01-01 12:00:10 PM'  
1, position 1, '2017-01-01 12:00:15 PM'  
1, position 1, '2017-01-01 12:00:20 PM'  
1, position 2, '2017-01-01 12:00:30 PM'  

That is, multiple records in the same position may be uploaded due to heavy traffic or traffic signals.

Therefore, there is a requirement to clean unnecessary wait records from the database. We keep at most two records for a given point, indicating arrival and departure.

This operation can be executed via window function.

Surely, in terms of providing the best efficiency, it is more reasonable to clean tracks from a terminal. Only two records will be kept for the start point of a position.

Implementation Example

1.Design a table structure

create table car_trace (cid int, pos point, crt_time timestamp);  

2.Generate 10 million pieces of test data and assume there are 1,000 vehicles (in order to make it easy to repeat the data and test the effect, 25 points are used at the position).

insert into car_trace select random()*999, point((random()*5)::int, (random()*5)::int), clock_timestamp() from generate_series(1,10000000);  

3.Create an index

create index idx_car on car_trace (cid, crt_time);  

4.Query a data layout

select * from car_trace where cid=1 order by crt_time limit 1000;  

  
   1 | (3,1) | 2017-07-22 21:30:09.84984  
   1 | (1,4) | 2017-07-22 21:30:09.850297  
   1 | (1,4) | 2017-07-22 21:30:09.852586  
   1 | (1,4) | 2017-07-22 21:30:09.854155  
   1 | (1,4) | 2017-07-22 21:30:09.854425  
   1 | (3,1) | 2017-07-22 21:30:09.854493  

As shown on the list, several pieces of data are repetitive.

5.Filter the records at a single position via the window. At most two records for arrival and departure from the position will be kept.

Two window functions are used here: lead and lag. Lag indicates the previous record of the current record and Lead indicates the next record of the current record.

The method of determining arrival and departure points is as follows:

•The current position is not equal to the previous position, indicating that the record is the current position's arrival point.

•The current position is not equal to the next position, indicating that the record is the current position's leaving point.

•The previous position is empty, indicating that the current record is the first record.

•The next position is empty, indicating that the current record is the last record.

select * from   
(  
select   
  *,   
  lag(pos) over (partition by cid order by crt_time) as lag,   
  lead(pos) over (partition by cid order by crt_time) as lead   
from car_trace   
  where cid=1   
  and crt_time between '2017-07-22 21:30:09.83994' and '2017-07-22 21:30:09.859735'  
) t  
  where pos <> lag  
  or pos <> lead  
  or lag is null  
  or lead is null;  
  
 cid |  pos  |          crt_time          |  lag  | lead    
-----+-------+----------------------------+-------+-------  
   1 | (2,1) | 2017-07-22 21:30:09.83994  |       | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.839953 | (2,1) | (5,2)  
   1 | (5,2) | 2017-07-22 21:30:09.840704 | (3,1) | (4,4)  
   1 | (4,4) | 2017-07-22 21:30:09.84179  | (5,2) | (5,2)  
   1 | (5,2) | 2017-07-22 21:30:09.843787 | (4,4) | (1,5)  
   1 | (1,5) | 2017-07-22 21:30:09.844165 | (5,2) | (0,5)  
   1 | (0,5) | 2017-07-22 21:30:09.84536  | (1,5) | (4,1)  
   1 | (4,1) | 2017-07-22 21:30:09.845896 | (0,5) | (3,3)  
   1 | (3,3) | 2017-07-22 21:30:09.846958 | (4,1) | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.84984  | (3,3) | (1,4)  
   1 | (1,4) | 2017-07-22 21:30:09.850297 | (3,1) | (1,4)  
   1 | (1,4) | 2017-07-22 21:30:09.854425 | (1,4) | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.854493 | (1,4) | (3,2)  
   1 | (3,2) | 2017-07-22 21:30:09.854541 | (3,1) | (2,0)  
   1 | (2,0) | 2017-07-22 21:30:09.855297 | (3,2) | (4,1)  
   1 | (4,1) | 2017-07-22 21:30:09.857592 | (2,0) | (4,1)  
   1 | (4,1) | 2017-07-22 21:30:09.857595 | (4,1) | (0,4)  
   1 | (0,4) | 2017-07-22 21:30:09.857597 | (4,1) | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.858996 | (0,4) | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.859735 | (3,1) |   
(20 rows)  

When track cleaning is not performed, the obtained results are as follows:

select   
  *,   
  lag(pos) over (partition by cid order by crt_time) as lag,   
  lead(pos) over (partition by cid order by crt_time) as lead   
from car_trace   
  where cid=1   
  and crt_time between '2017-07-22 21:30:09.83994' and '2017-07-22 21:30:09.859735';  
  
 cid |  pos  |          crt_time          |  lag  | lead    
-----+-------+----------------------------+-------+-------  
   1 | (2,1) | 2017-07-22 21:30:09.83994  |       | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.839953 | (2,1) | (5,2)  
   1 | (5,2) | 2017-07-22 21:30:09.840704 | (3,1) | (4,4)  
   1 | (4,4) | 2017-07-22 21:30:09.84179  | (5,2) | (5,2)  
   1 | (5,2) | 2017-07-22 21:30:09.843787 | (4,4) | (1,5)  
   1 | (1,5) | 2017-07-22 21:30:09.844165 | (5,2) | (0,5)  
   1 | (0,5) | 2017-07-22 21:30:09.84536  | (1,5) | (4,1)  
   1 | (4,1) | 2017-07-22 21:30:09.845896 | (0,5) | (3,3)  
   1 | (3,3) | 2017-07-22 21:30:09.846958 | (4,1) | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.84984  | (3,3) | (1,4)  
   1 | (1,4) | 2017-07-22 21:30:09.850297 | (3,1) | (1,4)  
   1 | (1,4) | 2017-07-22 21:30:09.852586 | (1,4) | (1,4)  
   1 | (1,4) | 2017-07-22 21:30:09.854155 | (1,4) | (1,4)  
   1 | (1,4) | 2017-07-22 21:30:09.854425 | (1,4) | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.854493 | (1,4) | (3,2)  
   1 | (3,2) | 2017-07-22 21:30:09.854541 | (3,1) | (2,0)  
   1 | (2,0) | 2017-07-22 21:30:09.855297 | (3,2) | (4,1)  
   1 | (4,1) | 2017-07-22 21:30:09.857592 | (2,0) | (4,1)  
   1 | (4,1) | 2017-07-22 21:30:09.857595 | (4,1) | (0,4)  
   1 | (0,4) | 2017-07-22 21:30:09.857597 | (4,1) | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.858996 | (0,4) | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.859735 | (3,1) |   
(22 rows)  

Lag and lead are used to clear records in the stay process.

Database Optimization for IoV

In a typical scenario, a lot of vehicle IDs are involved in the business. Data gathered for different vehicles is usually written into a database. If no optimization is made, after entering the database, the data of different vehicles may be staggered. That is, the data of different vehicles may be stored in one data block.

A lot of data blocks will be scanned (scanning IO amplification) when the track of a single vehicle is queried. There are two optimization methods to speed up the querying process.

1.Write into the database after the business end gathers grouping and sorting.

For example, after receiving the data submitted by the vehicle terminal, the program groups vehicle IDs, sorts them by time, and writes them into the database (using insert into tbl values (),(),...();). In this way, the data of the same vehicle will fall into the same data block as much as is possible.

2.Use a partition to reorganize the data in the database.

For example, we can store data based on the vehicle ID, every vehicle, or vehicle HASH partition.

The two methods both relate to reorganizing the data based on query requirements to achieve the purpose of decreasing scanning IO.

相关实践学习
Polardb-x 弹性伸缩实验
本实验主要介绍如何对PolarDB-X进行手动收缩扩容,了解PolarDB-X 中各个节点的含义,以及如何对不同配置的PolarDB-x 进行压测。
目录
相关文章
|
6月前
|
SQL 关系型数据库 MySQL
将MySQL 数据迁移到 PostgreSQL
将MySQL 数据迁移到 PostgreSQL 可以采用以下步骤: 安装 PostgreSQL 数据库:首先,需要安装 PostgreSQL 数据库。可以从官方网站(https://www.postgresql.org/)下载最新版本的 PostgreSQL,并根据官方指南进行安装。 创建 PostgreSQL 数据库:在 PostgreSQL 中创建与 MySQL 数据库相对应的数据库。可以使用 pgAdmin 或命令行工具(如 psql)来创建数据库。例如,如果在 MySQL 中有一个名为 "mydb" 的数据库,那么可以在 PostgreSQL 中创建一个具有相同名称的数据库。 导
815 0
|
4月前
|
SQL 关系型数据库 MySQL
postgresql|数据库|MySQL数据库向postgresql数据库迁移的工具pgloader的部署和初步使用
postgresql|数据库|MySQL数据库向postgresql数据库迁移的工具pgloader的部署和初步使用
118 0
|
1月前
|
存储 关系型数据库 MySQL
TiDB与MySQL、PostgreSQL等数据库的比较分析
【2月更文挑战第25天】本文将对TiDB、MySQL和PostgreSQL等数据库进行详细的比较分析,探讨它们各自的优势和劣势。TiDB作为一款分布式关系型数据库,在扩展性、并发性能等方面表现突出;MySQL以其易用性和成熟性受到广泛应用;PostgreSQL则在数据完整性、扩展性等方面具有优势。通过对比这些数据库的特点和适用场景,帮助企业更好地选择适合自己业务需求的数据库系统。
|
6月前
|
存储 NoSQL 关系型数据库
深入探索地理空间查询:如何优雅地在MySQL、PostgreSQL及Redis中实现精准的地理数据存储与检索技巧
深入探索地理空间查询:如何优雅地在MySQL、PostgreSQL及Redis中实现精准的地理数据存储与检索技巧
669 0
|
3月前
|
canal 缓存 SpringCloudAlibaba
Springcloud Alibaba 使用Canal将MySql数据实时同步到Elasticsearch
本篇文章在Springcloud Alibaba使用Canal将Mysql数据实时同步到Redis保证缓存的一致性-CSDN博客 基础上使用canal将mysql数据实时同步到Elasticsearch。
|
3月前
|
canal 缓存 关系型数据库
Springcloud Alibaba使用Canal将Mysql数据实时同步到Redis保证缓存的一致性
canal [kə'næl] ,译意为水道/管道/沟渠,主要用途是基于 MySQL 数据库增量日志解析,提供增量数据订阅和消费。其诞生的背景是早期阿里巴巴因为杭州和美国双机房部署,存在跨机房同步的业务需求,实现方式主要是基于业务 trigger 获取增量变更。从 2010 年开始,业务逐步尝试数据库日志解析获取增量变更进行同步,由此衍生出了大量的数据库增量订阅和消费业务。
|
3月前
|
关系型数据库 MySQL 数据处理
MySQL vs. PostgreSQL:选择适合你的开源数据库
在当今信息时代,开源数据库成为许多企业和开发者的首选。本文将比较两个主流的开源数据库——MySQL和PostgreSQL,分析它们的特点、优势和适用场景,以帮助读者做出明智的选择。
|
4月前
|
SQL 关系型数据库 MySQL
MySQL【实践 02】MySQL迁移到PostgreSQL数据库的语法调整说明及脚本分享(通过bat命令修改mapper文件内的SQL语法)
MySQL【实践 02】MySQL迁移到PostgreSQL数据库的语法调整说明及脚本分享(通过bat命令修改mapper文件内的SQL语法)
113 0
|
4月前
|
安全 关系型数据库 数据库
上新|阿里云RDS PostgreSQL支持PG 16版本,AliPG提供丰富自研能力
AliPG在社区版16.0的基础上,在安全、成本、可运维性等多个方面做了提升,丰富的内核/插件特性支持,满足业务场景的需求
|
5月前
|
SQL 关系型数据库 分布式数据库
阿里云PolarDB是一款兼容MySQL、PostgreSQL和SQL Server等多种数据库协议的产品
阿里云PolarDB是一款兼容MySQL、PostgreSQL和SQL Server等多种数据库协议的产品
647 6