cassandra 集合类型及底层存储格式介绍

本文涉及的产品
云原生多模数据库 Lindorm,多引擎 多规格 0-4节点
云数据库 Redis 版,社区版 2GB
推荐场景:
搭建游戏排行榜
云数据库 MongoDB,通用型 2核4GB
简介: cassandra的集合类型toturist 创建带有复杂cell的user表 CREATE TABLE ks.user ( id int PRIMARY KEY, addr map, complex map

cassandra的集合类型tourist

创建带有复杂cell的user表

CREATE TABLE ks.user (
    id int PRIMARY KEY,
    addr map<text, frozen<set<text>>>,
    complex map<text, frozen<map<text, text>>>,
    listcolumn list<text>,
    setcolumn set<text>
)

插入一些数据后,查询数据如下

cassandra@cqlsh:ks> select * from user;

 id | addr                                           | complex                          | listcolumn      | setcolumn
----+------------------------------------------------+----------------------------------+-----------------+-----------------
  1 | {'bj': {'ba', 'bb'}, 'shanghai': {'sa', 'sb'}} | {'bj': {'ka': 'va', 'kb': 'vb'}} | ['a', 'b', 'c'] | {'a', 'b', 'c'}

执行bin/nodetool flush,生成sst

查看sst,文本输出

tools/bin/sstabledump /data/data2/ks/user-a92ce790a8ff11e99a3d8963a5d3f9b4/md-17-big-Data.db
[
{
  "partition" : {
    "key" : [ "1" ],
    "position" : 0
  },
  "rows" : [
    {
      "type" : "row",
      "position" : 233,
      "liveness_info" : { "tstamp" : "2019-07-18T02:43:36.011497Z" },
      "cells" : [
        { "name" : "addr", "deletion_info" : { "marked_deleted" : "2019-07-18T02:43:36.011496Z", "local_delete_time" : "2019-07-18T02:43:36Z" } },
        { "name" : "addr", "path" : [ "bj" ], "value" : ["ba", "bb"] },
        { "name" : "addr", "path" : [ "shanghai" ], "value" : ["sa", "sb"] },
        { "name" : "complex", "deletion_info" : { "marked_deleted" : "2019-07-18T02:47:55.888562Z", "local_delete_time" : "2019-07-18T02:47:55Z" } },
        { "name" : "complex", "path" : [ "bj" ], "value" : {"ka": "va", "kb": "vb"}, "tstamp" : "2019-07-18T02:47:55.888563Z" },
        { "name" : "listcolumn", "deletion_info" : { "marked_deleted" : "2019-07-18T02:56:09.386468Z", "local_delete_time" : "2019-07-18T02:56:09Z" } },
        { "name" : "listcolumn", "path" : [ "982493f0-a907-11e9-9a3d-8963a5d3f9b4" ], "value" : "a", "tstamp" : "2019-07-18T02:56:09.386469Z" },
        { "name" : "listcolumn", "path" : [ "982493f1-a907-11e9-9a3d-8963a5d3f9b4" ], "value" : "b", "tstamp" : "2019-07-18T02:56:09.386469Z" },
        { "name" : "listcolumn", "path" : [ "982493f2-a907-11e9-9a3d-8963a5d3f9b4" ], "value" : "c", "tstamp" : "2019-07-18T02:56:09.386469Z" },
        { "name" : "setcolumn", "deletion_info" : { "marked_deleted" : "2019-07-18T02:55:03.280578Z", "local_delete_time" : "2019-07-18T02:55:03Z" } },
        { "name" : "setcolumn", "path" : [ "a" ], "value" : "", "tstamp" : "2019-07-18T02:55:03.280579Z" },
        { "name" : "setcolumn", "path" : [ "b" ], "value" : "", "tstamp" : "2019-07-18T02:55:03.280579Z" },
        { "name" : "setcolumn", "path" : [ "c" ], "value" : "", "tstamp" : "2019-07-18T02:55:03.280579Z" }
      ]
    }
  ]
}

底层的集合是通过cellName+path唯一标记一个元素的。
重点看下addr及complex列, 这两列是嵌套map
{ "name" : "complex", "path" : [ "bj" ], "value" : {"ka": "va", "kb": "vb"}, "tstamp" : "2019-07-18T02:47:55.888563Z" },
但对于子map frozen<map<text, text>> 基本上是当做blob存储的,不能操作map中的子元素,这也是frozen语义。

c*是无主架构,可以多node并发写同一个集合,那如何解决冲突?答案是底层最小存储单元并不是cell,而是cell+path唯一标记的element,依赖于cell&path做单元合并的,以cell timestamp最新作为最终值
删除setcolumn中一个元素

cassandra@cqlsh:ks> update user set setcolumn = setcolumn - {'a'} where id =1;

flush后,查看刚生成的sstable, setcolumn.a写入了一个delete_info

[root@Cassandra8c32GTest005 cassandra]# tools/bin/sstabledump /data/data2/ks/user-a92ce790a8ff11e99a3d8963a5d3f9b4/md-19-big-Data.db
[
{
  "partition" : {
    "key" : [ "1" ],
    "position" : 0
  },
  "rows" : [
    {
      "type" : "row",
      "position" : 27,
      "cells" : [
        { "name" : "setcolumn", "path" : [ "a" ], "deletion_info" : { "local_delete_time" : "2019-07-18T10:03:17Z" },
          "tstamp" : "2019-07-18T10:03:17.038519Z"
        }
      ]
    }
  ]
}
]

做下手工merge,可以发现setcolumn.a value没了,写入了delete_info

bin/nodetool compact
tools/bin/sstabledump /data/data2/ks/user-a92ce790a8ff11e99a3d8963a5d3f9b4/md-22-big-Data.db
[
  {
    "partition" : {
      "key" : [ "1" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 236,
        "liveness_info" : { "tstamp" : "2019-07-18T02:43:36.011497Z" },
        "cells" : [
          { "name" : "addr", "deletion_info" : { "marked_deleted" : "2019-07-18T02:43:36.011496Z", "local_delete_time" : "2019-07-18T02:43:36Z" } },
          { "name" : "addr", "path" : [ "bj" ], "value" : ["ba", "bb"] },
          { "name" : "addr", "path" : [ "shanghai" ], "value" : ["sa", "sb"] },
          { "name" : "complex", "deletion_info" : { "marked_deleted" : "2019-07-18T02:47:55.888562Z", "local_delete_time" : "2019-07-18T02:47:55Z" } },
          { "name" : "complex", "path" : [ "bj" ], "value" : {"ka": "va", "kb": "vb"}, "tstamp" : "2019-07-18T02:47:55.888563Z" },
          { "name" : "listcolumn", "deletion_info" : { "marked_deleted" : "2019-07-18T02:56:09.386468Z", "local_delete_time" : "2019-07-18T02:56:09Z" } },
          { "name" : "listcolumn", "path" : [ "982493f0-a907-11e9-9a3d-8963a5d3f9b4" ], "value" : "a", "tstamp" : "2019-07-18T02:56:09.386469Z" },
          { "name" : "listcolumn", "path" : [ "982493f1-a907-11e9-9a3d-8963a5d3f9b4" ], "value" : "b", "tstamp" : "2019-07-18T02:56:09.386469Z" },
          { "name" : "listcolumn", "path" : [ "982493f2-a907-11e9-9a3d-8963a5d3f9b4" ], "value" : "c", "tstamp" : "2019-07-18T02:56:09.386469Z" },
          { "name" : "setcolumn", "deletion_info" : { "marked_deleted" : "2019-07-18T02:55:03.280578Z", "local_delete_time" : "2019-07-18T02:55:03Z" } },
          { "name" : "setcolumn", "path" : [ "a" ], "deletion_info" : { "local_delete_time" : "2019-07-18T10:03:17Z" },
            "tstamp" : "2019-07-18T10:03:17.038519Z"
          },
          { "name" : "setcolumn", "path" : [ "b" ], "value" : "", "tstamp" : "2019-07-18T02:55:03.280579Z" },
          { "name" : "setcolumn", "path" : [ "c" ], "value" : "", "tstamp" : "2019-07-18T02:55:03.280579Z" }
        ]
      }
    ]
  }

试试删除setcolumn整列

update user set setcolumn = null where id =1;

刷sst,执行nodetool compact, 使用dump工具查看,setcolumn之前的子元素全部消失了。

tools/bin/sstabledump  /data/data2/ks/user-a92ce790a8ff11e99a3d8963a5d3f9b4/md-27-big-Data.db
[
  {
    "partition" : {
      "key" : [ "1" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 210,
        "liveness_info" : { "tstamp" : "2019-07-18T02:43:36.011497Z" },
        "cells" : [
          { "name" : "addr", "deletion_info" : { "marked_deleted" : "2019-07-18T02:43:36.011496Z", "local_delete_time" : "2019-07-18T02:43:36Z" } },
          { "name" : "addr", "path" : [ "bj" ], "value" : ["ba", "bb"] },
          { "name" : "addr", "path" : [ "shanghai" ], "value" : ["sa", "sb"] },
          { "name" : "complex", "deletion_info" : { "marked_deleted" : "2019-07-18T02:47:55.888562Z", "local_delete_time" : "2019-07-18T02:47:55Z" } },
          { "name" : "complex", "path" : [ "bj" ], "value" : {"ka": "va", "kb": "vb"}, "tstamp" : "2019-07-18T02:47:55.888563Z" },
          { "name" : "listcolumn", "deletion_info" : { "marked_deleted" : "2019-07-18T02:56:09.386468Z", "local_delete_time" : "2019-07-18T02:56:09Z" } },
          { "name" : "listcolumn", "path" : [ "982493f0-a907-11e9-9a3d-8963a5d3f9b4" ], "value" : "a", "tstamp" : "2019-07-18T02:56:09.386469Z" },
          { "name" : "listcolumn", "path" : [ "982493f1-a907-11e9-9a3d-8963a5d3f9b4" ], "value" : "b", "tstamp" : "2019-07-18T02:56:09.386469Z" },
          { "name" : "listcolumn", "path" : [ "982493f2-a907-11e9-9a3d-8963a5d3f9b4" ], "value" : "c", "tstamp" : "2019-07-18T02:56:09.386469Z" },
          { "name" : "setcolumn", "deletion_info" : { "marked_deleted" : "2019-07-18T10:11:06.966534Z", "local_delete_time" : "2019-07-18T10:11:06Z" } }
        ]
      }
    ]
  }

总结

cassandra 宽表模型是将列打平存储成一个个cell,对于集合类型,相当于把cell再打平成path存储,整个表格相当于是一个双层结构。同时集合cell有自己的deleteTime,下层的path也有自己的deleteTime, ts等。

钉钉群交流

为了营造一个开放的 Cassandra 技术交流,我们建立了微信群和钉钉群,为广大用户提供专业的技术分享及问答,定期在国内开展线下技术沙龙,专家技术直播,欢迎大家加入。
image

钉钉群入群链接:https://c.tb.cn/F3.ZRTY0o

相关文章
|
25天前
|
存储 缓存 NoSQL
【Redis技术进阶之路】「底层源码解析」揭秘高效存储模型与数据结构底层实现(字典)(一)
【Redis技术进阶之路】「底层源码解析」揭秘高效存储模型与数据结构底层实现(字典)
28 0
|
25天前
|
存储 NoSQL 算法
【Redis技术进阶之路】「底层源码解析」揭秘高效存储模型与数据结构底层实现(字典)(二)
【Redis技术进阶之路】「底层源码解析」揭秘高效存储模型与数据结构底层实现(字典)
36 0
|
20天前
|
存储 消息中间件 NoSQL
深入探索Redis集合:高效数据存储与应用解析
深入探索Redis集合:高效数据存储与应用解析
|
1月前
|
存储 NoSQL 关系型数据库
四种类型的nosql数据库
随着互联网的发展,传统关系型数据库已经不能满足大数据时代的需求。NoSQL数据库应运而生,它们具有高可扩展性、高性能和高可用性等优点。本文将介绍四种主要类型的NoSQL数据库,分别是键值存储数据库、文档存储数据库、列存储数据库和图形数据库。这些数据库在不同的场景下有着不同的应用,可以满足不同的需求。
|
3月前
|
存储 NoSQL 数据库
请解释一下键值存储数据库的工作原理,并提供一个使用键值存储数据库的实际应用场景。
请解释一下键值存储数据库的工作原理,并提供一个使用键值存储数据库的实际应用场景。
59 0
|
3月前
|
存储 JSON NoSQL
请列举一些常见的NoSQL数据库类型和其特点。
请列举一些常见的NoSQL数据库类型和其特点。
45 0
|
3月前
|
存储 NoSQL Redis
redis的五大数据类型底层数据结构
redis的五大数据类型底层数据结构
33 0
|
10月前
|
存储 缓存 NoSQL
Redis从入门到精通之底层数据结构简单动态字符串(SDS)详解
SDS是Redis中的一种字符串类型,它是一种二进制安全的字符串,由简单动态字符串(SDS)实现。SDS支持多种数据结构,其中字符串(String)是最常用的一种数据结构之一。SDS的优点在于它可以避免C字符串常见的问题,比如缓冲区溢出和内存泄露等。SDS的常数复杂度获取字符串长度和杜绝缓冲区溢出可以避免使用strlen和strcat函数时的一些问题。同时,SDS的空间预分配和惰性空间释放两种策略可以减少修改字符串的内存重新分配次数。SDS也是二进制安全的,因为它不是以空字符串来判断字符串是否结束,而是以len属性表示的长度来判断字符串是否结束。SDS还兼容部分C字符串函数
394 1
Redis从入门到精通之底层数据结构简单动态字符串(SDS)详解
|
10月前
|
存储 缓存 NoSQL
Redis从入门到精通之底层数据结构SDS(简单动态字符串)详解
SDS是Redis中的一种字符串类型,它是一种二进制安全的字符串,由简单动态字符串(SDS)实现。SDS支持多种数据结构,其中字符串(String)是最常用的一种数据结构之一。SDS的优点在于它可以避免C字符串常见的问题,比如缓冲区溢出和内存泄露等。SDS的常数复杂度获取字符串长度和杜绝缓冲区溢出可以避免使用strlen和strcat函数时的一些问题。同时,SDS的空间预分配和惰性空间释放两种策略可以减少修改字符串的内存重新分配次数。SDS也是二进制安全的,因为它不是以空字符串来判断字符串是否结束,而是以len属性表示的长度来判断字符串是否结束。SDS还兼容部分C字符串函数
734 1
|
存储
数据存储类型
数据存储类型
174 0