Star Schema Benchmark
SelectDB Cloud 致力于提供极致性能、简单易用的云上实时数据分析服务,在宽表聚合、多表关联和高并发点查等场景下均具有优异的性能表现。本文将为您介绍 SelectDB Cloud 在 SSB 标准测试上的测试方法和测试结果。
概述
星型模式基准测试(SSB,Star Schema Benchmark) (opens in a new tab) 是一个轻量级的数据仓库性能测试集。SSB 基于 TPC-H 提供了一个简化版的星型模型数据集,主要用于测试在星型模型下,多表关联查询的性能表现。本文档主要介绍 SelectDB Cloud 在 SSB 100GB 测试集上的性能表现。
说明 包括 SSB 在内的标准测试集通常和实际业务场景差距较大,并且部分测试会针对测试集进行参数调优。所以标准测试集的测试结果仅能反映数据库在特定场景下的性能表现。建议用户使用实际业务数据进行进一步的测试。
测试环境
-
数据库环境
环境配置项 配置说明 地域和可用区 华东1(杭州)地域,可用区K 规格 64核 512GB 磁盘 800GB 高性能云硬盘 SelectDB Cloud 内核版本 3.0.6 -
客户端环境
环境配置项 配置说明 下载测试工具的设备 云服务器ECS实例 地域和可用区 华东1(杭州)地域 实例规格 ecs.g7.2xlarge 操作系统 Ubuntu 22.04.1 LTS 网络 用户专有网络(VPC)
测试数据集
整个测试数据集模拟生成 SSB 100GB 和 500GB 的数据并导入 SelectDB 进行测试,下面是测试 100GB 数据表的相关信息。
SSB表名 | 行数 | 备注 |
---|---|---|
lineorder | 600037902 | 商品订单明细表表 |
customer | 3000000 | 客户信息表 |
part | 1400000 | 零件信息表 |
supplier | 200000 | 供应商信息表 |
dates | 2556 | 日期表 |
lineorder_flat | 600037902 | 数据展平后的宽表 |
测试步骤
如下介绍进行测试所需要的前置工作和测试步骤。涉及的脚本获取请参见测试工具 (opens in a new tab)。
步骤一:下载安装 SSB 数据生成工具
从上述脚本库中获取脚本后,解压脚本文件并进入对应目录,执行以下指令,下载并编译ssb-dbgen工具,示例如下:
tar -zxvf yaochi_performance_tool.tar.gz
cd ./yaochi_performance_tool/ssb-tools/bin
bash build-ssb-dbgen.sh
安装成功后,将在./yaochi_performance_tool/ssb-tools/bin/ssb-dbgen/
目录下生成dbgen
二进制文件。
步骤二:生成 SSB 测试集
在安装测试工具目录执行如下脚本以生成SSB数据集,示例如下:
cd ./yaochi_performance_tool/ssb-tools/bin
bash gen-ssb-data.sh -s 100 -c 100
数据会以.tbl
为后缀生成在ssb-data/
目录下。
其中,-s 100
表示测试集大小系数为100,-c 100
表示并发100个线程生成lineorder表的数据。-c
参数也决定了最终lineorder表的文件数量。参数越大,文件数越多,每个文件越小。
在-s 100
参数下,生成的数据集文件总大小约 60GB。生成时间可能在数分钟到1小时不等。具体数据信息如下表:
Table | Rows | Size | File Number |
---|---|---|---|
lineorder | 600037902 | 60GB | 100 |
customer | 3000000 | 277MB | 1 |
part | 1400000 | 116MB | 1 |
supplier | 200000 | 17MB | 1 |
dates | 2556 | 228KB | 1 |
步骤三:建表
-
准备
doris-cluster.conf
文件在调用导入脚本前,需要将测试用信息写在
doris-cluster.conf
文件中。文件位置在ssb-tools/conf/
目录下,内容包括连接集群的地址、HTTP端口、用户名、密码和待导入数据的DB。export FE_HOST="xxx" export FE_HTTP_PORT="8080" export FE_QUERY_PORT="9030" export USER="root" export PASSWORD='xxx' export DB="ssb"
说明 您可以在 SelectDB Cloud 平台上选择仓库,点击 连接 > 查看当前仓库的 私有连接地址 (或 公网连接地址 )以及 HTTP协议端口,然后替换上述脚本中的变量。
-
创建 SSB 表
在
./yaochi_performance_tool/ssb-tools/bin/
目录下执行脚本来自动创建测试用表。bash create-ssb-tables.sh
可以考虑通过修改建表语句中的配置,以让性能测试更加贴合实际生产的环境。以
lineorder_flat
表建表语句为例,以下的建表语句中,设置了默认分桶为48个桶。您可以根据集群规格对分桶数进行调整,这样可以获取到更好的测试效果。CREATE TABLE `lineorder_flat` ( `LO_ORDERDATE` date NOT NULL COMMENT "", `LO_ORDERKEY` int(11) NOT NULL COMMENT "", `LO_LINENUMBER` tinyint(4) NOT NULL COMMENT "", `LO_CUSTKEY` int(11) NOT NULL COMMENT "", `LO_PARTKEY` int(11) NOT NULL COMMENT "", `LO_SUPPKEY` int(11) NOT NULL COMMENT "", `LO_ORDERPRIORITY` varchar(100) NOT NULL COMMENT "", `LO_SHIPPRIORITY` tinyint(4) NOT NULL COMMENT "", `LO_QUANTITY` tinyint(4) NOT NULL COMMENT "", `LO_EXTENDEDPRICE` int(11) NOT NULL COMMENT "", `LO_ORDTOTALPRICE` int(11) NOT NULL COMMENT "", `LO_DISCOUNT` tinyint(4) NOT NULL COMMENT "", `LO_REVENUE` int(11) NOT NULL COMMENT "", `LO_SUPPLYCOST` int(11) NOT NULL COMMENT "", `LO_TAX` tinyint(4) NOT NULL COMMENT "", `LO_COMMITDATE` date NOT NULL COMMENT "", `LO_SHIPMODE` varchar(100) NOT NULL COMMENT "", `C_NAME` varchar(100) NOT NULL COMMENT "", `C_ADDRESS` varchar(100) NOT NULL COMMENT "", `C_CITY` varchar(100) NOT NULL COMMENT "", `C_NATION` varchar(100) NOT NULL COMMENT "", `C_REGION` varchar(100) NOT NULL COMMENT "", `C_PHONE` varchar(100) NOT NULL COMMENT "", `C_MKTSEGMENT` varchar(100) NOT NULL COMMENT "", `S_NAME` varchar(100) NOT NULL COMMENT "", `S_ADDRESS` varchar(100) NOT NULL COMMENT "", `S_CITY` varchar(100) NOT NULL COMMENT "", `S_NATION` varchar(100) NOT NULL COMMENT "", `S_REGION` varchar(100) NOT NULL COMMENT "", `S_PHONE` varchar(100) NOT NULL COMMENT "", `P_NAME` varchar(100) NOT NULL COMMENT "", `P_MFGR` varchar(100) NOT NULL COMMENT "", `P_CATEGORY` varchar(100) NOT NULL COMMENT "", `P_BRAND` varchar(100) NOT NULL COMMENT "", `P_COLOR` varchar(100) NOT NULL COMMENT "", `P_TYPE` varchar(100) NOT NULL COMMENT "", `P_SIZE` tinyint(4) NOT NULL COMMENT "", `P_CONTAINER` varchar(100) NOT NULL COMMENT "" ) ENGINE=OLAP DUPLICATE KEY(`LO_ORDERDATE`, `LO_ORDERKEY`) COMMENT "OLAP" PARTITION BY RANGE(`LO_ORDERDATE`) (PARTITION p1 VALUES [('0000-01-01'), ('1993-01-01')), PARTITION p2 VALUES [('1993-01-01'), ('1994-01-01')), PARTITION p3 VALUES [('1994-01-01'), ('1995-01-01')), PARTITION p4 VALUES [('1995-01-01'), ('1996-01-01')), PARTITION p5 VALUES [('1996-01-01'), ('1997-01-01')), PARTITION p6 VALUES [('1997-01-01'), ('1998-01-01')), PARTITION p7 VALUES [('1998-01-01'), ('1999-01-01'))) DISTRIBUTED BY HASH(`LO_ORDERKEY`) BUCKETS 48 PROPERTIES ( "replication_num" = "1", "colocate_with" = "groupxx1", "in_memory" = "false", "storage_format" = "DEFAULT" );
步骤四:导入数据
在./yaochi_performance_tool/ssb-tools/bin/
目录下执行脚本完成SSB测试集所有数据的导入。
bash load-ssb-data.sh -c 5
其中,-c 5
表示启动5个并发线程导入(默认为 5)。如果开启更多线程,可以加快导入速度,但会增加额外的内存开销。
说明 配置项
flush_thread_num_per_store
表示每个数据目录的写盘线程数,影响测试数据的导入速度,默认为 6。调大该配置可以提升写数据吞吐,但可能会增加IO Util。
步骤五:检查导入数据
SELECT COUNT(*) FROM part;
SELECT COUNT(*) FROM customer;
SELECT COUNT(*) FROM supplier;
SELECT COUNT(*) FROM dates;
SELECT COUNT(*) FROM lineorder;
SELECT COUNT(*) FROM lineorder_flat;
按照上述流程执行,数据量应和以下表格给出的数据行数一致。
Table | Rows | Origin Size | Compacted Size(1 Replica) |
---|---|---|---|
lineorder_flat | 600037902 | 60GB | 59.709GB |
lineorder | 600037902 | 60GB | 14.514GB |
customer | 3000000 | 277MB | 138.247MB |
part | 1400000 | 116MB | 12.759MB |
supplier | 200000 | 17MB | 9.143MB |
dates | 2556 | 228KB | 34.276KB |
步骤六:查询测试
标准SSB查询语句,示例如下。其他语句详情请参见ssb-queries (opens in a new tab)。
--Q1.1
SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE
FROM lineorder, dates
WHERE
lo_orderdate = d_datekey
AND d_year = 1993
AND lo_discount BETWEEN 1 AND 3
AND lo_quantity < 25;
--Q1.2
SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE
FROM lineorder, dates
WHERE
lo_orderdate = d_datekey
AND d_yearmonth = 'Jan1994'
AND lo_discount BETWEEN 4 AND 6
AND lo_quantity BETWEEN 26 AND 35;
--Q1.3
SELECT
SUM(lo_extendedprice * lo_discount) AS REVENUE
FROM lineorder, dates
WHERE
lo_orderdate = d_datekey
AND d_weeknuminyear = 6
AND d_year = 1994
AND lo_discount BETWEEN 5 AND 7
AND lo_quantity BETWEEN 26 AND 35;
--Q2.1
SELECT SUM(lo_revenue), d_year, p_brand
FROM lineorder, dates, part, supplier
WHERE
lo_orderdate = d_datekey
AND lo_partkey = p_partkey
AND lo_suppkey = s_suppkey
AND p_category = 'MFGR#12'
AND s_region = 'AMERICA'
GROUP BY d_year, p_brand
ORDER BY p_brand;
--Q2.2
SELECT SUM(lo_revenue), d_year, p_brand
FROM lineorder, dates, part, supplier
WHERE
lo_orderdate = d_datekey
AND lo_partkey = p_partkey
AND lo_suppkey = s_suppkey
AND p_brand BETWEEN 'MFGR#2221' AND 'MFGR#2228'
AND s_region = 'ASIA'
GROUP BY d_year, p_brand
ORDER BY d_year, p_brand;
--Q2.3
SELECT SUM(lo_revenue), d_year, p_brand
FROM lineorder, dates, part, supplier
WHERE
lo_orderdate = d_datekey
AND lo_partkey = p_partkey
AND lo_suppkey = s_suppkey
AND p_brand = 'MFGR#2239'
AND s_region = 'EUROPE'
GROUP BY d_year, p_brand
ORDER BY d_year, p_brand;
--Q3.1
SELECT
c_nation,
s_nation,
d_year,
SUM(lo_revenue) AS REVENUE
FROM customer, lineorder, supplier, dates
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_orderdate = d_datekey
AND c_region = 'ASIA'
AND s_region = 'ASIA'
AND d_year >= 1992
AND d_year <= 1997
GROUP BY c_nation, s_nation, d_year
ORDER BY d_year ASC, REVENUE DESC;
--Q3.2
SELECT
c_city,
s_city,
d_year,
SUM(lo_revenue) AS REVENUE
FROM customer, lineorder, supplier, dates
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_orderdate = d_datekey
AND c_nation = 'UNITED STATES'
AND s_nation = 'UNITED STATES'
AND d_year >= 1992
AND d_year <= 1997
GROUP BY c_city, s_city, d_year
ORDER BY d_year ASC, REVENUE DESC;
--Q3.3
SELECT
c_city,
s_city,
d_year,
SUM(lo_revenue) AS REVENUE
FROM customer, lineorder, supplier, dates
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_orderdate = d_datekey
AND (
c_city = 'UNITED KI1'
OR c_city = 'UNITED KI5'
)
AND (
s_city = 'UNITED KI1'
OR s_city = 'UNITED KI5'
)
AND d_year >= 1992
AND d_year <= 1997
GROUP BY c_city, s_city, d_year
ORDER BY d_year ASC, REVENUE DESC;
--Q3.4
SELECT
c_city,
s_city,
d_year,
SUM(lo_revenue) AS REVENUE
FROM customer, lineorder, supplier, dates
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_orderdate = d_datekey
AND (
c_city = 'UNITED KI1'
OR c_city = 'UNITED KI5'
)
AND (
s_city = 'UNITED KI1'
OR s_city = 'UNITED KI5'
)
AND d_yearmonth = 'Dec1997'
GROUP BY c_city, s_city, d_year
ORDER BY d_year ASC, REVENUE DESC;
--Q4.1
SELECT /*+SET_VAR(parallel_fragment_exec_instance_num=4, enable_vectorized_engine=true, batch_size=4096, enable_cost_based_join_reorder=true, enable_projection=true) */
d_year,
c_nation,
SUM(lo_revenue - lo_supplycost) AS PROFIT
FROM dates, customer, supplier, part, lineorder
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_partkey = p_partkey
AND lo_orderdate = d_datekey
AND c_region = 'AMERICA'
AND s_region = 'AMERICA'
AND (
p_mfgr = 'MFGR#1'
OR p_mfgr = 'MFGR#2'
)
GROUP BY d_year, c_nation
ORDER BY d_year, c_nation;
--Q4.2
SELECT /*+SET_VAR(parallel_fragment_exec_instance_num=2, enable_vectorized_engine=true, batch_size=4096, enable_cost_based_join_reorder=true, enable_projection=true) */
d_year,
s_nation,
p_category,
SUM(lo_revenue - lo_supplycost) AS PROFIT
FROM dates, customer, supplier, part, lineorder
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_partkey = p_partkey
AND lo_orderdate = d_datekey
AND c_region = 'AMERICA'
AND s_region = 'AMERICA'
AND (
d_year = 1997
OR d_year = 1998
)
AND (
p_mfgr = 'MFGR#1'
OR p_mfgr = 'MFGR#2'
)
GROUP BY d_year, s_nation, p_category
ORDER BY d_year, s_nation, p_category;
--Q4.3
SELECT /*+SET_VAR(parallel_fragment_exec_instance_num=2, enable_vectorized_engine=true, batch_size=4096, enable_cost_based_join_reorder=true, enable_projection=true) */
d_year,
s_city,
p_brand,
SUM(lo_revenue - lo_supplycost) AS PROFIT
FROM dates, customer, supplier, part, lineorder
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_partkey = p_partkey
AND lo_orderdate = d_datekey
AND s_nation = 'UNITED STATES'
AND (
d_year = 1997
OR d_year = 1998
)
AND p_category = 'MFGR#14'
GROUP BY d_year, s_city, p_brand
ORDER BY d_year, s_city, p_brand;
步骤七:测试结果
以下是 SSB 100GB 和 500GB 的测试结果。
Query | SSB 100GB(s) | SSB 500GB(s) |
---|---|---|
Q1.1 | 0.06 | 0.81 |
Q1.2 | 0.03 | 0.49 |
Q1.3 | 0.03 | 0.44 |
Q2.1 | 0.15 | 1.35 |
Q2.2 | 0.12 | 1 |
Q2.3 | 0.11 | 0.95 |
Q3.1 | 0.35 | 2.55 |
Q3.2 | 0.13 | 1.09 |
Q3.3 | 0.11 | 0.95 |
Q3.4 | 0.05 | 0.83 |
Q4.1 | 0.33 | 5.84 |
Q4.2 | 0.15 | 2.44 |
Q4.3 | 0.1 | 1.41 |
合计 | 1.72 | 20.15 |