SelectDB Cloud
开发指南
性能测试
Star Schema Benchmark

Star Schema Benchmark

SelectDB Cloud 致力于提供极致性能、简单易用的云上实时数据分析服务,在宽表聚合、多表关联和高并发点查等场景下均具有优异的性能表现。本文将为您介绍 SelectDB Cloud 在 SSB 标准测试上的测试方法和测试结果。

概述

星型模式基准测试(SSB,Star Schema Benchmark) (opens in a new tab) 是一个轻量级的数据仓库性能测试集。SSB 基于 TPC-H 提供了一个简化版的星型模型数据集,主要用于测试在星型模型下,多表关联查询的性能表现。本文档主要介绍 SelectDB Cloud 在 SSB 100GB 测试集上的性能表现。

说明 包括 SSB 在内的标准测试集通常和实际业务场景差距较大,并且部分测试会针对测试集进行参数调优。所以标准测试集的测试结果仅能反映数据库在特定场景下的性能表现。建议用户使用实际业务数据进行进一步的测试。

测试环境

  • 数据库环境

    环境配置项配置说明
    地域和可用区华东1(杭州)地域,可用区K
    规格64核 512GB
    磁盘800GB 高性能云硬盘
    SelectDB Cloud 内核版本3.0.6
  • 客户端环境

    环境配置项配置说明
    下载测试工具的设备云服务器ECS实例
    地域和可用区华东1(杭州)地域
    实例规格ecs.g7.2xlarge
    操作系统Ubuntu 22.04.1 LTS
    网络用户专有网络(VPC)

测试数据集

整个测试数据集模拟生成 SSB 100GB 和 500GB 的数据并导入 SelectDB 进行测试,下面是测试 100GB 数据表的相关信息。

SSB表名行数备注
lineorder600037902商品订单明细表表
customer3000000客户信息表
part1400000零件信息表
supplier200000供应商信息表
dates2556日期表
lineorder_flat600037902数据展平后的宽表

测试步骤

如下介绍进行测试所需要的前置工作和测试步骤。涉及的脚本获取请参见测试工具 (opens in a new tab)

步骤一:下载安装 SSB 数据生成工具

从上述脚本库中获取脚本后,解压脚本文件并进入对应目录,执行以下指令,下载并编译ssb-dbgen工具,示例如下:

tar -zxvf yaochi_performance_tool.tar.gz
cd ./yaochi_performance_tool/ssb-tools/bin
bash build-ssb-dbgen.sh

安装成功后,将在./yaochi_performance_tool/ssb-tools/bin/ssb-dbgen/目录下生成dbgen二进制文件。

步骤二:生成 SSB 测试集

在安装测试工具目录执行如下脚本以生成SSB数据集,示例如下:

cd ./yaochi_performance_tool/ssb-tools/bin
bash gen-ssb-data.sh -s 100 -c 100

数据会以.tbl为后缀生成在ssb-data/目录下。

其中,-s 100表示测试集大小系数为100,-c 100表示并发100个线程生成lineorder表的数据。-c参数也决定了最终lineorder表的文件数量。参数越大,文件数越多,每个文件越小。

-s 100参数下,生成的数据集文件总大小约 60GB。生成时间可能在数分钟到1小时不等。具体数据信息如下表:

TableRowsSizeFile Number
lineorder60003790260GB100
customer3000000277MB1
part1400000116MB1
supplier20000017MB1
dates2556228KB1

步骤三:建表

  1. 准备doris-cluster.conf文件

    在调用导入脚本前,需要将测试用信息写在doris-cluster.conf文件中。文件位置在ssb-tools/conf/目录下,内容包括连接集群的地址、HTTP端口、用户名、密码和待导入数据的DB。

    export FE_HOST="xxx"
    export FE_HTTP_PORT="8080"
    export FE_QUERY_PORT="9030"
    export USER="root"
    export PASSWORD='xxx'
    export DB="ssb"

    说明 您可以在 SelectDB Cloud 平台上选择仓库,点击 连接 > 查看当前仓库的 私有连接地址 (或 公网连接地址 )以及 HTTP协议端口,然后替换上述脚本中的变量。

  2. 创建 SSB 表

    ./yaochi_performance_tool/ssb-tools/bin/目录下执行脚本来自动创建测试用表。

    bash create-ssb-tables.sh

    可以考虑通过修改建表语句中的配置,以让性能测试更加贴合实际生产的环境。以lineorder_flat表建表语句为例,以下的建表语句中,设置了默认分桶为48个桶。您可以根据集群规格对分桶数进行调整,这样可以获取到更好的测试效果。

    CREATE TABLE `lineorder_flat` (
      `LO_ORDERDATE` date NOT NULL COMMENT "",
      `LO_ORDERKEY` int(11) NOT NULL COMMENT "",
      `LO_LINENUMBER` tinyint(4) NOT NULL COMMENT "",
      `LO_CUSTKEY` int(11) NOT NULL COMMENT "",
      `LO_PARTKEY` int(11) NOT NULL COMMENT "",
      `LO_SUPPKEY` int(11) NOT NULL COMMENT "",
      `LO_ORDERPRIORITY` varchar(100) NOT NULL COMMENT "",
      `LO_SHIPPRIORITY` tinyint(4) NOT NULL COMMENT "",
      `LO_QUANTITY` tinyint(4) NOT NULL COMMENT "",
      `LO_EXTENDEDPRICE` int(11) NOT NULL COMMENT "",
      `LO_ORDTOTALPRICE` int(11) NOT NULL COMMENT "",
      `LO_DISCOUNT` tinyint(4) NOT NULL COMMENT "",
      `LO_REVENUE` int(11) NOT NULL COMMENT "",
      `LO_SUPPLYCOST` int(11) NOT NULL COMMENT "",
      `LO_TAX` tinyint(4) NOT NULL COMMENT "",
      `LO_COMMITDATE` date NOT NULL COMMENT "",
      `LO_SHIPMODE` varchar(100) NOT NULL COMMENT "",
      `C_NAME` varchar(100) NOT NULL COMMENT "",
      `C_ADDRESS` varchar(100) NOT NULL COMMENT "",
      `C_CITY` varchar(100) NOT NULL COMMENT "",
      `C_NATION` varchar(100) NOT NULL COMMENT "",
      `C_REGION` varchar(100) NOT NULL COMMENT "",
      `C_PHONE` varchar(100) NOT NULL COMMENT "",
      `C_MKTSEGMENT` varchar(100) NOT NULL COMMENT "",
      `S_NAME` varchar(100) NOT NULL COMMENT "",
      `S_ADDRESS` varchar(100) NOT NULL COMMENT "",
      `S_CITY` varchar(100) NOT NULL COMMENT "",
      `S_NATION` varchar(100) NOT NULL COMMENT "",
      `S_REGION` varchar(100) NOT NULL COMMENT "",
      `S_PHONE` varchar(100) NOT NULL COMMENT "",
      `P_NAME` varchar(100) NOT NULL COMMENT "",
      `P_MFGR` varchar(100) NOT NULL COMMENT "",
      `P_CATEGORY` varchar(100) NOT NULL COMMENT "",
      `P_BRAND` varchar(100) NOT NULL COMMENT "",
      `P_COLOR` varchar(100) NOT NULL COMMENT "",
      `P_TYPE` varchar(100) NOT NULL COMMENT "",
      `P_SIZE` tinyint(4) NOT NULL COMMENT "",
      `P_CONTAINER` varchar(100) NOT NULL COMMENT ""
    ) ENGINE=OLAP
    DUPLICATE KEY(`LO_ORDERDATE`, `LO_ORDERKEY`)
    COMMENT "OLAP"
    PARTITION BY RANGE(`LO_ORDERDATE`)
    (PARTITION p1 VALUES [('0000-01-01'), ('1993-01-01')),
    PARTITION p2 VALUES [('1993-01-01'), ('1994-01-01')),
    PARTITION p3 VALUES [('1994-01-01'), ('1995-01-01')),
    PARTITION p4 VALUES [('1995-01-01'), ('1996-01-01')),
    PARTITION p5 VALUES [('1996-01-01'), ('1997-01-01')),
    PARTITION p6 VALUES [('1997-01-01'), ('1998-01-01')),
    PARTITION p7 VALUES [('1998-01-01'), ('1999-01-01')))
    DISTRIBUTED BY HASH(`LO_ORDERKEY`) BUCKETS 48
    PROPERTIES (
    "replication_num" = "1",
    "colocate_with" = "groupxx1",
    "in_memory" = "false",
    "storage_format" = "DEFAULT"
    );

步骤四:导入数据

./yaochi_performance_tool/ssb-tools/bin/目录下执行脚本完成SSB测试集所有数据的导入。

bash load-ssb-data.sh -c 5

其中,-c 5表示启动5个并发线程导入(默认为 5)。如果开启更多线程,可以加快导入速度,但会增加额外的内存开销。

说明 配置项flush_thread_num_per_store表示每个数据目录的写盘线程数,影响测试数据的导入速度,默认为 6。调大该配置可以提升写数据吞吐,但可能会增加IO Util。

步骤五:检查导入数据

SELECT COUNT(*) FROM part;
SELECT COUNT(*) FROM customer;
SELECT COUNT(*) FROM supplier;
SELECT COUNT(*) FROM dates;
SELECT COUNT(*) FROM lineorder;
SELECT COUNT(*) FROM lineorder_flat;

按照上述流程执行,数据量应和以下表格给出的数据行数一致。

TableRowsOrigin SizeCompacted Size(1 Replica)
lineorder_flat60003790260GB59.709GB
lineorder60003790260GB14.514GB
customer3000000277MB138.247MB
part1400000116MB12.759MB
supplier20000017MB9.143MB
dates2556228KB34.276KB

步骤六:查询测试

标准SSB查询语句,示例如下。其他语句详情请参见ssb-queries (opens in a new tab)

--Q1.1
SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE
FROM lineorder, dates
WHERE
    lo_orderdate = d_datekey
    AND d_year = 1993
    AND lo_discount BETWEEN 1 AND 3
    AND lo_quantity < 25;
--Q1.2
SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE
FROM lineorder, dates
WHERE
    lo_orderdate = d_datekey
    AND d_yearmonth = 'Jan1994'
    AND lo_discount BETWEEN 4 AND 6
    AND lo_quantity BETWEEN 26 AND 35;
    
--Q1.3
SELECT
    SUM(lo_extendedprice * lo_discount) AS REVENUE
FROM lineorder, dates
WHERE
    lo_orderdate = d_datekey
    AND d_weeknuminyear = 6
    AND d_year = 1994
    AND lo_discount BETWEEN 5 AND 7
    AND lo_quantity BETWEEN 26 AND 35;
    
--Q2.1
SELECT SUM(lo_revenue), d_year, p_brand
FROM lineorder, dates, part, supplier
WHERE
    lo_orderdate = d_datekey
    AND lo_partkey = p_partkey
    AND lo_suppkey = s_suppkey
    AND p_category = 'MFGR#12'
    AND s_region = 'AMERICA'
GROUP BY d_year, p_brand
ORDER BY p_brand;
 
--Q2.2
SELECT SUM(lo_revenue), d_year, p_brand
FROM lineorder, dates, part, supplier
WHERE
    lo_orderdate = d_datekey
    AND lo_partkey = p_partkey
    AND lo_suppkey = s_suppkey
    AND p_brand BETWEEN 'MFGR#2221' AND 'MFGR#2228'
    AND s_region = 'ASIA'
GROUP BY d_year, p_brand
ORDER BY d_year, p_brand;
 
--Q2.3
SELECT SUM(lo_revenue), d_year, p_brand
FROM lineorder, dates, part, supplier
WHERE
    lo_orderdate = d_datekey
    AND lo_partkey = p_partkey
    AND lo_suppkey = s_suppkey
    AND p_brand = 'MFGR#2239'
    AND s_region = 'EUROPE'
GROUP BY d_year, p_brand
ORDER BY d_year, p_brand;
 
--Q3.1
SELECT
    c_nation,
    s_nation,
    d_year,
    SUM(lo_revenue) AS REVENUE
FROM customer, lineorder, supplier, dates
WHERE
    lo_custkey = c_custkey
    AND lo_suppkey = s_suppkey
    AND lo_orderdate = d_datekey
    AND c_region = 'ASIA'
    AND s_region = 'ASIA'
    AND d_year >= 1992
    AND d_year <= 1997
GROUP BY c_nation, s_nation, d_year
ORDER BY d_year ASC, REVENUE DESC;
 
--Q3.2
SELECT
    c_city,
    s_city,
    d_year,
    SUM(lo_revenue) AS REVENUE
FROM customer, lineorder, supplier, dates
WHERE
    lo_custkey = c_custkey
    AND lo_suppkey = s_suppkey
    AND lo_orderdate = d_datekey
    AND c_nation = 'UNITED STATES'
    AND s_nation = 'UNITED STATES'
    AND d_year >= 1992
    AND d_year <= 1997
GROUP BY c_city, s_city, d_year
ORDER BY d_year ASC, REVENUE DESC;
 
--Q3.3
SELECT
    c_city,
    s_city,
    d_year,
    SUM(lo_revenue) AS REVENUE
FROM customer, lineorder, supplier, dates
WHERE
    lo_custkey = c_custkey
    AND lo_suppkey = s_suppkey
    AND lo_orderdate = d_datekey
    AND (
        c_city = 'UNITED KI1'
        OR c_city = 'UNITED KI5'
    )
    AND (
        s_city = 'UNITED KI1'
        OR s_city = 'UNITED KI5'
    )
    AND d_year >= 1992
    AND d_year <= 1997
GROUP BY c_city, s_city, d_year
ORDER BY d_year ASC, REVENUE DESC;
 
--Q3.4
SELECT
    c_city,
    s_city,
    d_year,
    SUM(lo_revenue) AS REVENUE
FROM customer, lineorder, supplier, dates
WHERE
    lo_custkey = c_custkey
    AND lo_suppkey = s_suppkey
    AND lo_orderdate = d_datekey
    AND (
        c_city = 'UNITED KI1'
        OR c_city = 'UNITED KI5'
    )
    AND (
        s_city = 'UNITED KI1'
        OR s_city = 'UNITED KI5'
    )
    AND d_yearmonth = 'Dec1997'
GROUP BY c_city, s_city, d_year
ORDER BY d_year ASC, REVENUE DESC;
 
--Q4.1
SELECT /*+SET_VAR(parallel_fragment_exec_instance_num=4, enable_vectorized_engine=true, batch_size=4096, enable_cost_based_join_reorder=true, enable_projection=true) */
    d_year,
    c_nation,
    SUM(lo_revenue - lo_supplycost) AS PROFIT
FROM dates, customer, supplier, part, lineorder
WHERE
    lo_custkey = c_custkey
    AND lo_suppkey = s_suppkey
    AND lo_partkey = p_partkey
    AND lo_orderdate = d_datekey
    AND c_region = 'AMERICA'
    AND s_region = 'AMERICA'
    AND (
        p_mfgr = 'MFGR#1'
        OR p_mfgr = 'MFGR#2'
    )
GROUP BY d_year, c_nation
ORDER BY d_year, c_nation;
 
--Q4.2
SELECT /*+SET_VAR(parallel_fragment_exec_instance_num=2, enable_vectorized_engine=true, batch_size=4096, enable_cost_based_join_reorder=true, enable_projection=true) */  
    d_year,
    s_nation,
    p_category,
    SUM(lo_revenue - lo_supplycost) AS PROFIT
FROM dates, customer, supplier, part, lineorder
WHERE
    lo_custkey = c_custkey
    AND lo_suppkey = s_suppkey
    AND lo_partkey = p_partkey
    AND lo_orderdate = d_datekey
    AND c_region = 'AMERICA'
    AND s_region = 'AMERICA'
    AND (
        d_year = 1997
        OR d_year = 1998
    )
    AND (
        p_mfgr = 'MFGR#1'
        OR p_mfgr = 'MFGR#2'
    )
GROUP BY d_year, s_nation, p_category
ORDER BY d_year, s_nation, p_category;
 
--Q4.3
SELECT /*+SET_VAR(parallel_fragment_exec_instance_num=2, enable_vectorized_engine=true, batch_size=4096, enable_cost_based_join_reorder=true, enable_projection=true) */
    d_year,
    s_city,
    p_brand,
    SUM(lo_revenue - lo_supplycost) AS PROFIT
FROM dates, customer, supplier, part, lineorder
WHERE
    lo_custkey = c_custkey
    AND lo_suppkey = s_suppkey
    AND lo_partkey = p_partkey
    AND lo_orderdate = d_datekey
    AND s_nation = 'UNITED STATES'
    AND (
        d_year = 1997
        OR d_year = 1998
    )
    AND p_category = 'MFGR#14'
GROUP BY d_year, s_city, p_brand
ORDER BY d_year, s_city, p_brand;

步骤七:测试结果

以下是 SSB 100GB 和 500GB 的测试结果。

QuerySSB 100GB(s)SSB 500GB(s)
Q1.10.060.81
Q1.20.030.49
Q1.30.030.44
Q2.10.151.35
Q2.20.121
Q2.30.110.95
Q3.10.352.55
Q3.20.131.09
Q3.30.110.95
Q3.40.050.83
Q4.10.335.84
Q4.20.152.44
Q4.30.11.41
合计1.7220.15
© 2023 北京飞轮数据科技有限公司 京ICP备2022004029号 | Apache、Apache Doris 以及相关开源项目名称均为 Apache 基金会商标