SelectDB Cloud
开发指南
性能测试
TPC-H Benchmark

TPC-H Benchmark

SelectDB Cloud 致力于提供极致性能、简单易用的云上实时数据分析服务,在宽表聚合、多表关联和高并发点查等场景下均具有优异的性能表现。本文将介绍 SelectDB 在 TPC-H 标准测试上的测试方法和测试结果。

概述

TPC-H 是一个决策支持基准(Decision Support Benchmark),它由一套面向业务的特别查询和并发数据修改组成,使用的数据具有广泛的行业相关性。该基准测试通过一系列的查询操作,来评估数据库系统在处理复杂查询和数据挖掘任务时的性能。

说明

  • 本文 TPC-H 的实现基于 TPC-H 的基准测试,并不符合 TPC-H 基准测试的所有要求。本测试结果不能等同于完全遵守 TPC-H 测试规范所获得的测试结果,因此不能与完全遵守该测试规范获得的测试结果进行对比。
  • TPC-H 在内的标准测试集通常和实际业务场景差距较大,并且部分测试会针对测试集进行参数调优。所以标准测试集的测试结果仅能反映数据库在特定场景下的性能表现。建议使用实际业务数据进行进一步的测试。

测试环境

  • 数据库环境

    环境配置项配置说明
    地域和可用区华东1(杭州)地域,可用区K
    规格64核 512GB
    磁盘800GB 高性能云硬盘
    SelectDB Cloud 内核版本3.0.6
  • 客户端环境

    环境配置项配置说明
    下载测试工具的设备云服务器ECS实例
    地域和可用区华东1(杭州)地域
    实例规格ecs.g7.2xlarge
    操作系统Ubuntu 22.04.1 LTS
    网络用户专有网络(VPC)

测试数据集

整个测试模拟生成 TPC-H 100GB 和 500GB 的数据并导入 SelectDB 进行测试,下面是测试 100GB 数据表的相关及数据量。

TPC-H表名行数导入后大小备注
REGION5400KB区域表
NATION257.714KB国家表
SUPPLIER100万85.528MB供应商表
PART2000万752.330MB零部件表
PARTSUPP8000万4.375GB零部件供应表
CUSTOMER1500万1.317GB客户表
ORDERS1.5亿6.301GB订单表
LINEITEM6亿20.882GB订单明细表

测试步骤

如下介绍进行测试所需要的前置工作和测试步骤。涉及的脚本获取请参见测试工具 (opens in a new tab)

步骤一:安装 unzip 工具

  1. 安装 unzip,示例如下:

    sudo apt install unzip

    这个命令将会安装 unzip 以及它的所有依赖项。

  2. 安装完成后,可以通过运行以下命令来验证 unzip 是否已成功安装。

    unzip --help

步骤二:下载安装 TPC-H 数据生成工具

从上述脚本库中获取脚本后,解压脚本文件并进入对应目录,执行以下指令,下载并编译 tpch-dbgen 工具,示例如下:

tar -zxvf yaochi_performance_tool.tar.gz
cd ./yaochi_performance_tool/tpch-tools/bin
bash build-tpch-dbgen.sh

安装成功后,将在TPC-H_Tools_v3.0.0/目录下生成dbgen二进制文件。

步骤三:生成 TPC-H 测试集

在安装测试工具目录执行以下脚本生成 TPC-H 数据集,示例如下:

cd ./yaochi_performance_tool/tpch-tools/bin
bash gen-tpch-data.sh

数据会以.tbl为后缀在tpch-data/目录下生成,默认情况下的文件总大小约 100GB。生成时间可能在数分钟到 1 小时不等。

步骤三:建表

  1. 准备doris-cluster.conf文件

    在调用导入脚本前,需要将测试数据库的连接信息写在doris-cluster.conf文件中。文件位置在tpch-tools/conf/目录下,内容包括连接集群的地址、HTTP 端口、用户名、密码和待导入数据的 DB。

    export FE_HOST="xxx"
    export FE_HTTP_PORT="8080"
    export FE_QUERY_PORT="9030"
    export USER="root"
    export PASSWORD='xxx'
    export DB="tpch1"

    说明 您可以在 SelectD Cloud 平台上选择仓库,点击 连接 > 查看当前仓库的 私有连接地址 (或 公网连接地址 )以及 HTTP协议端口,然后替换上述脚本中的变量。

  2. 创建TPC-H表。

    ./yaochi_performance_tool/tpch-tools/in目录下执行脚本来自动创建测试用表。

    bash create-tpch-tables.sh

步骤四:导入数据

./yaochi_performance_tool/tpch-tools/bin目录下执行脚本完成测试集数据导入。

bash ./load-tpch-data.sh

步骤五:检查导入数据

按照上述流程和参数执行的场合,数据量应和上文[测试数据集]给出的生成数据的行数一致。

SELECT  COUNT(*) FROM lineitem;
SELECT  COUNT(*) FROM orders;
SELECT  COUNT(*) FROM partsupp;
SELECT  COUNT(*) FROM part;
SELECT  COUNT(*) FROM customer;
SELECT  COUNT(*) FROM supplier;
SELECT  COUNT(*) FROM nation;
SELECT  COUNT(*) FROM region;
SELECT  COUNT(*) FROM revenue0;

步骤六:查询测试

  • 执行查询脚本

    执行下面的命令完成查询测试。

    ./run-tpch-queries.sh

    测试SQL详情请参见TPCH-Query-SQL (opens in a new tab)

    说明 目前 SelectDB 的查询优化器和统计信息功能仍有提升空间,所以我们在 TPC-H 中重写了一些查询以适应SelectDB的执行框架,但不影响结果的正确性。

  • 单个 SQL 执行

    下面是本次测试时使用的 SQL 语句,您也可以从代码库中获取最新的查询语句,最新测试查询语句请参见TPC-H 测试查询语句 (opens in a new tab)

    --ENV config
    set global experimental_enable_nereids_planner=true;
    set global experimental_enable_pipeline_engine=true;
    set global enable_runtime_filter_prune=false;
    set global runtime_filter_wait_time_ms=10000;
    set global enable_fallback_to_original_planner=false;
    set global query_timeout=1000;
     
    --Q1
    select 
        l_returnflag,
        l_linestatus,
        sum(l_quantity) as sum_qty,
        sum(l_extendedprice) as sum_base_price,
        sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
        sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
        avg(l_quantity) as avg_qty,
        avg(l_extendedprice) as avg_price,
        avg(l_discount) as avg_disc,
        count(*) as count_order
    from
        lineitem
    where
        l_shipdate <= date '1998-12-01' - interval '90' day
    group by
        l_returnflag,
        l_linestatus
    order by
        l_returnflag,
        l_linestatus;
     
    --Q2
    select
        s_acctbal,
        s_name,
        n_name,
        p_partkey,
        p_mfgr,
        s_address,
        s_phone,
        s_comment
    from
        partsupp join
        (
            select
                ps_partkey as a_partkey,
                min(ps_supplycost) as a_min
            from
                partsupp,
                part,
                supplier,
                nation,
                region
            where
                p_partkey = ps_partkey
                and s_suppkey = ps_suppkey
                and s_nationkey = n_nationkey
                and n_regionkey = r_regionkey
                and r_name = 'EUROPE'
                and p_size = 15
                and p_type like '%BRASS'
            group by a_partkey
        ) A on ps_partkey = a_partkey and ps_supplycost=a_min ,
        part,
        supplier,
        nation,
        region
    where
        p_partkey = ps_partkey
        and s_suppkey = ps_suppkey
        and p_size = 15
        and p_type like '%BRASS'
        and s_nationkey = n_nationkey
        and n_regionkey = r_regionkey
        and r_name = 'EUROPE'
     
    order by
        s_acctbal desc,
        n_name,
        s_name,
        p_partkey
    limit 100;
     
    --Q3
    select l_orderkey,
        sum(l_extendedprice * (1 - l_discount)) as revenue,
        o_orderdate,
        o_shippriority
    from
        (
            select l_orderkey, l_extendedprice, l_discount, o_orderdate, o_shippriority, o_custkey from
            lineitem join orders
            where l_orderkey = o_orderkey
            and o_orderdate < date '1995-03-15'
            and l_shipdate > date '1995-03-15'
        ) t1 join customer c 
        on c.c_custkey = t1.o_custkey
        where c_mktsegment = 'BUILDING'
    group by
        l_orderkey,
        o_orderdate,
        o_shippriority
    order by
        revenue desc,
        o_orderdate
    limit 10;
     
    --Q4
    select o_orderpriority,
        count(*) as order_count
    from
        (
            select
                *
            from
                lineitem
            where l_commitdate < l_receiptdate
        ) t1
        right semi join orders
        on t1.l_orderkey = o_orderkey
    where
        o_orderdate >= date '1993-07-01'
        and o_orderdate < date '1993-07-01' + interval '3' month
    group by
        o_orderpriority
    order by
        o_orderpriority;
     
    --Q5
    select n_name,
        sum(l_extendedprice * (1 - l_discount)) as revenue
    from
        customer,
        orders,
        lineitem,
        supplier,
        nation,
        region
    where
        c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and l_suppkey = s_suppkey
        and c_nationkey = s_nationkey
        and s_nationkey = n_nationkey
        and n_regionkey = r_regionkey
        and r_name = 'ASIA'
        and o_orderdate >= date '1994-01-01'
        and o_orderdate < date '1994-01-01' + interval '1' year
    group by
        n_name
    order by
        revenue desc;
     
    --Q6
    select sum(l_extendedprice * l_discount) as revenue
    from
        lineitem
    where
        l_shipdate >= date '1994-01-01'
        and l_shipdate < date '1994-01-01' + interval '1' year
        and l_discount between .06 - 0.01 and .06 + 0.01
        and l_quantity < 24;
     
    --Q7
    select supp_nation,
        cust_nation,
        l_year,
        sum(volume) as revenue
    from
        (
            select
                n1.n_name as supp_nation,
                n2.n_name as cust_nation,
                extract(year from l_shipdate) as l_year,
                l_extendedprice * (1 - l_discount) as volume
            from
                supplier,
                lineitem,
                orders,
                customer,
                nation n1,
                nation n2
            where
                s_suppkey = l_suppkey
                and o_orderkey = l_orderkey
                and c_custkey = o_custkey
                and s_nationkey = n1.n_nationkey
                and c_nationkey = n2.n_nationkey
                and (
                    (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')
                    or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')
                )
                and l_shipdate between date '1995-01-01' and date '1996-12-31'
        ) as shipping
    group by
        supp_nation,
        cust_nation,
        l_year
    order by
        supp_nation,
        cust_nation,
        l_year;
     
    --Q8
     
    select o_year,
        sum(case
            when nation = 'BRAZIL' then volume
            else 0
        end) / sum(volume) as mkt_share
    from
        (
            select
                extract(year from o_orderdate) as o_year,
                l_extendedprice * (1 - l_discount) as volume,
                n2.n_name as nation
            from
                lineitem,
                orders,
                customer,
                supplier,
                part,
                nation n1,
                nation n2,
                region
            where
                p_partkey = l_partkey
                and s_suppkey = l_suppkey
                and l_orderkey = o_orderkey
                and o_custkey = c_custkey
                and c_nationkey = n1.n_nationkey
                and n1.n_regionkey = r_regionkey
                and r_name = 'AMERICA'
                and s_nationkey = n2.n_nationkey
                and o_orderdate between date '1995-01-01' and date '1996-12-31'
                and p_type = 'ECONOMY ANODIZED STEEL'
        ) as all_nations
    group by
        o_year
    order by
        o_year;
     
    --Q9
    select nation,
        o_year,
        sum(amount) as sum_profit
    from
        (
            select
                n_name as nation,
                extract(year from o_orderdate) as o_year,
                l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
            from
                lineitem join orders on o_orderkey = l_orderkey
                join[shuffle] part on p_partkey = l_partkey
                join[shuffle] partsupp on ps_partkey = l_partkey
                join[shuffle] supplier on s_suppkey = l_suppkey
                join[broadcast] nation on s_nationkey = n_nationkey
            where
                ps_suppkey = l_suppkey and 
                p_name like '%green%'
        ) as profit
    group by
        nation,
        o_year
    order by
        nation,
        o_year desc;
     
    --Q10
    select c_custkey,
        c_name,
        sum(t1.l_extendedprice * (1 - t1.l_discount)) as revenue,
        c_acctbal,
        n_name,
        c_address,
        c_phone,
        c_comment
    from
        customer,
        (
            select o_custkey,l_extendedprice,l_discount from lineitem, orders
            where l_orderkey = o_orderkey
            and o_orderdate >= date '1993-10-01'
            and o_orderdate < date '1993-10-01' + interval '3' month
            and l_returnflag = 'R'
        ) t1,
        nation
    where
        c_custkey = t1.o_custkey
        and c_nationkey = n_nationkey
    group by
        c_custkey,
        c_name,
        c_acctbal,
        c_phone,
        n_name,
        c_address,
        c_comment
    order by
        revenue desc
    limit 20;
     
    --Q11
    select ps_partkey,
        sum(ps_supplycost * ps_availqty) as value
    from
        partsupp,
        (
        select s_suppkey
        from supplier, nation
        where s_nationkey = n_nationkey and n_name = 'GERMANY'
        ) B
    where
        ps_suppkey = B.s_suppkey
    group by
        ps_partkey having
            sum(ps_supplycost * ps_availqty) > (
                select
                    sum(ps_supplycost * ps_availqty) * 0.000002
                from
                    partsupp,
                    (select s_suppkey
                     from supplier, nation
                     where s_nationkey = n_nationkey and n_name = 'GERMANY'
                    ) A
                where
                    ps_suppkey = A.s_suppkey
            )
    order by
        value desc;
     
    --Q12
    select l_shipmode,
        sum(case
            when o_orderpriority = '1-URGENT'
                or o_orderpriority = '2-HIGH'
                then 1
            else 0
        end) as high_line_count,
        sum(case
            when o_orderpriority <> '1-URGENT'
                and o_orderpriority <> '2-HIGH'
                then 1
            else 0
        end) as low_line_count
    from
        orders,
        lineitem
    where
        o_orderkey = l_orderkey
        and l_shipmode in ('MAIL', 'SHIP')
        and l_commitdate < l_receiptdate
        and l_shipdate < l_commitdate
        and l_receiptdate >= date '1994-01-01'
        and l_receiptdate < date '1994-01-01' + interval '1' year
    group by
        l_shipmode
    order by
        l_shipmode;
     
    --Q13
    select c_count,
        count(*) as custdist
    from
        (
            select
                c_custkey,
                count(o_orderkey) as c_count
            from
                orders right outer join customer on
                    c_custkey = o_custkey
                    and o_comment not like '%special%requests%'
            group by
                c_custkey
        ) as c_orders
    group by
        c_count
    order by
        custdist desc,
        c_count desc;
     
    --Q14
    select 100.00 * sum(case
            when p_type like 'PROMO%'
                then l_extendedprice * (1 - l_discount)
            else 0
        end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
    from
        part,
        lineitem
    where
        l_partkey = p_partkey
        and l_shipdate >= date '1995-09-01'
        and l_shipdate < date '1995-09-01' + interval '1' month;
     
    --Q15
    select s_suppkey,
        s_name,
        s_address,
        s_phone,
        total_revenue
    from
        supplier,
        revenue0
    where
        s_suppkey = supplier_no
        and total_revenue = (
            select
                max(total_revenue)
            from
                revenue0
        )
    order by
        s_suppkey;
     
    --Q16
    select p_brand,
        p_type,
        p_size,
        count(distinct ps_suppkey) as supplier_cnt
    from
        partsupp,
        part
    where
        p_partkey = ps_partkey
        and p_brand <> 'Brand#45'
        and p_type not like 'MEDIUM POLISHED%'
        and p_size in (49, 14, 23, 45, 19, 3, 36, 9)
        and ps_suppkey not in (
            select
                s_suppkey
            from
                supplier
            where
                s_comment like '%Customer%Complaints%'
        )
    group by
        p_brand,
        p_type,
        p_size
    order by
        supplier_cnt desc,
        p_brand,
        p_type,
        p_size;
     
    --Q17
    select sum(l_extendedprice) / 7.0 as avg_yearly
    from
        lineitem join [broadcast]
        part p1 on p1.p_partkey = l_partkey
    where
        p1.p_brand = 'Brand#23'
        and p1.p_container = 'MED BOX'
        and l_quantity < (
            select
                0.2 * avg(l_quantity)
            from
                lineitem join [broadcast]
                part p2 on p2.p_partkey = l_partkey
            where
                l_partkey = p1.p_partkey
                and p2.p_brand = 'Brand#23'
                and p2.p_container = 'MED BOX'
        );
     
    --Q18
    select c_name,
        c_custkey,
        t3.o_orderkey,
        t3.o_orderdate,
        t3.o_totalprice,
        sum(t3.l_quantity)
    from
    customer join
    (
      select * from
      lineitem join
      (
        select * from
        orders left semi join
        (
          select
              l_orderkey
          from
              lineitem
          group by
              l_orderkey having sum(l_quantity) > 300
        ) t1
        on o_orderkey = t1.l_orderkey
      ) t2
      on t2.o_orderkey = l_orderkey
    ) t3
    on c_custkey = t3.o_custkey
    group by
        c_name,
        c_custkey,
        t3.o_orderkey,
        t3.o_orderdate,
        t3.o_totalprice
    order by
        t3.o_totalprice desc,
        t3.o_orderdate
    limit 100;
     
    --Q19
    select sum(l_extendedprice* (1 - l_discount)) as revenue
    from
        lineitem,
        part
    where
        (
            p_partkey = l_partkey
            and p_brand = 'Brand#12'
            and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
            and l_quantity >= 1 and l_quantity <= 1 + 10
            and p_size between 1 and 5
            and l_shipmode in ('AIR', 'AIR REG')
            and l_shipinstruct = 'DELIVER IN PERSON'
        )
        or
        (
            p_partkey = l_partkey
            and p_brand = 'Brand#23'
            and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
            and l_quantity >= 10 and l_quantity <= 10 + 10
            and p_size between 1 and 10
            and l_shipmode in ('AIR', 'AIR REG')
            and l_shipinstruct = 'DELIVER IN PERSON'
        )
        or
        (
            p_partkey = l_partkey
            and p_brand = 'Brand#34'
            and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
            and l_quantity >= 20 and l_quantity <= 20 + 10
            and p_size between 1 and 15
            and l_shipmode in ('AIR', 'AIR REG')
            and l_shipinstruct = 'DELIVER IN PERSON'
        );
     
    --Q20
    select s_name, s_address from
    supplier left semi join
    (
        select * from
        (
            select l_partkey,l_suppkey, 0.5 * sum(l_quantity) as l_q
            from lineitem
            where l_shipdate >= date '1994-01-01'
                and l_shipdate < date '1994-01-01' + interval '1' year
            group by l_partkey,l_suppkey
        ) t2 join
        (
            select ps_partkey, ps_suppkey, ps_availqty
            from partsupp left semi join part
            on ps_partkey = p_partkey and p_name like 'forest%'
        ) t1
        on t2.l_partkey = t1.ps_partkey and t2.l_suppkey = t1.ps_suppkey
        and t1.ps_availqty > t2.l_q
    ) t3
    on s_suppkey = t3.ps_suppkey
    join nation
    where s_nationkey = n_nationkey
        and n_name = 'CANADA'
    order by s_name;
     
    --Q21
    select s_name, count(*) as numwait
    from
      lineitem l2 right semi join
      (
        select * from
        lineitem l3 right anti join
        (
          select * from
          orders join lineitem l1 on l1.l_orderkey = o_orderkey and o_orderstatus = 'F'
          join
          (
            select * from
            supplier join nation
            where s_nationkey = n_nationkey
              and n_name = 'SAUDI ARABIA'
          ) t1
          where t1.s_suppkey = l1.l_suppkey and l1.l_receiptdate > l1.l_commitdate
        ) t2
        on l3.l_orderkey = t2.l_orderkey and l3.l_suppkey <> t2.l_suppkey  and l3.l_receiptdate > l3.l_commitdate
      ) t3
      on l2.l_orderkey = t3.l_orderkey and l2.l_suppkey <> t3.l_suppkey 
     
    group by
        t3.s_name
    order by
        numwait desc,
        t3.s_name
    limit 100;
     
    --Q22
    with tmp as (select
                        avg(c_acctbal) as av
                    from
                        customer
                    where
                        c_acctbal > 0.00
                        and substring(c_phone, 1, 2) in
                            ('13', '31', '23', '29', '30', '18', '17'))
     
    select cntrycode,
        count(*) as numcust,
        sum(c_acctbal) as totacctbal
    from
        (
        select
                substring(c_phone, 1, 2) as cntrycode,
                c_acctbal
            from
                 orders right anti join customer c on  o_custkey = c.c_custkey join tmp on c.c_acctbal > tmp.av
            where
                substring(c_phone, 1, 2) in
                    ('13', '31', '23', '29', '30', '18', '17')
        ) as custsale
    group by
        cntrycode
    order by
        cntrycode;

步骤七:测试结果

以下将给出 TPCH 100GB 和 500GB 的测试结果:

QueryTPCH 100GB(s)TPCH 500GB(s)
Q11.7410.04
Q20.070.19
Q30.343.43
Q40.191.1
Q50.817.52
Q60.030.15
Q70.545.74
Q80.262.56
Q92.6218.44
Q100.915.45
Q110.080.36
Q120.090.47
Q131.326.6
Q140.120.59
Q150.180.85
Q160.281.17
Q170.10.45
Q181.79.94
Q190.181.9
Q200.390.62
Q210.657.23
Q220.191.04
合计12.7985.84
© 2023 北京飞轮数据科技有限公司 京ICP备2022004029号 | Apache、Apache Doris 以及相关开源项目名称均为 Apache 基金会商标