TPC-H Benchmark
SelectDB Cloud 致力于提供极致性能、简单易用的云上实时数据分析服务,在宽表聚合、多表关联和高并发点查等场景下均具有优异的性能表现。本文将介绍 SelectDB 在 TPC-H 标准测试上的测试方法和测试结果。
概述
TPC-H 是一个决策支持基准(Decision Support Benchmark),它由一套面向业务的特别查询和并发数据修改组成,使用的数据具有广泛的行业相关性。该基准测试通过一系列的查询操作,来评估数据库系统在处理复杂查询和数据挖掘任务时的性能。
说明
- 本文 TPC-H 的实现基于 TPC-H 的基准测试,并不符合 TPC-H 基准测试的所有要求。本测试结果不能等同于完全遵守 TPC-H 测试规范所获得的测试结果,因此不能与完全遵守该测试规范获得的测试结果进行对比。
- TPC-H 在内的标准测试集通常和实际业务场景差距较大,并且部分测试会针对测试集进行参数调优。所以标准测试集的测试结果仅能反映数据库在特定场景下的性能表现。建议使用实际业务数据进行进一步的测试。
测试环境
-
数据库环境
环境配置项 配置说明 地域和可用区 华东1(杭州)地域,可用区K 规格 64核 512GB 磁盘 800GB 高性能云硬盘 SelectDB Cloud 内核版本 3.0.6 -
客户端环境
环境配置项 配置说明 下载测试工具的设备 云服务器ECS实例 地域和可用区 华东1(杭州)地域 实例规格 ecs.g7.2xlarge 操作系统 Ubuntu 22.04.1 LTS 网络 用户专有网络(VPC)
测试数据集
整个测试模拟生成 TPC-H 100GB 和 500GB 的数据并导入 SelectDB 进行测试,下面是测试 100GB 数据表的相关及数据量。
TPC-H表名 | 行数 | 导入后大小 | 备注 |
---|---|---|---|
REGION | 5 | 400KB | 区域表 |
NATION | 25 | 7.714KB | 国家表 |
SUPPLIER | 100万 | 85.528MB | 供应商表 |
PART | 2000万 | 752.330MB | 零部件表 |
PARTSUPP | 8000万 | 4.375GB | 零部件供应表 |
CUSTOMER | 1500万 | 1.317GB | 客户表 |
ORDERS | 1.5亿 | 6.301GB | 订单表 |
LINEITEM | 6亿 | 20.882GB | 订单明细表 |
测试步骤
如下介绍进行测试所需要的前置工作和测试步骤。涉及的脚本获取请参见测试工具 (opens in a new tab)
步骤一:安装 unzip 工具
-
安装 unzip,示例如下:
sudo apt install unzip
这个命令将会安装 unzip 以及它的所有依赖项。
-
安装完成后,可以通过运行以下命令来验证 unzip 是否已成功安装。
unzip --help
步骤二:下载安装 TPC-H 数据生成工具
从上述脚本库中获取脚本后,解压脚本文件并进入对应目录,执行以下指令,下载并编译 tpch-dbgen 工具,示例如下:
tar -zxvf yaochi_performance_tool.tar.gz
cd ./yaochi_performance_tool/tpch-tools/bin
bash build-tpch-dbgen.sh
安装成功后,将在TPC-H_Tools_v3.0.0/
目录下生成dbgen
二进制文件。
步骤三:生成 TPC-H 测试集
在安装测试工具目录执行以下脚本生成 TPC-H 数据集,示例如下:
cd ./yaochi_performance_tool/tpch-tools/bin
bash gen-tpch-data.sh
数据会以.tbl
为后缀在tpch-data/
目录下生成,默认情况下的文件总大小约 100GB。生成时间可能在数分钟到 1 小时不等。
步骤三:建表
-
准备
doris-cluster.conf
文件在调用导入脚本前,需要将测试数据库的连接信息写在
doris-cluster.conf
文件中。文件位置在tpch-tools/conf/
目录下,内容包括连接集群的地址、HTTP 端口、用户名、密码和待导入数据的 DB。export FE_HOST="xxx" export FE_HTTP_PORT="8080" export FE_QUERY_PORT="9030" export USER="root" export PASSWORD='xxx' export DB="tpch1"
说明 您可以在 SelectD Cloud 平台上选择仓库,点击 连接 > 查看当前仓库的 私有连接地址 (或 公网连接地址 )以及 HTTP协议端口,然后替换上述脚本中的变量。
-
创建TPC-H表。
在
./yaochi_performance_tool/tpch-tools/in
目录下执行脚本来自动创建测试用表。bash create-tpch-tables.sh
步骤四:导入数据
在./yaochi_performance_tool/tpch-tools/bin
目录下执行脚本完成测试集数据导入。
bash ./load-tpch-data.sh
步骤五:检查导入数据
按照上述流程和参数执行的场合,数据量应和上文[测试数据集]给出的生成数据的行数一致。
SELECT COUNT(*) FROM lineitem;
SELECT COUNT(*) FROM orders;
SELECT COUNT(*) FROM partsupp;
SELECT COUNT(*) FROM part;
SELECT COUNT(*) FROM customer;
SELECT COUNT(*) FROM supplier;
SELECT COUNT(*) FROM nation;
SELECT COUNT(*) FROM region;
SELECT COUNT(*) FROM revenue0;
步骤六:查询测试
-
执行查询脚本
执行下面的命令完成查询测试。
./run-tpch-queries.sh
测试SQL详情请参见TPCH-Query-SQL (opens in a new tab)。
说明 目前 SelectDB 的查询优化器和统计信息功能仍有提升空间,所以我们在 TPC-H 中重写了一些查询以适应SelectDB的执行框架,但不影响结果的正确性。
-
单个 SQL 执行
下面是本次测试时使用的 SQL 语句,您也可以从代码库中获取最新的查询语句,最新测试查询语句请参见TPC-H 测试查询语句 (opens in a new tab)。
--ENV config set global experimental_enable_nereids_planner=true; set global experimental_enable_pipeline_engine=true; set global enable_runtime_filter_prune=false; set global runtime_filter_wait_time_ms=10000; set global enable_fallback_to_original_planner=false; set global query_timeout=1000; --Q1 select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= date '1998-12-01' - interval '90' day group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus; --Q2 select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from partsupp join ( select ps_partkey as a_partkey, min(ps_supplycost) as a_min from partsupp, part, supplier, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE' and p_size = 15 and p_type like '%BRASS' group by a_partkey ) A on ps_partkey = a_partkey and ps_supplycost=a_min , part, supplier, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = 15 and p_type like '%BRASS' and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE' order by s_acctbal desc, n_name, s_name, p_partkey limit 100; --Q3 select l_orderkey, sum(l_extendedprice * (1 - l_discount)) as revenue, o_orderdate, o_shippriority from ( select l_orderkey, l_extendedprice, l_discount, o_orderdate, o_shippriority, o_custkey from lineitem join orders where l_orderkey = o_orderkey and o_orderdate < date '1995-03-15' and l_shipdate > date '1995-03-15' ) t1 join customer c on c.c_custkey = t1.o_custkey where c_mktsegment = 'BUILDING' group by l_orderkey, o_orderdate, o_shippriority order by revenue desc, o_orderdate limit 10; --Q4 select o_orderpriority, count(*) as order_count from ( select * from lineitem where l_commitdate < l_receiptdate ) t1 right semi join orders on t1.l_orderkey = o_orderkey where o_orderdate >= date '1993-07-01' and o_orderdate < date '1993-07-01' + interval '3' month group by o_orderpriority order by o_orderpriority; --Q5 select n_name, sum(l_extendedprice * (1 - l_discount)) as revenue from customer, orders, lineitem, supplier, nation, region where c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'ASIA' and o_orderdate >= date '1994-01-01' and o_orderdate < date '1994-01-01' + interval '1' year group by n_name order by revenue desc; --Q6 select sum(l_extendedprice * l_discount) as revenue from lineitem where l_shipdate >= date '1994-01-01' and l_shipdate < date '1994-01-01' + interval '1' year and l_discount between .06 - 0.01 and .06 + 0.01 and l_quantity < 24; --Q7 select supp_nation, cust_nation, l_year, sum(volume) as revenue from ( select n1.n_name as supp_nation, n2.n_name as cust_nation, extract(year from l_shipdate) as l_year, l_extendedprice * (1 - l_discount) as volume from supplier, lineitem, orders, customer, nation n1, nation n2 where s_suppkey = l_suppkey and o_orderkey = l_orderkey and c_custkey = o_custkey and s_nationkey = n1.n_nationkey and c_nationkey = n2.n_nationkey and ( (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY') or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE') ) and l_shipdate between date '1995-01-01' and date '1996-12-31' ) as shipping group by supp_nation, cust_nation, l_year order by supp_nation, cust_nation, l_year; --Q8 select o_year, sum(case when nation = 'BRAZIL' then volume else 0 end) / sum(volume) as mkt_share from ( select extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) as volume, n2.n_name as nation from lineitem, orders, customer, supplier, part, nation n1, nation n2, region where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name = 'AMERICA' and s_nationkey = n2.n_nationkey and o_orderdate between date '1995-01-01' and date '1996-12-31' and p_type = 'ECONOMY ANODIZED STEEL' ) as all_nations group by o_year order by o_year; --Q9 select nation, o_year, sum(amount) as sum_profit from ( select n_name as nation, extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount from lineitem join orders on o_orderkey = l_orderkey join[shuffle] part on p_partkey = l_partkey join[shuffle] partsupp on ps_partkey = l_partkey join[shuffle] supplier on s_suppkey = l_suppkey join[broadcast] nation on s_nationkey = n_nationkey where ps_suppkey = l_suppkey and p_name like '%green%' ) as profit group by nation, o_year order by nation, o_year desc; --Q10 select c_custkey, c_name, sum(t1.l_extendedprice * (1 - t1.l_discount)) as revenue, c_acctbal, n_name, c_address, c_phone, c_comment from customer, ( select o_custkey,l_extendedprice,l_discount from lineitem, orders where l_orderkey = o_orderkey and o_orderdate >= date '1993-10-01' and o_orderdate < date '1993-10-01' + interval '3' month and l_returnflag = 'R' ) t1, nation where c_custkey = t1.o_custkey and c_nationkey = n_nationkey group by c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment order by revenue desc limit 20; --Q11 select ps_partkey, sum(ps_supplycost * ps_availqty) as value from partsupp, ( select s_suppkey from supplier, nation where s_nationkey = n_nationkey and n_name = 'GERMANY' ) B where ps_suppkey = B.s_suppkey group by ps_partkey having sum(ps_supplycost * ps_availqty) > ( select sum(ps_supplycost * ps_availqty) * 0.000002 from partsupp, (select s_suppkey from supplier, nation where s_nationkey = n_nationkey and n_name = 'GERMANY' ) A where ps_suppkey = A.s_suppkey ) order by value desc; --Q12 select l_shipmode, sum(case when o_orderpriority = '1-URGENT' or o_orderpriority = '2-HIGH' then 1 else 0 end) as high_line_count, sum(case when o_orderpriority <> '1-URGENT' and o_orderpriority <> '2-HIGH' then 1 else 0 end) as low_line_count from orders, lineitem where o_orderkey = l_orderkey and l_shipmode in ('MAIL', 'SHIP') and l_commitdate < l_receiptdate and l_shipdate < l_commitdate and l_receiptdate >= date '1994-01-01' and l_receiptdate < date '1994-01-01' + interval '1' year group by l_shipmode order by l_shipmode; --Q13 select c_count, count(*) as custdist from ( select c_custkey, count(o_orderkey) as c_count from orders right outer join customer on c_custkey = o_custkey and o_comment not like '%special%requests%' group by c_custkey ) as c_orders group by c_count order by custdist desc, c_count desc; --Q14 select 100.00 * sum(case when p_type like 'PROMO%' then l_extendedprice * (1 - l_discount) else 0 end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue from part, lineitem where l_partkey = p_partkey and l_shipdate >= date '1995-09-01' and l_shipdate < date '1995-09-01' + interval '1' month; --Q15 select s_suppkey, s_name, s_address, s_phone, total_revenue from supplier, revenue0 where s_suppkey = supplier_no and total_revenue = ( select max(total_revenue) from revenue0 ) order by s_suppkey; --Q16 select p_brand, p_type, p_size, count(distinct ps_suppkey) as supplier_cnt from partsupp, part where p_partkey = ps_partkey and p_brand <> 'Brand#45' and p_type not like 'MEDIUM POLISHED%' and p_size in (49, 14, 23, 45, 19, 3, 36, 9) and ps_suppkey not in ( select s_suppkey from supplier where s_comment like '%Customer%Complaints%' ) group by p_brand, p_type, p_size order by supplier_cnt desc, p_brand, p_type, p_size; --Q17 select sum(l_extendedprice) / 7.0 as avg_yearly from lineitem join [broadcast] part p1 on p1.p_partkey = l_partkey where p1.p_brand = 'Brand#23' and p1.p_container = 'MED BOX' and l_quantity < ( select 0.2 * avg(l_quantity) from lineitem join [broadcast] part p2 on p2.p_partkey = l_partkey where l_partkey = p1.p_partkey and p2.p_brand = 'Brand#23' and p2.p_container = 'MED BOX' ); --Q18 select c_name, c_custkey, t3.o_orderkey, t3.o_orderdate, t3.o_totalprice, sum(t3.l_quantity) from customer join ( select * from lineitem join ( select * from orders left semi join ( select l_orderkey from lineitem group by l_orderkey having sum(l_quantity) > 300 ) t1 on o_orderkey = t1.l_orderkey ) t2 on t2.o_orderkey = l_orderkey ) t3 on c_custkey = t3.o_custkey group by c_name, c_custkey, t3.o_orderkey, t3.o_orderdate, t3.o_totalprice order by t3.o_totalprice desc, t3.o_orderdate limit 100; --Q19 select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where ( p_partkey = l_partkey and p_brand = 'Brand#12' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity >= 1 and l_quantity <= 1 + 10 and p_size between 1 and 5 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#23' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity >= 10 and l_quantity <= 10 + 10 and p_size between 1 and 10 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#34' and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') and l_quantity >= 20 and l_quantity <= 20 + 10 and p_size between 1 and 15 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ); --Q20 select s_name, s_address from supplier left semi join ( select * from ( select l_partkey,l_suppkey, 0.5 * sum(l_quantity) as l_q from lineitem where l_shipdate >= date '1994-01-01' and l_shipdate < date '1994-01-01' + interval '1' year group by l_partkey,l_suppkey ) t2 join ( select ps_partkey, ps_suppkey, ps_availqty from partsupp left semi join part on ps_partkey = p_partkey and p_name like 'forest%' ) t1 on t2.l_partkey = t1.ps_partkey and t2.l_suppkey = t1.ps_suppkey and t1.ps_availqty > t2.l_q ) t3 on s_suppkey = t3.ps_suppkey join nation where s_nationkey = n_nationkey and n_name = 'CANADA' order by s_name; --Q21 select s_name, count(*) as numwait from lineitem l2 right semi join ( select * from lineitem l3 right anti join ( select * from orders join lineitem l1 on l1.l_orderkey = o_orderkey and o_orderstatus = 'F' join ( select * from supplier join nation where s_nationkey = n_nationkey and n_name = 'SAUDI ARABIA' ) t1 where t1.s_suppkey = l1.l_suppkey and l1.l_receiptdate > l1.l_commitdate ) t2 on l3.l_orderkey = t2.l_orderkey and l3.l_suppkey <> t2.l_suppkey and l3.l_receiptdate > l3.l_commitdate ) t3 on l2.l_orderkey = t3.l_orderkey and l2.l_suppkey <> t3.l_suppkey group by t3.s_name order by numwait desc, t3.s_name limit 100; --Q22 with tmp as (select avg(c_acctbal) as av from customer where c_acctbal > 0.00 and substring(c_phone, 1, 2) in ('13', '31', '23', '29', '30', '18', '17')) select cntrycode, count(*) as numcust, sum(c_acctbal) as totacctbal from ( select substring(c_phone, 1, 2) as cntrycode, c_acctbal from orders right anti join customer c on o_custkey = c.c_custkey join tmp on c.c_acctbal > tmp.av where substring(c_phone, 1, 2) in ('13', '31', '23', '29', '30', '18', '17') ) as custsale group by cntrycode order by cntrycode;
步骤七:测试结果
以下将给出 TPCH 100GB 和 500GB 的测试结果:
Query | TPCH 100GB(s) | TPCH 500GB(s) |
---|---|---|
Q1 | 1.74 | 10.04 |
Q2 | 0.07 | 0.19 |
Q3 | 0.34 | 3.43 |
Q4 | 0.19 | 1.1 |
Q5 | 0.81 | 7.52 |
Q6 | 0.03 | 0.15 |
Q7 | 0.54 | 5.74 |
Q8 | 0.26 | 2.56 |
Q9 | 2.62 | 18.44 |
Q10 | 0.91 | 5.45 |
Q11 | 0.08 | 0.36 |
Q12 | 0.09 | 0.47 |
Q13 | 1.32 | 6.6 |
Q14 | 0.12 | 0.59 |
Q15 | 0.18 | 0.85 |
Q16 | 0.28 | 1.17 |
Q17 | 0.1 | 0.45 |
Q18 | 1.7 | 9.94 |
Q19 | 0.18 | 1.9 |
Q20 | 0.39 | 0.62 |
Q21 | 0.65 | 7.23 |
Q22 | 0.19 | 1.04 |
合计 | 12.79 | 85.84 |