(类 PG 信创)亿级 table 简单查询的一个优化问题 ...

部分信息借由 GPT 进行脱敏处理

现有一个简单表结构：
Table "schema.t_table_m"
   Column    |          Type          |             Modifiers                | Storage  | Stats target | Description
-----------------+-----------------------------+------------------------------------------+----------+--------------+-------------
partition_id | smallint                   | not null                               | plain |             | 分区 ID
transaction_time| timestamp without time zone | not null default pg_systimestamp()    | plain |             | 交易时间
transaction_count | integer                |                                        | plain |             | 交易数量
last_update_time | timestamp without time zone | not null default pg_systimestamp()    | plain |             | 更新时间
source_type    | character varying(10)    | not null default NULL::character varying | extended |             | 事件源
transaction_code | character varying(30)    | not null default NULL::character varying | extended |             | 交易码
key_value    | character varying(100)    | not null default NULL::character varying | extended |             | RV 变量键
value_data    | character varying(500)    | not null default NULL::character varying | extended |             | RV 变量值
Indexes:
"t_table_m_pkey" PRIMARY KEY, ubtree (transaction_time, value_data, key_value, transaction_code, source_type, partition_id) LOCAL WITH (storage_type=USTORE, active_pages=1314323) TABLESPACE pg_default
Partition By LIST(partition_id)
Number of partitions: 31 (View pg_partition to check each partition range.)
Has OIDs: no
Options: orientation=row, compression=no, storage_type=USTORE, segment=off
表由 Mysql 迁移而来，这个分区表中存储的记录条数为 126000000+。在开发侧，他们反馈下面的一个简单查询执行性能非常差，每条 SQL 要跑 15s 左右：
SELECT
  value_data AS dataValue,
  SUM(transaction_count) AS txnCount
FROM
  schema.t_table_m
WHERE
  1 = 1
  AND source_type = '1'
  AND key_value IN (
'key1',
'key2',
'key3',
'key4',
'key5',
'key6',
'key7',
'key8',
'key9',
'key10'
  )
  AND partition_id IN (1, 31)
  AND transaction_time >= '2024-07-31 15:58:11'
  AND transaction_time
执行计划如下：
postgres=> explain performance
SELECT
.......;
id |                                        operation                                           |  A-time | A-rows | E-rows | E-distinct | Peak Memory | A-width | E-width | E-costs
----+------------------------------------------------------------------------------------------------+-----------+--------+--------+------------+-------------+---------+---------+----------------
  1 | ->  HashAggregate                                                                            | 12715.021 |    0 |    1 |          | 22KB       |       |    8 | 28.485..28.495
  2 | ->  Partition Iterator                                                                   | 12715.011 |    0 |    1 |          | 69KB       |       |    8 | 0.010..28.480
  3 |    ->  Partitioned Index Scan using t_table_m_pkey on schema.t_table_m                   | 12714.903 |    0 |    1 |          | 29KB       |       |    8 | 0.010..28.480
(3 rows)
         Predicate Information (identified by plan id)
------------------------------------------------------------------------------------------
2 --Partition Iterator
      Iterations: 2
3 --Partitioned Index Scan using t_table_m_pkey on schema.t_table_m
      Index Cond: ((schema.t_table_m.transaction_time >= '2024-07-31 15:58:11'::timestamp without time zone) AND (schema.t_table_m.transaction_time  'code1'::text) AND ((schema.t_table_m.transaction_code)::text  'code2'::text) AND ((schema.t_table_m.value_data)::text  ALL ('{"null",space,#,*}'::text[])))
      Selected Partitions:  1,31
(6 rows)
                     Memory Information (identified by plan id)
------------------------------------------------------------------------------------------
1 --HashAggregate
      Peak Memory: 22KB, Estimate Memory: 32MB
2 --Partition Iterator
      Peak Memory: 69KB, Estimate Memory: 32MB
3 --Partitioned Index Scan using t_table_m_pkey on schema.t_table_m
      Peak Memory: 29KB, Estimate Memory: 32MB
(6 rows)
                     Targetlist Information (identified by plan id)
--------------------------------------------------------------------------------------------
1 --HashAggregate
      Output: value_data, sum(transaction_count)
      Group By Key: schema.t_table_m.value_data
2 --Partition Iterator
      Output: value_data, transaction_count
3 --Partitioned Index Scan using t_table_m_pkey on schema.t_table_m
      Output: value_data, transaction_count
(7 rows)
                     Datanode Information (identified by plan id)
--------------------------------------------------------------------------------------------
1 --HashAggregate
      (actual time=12715.021..12715.021 rows=0 loops=1)
      (Buffers: shared hit=760703)
      (CPU: ex c/r=0, ex row=0, ex cyc=437, inc cyc=847446039235951)
2 --Partition Iterator
      (actual time=12715.011..12715.011 rows=0 loops=1)
      (CPU: ex c/r=0, ex row=0, ex cyc=-847446039224001, inc cyc=847446039235514)
3 --Partitioned Index Scan using t_table_m_pkey on schema.t_table_m
      (actual time=12714.903..12714.903 rows=0 loops=2)
      (Buffers: shared hit=760703)
      (CPU: ex c/r=0, ex row=0, ex cyc=1694892078459515, inc cyc=1694892078459515)
(11 rows)
   ====== Query Summary =====
------------------------------------------
Datanode executor start time: 0.177 ms
Datanode executor run time: 12715.042 ms
Datanode executor end time: 0.021 ms
Planner runtime: 1.124 ms
Query Id: 10810046480632471817
Total runtime: 12715.260 ms
(6 rows)
其中所有涉及列都已经命中索引扫描，测试时尝试更改某些 where 条件使得查询进行全表扫描的话，执行时间会暴增到 2 分钟左右。
经由我们多轮调试后，发现仅仅是单纯地将 where 条件中partition_id IN (1, 31)更改为(partition_id=31 OR partition_id=1)，就能使得查询平均耗时减少到 6s 。其它优化更改的效果低于系统性能波动，无法观测。
原本猜测是因为 IN 语句改写后会触发分区剪枝类的优化，但是经对比发现改写前后的执行计划完全一致：
postgres=> explain performance
SELECT
  value_data AS dataValue,
  SUM(transaction_count) AS txnCount
FROM
  schema.t_table_m
WHERE
  1 = 1
  AND source_type = '1'
  AND key_value = any (array[
'key1',
'key2',
'key3',
'key4',
'key5',
'key6',
'key7',
'key8',
'key9',
'key10'
  ])
  AND (partition_id = 1 OR partition_id = 31)
  AND transaction_time >= '2024-07-31 15:58:11'
  AND transaction_time  'code1'
  AND transaction_code  'code2'
GROUP BY
  value_data;
id |                                        operation                                           |  A-time | A-rows | E-rows | E-distinct | Peak Memory | A-width | E-width | E-costs
----+------------------------------------------------------------------------------------------------+-----------+--------+--------+------------+-------------+---------+---------+----------------
  1 | ->  HashAggregate                                                                            | 6421.454 |    0 |    1 |          | 22KB       |       |    8 | 14.790..14.800
  2 | ->  Partition Iterator                                                                   | 6421.441 |    0 |    1 |          | 69KB       |       |    8 | 0.010..14.785
  3 |    ->  Partitioned Index Scan using t_table_m_pkey on schema.t_table_m                   | 6421.276 |    0 |    1 |          | 31KB       |       |    8 | 0.010..14.785
(3 rows)
         Predicate Information (identified by plan id)
-----------------------------------------------------------------------------------------
2 --Partition Iterator
      Iterations: 2
3 --Partitioned Index Scan using t_table_m_pkey on schema.t_table_m
      Index Cond: ((schema.t_table_m.transaction_time >= '2024-07-31 15:58:11'::timestamp without time zone) AND (schema.t_table_m.transaction_time  'code1'::text) AND ((schema.t_table_m.transaction_code)::text  'code2'::text) AND ((schema.t_table_m.partition_id = 31) OR (schema.t_table_m.partition_id = 1)) AND ((schema.t_table_m.value_data)::text  ALL ('{"null",space,#,*}'::text[])))
      Selected Partitions:  1,31
(6 rows)

         Memory Information (identified by plan id)
------------------------------------------------------------------------------------------
1 --HashAggregate
      Peak Memory: 22KB, Estimate Memory: 32MB
2 --Partition Iterator
      Peak Memory: 69KB, Estimate Memory: 32MB
3 --Partitioned Index Scan using t_table_m_pkey on schema.t_table_m
      Peak Memory: 31KB, Estimate Memory: 32MB
(6 rows)
                     Targetlist Information (identified by plan id)
--------------------------------------------------------------------------------------------
1 --HashAggregate
      Output: value_data, sum(transaction_count)
      Group By Key: schema.t_table_m.value_data
2 --Partition Iterator
      Output: value_data, transaction_count
3 --Partitioned Index Scan using t_table_m_pkey on schema.t_table_m
      Output: value_data, transaction_count
(7 rows)
                     Datanode Information (identified by plan id)
--------------------------------------------------------------------------------------------
1 --HashAggregate
      (actual time=6421.454..6421.454 rows=0 loops=1)
      (Buffers: shared hit=380353)
      (CPU: ex c/r=0, ex row=0, ex cyc=488, inc cyc=847433445353297)
2 --Partition Iterator
      (actual time=6421.441..6421.441 rows=0 loops=1)
      (CPU: ex c/r=0, ex row=0, ex cyc=-847433445336080, inc cyc=847433445352809)
3 --Partitioned Index Scan using t_table_m_pkey on schema.t_table_m
      (actual time=6421.276..6421.276 rows=0 loops=2)
      (Buffers: shared hit=380353)
      (CPU: ex c/r=0, ex row=0, ex cyc=1694866890688889, inc cyc=1694866890688889)
(11 rows)
   ====== Query Summary =====
-----------------------------------------
Datanode executor start time: 0.198 ms
Datanode executor run time: 6421.479 ms
Datanode executor end time: 0.022 ms
Planner runtime: 1.417 ms
Query Id: 10810046480632413864
Total runtime: 6421.729 ms
(6 rows)
两者唯一的区别是，在 Partitioned Index Scan 时，使用 OR 的语句 Buffers: shared hit 是使用 IN 的一半。
其中造成这样的差距的可能原因是什么？

(类 PG 信创)亿级 table 简单查询的一个优化问题

浏览过的版块

热门主题

ioio事件是什么鬼？

养老贷又来了，贷贷相传啊

今天要撸2次

好评有礼给的是红包还是优惠卷

现在干啥都太难了，珍惜吧

淘宝现在也好难搞啊

现在的ai能生产图文结合的内容吗

周固固突然发飙了，谁惹他了呢？吃光群众等

怎么吵架了啊

老坛们看过来，周固固同志狂撒金币。折射一

热门板块

公告

网站帮助 - Yoo趣儿

我们的愿景

在 Yoo趣儿投放广告

Yoo趣儿网站用户应遵守规则