다음 조인은 파티션에서 조인을 수행 할 때와 전체 테이블에서 조인 할 때 행 추정값이 매우 다릅니다.
CREATE TABLE m_data.ga_session (
session_id BIGINT NOT NULL,
visitor_id BIGINT NOT NULL,
transaction_id TEXT,
timestamp TIMESTAMP WITH TIME ZONE NOT NULL,
day_id INTEGER NOT NULL,
[...]
device_category TEXT NOT NULL,
[...]
operating_system TEXT
);
모든 파티션의 경우 :
CREATE TABLE IF NOT EXISTS m_data.ga_session_20170127 ( CHECK (day_id = 20170127) ) INHERITS (m_data.ga_session);
-- the identifier are theoretically invalid, but they get truncated to 63 chars and nevertheless work
CREATE INDEX IF NOT EXISTS "ga_session__m_tmp.normalize_device_category(ga_session.device_category)" on m_data.ga_session_20170127 USING btree (m_tmp.normalize_device_category(device_category)) ;
CREATE INDEX IF NOT EXISTS "ga_session__m_tmp.normalize_operating_system(operating_system)" on m_data.ga_session_20170127 USING btree (m_tmp.normalize_operating_system(operating_system)) ;
ANALYZE m_data.ga_session_20170127;
EXPLAIN analyse
SELECT *
FROM m_data.ga_session_20170127 ga_session
JOIN m_dim_next.device ON
device.device_category_name = m_tmp.normalize_device_category(ga_session.device_category)
AND device.operating_system_name = m_tmp.normalize_operating_system(ga_session.operating_system);
파티션에서 이러한 인덱스에 대한 통계가 표시됩니다.
SELECT * FROM pg_stats WHERE tablename ilike 'ga_session_20170127%';
schemaname |tablename |attname |inherited |null_frac |avg_width |n_distinct
-----------|----------------------------------------------------------------|---------------------------|----------|------------|----------|-------------
m_data |ga_session_20170127__m_tmp.normalize_device_category(device_cat |normalize_device_category |false |0 |10 |3
m_data |ga_session_20170127__m_tmp.normalize_operating_system(operating |normalize_operating_system |false |0 |7 |14
(파티션의 인덱스에 대한 통계와 함께) 다음과 같은 쿼리 계획 추정치가 산출됩니다. 추정 된 80146, 실제 77503
Hash Join (cost=1.95..6103.53 rows=80146 width=262) (actual time=0.121..117.204 rows=77503 loops=1)
Hash Cond: ((COALESCE(initcap(ga_session.device_category), 'Unknown'::text) = device.device_category_name) AND (COALESCE(replace(ga_session.operating_system, '(not set)'::text, 'Unknown'::text), 'Unknown'::text) = device.operating_system_name))
-> Seq Scan on ga_session_20170127 ga_session (cost=0.00..2975.03 rows=77503 width=224) (actual time=0.010..9.203 rows=77503 loops=1)
-> Hash (cost=1.38..1.38 rows=38 width=38) (actual time=0.064..0.064 rows=38 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 11kB
-> Seq Scan on device (cost=0.00..1.38 rows=38 width=38) (actual time=0.006..0.019 rows=38 loops=1)
Planning time: 1.460 ms
Execution time: 120.098 ms
작동하지 않는 것은 전체 테이블에 대한 조인으로, 완전히 잘못된 행 수를 추정합니다 (832 추정 대 876237 실제).
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=1.95..60056.78 rows=832 width=262) (actual time=0.037..1065.778 rows=876237 loops=1)
Hash Cond: ((COALESCE(initcap(ga_session.device_category), 'Unknown'::text) = device.device_category_name) AND (COALESCE(replace(ga_session.operating_system, '(not set)'::text, 'Unknown'::text), 'Unknown'::text) = device.operating_system_name))
-> Append (cost=0.00..33759.37 rows=876238 width=225) (actual time=0.005..132.070 rows=876237 loops=1)
-> Seq Scan on ga_session (cost=0.00..0.00 rows=1 width=319) (actual time=0.000..0.000 rows=0 loops=1)
-> Seq Scan on ga_session_20170125 ga_session_1 (cost=0.00..3648.38 rows=94438 width=226) (actual time=0.005..10.606 rows=94438 loops=1)
-> Seq Scan on ga_session_20170126 ga_session_2 (cost=0.00..3185.81 rows=82581 width=225) (actual time=0.014..8.982 rows=82581 loops=1)
-> Seq Scan on ga_session_20170127 ga_session_3 (cost=0.00..2975.03 rows=77503 width=224) (actual time=0.002..8.797 rows=77503 loops=1)
-> Seq Scan on ga_session_20170128 ga_session_4 (cost=0.00..2936.83 rows=76083 width=225) (actual time=0.003..7.873 rows=76083 loops=1)
-> Seq Scan on ga_session_20170129 ga_session_5 (cost=0.00..3716.18 rows=96618 width=224) (actual time=0.002..9.318 rows=96618 loops=1)
-> Seq Scan on ga_session_20170130 ga_session_6 (cost=0.00..3833.19 rows=99619 width=224) (actual time=0.002..9.453 rows=99619 loops=1)
-> Seq Scan on ga_session_20170131 ga_session_7 (cost=0.00..3488.79 rows=90579 width=225) (actual time=0.002..8.298 rows=90579 loops=1)
-> Seq Scan on ga_session_20170201 ga_session_8 (cost=0.00..3615.58 rows=93958 width=224) (actual time=0.002..9.199 rows=93958 loops=1)
-> Seq Scan on ga_session_20170202 ga_session_9 (cost=0.00..3286.56 rows=85256 width=224) (actual time=0.006..8.021 rows=85256 loops=1)
-> Seq Scan on ga_session_20170203 ga_session_10 (cost=0.00..3073.02 rows=79602 width=225) (actual time=0.002..7.727 rows=79602 loops=1)
-> Hash (cost=1.38..1.38 rows=38 width=38) (actual time=0.016..0.016 rows=38 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 11kB
-> Seq Scan on device (cost=0.00..1.38 rows=38 width=38) (actual time=0.002..0.004 rows=38 loops=1)
Planning time: 1.017 ms
Execution time: 1090.213 ms
그러면 해당 조인을 사용할 때 더 많은 조인 (여기서는 표시되지 않음)이 발생할 때 잘못된 조인 선택 (중첩 루프)이 발생합니다.
실제로 파티션에서 ANALYSE
다시 실행하기 전에 파티션에 대한 행 예측이 잘못 되었으므로 쿼리 플래너가 전체 테이블을 사용할 때 인덱스 기반 통계를 고려하지 않는 것 같습니다.
쿼리 계획자가 부모 테이블 수준에서 통계를 수집하거나 쿼리 계획을 작성할 때 파티션의 개별 통계를 고려할 수있는 방법이 있습니까?