PostgreSQL이 더 비싼 가입 순서를 선택하는 이유는 무엇입니까?

기본값을 사용하는 PostgreSQL

default_statistics_target=1000
random_page_cost=1.5

버전

PostgreSQL 10.4 on x86_64-pc-linux-musl, compiled by gcc (Alpine 6.4.0) 6.4.0, 64-bit

진공 청소기로 청소하고 분석했습니다. 쿼리는 매우 간단합니다.

SELECT r.price
FROM account_payer ap
  JOIN account_contract ac ON ap.id = ac.account_payer_id
  JOIN account_schedule "as" ON ac.id = "as".account_contract_id
  JOIN schedule s ON "as".id = s.account_schedule_id
  JOIN rate r ON s.id = r.schedule_id
WHERE ap.account_id = 8

모든 id열은 기본 키이며 결합되는 모든 항목은 외래 키 관계이며 각 외래 키에는 인덱스가 있습니다. 에 대한 색인을 추가했습니다 account_payer.account_id.

76k 행을 반환하는 데 3.93 초가 걸립니다.

Merge Join  (cost=8.06..83114.08 rows=3458267 width=6) (actual time=0.228..3920.472 rows=75548 loops=1)
  Merge Cond: (s.account_schedule_id = "as".id)
  ->  Nested Loop  (cost=0.57..280520.54 rows=6602146 width=14) (actual time=0.163..3756.082 rows=448173 loops=1)
        ->  Index Scan using schedule_account_schedule_id_idx on schedule s  (cost=0.14..10.67 rows=441 width=16) (actual time=0.035..0.211 rows=89 loops=1)
        ->  Index Scan using rate_schedule_id_code_modifier_facility_idx on rate r  (cost=0.43..486.03 rows=15005 width=10) (actual time=0.025..39.903 rows=5036 loops=89)
              Index Cond: (schedule_id = s.id)
  ->  Materialize  (cost=0.43..49.46 rows=55 width=8) (actual time=0.060..12.984 rows=74697 loops=1)
        ->  Nested Loop  (cost=0.43..49.32 rows=55 width=8) (actual time=0.048..1.110 rows=66 loops=1)
              ->  Nested Loop  (cost=0.29..27.46 rows=105 width=16) (actual time=0.030..0.616 rows=105 loops=1)
                    ->  Index Scan using account_schedule_pkey on account_schedule "as"  (cost=0.14..6.22 rows=105 width=16) (actual time=0.014..0.098 rows=105 loops=1)
                    ->  Index Scan using account_contract_pkey on account_contract ac  (cost=0.14..0.20 rows=1 width=16) (actual time=0.003..0.003 rows=1 loops=105)
                          Index Cond: (id = "as".account_contract_id)
              ->  Index Scan using account_payer_pkey on account_payer ap  (cost=0.14..0.21 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=105)
                    Index Cond: (id = ac.account_payer_id)
                    Filter: (account_id = 8)
                    Rows Removed by Filter: 0
Planning time: 5.843 ms
Execution time: 3929.317 ms

설정 join_collapse_limit=1하면 25 배 빠른 속도로 0.16 초가 걸립니다.

Nested Loop  (cost=6.32..147323.97 rows=3458267 width=6) (actual time=8.908..151.860 rows=75548 loops=1)
  ->  Nested Loop  (cost=5.89..390.23 rows=231 width=8) (actual time=8.730..11.655 rows=66 loops=1)
        Join Filter: ("as".id = s.account_schedule_id)
        Rows Removed by Join Filter: 29040
        ->  Index Scan using schedule_pkey on schedule s  (cost=0.27..17.65 rows=441 width=16) (actual time=0.014..0.314 rows=441 loops=1)
        ->  Materialize  (cost=5.62..8.88 rows=55 width=8) (actual time=0.001..0.011 rows=66 loops=441)
              ->  Hash Join  (cost=5.62..8.61 rows=55 width=8) (actual time=0.240..0.309 rows=66 loops=1)
                    Hash Cond: ("as".account_contract_id = ac.id)
                    ->  Seq Scan on account_schedule "as"  (cost=0.00..2.05 rows=105 width=16) (actual time=0.010..0.028 rows=105 loops=1)
                    ->  Hash  (cost=5.02..5.02 rows=48 width=8) (actual time=0.178..0.178 rows=61 loops=1)
                          Buckets: 1024  Batches: 1  Memory Usage: 11kB
                          ->  Hash Join  (cost=1.98..5.02 rows=48 width=8) (actual time=0.082..0.143 rows=61 loops=1)
                                Hash Cond: (ac.account_payer_id = ap.id)
                                ->  Seq Scan on account_contract ac  (cost=0.00..1.91 rows=91 width=16) (actual time=0.007..0.023 rows=91 loops=1)
                                ->  Hash  (cost=1.64..1.64 rows=27 width=8) (actual time=0.048..0.048 rows=27 loops=1)
                                      Buckets: 1024  Batches: 1  Memory Usage: 10kB
                                      ->  Seq Scan on account_payer ap  (cost=0.00..1.64 rows=27 width=8) (actual time=0.009..0.023 rows=27 loops=1)
                                            Filter: (account_id = 8)
                                            Rows Removed by Filter: 24
  ->  Index Scan using rate_schedule_id_code_modifier_facility_idx on rate r  (cost=0.43..486.03 rows=15005 width=10) (actual time=0.018..1.685 rows=1145 loops=66)
        Index Cond: (schedule_id = s.id)
Planning time: 4.692 ms
Execution time: 160.585 ms

이 출력은 나에게 의미가 없습니다. 첫 번째는 스케줄 및 비율 인덱스에 대한 중첩 루프 조인에 대해 280,500의 (매우 높은) 비용이 있습니다. PostgreSQL이 고비용의 조인을 먼저 선택하는 이유는 무엇입니까?

의견을 통해 요청 된 추가 정보

가 rate_schedule_id_code_modifier_facility_idx복합 지수는?

그것과 함께, 인 schedule_id첫 번째 컬럼 인. 전용 인덱스로 만들었고 쿼리 플래너에서 선택했지만 성능에 영향을 미치지 않거나 계획에 영향을 미치지 않습니다.

— 폴 드레이퍼
소스

당신은 설정을 변경할 수 default_statistics_target및 random_page_cost기본값으로 다시? default_statistics_target더 높이면 어떻게 되나요? DB Fiddle (dbfiddle.uk에서)을 만들어 문제를 재현 할 수 있습니까?

— Colin 't Hart

실제 통계를 검사하여 데이터에 왜곡되거나 이상한 것이 있는지 확인할 수 있습니까? postgresql.org/docs/10/static/planner-stats.html

— Colin 't Hart

파라미터의 현재 값은 무엇입니까 work_mem? 그것을 바꾸는 것은 다른 타이밍을 제공합니까?

— eppesuig

통계가 정확하지 않은 것 같습니다 (진공 분석을 실행하여 새로 고침 create statistics).

그만큼 join_collapse 매개 변수를 사용하면 플래너가 결합을 재 배열 할 수 있으므로 더 적은 데이터를 가져 오는 결합을 먼저 수행합니다. 그러나 성능을 위해 플래너가 조인이 많은 쿼리에서이를 수행하도록 할 수 없습니다. 기본적으로 최대 8 조인으로 설정되어 있습니다. 1로 설정하면 해당 기능이 비활성화됩니다.

그러면 postgres는 어떻게이 쿼리가 가져와야하는 행 수를 예측합니까? 통계를 사용하여 행 수를 추정합니다.

Explain Plan에서 볼 수있는 것은 여러 개의 부정확 한 행 수 추정치가 있다는 것입니다 (첫 번째 값은 예상, 두 번째는 실제 임).

예를 들면 다음과 같습니다.

Materialize  (cost=0.43..49.46 rows=55 width=8) (actual time=0.060..12.984 rows=74697 loops=1)

플래너는 실제로 74697을 얻었을 때 55 행을 얻는 것으로 추정했습니다.

내가 할 일은 (내 신발에 있다면) :

analyze 통계 새로 고침에 관련된 5 개의 테이블
다시 하다 explain analyze
추정 행 번호와 실제 행 번호의 차이를보십시오
예상 행 번호가 맞으면 계획이 변경되어 더 효율적일 수 있습니다. 모든 것이 정상이면 자동 진공 설정 변경을 고려하여 분석 (및 진공)이 더 자주 수행됩니다.
추정 행 번호가 여전히 잘못된 경우 테이블에 데이터를 상관시킨 것으로 보입니다 (세 번째 일반 양식 위반). CREATE STATISTICS(documentation here )

행 추정 및 계산에 대한 자세한 정보가 필요하면 Tomas Vondra의 대화 "통계 만들기-무엇을위한 것입니까?"에서 필요한 모든 것을 찾을 수 있습니다. ( 여기 슬라이드 )

— 아르케 나
소스