PostgreSQL EXPLAIN PLAN

  • Explain plan is mainly used to how Postgres will execute given a query and also Understanding this tells you how you can optimize your database with indexes to improve performance
Most commonly, explain is run on SELECT statements. However, you can also use it on:
  • INSERT
  • UPDATE
  • DELETE
  • EXECUTE
  • DECLARE 
There are different types of scan nodes for different table access methods: 
  • sequential scans, 
  • index scans, and 
  • bitmap index scans.
1.SIMPLE EXECUTION PLAN :
This query has no WHERE clause, it must scan all the rows of the table, so the planner has chosen to use a simple sequential scan plan. 
EXPLAIN SELECT * FROM tenk1;
                         QUERY PLAN
-------------------------------------------------------------
 Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=244)  
FROM ABOVE PLAN:-
COST:Estimated start-up cost. 
TOTAL COST:Estimated total cost. 
ROWS:Estimated number of rows output by this plan node.
WIDTH:Estimated average width of rows output by this plan node (in bytes).
SELECT relpages, reltuples FROM pg_class WHERE relname = 'tenk1';  
you will find that tenk1 has 358 disk pages and 10000 rows. The estimated cost is computed as (disk pages read * seq_page_cost) + (rows scanned * cpu_tuple_cost). By default, seq_page_cost is 1.0 and cpu_tuple_cost is 0.01, so the estimated cost is (358 * 1.0) + (10000 * 0.01) = 458.

USING - WHERE CONDITION:
EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 7000;
                         QUERY PLAN
------------------------------------------------------------
 Seq Scan on tenk1  (cost=0.00..483.00 rows=7001 width=244)
   Filter: (unique1 < 7000)  
  • "seq scan" This means that the plan node checks the condition for each row it scans, and outputs only the ones that pass the condition. The estimate of output rows has been reduced because of the WHERE clause
  • However, the scan will still have to visit all 10000 rows, so the cost hasn't decreased in fact it has gone up a bit (by 10000 * cpu_operator_cost, to be exact) to reflect the extra CPU time spent checking the WHERE condition.
EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 100;
                                  QUERY PLAN
------------------------------------------------------------------------------
 Bitmap Heap Scan on tenk1  (cost=5.07..229.20 rows=101 width=244)
   Recheck Cond: (unique1 < 100)
   ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..5.04 rows=101 width=0)
         Index Cond: (unique1 < 100)   

EXECUTION PLAN FOR SIMPLE INDEX SCAN:
EXPLAIN SELECT * FROM tenk1 WHERE unique1 = 42;
                                 QUERY PLAN
-----------------------------------------------------------------------------
 Index Scan using tenk1_unique1 on tenk1  (cost=0.29..8.30 rows=1 width=244)
   Index Cond: (unique1 = 42)  
In this type of plan the table rows are fetched in index order, which makes them even more expensive to read, but there are so few that the extra cost of sorting the row locations is not worth it. You'll most often see this plan type for queries that fetch just a single row. It's also often used for queries that have an ORDER BY condition that matches the index order, because then no extra sorting step is needed to satisfy the ORDER BY.

EXECUTION PLAN FOR SEPERATE INDEXES ON SEVERAL OF THE COLUMN:
EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000;
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Bitmap Heap Scan on tenk1  (cost=25.08..60.21 rows=10 width=244)
   Recheck Cond: ((unique1 < 100) AND (unique2 > 9000))
   ->  BitmapAnd  (cost=25.08..25.08 rows=10 width=0)
         ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..5.04 rows=101 width=0)
               Index Cond: (unique1 < 100)
         ->  Bitmap Index Scan on tenk1_unique2  (cost=0.00..19.78 rows=999 width=0)
               Index Cond: (unique2 > 9000)                                                 
But this requires visiting both indexes, so it's not necessarily a win compared to using just one index and treating the other condition as a filter. If you vary the ranges involved you'll see the plan change accordingly.

EXPLAIN PLAN USING LIMIT CLAUSE:
EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000 LIMIT 2;

                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Limit  (cost=0.29..14.48 rows=2 width=244)
   ->  Index Scan using tenk1_unique2 on tenk1  (cost=0.29..71.27 rows=10 width=244)
         Index Cond: (unique2 > 9000)
         Filter: (unique1 < 100)                                                           
This is the same query as above, but we added a LIMIT so that not all the rows need be retrieved the Limit node is expected to stop after retrieving only a fifth of those rows

EXECUTION PLAN FOR JOINING QUERY:
EXPLAIN SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;

                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Nested Loop  (cost=4.65..118.62 rows=10 width=488)
   ->  Bitmap Heap Scan on tenk1 t1  (cost=4.36..39.47 rows=10 width=244)
         Recheck Cond: (unique1 < 10)
         ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..4.36 rows=10 width=0)
               Index Cond: (unique1 < 10)
   ->  Index Scan using tenk2_unique2 on tenk2 t2  (cost=0.29..7.91 rows=1 width=244)
         Index Cond: (unique2 = t1.unique2)   
In this plan, we have a nested-loop join node with two table scans as inputs, or children.

EXAMPLE 2:
EXPLAIN SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 10 AND t2.unique2 < 10 AND t1.hundred < t2.hundred;

                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Nested Loop  (cost=4.65..49.46 rows=33 width=488)
   Join Filter: (t1.hundred < t2.hundred)
   ->  Bitmap Heap Scan on tenk1 t1  (cost=4.36..39.47 rows=10 width=244)
         Recheck Cond: (unique1 < 10)
         ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..4.36 rows=10 width=0)
               Index Cond: (unique1 < 10)
   ->  Materialize  (cost=0.29..8.51 rows=10 width=244)
         ->  Index Scan using tenk2_unique2 on tenk2 t2  (cost=0.29..8.46 rows=10 width=244)
               Index Cond: (unique2 < 10)
In this example the join's output row count is the same as the product of the two scans' row counts, but that's not true in all cases because there can be additional WHERE clauses that mention both tables and so can only be applied at the join point, not to either input scan
EXECUTION PLAN FOR MERGE JOIN:
EXPLAIN SELECT *
FROM tenk1 t1, onek t2
WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2;

                                        QUERY PLAN
------------------------------------------------------------------------------------------
 Merge Join  (cost=198.11..268.19 rows=10 width=488)
   Merge Cond: (t1.unique2 = t2.unique2)
   ->  Index Scan using tenk1_unique2 on tenk1 t1  (cost=0.29..656.28 rows=101 width=244)
         Filter: (unique1 < 100)
   ->  Sort  (cost=197.83..200.33 rows=1000 width=244)
         Sort Key: t2.unique2
         ->  Seq Scan on onek t2  (cost=0.00..148.00 rows=1000 width=244)  
Merge join requires its input data to be sorted on the join keys. In this plan the tenk1 data is sorted by using an index scan to visit the rows in the correct order, but a sequential scan and sort is preferred for onek, because there are many more rows to be visited in that table

2.EXPLAIN ANALYZE 
It check the accuracy of the planner's estimates by using EXPLAIN's ANALYZE option,EXPLAIN actually executes the query, and then displays the true row counts and true run time accumulated within each plan node, along with the same estimates that a plain EXPLAIN shows.
EXPLAIN ANALYZE SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;

                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=4.65..118.62 rows=10 width=488) (actual time=0.128..0.377 rows=10 loops=1)
   ->  Bitmap Heap Scan on tenk1 t1  (cost=4.36..39.47 rows=10 width=244) (actual time=0.057..0.121 rows=10 loops=1)
         Recheck Cond: (unique1 < 10)
         ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..4.36 rows=10 width=0) (actual time=0.024..0.024 rows=10 loops=1)
               Index Cond: (unique1 < 10)
   ->  Index Scan using tenk2_unique2 on tenk2 t2  (cost=0.29..7.91 rows=1 width=244) (actual time=0.021..0.022 rows=1 loops=10)
         Index Cond: (unique2 = t1.unique2)
 Planning time: 0.181 ms
 Execution time: 0.501 ms  
Note:EXPLAIN ANALYZE shows additional execution statistics beyond the "plan node execution times" and row counts.

EXPLAIN ANALYZE - SORT AND HASH JOIN:
EXPLAIN ANALYZE SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2 ORDER BY t1.fivethous;

                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=717.34..717.59 rows=101 width=488) (actual time=7.761..7.774 rows=100 loops=1)
   Sort Key: t1.fivethous
   Sort Method: quicksort  Memory: 77kB
   ->  Hash Join  (cost=230.47..713.98 rows=101 width=488) (actual time=0.711..7.427 rows=100 loops=1)
         Hash Cond: (t2.unique2 = t1.unique2)
         ->  Seq Scan on tenk2 t2  (cost=0.00..445.00 rows=10000 width=244) (actual time=0.007..2.583 rows=10000 loops=1)
         ->  Hash  (cost=229.20..229.20 rows=101 width=244) (actual time=0.659..0.659 rows=100 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 28kB
               ->  Bitmap Heap Scan on tenk1 t1  (cost=5.07..229.20 rows=101 width=244) (actual time=0.080..0.526 rows=100 loops=1)
                     Recheck Cond: (unique1 < 100)
                     ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..5.04 rows=101 width=0) (actual time=0.049..0.049 rows=100 loops=1)
                           Index Cond: (unique1 < 100)
 Planning time: 0.194 ms
 Execution time: 8.008 ms  
The Hash node shows the number of hash buckets and batches as well as the peak amount of memory used for the hash table. If the number of batches exceeds one, there will also be disk space usage involved, but that is not shown.

EXPLAIN ANALYZE - REMOVED ROWS & FORCE INDEX SCAN:
EXPLAIN ANALYZE SELECT * FROM polygon_tbl WHERE f1 @> polygon '(0.5,2.0)';
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Seq Scan on polygon_tbl  (cost=0.00..1.05 rows=1 width=32) (actual time=0.044..0.044 rows=0 loops=1)
   Filter: (f1 @> '((0.5,2))'::polygon)
   Rows Removed by Filter: 4
 Planning time: 0.040 ms
 Execution time: 0.083 ms  
The planner thinks (quite correctly) that this sample table is too small to bother with an index scan, so we have a plain sequential scan in which all the rows got rejected by the filter condition. But if we force an index scan to be used
SET enable_seqscan TO off;

EXPLAIN ANALYZE SELECT * FROM polygon_tbl WHERE f1 @> polygon '(0.5,2.0)';

                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Index Scan using gpolygonind on polygon_tbl  (cost=0.13..8.15 rows=1 width=32) (actual time=0.062..0.062 rows=0 loops=1)
   Index Cond: (f1 @> '((0.5,2))'::polygon)
   Rows Removed by Index Recheck: 1
 Planning time: 0.034 ms
 Execution time: 0.144 ms
  
Here we can see that the index returned one candidate row, which was then rejected by a recheck of the index condition. This happens because a GiST index is "lossy" for polygon containment tests: it actually returns the rows with polygons that overlap the target, and then we have to do the exact containment test on those rows.

3.EXPLAIN ANALYZE BUFFER:
EXPLAIN has a BUFFERS option that can be used with ANALYZE to get even more run time statistics
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000;

                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tenk1  (cost=25.08..60.21 rows=10 width=244) (actual time=0.323..0.342 rows=10 loops=1)
   Recheck Cond: ((unique1 < 100) AND (unique2 > 9000))
   Buffers: shared hit=15
   ->  BitmapAnd  (cost=25.08..25.08 rows=10 width=0) (actual time=0.309..0.309 rows=0 loops=1)
         Buffers: shared hit=7
         ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..5.04 rows=101 width=0) (actual time=0.043..0.043 rows=100 loops=1)
               Index Cond: (unique1 < 100)
               Buffers: shared hit=2
         ->  Bitmap Index Scan on tenk1_unique2  (cost=0.00..19.78 rows=999 width=0) (actual time=0.227..0.227 rows=999 loops=1)
               Index Cond: (unique2 > 9000)
               Buffers: shared hit=5
 Planning time: 0.088 ms
 Execution time: 0.423 ms                                                                 
Note: From above example We can calculate  where is data read from "Disk" or "buffers"  
see above example you see  "shared hit" this is means data read  from buffers other options is "shared Read" this means data read from Disk(Storage)

EXPLAIN PLAN FOR JSON TABLE:

  • index is fast fetching data without going full table scan.
Here  i will explain how to create index on json data type table and how to  work index on json data type.
--check the sales table for already having any index or not.
postgres=# \d sales
       Table "public.sales"
   Column   |  Type   | Modifiers 
------------+---------+-----------
 id         | integer | not null
 sales_info | json    | not null
Indexes:
    "sales_pkey" PRIMARY KEY, btree (id)


postgres=# \di sales
No matching relations found.
--already we have sales table now let us go to create indicies on any (even nested) JSON field:
 create unique index ind_name
on sales ((sales_info ->'PRODUCTS'->>'total_item'));
--Now check the table index is properly created or not using \di,\dt,pg_indexes.
postgres=# \di ind_name
              List of relations
 Schema |   Name   | Type  |  Owner   | Table 
--------+----------+-------+----------+-------
 public | ind_name | index | postgres | sales
(1 row)



Deleting json objectspostgres=# \d sales
       Table "public.sales"
   Column   |  Type   | Modifiers 
------------+---------+-----------
 id         | integer | not null
 sales_info | json    | not null
Indexes:
    "sales_pkey" PRIMARY KEY, btree (id)
    "ind_name" UNIQUE, btree (((sales_info -> 'PRODUCTS'::text) ->> 'total_item'::text))



postgres=# select tablename,indexname,indexdef from pg_indexes where tablename='sales';
 tablename | indexname  |                                                    indexdef                       
                              
-----------+------------+-----------------------------------------------------------------------------------
------------------------------
 sales     | sales_pkey | CREATE UNIQUE INDEX sales_pkey ON sales USING btree (id)
 sales     | ind_name   | CREATE UNIQUE INDEX ind_name ON sales USING btree ((((sales_info -> 'PRODUCTS'::te
xt) ->> 'total_item'::text)))
(2 rows)

 From below example index is  properly working becouse we already created unique index on on json column of "total_item" unique index will not accept duplicate value if you try to insert duplicate values it will throw errors,i hope you understood it from below example.
postgres=# select * from sales;
 id |                                    sales_info                                    
----+----------------------------------------------------------------------------------
  2 | { "customer": "ABU", "PRODUCTS": {"product_name": "badam","total_item": 5}}
  3 | { "customer": "UMAR", "PRODUCTS": {"product_name": "mobile","total_item": 1}}
  5 | { "customer": "NIJAM", "PRODUCTS": {"product_name": "choclate","total_item": 9}}
  1 | { "customer": "ABU", "PRODUCTS": {"product_name": "badam","total_item": 6}}
  4 | { "customer": "UMAR", "PRODUCTS": {"product_name": "mobile","total_item": 8}}
(5 rows)


postgres=# INSERT INTO SALES VALUES
 (9,'{ "customer": "UMAR", "PRODUCTS": {"product_name": "mobile","total_item": 8}}'
 );
ERROR:  duplicate key value violates unique constraint "ind_name"
DETAIL:  Key (((sales_info -> 'PRODUCTS'::text) ->> 'total_item'::text))=(8) already exists.
another example for json index is properly working or not
HERE LET US consider  sales table having above 20000 json records for making practical json index , This way we can begin to see some performance implications when dealing with JSON data in Postgres, as well as how to solve them.
SELECT count(*) FROM sales WHERE sales_info ->'PRODUCTS'->>'total_item'= '1';
count
-------
4937 
(1 row)
--taking explain plan for without index
postgres=# explain analyze SELECT sales_info ->>'customer' as customer_name from sales 
where sales_info ->'PRODUCTS'->>'total_item'= '1';
Aggregate (cost=335.12..335.13 rows=1 width=0) (actual time=4.421..4.421 rows=1 loops=1) -> Seq Scan on cards (cost=0.00..335.00 rows=50 width=0) (actual time=0.016..3.961 rows=4938 loops=1) 
    Filter: (((sales_info -> 'PRODUCTS'::text) ->> 'total_item'::text) = '1'::text) 
    Rows Removed by Filter: 5062 
Total runtime: 4.475 ms
Now, that wasn’t that slow of a query at 4.5ms, but let’s see if we can improve it.
CREATE INDEX ind_name1 ON sales ((sales_info ->'PRODUCTS'->>'total_item'));
If we run the same query which now has an index, we end up cutting the time in half.
SELECT count(*) FROM sales WHERE sales_info ->'PRODUCTS'->>'total_item'= '1';
count
-------
4937 
(1 row)

postgres=# explain analyze SELECT sales_info ->>'customer' as customer_name from sales 
where sales_info ->'PRODUCTS'->>'total_item'= '1';
Aggregate (cost=118.97..118.98 rows=1 width=0) (actual time=2.122..2.122 rows=1 loops=1) -> Bitmap Heap Scan on cards (cost=4.68..118.84 rows=50 width=0) (actual time=0.711..1.664 rows=4938 loops=1) 
    Recheck Cond: (((sales_info -> 'PRODUCTS'::text) ->> 'total_item'::text) = '1'::text)  
    Heap Blocks: exact=185 
    -> Bitmap Index Scan on idxfinished (cost=0.00..4.66 rows=50 width=0) (actual time=0.671..0.671 rows=4938 loops=1) 
    Index Cond:(((sales_info -> 'PRODUCTS'::text) ->> 'total_item'::text) = '1'::text) 
Total runtime: 2.175 ms

Our query is now taking advantage of the ind_name1 index we created, and the query time has been approximately cut in half.

Do you want to know more about json data type



No comments:

Post a Comment