PostgreSQL Query Flow

In this tutorial i explained about postgresql query flow with pictorial diagram.
PostgreSQL Query Flow met five stages they are.
  1. parser
  2. Traffic Cop
  3. Rewriter
  4. Optimizer plan
  5. Executor
1.parser:
  • The parser stage consists of two parts:
  • The parser defined in gram.y and scan.l is built using the Unix tools bison and flex.
  • The transformation process does modifications and augmentations to the data structures returned by the parser.
  • The parser has to check the query string (which arrives as plain text) for valid syntax. If the syntax is correct a parse tree is built up and handed back; otherwise an error is returned. The parser and lexer are implemented using the well-known Unix tools bison and flex.
  • The lexer is defined in the file scan.l and is responsible for recognizing identifiers, the SQL key words etc. For every key word or identifier that is found, a token is generated and handed to the parser.
  • The parser is defined in the file gram.y and consists of a set of grammar rules and actions that are executed whenever a rule is fired. The code of the actions (which is actually C code) is used to build up the parse tree.
  • The file scan.l is transformed to the C source file scan.c using the program flex and gram.y is transformed to gram.c using bison. After these transformations have taken place a normal C compiler can be used to create the parser. Never make any changes to the generated C files as they will be overwritten the next time flex or bison is called.
  • Note: The mentioned transformations and compilations are normally done automatically using the makefiles shipped with the PostgreSQL source distribution.
  • A detailed description of bison or the grammar rules given in gram.y would be beyond the scope of this paper. There are many books and documents dealing with flex and bison. You should be familiar with bison before you start to study the grammar given in gram.y otherwise you won't understand what happens there
2.Traffic Cop:
  • The traffic cop is the agent that is responsible for differentiating between simple and complex query
  • commands. Transaction control commands such as BEGIN and ROLLBACK are simple enough so as to not need additional processing, whereas other commands such as SELECT and JOIN are passed on to the rewriter. This discrimination reduces the processing time by performing minimal optimization on the simple commands, and devoting more time to the complex ones.
Parsing is Two Types: 
  • Soft Parse – when the parsed representation of a submitted SQL statement exists in the Postgres Server(Shared Buffer) Performs syntax and semantic checks but avoids the relatively costly operation of query optimization. Reuses the existing Postgres SQL area which already has the execution plan required to execute the SQL statement
  • Hard Parse – if a statement cannot be reused or if it the very first time the SQL statement is being loaded in the Postgres Server(Shared Buffer), it results in a hard parse. Also when a statement is aged out of the Postgres Server(Shared Buffer) (because the sPostgres Server(Shared Buffer) is limited in size), when it is reloaded again, it results in another hard parse. So size of the shared Buffer can also affect the amount of parse calls.
  • We can query pg_prepared_statements to see what is cached. Note that it is not available across sessions and visible only to the current session.
  • The pg_buffercache module provides a means for examining what's happening in the shared buffer cache in real time
It(below Query) can even tell how much data blocks came from disk and how much came from shared_buffers i.e memory.
explain (analyze,buffers) statement;
explain (analyze,buffers) select * from t8 order by userid limit 20;
3.Rewriter:
PostgreSQL rule system consisted of two implementations:
  • The first one worked using row level processing and was implemented deep in the executor. The rule system was called whenever an individual row had been accessed. This implementation was removed in 1995 when the last official release of the Berkeley Postgres project was transformed into Postgres95.
  • The second implementation of the rule system is a technique called query rewriting. The rewrite system is a module that exists between the parser stage and the planner/optimizer. This technique is still implemented.
4.Optimizer plan:
  • The task of the planner/optimizer is to create an optimal execution plan. A given SQL query (and hence, a query tree) can be actually executed in a wide variety of different ways, each of which will produce the same set of results. If it is computationally feasible, the query optimizer will examine each of these possible execution plans, ultimately selecting the execution plan that is expected to run the fastest.
  • The planner's search procedure actually works with data structures called paths, which are simply cut-down representations of plans containing only as much information as the planner needs to make its decisions. After the cheapest path is determined, a full-fledged plan tree is built to pass to the executor. This represents the desired execution plan in sufficient detail for the executor to run it. In the rest of this section we'll ignore the distinction between paths and plans.
5.Executor:
  • The executor takes the plan created by the planner/optimizer and recursively processes it to extract the required set of rows. This is essentially a demand-pull pipeline mechanism. Each time a plan node is called, it must deliver one more row, or report that it is done delivering rows.
  • The executor mechanism is used to evaluate all four basic SQL query types: SELECT, INSERT, UPDATE, and DELETE. 
  • For SELECT, the top-level executor code only needs to send each row returned by the query plan tree off to the client. 
  • For INSERT, each returned row is inserted into the target table specified for the INSERT. This is done in a special top-level plan node called ModifyTable. (A simple INSERT ... VALUES command creates a trivial plan tree consisting of a single Result node, which computes just one result row, and ModifyTable above it to perform the insertion. But INSERT ... SELECT can demand the full power of the executor mechanism.) 
  • For UPDATE, the planner arranges that each computed row includes all the updated column values, plus the TID (tuple ID, or row ID) of the original target row; this data is fed into a ModifyTable node, which uses the information to create a new updated row and mark the old row deleted. 
  • For DELETE, the only column that is actually returned by the plan is the TID, and the ModifyTable node simply uses the TID to visit each target row and mark it deleted.



No comments:

Post a Comment