postg参考digoal zhou postgresql i一期培训_第1页
postg参考digoal zhou postgresql i一期培训_第2页
postg参考digoal zhou postgresql i一期培训_第3页
postg参考digoal zhou postgresql i一期培训_第4页
postg参考digoal zhou postgresql i一期培训_第5页
已阅读5页,还剩85页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、PostgreSQL InsideDigoal.Zhou8/17/20111Agendan PostgreSQL Principlen Designn Administrationn Tuningn Q&A2Agendan PostgreSQL Principle3Enterprise FeaturesPITROnline BackupStream ReplicationWALSecurityCBOAuditMVCCPoolOpen Source & FreeACIDParallelRDBMS4Atomicity-Oracle5Atomicity-PostgreSQL6Cons

2、istency7Consistencydigoal=> begin;insert into tbl_account_infovalues (1,'dage',100);digoal=> select txid_current_snapshot(); txid_current_snapshot916237:916237:(3,'francs',100);digoal=> select cmin,cmax,ctid,xmin,xmax,*from tbl_account_info ;nnnnnncmin | cmax | ctid | xminna

3、me | account| xmax | id |n0 |1000 |2001 |1000 | (0,1) | 916237 |0 |1 | dage|digoal=> select cmin,cmax,ctid,xmin,xmax,* nfrom tbl_account_info ;0 | (0,2) | 916238 |0 |2 | digoal|ncmin | cmax | ctid | xmin | xmax | id |name | accountn1 | (0,3) | 916238 |0 |3 | francs|n0 |1000 | (0,1) | 916237 |0 |

4、1 | dage|ndigoal=> end; COMMITdigoal=> select txid_current_snapshot(); txid_current_snapshot916239:916239:nnnnndigoal=> end; COMMITdigoal=> begin;insert into tbl_account_info values (2,'digoal',200);digoal=> insert into tbl_account_info valuesnnnn8Isolationn ISOLATION LEVEL SE

5、RIALIZABLE | REPEATABLE READ | READ COMMITTED |READ UNCOMMITTED Session ASession BSession ASession B9Durabilityn fsync = onn synchronous_commit = onn wal_sync_method = (fsync method supported by the operating system)n full_page_writes = on10DurabilityServer failedServer recoveredTimeLineBEGIN;INSERT

6、 INTO tbl_test(id,username)VALUES (1,digoal);COMMIT;SELECT * FROM tbl_testwhere id=1;11ACIDn Atomicityn All or nothing commit.n Consistencyn If a transaction successfully executes, it will take the database from one state that is consistent with the rules to another state that is also consistent wit

7、h the rules.n Isolationn Transactions will not interfere with each other.n Durabilityn Ensures that any transaction committed to the database will not be lost.12Logical LayoutInstance(Top-Level)ClusterDatabaseDatabase(s)SchemaSchema(s)ObjectTable(s)Index(s)View(s)Function(s)Sequence(s)Other(s)Column

8、 (s)FieldRow(s)13Physical LayoutTableToastIndexSequenceArchivedDatafile(s)Datafile(s)Datafile(s)Datafile(s)ControlfileWALs14Physical LayoutControl File15importantProcess LayoutShared Memory Area IPCAPPHandshake &authenticationpostmasterautovacuum launcherforkautovacuum workerShared bufferbackend

9、 processbgwriterWAL bufferWAL writerXLOGsDatafilesArchiverARCH FILEs16MVCCn txid_current_snapshot17NameDescriptionxminEarliest transaction ID (txid) that is still active. All earlier transactions will either be committed and visible, or rolled back and dead.xmaxFirst as-yet-unassigned txid. All txid

10、s greater than or equal to this are not yet started as of the time of the snapshot, and thus invisible.xip_listActive txids at the time of the snapshot. The list includes only those active txids between xmin and xmax; there might be active txids higher than xmax. A txid that is xmin <= txid <

11、xmax and not in this list was already completed at the time of the snapshot, and thus either visible or dead according to its commit status. The list does not include txids of subtransactions.MVCCn Session A:n digoal=> create table tbl_user (id int,firstname varchar(64),lastnamevarchar(64),corp v

12、archar(64),age int);n digoal=> insert into tbl_user values (1,'zhou','digoal','sky-mobi',27);n digoal=> select ctid,xmin,xmax,cmin,cmax,* from tbl_user;n ctid | xmin | xmax | cmin | cmax | id | firstname | lastname |corp27| agen (0,1) | 3909 |n Session B:0 |0 |0 | 1 | z

13、hou| digoal| sky-mobi |n digoal=> select ctid,xmin,xmax,cmin,cmax,* from tbl_user;n ctid | xmin | xmax | cmin | cmax | id | firstname | lastname |corp27| agen (0,1) | 3909 |0 |0 |0 | 1 | zhou| digoal| sky-mobi |18MVCCn Session A :n digoal=> begin;n digoal=> update tbl_user set id=2 where id

14、=1;n digoal=> select ctid,xmin,xmax,cmin,cmax,* from tbl_user;n ctid | xmin | xmax | cmin | cmax | id | firstname | lastname |corp27| agen (0,2) | 3910 |0 |0 |0 | 2 | zhou| digoal| sky-mobi |n digoal=> select txid_current_snapshot();n 3910:3914:n Session B :n select ctid,xmin,xmax,cmin,cmax,*

15、from tbl_user;n ctid | xmin | xmax | cmin | cmax | id | firstname | lastname |corp| agen (0,1) | 3909 | 3910 |0 |0 | 1 | zhou| digoal| sky-mobi | 27n digoal=> select txid_current_snapshot();n 3910:3914:391019MVCCn Oraclen UNDO (COPY OLD BLOCK to UNDO)n WHEN DATABASE ABORT During LONG Transaction.

16、n PostgreSQL or Oracle Recover faster.20MVCCLock Granularity in MySQL21MVCCLock Granularity in PostgreSQL and OracleQuestion :22CBOsrc/backend/optimizer23CBOn CBO Principlen autoanalyze是否继承; 空值比例; 平均长度;唯一值个数(-1唯一);最常见的值;最常见的值得占比; 分bucket边界值;物理与该列的匹配顺性;24CBOn CBO Principlen autoanalyze25CBO26CBOExamp

17、le : Change choice of the planer.ndigoal=> create table tbl_cbo_test (id int primary key,firstname text,lastname text);NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "tbl_cbo_test_pkey" for table "tbl_cbo_test"CREATE TABLEdigoal=> insert into tbl_cbo_test sel

18、ect generate_series(1,1000000),'zhou','digoal'INSERT 0 1000000digoal=> explain analyze select * from tbl_cbo_test where id=100000;QUERY PLANnnnnnnnnIndex Scan using tbl_cbo_test_pkey on tbl_cbo_test (actual time=0.014.0.015 rows=1 loops=1)Index Cond: (id = 100000)Total runtime: 0.

19、035 ms (3 rows)(cost=0.00.4.32 rows=1 width=16)nnn27CBOdigoal=> set enable_indexscan=off;SETdigoal=> explain analyze select * from tbl_cbo_test where id=100000;QUERY PLANnnnnnBitmap Heap Scan on tbl_cbo_test (cost=3 rows=1 width=16) (actual time=0.030.0.031 rows=1 loops=1)Recheck Cond:

20、 (id = 100000)nn-> Bitmap Index Scan on tbl_cbo_test_pkey time=0.022.0.022 rows=1 loops=1)Index Cond: (id = 100000)Total runtime: 0.065 ms (5 rows)(cost=0.00.2.31 rows=1 width=0) (actualnnn28CBOn digoal=> set enable_bitmapscan=off;n SETn digoal=> explain analyze select * from tbl_cbo_test w

21、here id=100000;QUERY PLANnnn Seq Scan on tbl_cbo_test (cost=0.00.17906.00 rows=1 width=16) (actualtime=17.524.149.940 rows=1 loops=1)Filter: (id = 100000)nn Total runtime: 149.962 msn (3 rows)29CBOn JOIN Tuning30CBOn JOIN Tuning31CBOn JOIN TuningWhich SQL is better !32CBOn JOIN TuningCBOn JOIN Tunin

22、g34CBOn JOIN Tuningn Explicit JOINn What happen when SET join_collapse_limit = 1 and use the join SQL;35CBO36Reliability(WAL,PITR,OLBackup)Time Line1. Compare pd_lsn2. Dumpd to DiskData ChangedWhich Page the first Modified after Checkpoint Write full page to WAL.WALArchiveOnline Backup FilePITR37Che

23、ckpointMistakeStream replication - Async38Stream replication - AsyncWhen Xlog data send from master to standby node mit end . 10 ms interval.39Stream replication - Async2Standbyarchlogxlog1WalRcvData->receivedUptoPrimarystartupprocesspostmasterwal receiverPM_SHUTDOWN_2PMSignalwal se

24、ndersendWalRcvData->conninfo WalRcvData->receiveStart3recoverreadforkpostmasterXLOGdatafilesWAL bufferwriteforkreadwritewal writerAPPOther backend processes40Stream replication - AsyncParameter Tuning :Primarymax_wal_senderswal_sender_delay ( The sleep is interrupted by transaction commit )wal

25、_keep_segmentsvacuum_defer_cleanup_age ( the number of transactions by which VACUUM and HOT updates will defer cleanup of dead row versions. )Standbyhot_standby# wal apply & SQL on standbymax_standby_archive_delayreference parameter( theum total time allowed to apply any one WAL segment's da

26、ta. )max_standby_streaming_delay( theum total time allowed to apply WAL data onceit has been received from the primary server )wal_receiver_status_interval( minimum frequency, The standby will report the last transaction log positionit has written, the last position it has flushed to disk, and the l

27、ast position it has applied.) hot_standby_feedback(send feedback to the primary about queries currently executing on the standby. )41Stream replication - sync1. Many standbys2. Only one sync standby at the same time2Standby(s)3. Sync wide: Cluster Database User Session Transactionarchlogxlog1WalRcvD

28、ata->receivedUptoPrimarystartupprocesspostmasterwal receiverPM_SHUTDOWN_2PMSignalwal senderWalRcvData->conninfo WalRcvData->receiveStart3recoverreadforkpostmasterXLOGdatafilesWAL bufferwriteforkreadwritewal writerAPPOther backend processeswaiting/releasing42Stream replication - syncParamete

29、r Tuning :Primary max_wal_senders wal_sender_delay wal_keep_segmentsvacuum_defer_cleanup_agesynchronous_replication (transaction upto database configurable) synchronous_standby_names( primary_conninfo in standbys primary_conninfo )Standbyhot_standby max_standby_archive_delay max_standby_streaming_de

30、lay wal_receiver_status_interval hot_standby_feedback43Do with Stream replication1.快速数据库迁移2.异地容灾3.多机房业务部署4.读性能扩展5.HA442PC跨库事务一致性 :DB-1DB-2two phase commit03578/452PCcreate or replace function fun_test () returns void as $BODY$ declarev_trace text;beginv_trace = 'RMT 2PC begin'perform dblink_

31、exec('hostaddr=3 port=1921 dbname=digoal user=digoal password=digoal','begin;insert into tbl_user(id) values(6);prepare transaction ''p1''') ; v_trace = 'LOCAL start'insert into tbl_user(id) values(7);v_trace = 'RMT 2PC commit start'perform d

32、blink_exec('hostaddr=3 port=1921 dbname=digoal user=digoal password=digoal','commit prepared ''p1''') ;v_trace = 'RMT 2PC commit success'raise notice 'execute success at : %.',v_trace; return;exceptionwhen others thenraise notice 'execute

33、 error at : %.',v_trace;perform * from dblink('hostaddr=3 port=1921 dbname=digoal user=digoal password=digoal','select 1 from pg_prepared_xacts where gid=''p1''') as t(id int);if found thenperform dblink_exec('hostaddr=3 port=1921 dbname=di

34、goal user=digoal password=digoal','rollback prepared ''p1''');end if; return; end;$BODY$ language plpgsql;46Securityn Authenticaten Data EncryptoPostgreSQLConnection LimitAuth Method (Trust, Password, Ident, LDAP)Listene Which AddressPG_HBARolesTYPEDATABASE USER CIDR-ADDR

35、ESS METHOD47Auditlog_min_messages log_min_error_statement client_min_messages debug_print_parse debug_print_rewritten debug_print_plan debug_pretty_print log_min_duration_statement log_checkpoints log_connections log_disconnections log_duration log_error_verbositylog_hostnamelog_line_prefix log_lock

36、_waits log_statement log_temp_files log_timezone log_parser_stats log_planner_stats log_executor_statslog_statement_statsnnnnnnnnnnnnnnnnnnnnnnn48INDEX Pointer Introductionn ItemPointers (index) ->n ItemId (Array of (lp_off:15bit, lp_flags:2bit,lp_len:15bit) pairs pointing to the actual items. 4

37、bytes per ItemId.)n -> Item (tuple)49HOT Introductionn Heap-Only Tuple Benefit :n eliminates redundant index entriesn allows the re-use of space taken by DELETEd or obsoleted UPDATEd tuples without performing a table-wide vacuum.n ExampleUpdate 1: Index points to 1line points 1 2Items 111111111-&

38、gt;2222222222Update 2: Index points to 1line point 1->2Items 2222222222Update 3: Index points to 1line points 1->2 3Items 2222222222->3333333333Update 4: Index points to 1line points 1>3Items 3333333333nnnnnnnnnnnn50HOT UpdateHeap PageIndex PagePageHeadPageHeadLinePoint1LinePoint2Free Sp

39、aceItemPointTuple2HEAP_HOT_TUPLETuple1t_ctid, HEAP_HOT_UPDATED51HOT UpdateHeap PageIndex PagePageHeadPageHeadLinePoint1LinePoint2Free SpaceLP_REDIRECTItemPointTuple2HEAP_HOT_TUPLETuple1Dead,vacuumed52DataFile Storage LayoutDataFileOne DataFile(s) Per Table or Index . BlockID :sequentially, 0 to 0xFF

40、FFFFFEInitilized Block 0x00000000Initilized Block 0x00000001Initilized Block 0x00000002Initilized Block 0x00000003Initilized Block 0x00000004Initilized Block 0xFFFFFFFE53Page LayoutPageHeaderData(24 Bytes)ItemIdData(Array of (offset,length) pairs pointing to the actual items. 4 bytes per item)Free s

41、pace(The unallocated space.New item pointers are allocated from the start of this area, new items from the end.)ONEPAGEItems (The actual items themselves.)Special space (Index access method specific data.Different methods store different data. Emptyin ordinary tables.)(an access method should always

42、initialize its pages with PageInitand then set its own opaque fields.)54PageHeader Layout55FieldTypeLengthDescriptionpd_lsnXLogRecPtr8 bytesLSN: next byte after last byte of xlog record for last change to this pagepd_tliuint162 bytesTimeLineID of last change (only its lowest 16 bits)pd_flagsuint162

43、bytesFlag bitspd_lowerLocationIndex2 bytesOffset to start of free spacepd_upperLocationIndex2 bytesOffset to end of free spacepd_specialLocationIndex2 bytesOffset to start of special spacepd_pagesize_versionuint162 bytesPage size and layout version number informationpd_prune_xidTransactionId4 bytesO

44、ldest unpruned XMAX on page, or zero if nonepd_flags define/* pd_flags contains the following flag bits. Undefined bits are initialized* to zero and may be used in the future.* PD_HAS_FREE_LINES is set if there are any LP_UNUSED line pointers before* pd_lower. This should be considered a hint rather

45、 than the truth, since* changes to it are not WAL-logged.* PD_PAGE_FULL is set if an UPDATE doesn't find enough free space in the* page for its new tuple version; this suggests that a prune is needed.* Again, this is just a hint.*/nnnnnnnnnnnnnnnn#define PD_HAS_FREE_LINES #define PD_PAGE_FULL #d

46、efine PD_ALL_VISIBLE#define PD_VALID_FLAG_BITS0x00010x00020x00040x0007/* are there any unused line pointers? */* not enough free space for new tuple? */* all tuples on page are visible to everyone */* OR of all valid pd_flags bits */56ItemIdData Layout/* An item pointer (also called line pointer) on

47、 a buffer page */* In some cases an item pointer is "in use" but does not have any associated */* storage on the page. By convention, lp_len = 0 in every item pointer */* that does not have storage, independently of its lp_flags state. */ typedef struct ItemIdDatannnnnnnnnnnnnnnnunsignedlp

48、_off:15, lp_flags:2,lp_len:15;/* offset to tuple (from start of page) */* state of item pointer, see below */* byte length of tuple */ ItemIdData;/* lp_flags has these possible states. An UNUSED line pointer is available */* for immediate re-use, the other states are not. */#define LP_UNUSED #define

49、 LP_NORMAL #define LP_REDIRECT#define LP_DEAD0123/* unused (should always have lp_len=0) */* used (should always have lp_len>0) */* HOT redirect (should have lp_len=0) */* dead, may or may not have storage */57Tuple LayoutHeapTupleHeaderData(23 Bytes on most machine)null bitmap(Optional,The null

50、bitmap is only presentif the HEAP_HASNULL bit is set in t_infomask. occupies enough bytes to have one bit per data column. a 1 bit indicates not-null, a 0 bit is a null)Padding(Optional, Any padding needed to make t_hoff a MAXALIGNmultiple will appear between the null bitmap and the object ID)object

51、 ID(Optional, only present if theHEAP_HASOID bit is set in t_infomask)ColumnData58One TupleHeapTupleHeader Layout59FieldTypeLengthDescriptiont_xminTransactionId4 bytesinsert XID stampt_xmaxTransactionId4 bytesdelete XID stampt_cidCommandId4 bytesinsert and/or delete CID stamp (overlays with t_xvac)t

52、_xvacTransactionId4 bytesXID for VACUUM operation moving a row versiont_ctidItemPointerData6 bytescurrent TID of this or newer row versiont_infomask2int162 bytesnumber of attributes, plus various flag bitst_infomaskuint162 bytesvarious flag bitst_hoffuint81 byteoffset to user dataEvaluate a Table Si

53、zen Tablen 24 bytes: each Pagen fillfactor: The fillfactor for a table is a percentage between 10 and100. 100 (complete packing) is the default.n 24 bytes: each row header (approximate)n 4 bytes: pointer on page to tuplen Column Sizesn NULLs are stored as bitmapsn long values may be compressed transparently.n Indexn Indexes do not require as much overhead, but do contain the datathat is being indexed, so they can be large also.60PostgreSQL Limitedum size for a database? unlimited um size for a table? 32 TBum size for a row? 400 GB um size for a field? 1 GBum number of rows in a

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论