Data Warehouse Netezza Annoyances

May 27, 2018 | Author: Matthew Lawler | Category: Data Warehouse, Business Intelligence, Database Schema, Databases, Oracle Database


Comments



Description

Matthew Lawler [email protected] Datawarehouse Netezza Annoyances Datawarehouse Netezza Annoyances Matthew Lawler [email protected] D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 1 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances INTRODUCTION 3 NETEZZA BACKGROUND 8 FROM ORACLE TO NETEZZA 9 NETEZZA SITES 11 TIME PATTERNS 13 SQLEXT FUNCTIONS 14 NETEZZA SYSTEM VIEWS 18 SQL STANDARDS FOR DISTRIBUTION KEYS 27 NETEZZA LINE COMMANDS 29 D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 2 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances Introduction Licence As these are generic software documentation standards, they will be covered by the 'Creative Commons Zero v1.0 Universal' CC0 licence. Warranty The author does not make any warranty, express or implied, that any statements in this document are free of error, or are consistent with particular standard of merchantability, or they will meet the requirements for any particular application or environment. They should not be relied on for solving a problem whose incorrect solution could result in injury or loss of property. If you do use this material in such a manner, it is at your own risk. The author disclaims all liability for direct or consequential damage resulting from its use. Purpose The primary goal of this document is to provide a reference on Netezza. This is traditionally called an Annoyances document. Hopefully, this document will reduce the learning curve. Audience This is primarily for developers that are unfamiliar with Netezza. Assumptions It will be assumed that the reader is familiar with Oracle, SQL and DB metadata concepts. Approach The approach is to document Netezza resources, as well as any tips, etc. Much has been collected from the internet, as well as from bitter experience. Related Documents All these documents are published by Netezza. O Name Subject 1 Aginity_Netezza_Workbench_Documentation Overview 2 Aginity_Workbench_for_Netezza_Functionality_Overview Detailed Introduction 3 IBM_Netezza_In-Database_Analytics_Reference_Guide Analytics functions 4 IBM Netezza User-Defined Functions Developer's Guide 5 Netezza RedGuide Overview 6 Netezza_advanced_security_admin_guide Security 7 Netezza_data_loading_guide Data Loading D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 3 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances O Name Subject 8 Netezza_database_users_guide SQL user guide 9 Netezza_getting_started_tips background information and tips 10 Netezza_odbc_jdbc_guide ODBC, JDBC clients, or the OLE DB connector. 11 Netezza_Spatial_Package_Developers_Guide Spatial Analytics functions 12 Netezza_Spatial_Package_Reference_Guide Spatial Analytics functions 13 Netezza_Spatial_Package_Users_Guide-3.0.0 Spatial Geometric Functions 14 Netezza_stored_procedures_guide Developers Guide for Stored procs 15 Netezza_system_admin_guide Official guide to distribution keys 16 Netezza_udf_dev_guide Developers Guide for Functions 17 Netezza-Basics Course Outline 18 netezza-fpga Overview Definitions Term Source Definition API DB Application Programming Interface AWB Netezza Aginity Work Bench BI DW Business Intelligence BLOB DB Binary Large Object Cardinality DB The number of unique values for an attribute in a table. Low cardinality refers to a limited number of values, relative to the overall number of rows in the table. CLOB DB Character Large Object Data DW An implementation of an informational database used to store sharable Warehouse data sourced from an operational database-of-record. Dimension Kimball An independent entity in a dimensional model that serves as an entry point or as a mechanism for slicing and dicing the additive measure located in the fact table of the dimensional model. For example, all months, quarters, years, etc., make up a time dimension. Based on Measure Theory. Distribution NZ The distribution key is used to determine the placement of table rows Key across multiple Netezza nodes. The distribution key is part of the table D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 4 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances Term Source Definition definition. DK NZ Distribution Key DW DW Data Warehouse DW Appliance DW This is an integrated hardware, software and DBMS platform, designed for high performance reporting. ELT DW Extract Load and Transform Epoch Time Unix Aka Unix time is a system for describing instants in time, defined as the number of seconds that have elapsed since 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970, not counting leap seconds. ETL DW Extract, Transform and Load Fact Kimball A business performance measurement, typically numeric and additive, that is stored in a fact table. Based on Measure Theory. FK DB Foreign Key Foreign Key DB A foreign key is the primary key of one data structure that is placed into a related data structure to represent a relationship among those structures. Foreign keys resolve relationships, and support navigation among data structures. Imperative DB With imperative programming, like PL/SQL, the coder defines how the code will proceed, from step by step. In declarative programming, like SQL, the coder defines what the code will do, and lets the compiler determine the best way to implement this. Join Codd A join is a binary operator on two relations or database tables. Key Source This represents cleansed Source tables that have had a surrogate and/or a distribution key added. A surrogate key is needed for Fact/Dimension joins. A distribution key is critical for adequate Netezza performance. Note that these tables would still retain their source names. KPI DW Key Performance Indicator Massively DW This refers to the use of a large number of processors (or separate Parallel computers) to perform a set of coordinated computations in parallel Processing (simultaneously). Metadata DB Metadata is "data about data". While not often used in reporting, these tables are important in DW standards, and for generating and describing DW components. MPP DW Massively Parallel Processing D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 5 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances Term Source Definition Netezza Netezza Netezza designs and markets high-performance data warehouse appliances and advanced analytics applications for uses including enterprise data warehousing, business intelligence, predictive analytics and business continuity planning. NPS Netezza Netezza Performance Server NULL DB NULL indicates that a data value does not exist in a particular column row pair in the database. In other words, if the value of a column in a row is NULL, then the value is undefined. NZ NZ Used as short form for Netezza. Not to be confused with the shaky isles. Pivot DB A pivot table is the transformation of an EAV table into columnar form. PK DB Primary Key Primary Key DB A column or combination of columns whose values uniquely identify a row or record in the table. The primary key(s) will have a unique value for each record or row in the table. That is, their cardinality will be 1. Referential DB Referential integrity is a property of data which, when satisfied, requires Integrity every value of one attribute (column) of a relation (table) to exist as a value of another attribute in a different (or the same) relation (table). RI DB Referential Integrity Set Maths In mathematics, a set is a collection of distinct objects, considered as an object in its own right. The set has no duplicates, and each object has an identifier (or key). Snippet NZ The independent node that performs functions on a data subsets in Processing parallel with other SPUs. Unit SPU NZ Snippet Processing Unit SQL DB Structured Query Language UDA NZ User defined aggregates UDF NZ User Defined Functions UDTF NZ User defined table functions UDX NZ This is generally used by Netezza developers to refer to user developed code. This covers user defined functions (UDF), user defined aggregates (UDA), and user defined table functions (UDTF). Union Maths Union of the sets A and B, denoted A U B, is the set of all objects that are a member of A, or B, or both. View DB A projection. Apart from the display of data returned, a view can be D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 6 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances Term Source Definition considered to be a pure function. XML DB Extensible Mark-up Language Tags Business Intelligence ; Data Design ; Data Load ; Data Mapping ; Data Model ; Data Transformation ; Data Vault ; Data Warehouse ; Database ; Database Design ; DW Appliance ; Extract Load Transform - ELT ; Extract Transform Load - ETL ; Fact / Dimension ; Inmon ; Kimball ; Massive Parallel Processing - MPP ; Master Data Management ; Metadata ; Netezza ; Oracle ; SQL ; Standards ; Teradata ; Data Architect ; Data Architecture ; D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 7 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances Netezza Background Netezza is based on a branch from Postgres. However, it is not strict subset of Postgres, as there are many additional features. In addition, many unneeded changes were introduced, such as differences between Postgres and Netezza column data types. This means that using Postgres documentation is generally not useful. IBM bought Netezza to fill the DW platform gap in their DB offering. However, it seems clear from the many gaps in the Netezza platform, that IBM have not spent much on upgrading the platform. This even affects their online help, which can be difficult to navigate. Documentation for different versions remains useful. Due to the incomplete state of documentation for the later versions, the earlier documentation remains useful. This is due to the natural inertia that occurs when dealing with fundamental design of a tool. That is, very few companies completely re-engineer the underlying technology. Some of the more important gaps are: 1. There is no CONNECT BY statement. "Recursive queries for the WITH clause are not supported." See https://www- 304.ibm.com/support/knowledgecenter/SSULQD_7.2.0/com.ibm.nz.dbu.doc/r_dbuser_with _clause.html 2. There is no PIVOT statement. 3. The Optimiser does not do much, as it does not automatically match on common distribution keys. These joins have to done manually. 4. Transaction commits do not apply until the end of a PROC and cannot be triggered manually. This affects PROC design. 5. Many missing OOB functions, when compared with Oracle. This was resolved when the SQLEXT package was installed. These cover functions types including XML, encryption, Hashing, Date and time comparisons, Regular expression, Arrays, etc. D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 8 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances From Oracle to Netezza As most developers will be more familiar with Oracle than Netezza, this table gives feature by feature availability comparison, which may be a useful starting point. Name Type Oracle Netezza Based on Diff Oracle Postgres Bulk (set) transactions Diff Some Best Db Ranking Diff 1 27 Description Diff Widely used RDBMS DW appliance Developer Diff Oracle IBM Initial release Diff 1980 2000 Partitioning methods Diff Horizontal partitioning Sharding Replication methods Diff Master-master and Master-slave Master-slave replication replication Scales to Diff Terabytes Petabytes Appliance Netezz Not Available Available a Implicit Casting Netezz Not Available Available a Map Reduce Netezz Not Available Available a Multi Db per Environment Netezz Not Available Available a Query Tuning not needed Netezz Not Available Available a Any Hardware platform Oracle Available Not Available APIs - Oracle Oracle ODP.NET, Oracle Call Interface (OCI), Not Available BLOB Oracle Available Not Available CONNECT BY Oracle Available Not Available Correlated sub queries Oracle Available Not Available Cursors Oracle Available Not Available D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 9 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances Name Type Oracle Netezza Foreign keys Oracle Available Available, not enforced. Index Oracle Available Not Available PIVOT Oracle Available Not Available PL/SQL Oracle Available Not Available Row level transactions Oracle Available Not Available Triggers Oracle Available Not Available XML support Oracle Available Not Available APIs - Common Same JDBC,ODBC, OLE DB JDBC, ODBC, OLE DB Concurrency Same Available Available Data scheme Same Available Available Database model Same Relational DBMS Relational DBMS Function Same Available Available SQL Same Available Available Supported languages Same C, Java, Python, Perl, R C, Java, Python, Perl, R Transaction concepts Same ACID ACID Typing Same Available Available D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 10 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances Netezza Sites These sites have been used in the preparation of this document. O Subject URL 1 ACM Article http://dl.acm.org/citation.cfm?id=2093965 2 DB Ranking http://db-engines.com/en/ranking 3 Hash http://preshing.com/20110504/hash-collision-probabilities/ 4 IT Blog http://www.norvig.com/ 5 IT Blog http://www.paulgraham.com/icad.html 6 MWB http://www.ibm.com/developerworks/data/library/techarticle/dm- 1001datalineageinfosphereworkbench/ 7 NZ Blog http://colbran.co.za/wordpress/category/netezza/ 8 NZ Blog http://netezzaonline.blogspot.com.au/ 9 NZ Blog http://nztips.com/ 1 NZ CLI https://www- 0 01.ibm.com/support/knowledgecenter/SSULQD_7.2.0/com.ibm.nz.adm.doc/r_sy sadm_cmd_summary.html 1 NZ Commit https://www- 1 Rollback 01.ibm.com/support/knowledgecenter/SSULQD_7.2.0/com.ibm.nz.sproc.doc/c_s proc_transaction_commits_and_rollbacks.html 1 NZ Data https://www-304.ibm.com/connections/forums/html/topic?id=ec181859-bafe- 2 Type 4fc4-8bb1-4714d920aa7b 1 NZ DK http://expertintegratedsystemsblog.com/2014/11/choosing-distribution-keys-a- 3 five-step-methodology/ 1 NZ DK http://nztips.com/2011/03/distributed-joins-the-basics/ 4 1 NZ DK https://www.ibm.com/developerworks/community/blogs/Netezza/tags/distribut 5 ion?lang=en 1 NZ Forum https://www-304.ibm.com/connections/forums/html/forum?id=f6bffd40-f80d- 7 4331-9d4e-b265bc0c1da6 1 NZ Issues https://www- 8 304.ibm.com/support/knowledgecenter/SSULQD_7.2.0/com.ibm.nz.reln.doc/c_r elnotes_known_issues.html D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 11 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances O Subject URL 1 NZ manual https://www.ibm.com/developerworks/community/files/form/anonymous/api/li 9 brary/1b6a2624-dc86-4856-b4ed-cdda6bfdecda/document/9da6e781-392b- 41dc-9b43-962ae1c41716/media/01_ch.pdf 2 NZ PLSQL https://www- 0 01.ibm.com/support/knowledgecenter/SSULQD_7.2.0/com.ibm.nz.sproc.doc/r_s proc_nzplqsl_language.html 2 NZ Return https://www- 1 Values 01.ibm.com/support/knowledgecenter/SSULQD_7.2.0/com.ibm.nz.sproc.doc/c_s proc_returning_a_result_set.html 2 NZ Table https://www- 2 Mods 304.ibm.com/support/knowledgecenter/SSULQD_7.1.0/com.ibm.nz.dbu.doc/r_d buser_alter_table.html 2 NZ With https://www- 3 304.ibm.com/support/knowledgecenter/SSULQD_7.2.0/com.ibm.nz.dbu.doc/r_d buser_with_clause.html 2 RI DP http://stackoverflow.com/questions/5649297/how-to-overcome-netezzas-lack- 4 of-unique-constraint-referential-integrity-enforc 2 Vendor http://www.aginity.com/workbench/netezza/ 5 2 Vendor http://www.postgresonline.com/ 6 2 Vendor http://www- 7 01.ibm.com/support/knowledgecenter/SSULQD/SSULQD_welcome.html D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 12 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances Time Patterns Time Casting The sources used different ways to represent dates. String Dates, so these needed to be transformed back to be usable by reporting users. Remedy used UNIX time or Epoch to save dates. These cast functions were applied where ever needed. Note that the pattern handles null values as well. This works in Netezza. Date Source Cast Type Type Char String COALESCE( CASE WHEN SQLEXT.REGEXP_LIKE ( SUBSTR ( YYYY- DISPOSALTRANSACTIONDATE ,1,10), '^(19|20)\d\d[- /.](0[1-9]|1[012])[- MM-DD /.](0[1-9]|[12][0-9]|3[01])$') THEN TO_DATE(SUBSTR ( <date column> ,1,10), 'YYYY-MM-DD') ELSE TO_DATE('1970-01-01', 'YYYY-MM-DD') END , TO_DATE('1970-01-01', 'YYYY-MM-DD')) Unix Remedy COALESCE( TO_TIMESTAMP( TO_CHAR(TO_DATE('19700101000000', (Epoch) 'YYYYMMDDHH24MISS')+(( <date column> - 18000) /(60*60*24)),'YYYYMMDDHH24MISS'), 'YYYYMMDDHH24MISS') , TO_TIMESTAMP('1970-01-01', 'YYYY-MM-DD')) Time SQL The dimension for Date is SCHEMA2.D_DATE. As some end user tools cannot use a BETWEEN in their join, some alternate patterns can be used. Purpose Example Point in INNER JOIN SCHEMA2.D_DATE DD ON DD.CALENDAR_MONTH_QTY = EXTRACT(MONTH time FROM <timestamp_column>) Period of INNER JOIN SCHEMA2.D_DATE DD ON DD.DATE_DT > <timestamp_column> AND time DD.DATE_DT < <timestamp_column> with a GROUP BY to eliminate duplicate rows Differenc EXTRACT(EPOCH FROM <DateCol1> - <DateCol1>)/3600 e in hours Between SELECT DD.DATE_DT, MONTH_START_DT, MONTH_END_DT, SQLEXT.DAYS_BETWEEN (DD.MONTH_START_DT, DD.DATE_DT) DBST, SQLEXT.DAYS_BETWEEN (DD.DATE_DT, DD.MONTH_END_DT) DBEND FROM DB1.SCHEMA2.D_DATE DD LIMIT 100; D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 13 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances SQLEXT Functions SQLEXT functions are needed to provide equivalent functionality to Oracle OOB functions. See the below for a full set of Netezza SQLEXT functions. These functions need to be referenced by appending the SQLEXT to the function name. Function Type Function Name Array add_element() Array array() Array array_combine() Array array_concat() Array array_count() Array array_split() Array array_type() Array delete_element() Array element_name() Array get_value_type() Array narray_combine() Array nelement_name() Array replace_element() Collection collection() Collection element_type() Data transformation compress() Data transformation decompress() Data transformation decrypt() Data transformation encrypt() Data transformation fpe_decrypt() Data transformation fpe_encrypt() Data transformation uudecode() Data transformation uuencode() Date and time day() Date and time days_between() D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 14 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances Function Type Function Name Date and time hour() Date and time hours_between() Date and time minute() Date and time minutes_between() Date and time month() Date and time next_month() Date and time next_quarter() Date and time next_week() Date and time next_year() Date and time second() Date and time seconds_between() Date and time this_month() Date and time this_quarter() Date and time this_week() Date and time this_year() Date and time weeks_between() Date and time year() Hashing hash() Hashing hash4() Hashing hash8() Miscellaneous corr() Miscellaneous covar_pop() Miscellaneous covar_samp() Miscellaneous greatest() Miscellaneous least() Miscellaneous mt_random() Regular expression regexp_extract() Regular expression regexp_extract_all() D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 15 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances Function Type Function Name Regular expression regexp_extract_all_sp() Regular expression regexp_extract_sp() Regular expression regexp_instr() Regular expression regexp_like() Regular expression regexp_match_count() Regular expression regexp_replace() Regular expression regexp_replace_sp() Text utility hextoraw() Text utility rawtohex() Text utility replace() Text utility strleft() Text utility strright() Word comparison word_diff() Word comparison word_find() Word comparison word_key() Word comparison word_key_tochar() Word comparison word_keys_diff() Word comparison word_stem() XML IsValidXML() XML IsXML() XML XMLAgg() aggregate XML XMLAttributes() XML XMLConcat() XML XMLElement() XML XMLExistsNode() XML XMLExtract() XML XMLExtractValue() XML XMLParse() D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 16 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances Function Type Function Name XML XMLRoot() XML XMLSerialize() XML XMLUpdate() D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 17 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances Netezza System Views These views are not the complete set of metadata. They are ones that have actual data, and that the BIA_DSM user has permission to see. Note that these views are not visible under the SYSTEM schema in Aginity. In general, all systems views can be seen by running: SELECT * FROM _V_SYS_VIEW NAME For Description _T_CLASS Class _T_OBJECT Object _T_OBJECT_CLASSES Object Classes _T_PRIORITY Priority _T_TYPE Data Type _T_USER User _V_AGGREGATE Aggregate Returns a list of all defined aggregates _V_ATTRIBUTE Column _V_AUTHENTICATION System Parameter _V_CLASS Class _V_CLASS2 Class _V_CONSTRAINT_DEPEND Constraint S _V_DATABASE Database Returns a list of all databases _V_DATATYPE Data Type Returns a list of all system data types _V_DEPEND Object 2 Object; e.g. View to function _V_DISK System Parameter _V_DISKENCLOSURE System Parameter _V_DOTNET_SCHEMAS1 Schema _V_DUAL Dual _V_ENVIRON System Parameter D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 18 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances NAME For Description _V_ETHSW System Parameter _V_EVRULE System Parameter _V_EXTERNAL External Table _V_EXTERNAL_XDB External table _V_EXTOBJECT External Table _V_EXTOBJECT_XDB External table _V_FAN System Parameter _V_FUNCTION Function Returns a list of all defined functions _V_GROOM_HISTORY Groom _V_GROUP Group Returns a list of all groups _V_GROUPUSERS Group Returns a list of all users of a group _V_HOST_TX System Parameter _V_HOST_VERSION System Version _V_HWCOMP System Parameter _V_HWROLETEXT System Parameter _V_HWSTATETEXT System Parameter _V_HWTYPETEXT System Parameter _V_INDEX Index Returns a list of all user indexes _V_JDBC_BESTROWIDENTI Key FIER1 _V_JDBC_FEATURE System Parameter _V_JDBC_PRIMARYKEYS1 Key _V_JDBC_PROCEDURE_CO Procedure LUMNS1 _V_JDBC_TABLETYPES1 Table Type _V_JDBC_TYPEINFO1 Data Type _V_MM System Parameter _V_OBJ_CONSTRAINT Constraint _V_OBJ_CONSTRAINT_XDB Constraint D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 19 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances NAME For Description _V_OBJ_DATABASE Database _V_OBJ_MISCOBJS Function _V_OBJ_RELATION Table _V_OBJ_RELATION_XDB Function _V_OBJ_SCHEMA Schema _V_OBJ_SCHEMA_XDB Schema _V_OBJ_USER User _V_OBJECT Object _V_OBJECT_CAST Class _V_OBJECT_DATA Object _V_OBJECT_DATA_NO_O Object WNER _V_OBJECTS Object _V_OBJS_OWNED Object Owner _V_ODBC_COLUMNS1 Column _V_ODBC_FEATURE System Parameter _V_ODBC_GETTYPEINFO1 Data Type _V_ODBC_PRIMARYKEYS1 Key _V_ODBC_PROCEDURECOL Procedure UMNS1 _V_ODBC_PROCEDURES1 Procedure _V_ODBC_SCHEMA3 Schema _V_OLEDB_COLUMNS1 Column _V_OLEDB_FEATURE System Parameter _V_OLEDB_PRIMARYKEYS1 Key _V_OLEDB_PROCEDURES1 Procedure _V_OLEDB_SCHEMAS1 Schema _V_OLEDB_SCHEMAS2 Schema _V_OLEDB_TABLES1 Table D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 20 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances NAME For Description _V_OLEDB_TYPES1 Data Type _V_OLEDB_VIEWS1 View _V_OPERATOR Operator Returns a list of all defined operators _V_PG_TIME_OFFSET System Parameter _V_PLAN_RESOURCE Statistics _V_PLANSTATUS Statistics _V_PRIORITY Priority _V_PROCEDURE Procedure Returns a list of all the stored procedures and their attributes _V_QRYHIST Query _V_QRYSTAT Query _V_QUERYSTATUS Statistics _V_RACK System Parameter _V_RELATION_COLUMN Column Returns a list of all attributes of a relation (table, view, index.) Deprecate. Use _V_OBJECTS and _V_ATTRIBUTE instead. _V_RELATION_COLUMN_D Column Returns a list of all attributes of a EF relation that have defined defaults _V_RELATION_COLUMN_D Column EF_XDB _V_RELATION_COLUMN_X Column DB _V_RELATION_KEYDATA Key _V_RELATION_KEYDATA_X Key DB _V_RELOBJCLASSES Class _V_SCHED_GRA Statistics _V_SCHED_GRA_EXT Statistics _V_SCHED_GRA_EXT_LATE Statistics ST _V_SCHED_GRA_LATEST Statistics D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 21 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances NAME For Description _V_SCHED_SN Statistics _V_SCHED_SN_EXT Statistics _V_SCHED_SN_EXT_LATES Statistics T _V_SCHED_SN_LATEST Statistics _V_SCHEMA Schema _V_SCHEMA_XDB Schema _V_SEQUENCE Sequence Returns a list of all defined sequences _V_SEQUENCE_XDB Sequence _V_SESSION Session Returns a list of all active sessions _V_SESSION_BRIEF Session _V_SESSION_DETAIL Session _V_SESSION_DETAIL_TX Session _V_SESSION_VERSION System Parameter _V_SPA System Parameter _V_SPU SPU data _V_SPUDEVICEMAP SPU data _V_SPUPARTITION SPU data _V_STATISTIC Statistic _V_STATISTIC_XDB Statistics _V_SYNONYM Synonym _V_SYNONYM_XDB Synonyms _V_SYS_COLUMNS Column _V_SYS_COLUMNS_XDB Column _V_SYS_CONSTRAINT Constraint _V_SYS_CONSTRAINT_XDB Constraint _V_SYS_DATABASE Database _V_SYS_DATATYPE Data Type _V_SYS_DB_OWNER Database D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 22 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances NAME For Description _V_SYS_DB_OWNER_XDB Database _V_SYS_GROUP Group _V_SYS_GROUP_PRIV Group Returns a list of all defined group privileges _V_SYS_INDEX Index Returns a list of all system indexes _V_SYS_MISCOBJS Function _V_SYS_OBJECT_DATA Object _V_SYS_OBJECT_STORAGE Statistic _SIZE _V_SYS_PRIV User Returns a list of all user privileges. This list is a cumulative list of all groups and user-specific privileges. _V_SYS_RELATION Column _V_SYS_RELATION_XDB Function _V_SYS_SCHEMA Schema _V_SYS_SCHEMA_XDB Schema _V_SYS_TABLE Table Returns a list of all system tables _V_SYS_USER User _V_SYS_USER_PRIV User Returns a list of all defined user privileges _V_SYS_VIEW View Returns a list of all system views _V_SYSTEM_INFO System Version _V_SYSTEM_UTIL Utilisation _V_SYSTEMDEF System Parameter _V_TABLE Table Returns a list of all user tables _V_TABLE_CONSTRAINT Constraint _V_TABLE_CONSTRAINT_X Constraint DB _V_TABLE_DIST Column _V_TABLE_DIST_MAP Table Returns a list of all fields that are used to determine the table’s data distribution D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 23 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances NAME For Description _V_TABLE_DIST_MAP_XDB Column _V_TABLE_DS_EXTTYPEPA Statistic GES _V_TABLE_DS_ORGSTATE Groom _V_TABLE_EXTTYPEPAGES Statistic _V_TABLE_GROOMSTATE Groom _V_TABLE_INDEX Table Returns a list of all user table indexes _V_TABLE_ONLY_STORAGE Statistics _STAT _V_TABLE_ORGANIZE_COL Column UMN _V_TABLE_ORGSTATE Groom _V_TABLE_STORAGE_STAT Statistics _V_TABLE_XDB Table _V_TABLE_ZMAP_COLUM Column N _V_USER User Returns a list of all users _V_USER_PRIV User _V_USERGROUPS User Returns a list of all groups of which the user is a member _V_VIEW View Returns a list of all user views _V_VIEW_XDB View COLUMNS Column DOMAINS Data Type ROUTINES Procedure SCHEMATA Schema SEQUENCES Sequence TABLE_CONSTRAINTS Constraint TABLES Table VIEWS View D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 24 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances NAME For Description Metadata Queries This shows some handy metadata queries. O Purpose SQL 1 Column SELECT ATC.OWNER SN,ATC.NAME TN,ATC.ATTNUM ORD,ATC.ATTNAME CN,ATC.FORMAT_TYPE DT,ATC.ATTCOLLENG DL,ROUND(ATC.ATTDISPERSION * AT.RELTUPLES) AS ND ,CASE WHEN ATC.ATTNOTNULL THEN 'F' ELSE 'T' END AS ISNL ,CASE WHEN INSTR( ATC.FORMAT_TYPE, 'NUMERIC' ) > 0 THEN SUBSTR ( ATC.FORMAT_TYPE, 9, INSTR ( ATC.FORMAT_TYPE, ',' ) - 9 ) ELSE '0' END AS DP ,CASE WHEN INSTR( ATC.FORMAT_TYPE, 'NUMERIC' ) > 0 THEN SUBSTR ( ATC.FORMAT_TYPE, INSTR ( ATC.FORMAT_TYPE, ',' ) +1, LENGTH ( ATC.FORMAT_TYPE) - INSTR ( ATC.FORMAT_TYPE, ',' ) -1 ) ELSE '0' END AS DS ,CASE WHEN ATC.TYPE = 'TABLE' THEN 'T' ELSE 'F' END AS IST FROM _V_RELATION_COLUMN ATC INNER JOIN _V_TABLE AT ON ATC.OWNER = AT.OWNER AND ATC.NAME = AT.TABLENAME WHERE ATC.OWNER = 'SCHEMA1' ORDER BY ATC.OWNER, ATC.NAME, ATC.ATTNUM 2 Column SELECT * FROM _V_SYS_COLUMNS WHERE (COLUMN_NAME LIKE '%OWNER%' OR COLUMN_NAME LIKE '%USER%' OR COLUMN_NAME LIKE '%USR%') AND TABLE_SCHEM = 'ADMIN' ORDER BY TABLE_NAME ; 3 Column SELECT ATTNUM, ATTNAME FROM _V_RELATION_COLUMN WHERE NAME=UPPER('<TABLE NAME>') ORDER BY ATTNUM ASC; 4 Object Owner SELECT O.OBJNAME, U.USENAME, OC.CLASSNAME, D.DATABASE, D.OWNER FROM _T_OBJECT O JOIN _T_USER U ON O.OBJOWNER=U.USESYSID JOIN _T_OBJECT_CLASSES OC ON O.OBJCLASS=OC.OBJCLASS JOIN _V_DATABASE D ON O.OBJDB=D.OBJID WHERE U.USENAME='SCHEMA1' ; 5 Procedure SHOW PROCEDURE ALL 6 Query SELECT _V_QRYHIST.QH_DATABASE , _V_QRYHIST.QH_TSTART , _V_QRYHIST.QH_TEND , EXTRACT (EPOCH FROM ( _V_QRYHIST.QH_TSTART - _V_QRYHIST.QH_TEND ))* INTERVAL '1 SECOND' AS DIFF_HHMMSS FROM _V_QRYHIST WHERE 1=1 AND QH_TSUBMIT > CURRENT_DATE -- AND _V_QRYHIST.QH_DATABASE = 'SYSTEM' -- AND _V_QRYHIST.QH_SQL LIKE '%INCR%' -- ORDER BY QH_ESTCOST ASC 7 Query SELECT * FROM _V_QRYHIST WHERE 1=1 AND QH_TSUBMIT > CURRENT_DATE 8 Table SELECT AT.OWNER, AT.TABLENAME, AT.RELTUPLES FROM _V_TABLE AT 9 Table SELECT RELNAME TABLE_NAME, CASE WHEN RELTUPLES < 0 THEN ((2^32) * RELREFS) + ((2^32) + RELTUPLES ) ELSE ((2^32) * RELREFS) + ( RELTUPLES ) END NUM_ROWS FROM _T_CLASS, _T_OBJECT WHERE _T_OBJECT.OBJID=_T_CLASS.OID AND _T_OBJECT.OBJCLASS=4905 D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 25 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances O Purpose SQL 1 Table SELECT TABLENAME, OWNER, CREATEDATE FROM _V_TABLE WHERE 0 OBJTYPE='TABLE'; 1 Table SELECT TABLENAME, OBJTYPE, OWNER, CREATEDATE, USED_BYTES, 1 USED_BYTES/1073741824 as USED_GB, RELTUPLES as "ROWS" FROM _V_TABLE_ONLY_STORAGE_STAT WHERE OBJCLASS = 4905 OR OBJCLASS = 4911 ORDER BY TABLENAME; 1 Table skew SELECT TABLENAME, OBJTYPE, OWNER, CREATEDATE, USED_BYTES, SKEW FROM 2 _V_TABLE_ONLY_STORAGE_STAT WHERE OBJCLASS = 4905 OR OBJCLASS = 4911 ORDER BY TABLENAME; 1 Time SELECT CREATEDBYEXT, SQLEXT.REGEXP_EXTRACT(CREATEDBYEXT,'0[1-9]|1[012]') 3 AS CREATEDBYUSMTH, MODIFIEDDATE, CREATEDDATE, SQLEXT.DAYS_BETWEEN (MODIFIEDDATE, CREATEDDATE) AS DAYDIFF, UPDATEDBYEMAILADDR, INSTR (UPDATEDBYEMAILADDR,'@') FROM DB1.SCHEM1.TABLE1 1 User SELECT GROUPNAME, OWNER, USERNAME FROM _V_GROUPUSERS; 4 1 User group SELECT GROUPNAME, OWNER, CREATEDATE, ROWLIMIT, SESSIONTIMEOUT, 5 QUERYTIMEOUT, DEF_PRIORITY, MAX_PRIORITY FROM _V_GROUP; 1 View SELECT VIEWNAME, OWNER, CREATEDATE, DEFINITION FROM _V_VIEW WHERE 6 OBJTYPE='VIEW'; D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 26 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances SQL Standards for distribution keys Create each table with a distribution key with the following clause: CREATE TABLE <Tablename> ( <ForeignKeyName>_DK_ID NUMERIC (19, 0) NOT NULL, <PrimaryKeyName> <PrimaryKeyType> <ForeignKeyName> <ForeignKeyType> ... other source Columns ) DISTRIBUTE ON ( <ForeignKeyName>_DK_ID ) ; For a distribution based on VARCHAR source, use an integer sequence to populate this column. Note that the column name is based on foreign key, not primary key. This will enable automatic matching of these distribution key columns in the end user BI tool. For a distribution based on NUMBER source and a foreign key, add this function to the ETL: <ForeignKeyName>_ DK _ID = CAST( <ForeignKeyName> AS NUMERIC (19,0)); For a distribution based on NUMBER source and a primary key, add this function to the ETL: <ForeignKeyName>_ DK _ID = CAST( <PrimaryKeyName> AS NUMERIC (19,0)); For RANDOM distribution keys Create each table without a distribution key with the following clause: CREATE TABLE <Tablename> ( <PrimaryKeyName> <PrimaryKeyType> <ForeignKeyName> <ForeignKeyType> ... other source Columns ) DISTRIBUTE ON RANDOM ; See: https://www-304.ibm.com/connections/forums/html/topic?id=ec181859-bafe-4fc4-8bb1- 4714d920aa7b The more compressible the data, the better the performance. Compressed data eliminates I/O. If you have something that is 4:1 compressed, 75% of I/O has just been eliminated. Some data types compress better than others, some not at all. The double/float is not compressible so will negatively affect performance. D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 27 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances NUMERIC are scaled integers and compress well. As fixed types they improve performance (the NPS examines row content for data, so fixed-length values at the front of the record improve the efficiency, and any improvement in efficiency affects performance. A BIGINT is 19 digits wide ( NUMERIC (19,0) ) and the NUMERIC 38 is two big integers. NUMERIC (38) can represent a larger number. Double can represent significantly larger numbers. DOUBLE is rarely used for joining and is extremely inefficient as a join key. DOUBLE is also more (if not highly) inefficient as a representation, so don't choose it when NUMERIC could work instead. NUMERIC (38) is only marginally less efficient than the BIGINT. The BIGINT is the most efficient data type in Netezza, as everything is optimized around it. VARCHAR is the least efficient but can be the "largest" data type. The performance difference in a BIGINT versus VARCHAR for a join key is typically 100:1 in favour of BIGINT because comparison is a binary subtraction on a CPU register where VARCHAR comparison required an (albeit efficient) looping algorithm. Choice of data types is most definitely a performance decision. If you have a VARCHAR key that is all NUMERIC, convert it to an integer or NUMERIC and it will perform orders-of-magnitude better. Move fixed-length fields to the front of the record. Align data types for joining or comparison - don't compare or join unlike types since it causes the query to perform on-demand conversion. On-demand conversion in the where-clause - with millions or billions of rows to compare-or- join, this is a performance drain. Never use a VARCHAR for a distribution key. Hash8 the thing if you have to. DOUBLE or VARCHAR do not work as zone mapped columns. VARCHAR can be used in Organize On for zone-mapped columns. NUMERIC, INTEGER and DATE always work as zone mapped columns. Never use DATE as distribution keys. Never use low-cardinality columns as distribution keys. Zone mapping and distribution are key performance strategies in Netezza. Choose data types or convert data types to align with Netezza's performance model. D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 28 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances Netezza Line Commands The following table describes the commands that you can use to monitor and manage the IBM® Netezza® system. These commands are in the /nz/kit/bin directory on the Netezza host. A subset of these commands is also installed with the Netezza client kits and can be run from a remote client system, as described in Command locations. To run these commands from the Netezza system, you must be able to log in as a valid Linux user on the Netezza system. Most users typically log in as the nz user. In addition, many commands require that you to specify a valid database user account and password; the database user might require special privileges, as described in Command privileges. Throughout this section, some command examples show the database user and password options on the command line, and some examples omit them with the assumption that the user and password were cached such as by using nzpassword. Command Description nzbackup Backs up an existing database. nzcontents Displays the revision and build number of all the executable files, plus the checksum of Netezza binaries. nzconvert Converts character encodings for loading with the nzload command or external tables. nzds Manages and displays information about the data slices on the system. nzevent Displays and manages event rules. nzhistcleanupdb Deletes old history information from a history database. nzhistcreatedb Creates a history database with all its tables, views, and objects for history collection and reporting. nzhostbackup Backs up the host information, including users and groups. nzhostrestore Restores the host information. nzhw Manages system hardware components. nzinventory This command is obsolete in Release 5.0. nzkey Creates and manages authentication keys for self-encrypting drives (SEDs). nzkeydb Creates and manages a key store for SED authentication keys. nzkeybackup Creates a backup of a key store for SED authentication keys. nzkeyrestore Restores from a backup of a key store for SED authentication keys. nzload Loads data into database files. D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 29 of 30 Matthew Lawler [email protected] Datawarehouse Netezza Annoyances Command Description nzpassword Stores a local copy of the user password. nzreclaim Grooms databases and tables to remove deleted or outdated rows, and also reorganizes the tables based on their organizing keys. nzrestore Restores the contents of a database backup. nzrev Displays the current software revision for any Netezza software release. nzsession Shows a list of current system sessions (load, client, and sql). Supports filtering by session type or user, aborting sessions, and changing the current job list for a queued session job. nzsfi This command is obsolete in Release 5.0. nzspu This command is obsolete in Release 5.0. nzspupart Shows a list of all the SPU partitions and the disks that support them. nzsql Runs the SQL command interpreter. nzstart Starts the system. nzstate Displays the current system state or waits for a specific system state to occur before it returns. nzstats Displays system level statistics. nzstop Stops the system. nzsystem Changes the system state or displays the current system information. nztopology This command is obsolete in Release 5.0. https://www- 01.ibm.com/support/knowledgecenter/SSULQD_7.2.0/com.ibm.nz.adm.doc/r_sysadm_cmd_summa ry.html D:\D\Documents\DW Me\0 Publish\DW Netezza Annoyances.docx February 13, 2018 30 of 30
Copyright © 2024 DOKUMEN.SITE Inc.