Netezza Data Loading Guide - Free Download PDF Ebook

Netezza CorporationCorporate Headquarters 26 Forest St., Marlborough, Massachusetts 01752 tel 508.382.8200 fax 508.382.8300 www.netezza.com Netezza Data Loading Guide Document Number: 20525-1 Rev. 1 Software Release: 6.0.x Revised: October 6, 2010 The specifications and information regarding the products described in this manual are subject to change without notice. All statements, information, and recommendations in this manual are believed to be accurate. Netezza makes no representations or warranties of any kind, express or implied, including, without limitation, those of merchantability, fitness for a particular purpose, and non infringement, regarding this manual or the products' use or performance. In no event will Netezza be liable for indirect, incidental, consequential, special, or economic damages (including lost business profits, business interruption, loss or damage of data, and the like) arising out of the use or inability to use this manual or the products, regardless of the form of action, whether in contract, tort (including negligence), breach of warranty, or otherwise, even if Netezza has been advised of the possibility of such damages. Netezza, the Netezza logo, Netezza TwinFin, TwinFin, Snippet Blades, S-Blades, NPS, Snippet, Snippet Processing Unit, SPU, Snippet Processing Array, SPA, Performance Server, Netezza Performance Server, Asymmetric Massively Parallel Processing, AMPP, Intelligent Query Streaming and other marks are trademarks or registered trademarks of Netezza Corporation in the United States and/or other countries. All rights reserved. Red Hat is a trademark or registered trademark of Red Hat, Inc. in the United States and/or other countries. Linux is a trademark or registered trademark of Linus Torvalds in the United States and/or other countries. D-CC, D-C++, Diab+, FastJ, pSOS+, SingleStep, Tornado, VxWorks, Wind River, and the Wind River logo are trademarks, registered trademarks, or service marks of Wind River Systems, Inc. Tornado patent pending. APC and the APC logo are trademarks or registered trademarks of American Power Conversion Corporation. All document files and software of the above named third-party suppliers are provided "as is" and may contain deficiencies. Netezza and its suppliers dis- claim all warranties of any kind, express or implied, including, without limitation, those of merchantability, fitness for a particular purpose, and non infringement. In no event will Netezza or its suppliers be liable for indirect, incidental, consequential, special, or economic damages (including lost business profits, business interruption, loss or damage of data, and the like), or the use or inability to use the above-named third-party products, even if Netezza or its suppliers have been advised of the possibility of such damages. All other trademarks mentioned in this document are the property of their respective owners. Document Number: 20525-1 Software Release Number: 6.0.x Netezza Data Loading Guide Copyright © 2001-2010 Netezza Corporation. All rights reserved. PostgreSQL Portions of this publication were derived from PostgreSQL documentation. For those portions of the documentation that were derived originally from Postgr- eSQL documentation, and only for those portions, the following applies: PostgreSQL is copyright © 1996-2001 by the PostgreSQL global development group and is distributed under the terms of the license of the University of California below. Postgres95 is copyright © 1994-5 by the Regents of the University of California. Permission to use, copy, modify, and distribute this documentation for any purpose, without fee, and without a written agreement is hereby granted, provided that the above copyright notice and this paragraph and the following two paragraphs appear in all copies. In no event shall the University of California be liable to any party for direct, indirect, special, incidental, or consequential damages, including lost profits, arising out of the use of this documentation, even if the University of California has been advised of the possibility of such damage. The University of California specifically disclaims any warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. The documentation provided hereunder is on an "as-is" basis, and the University of California has no obligations to provide maintenance, support, updates, enhancements, or modifications. ICU Library The Netezza implementation of the ICU library is an adaptation of an open source library Copyright (c) 1995-2003 International Business Machines Corpo- ration and others. ICU License - ICU 1.8.1 and later COPYRIGHT AND PERMISSION NOTICE Copyright (c) 1995-2003 International Business Machines Corporation and others All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, provided that the above copyright notice(s) and this permission notice appear in all copies of the Software and that both the above copyright notice(s) and this permission notice appear in supporting documentation. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRAN- TIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL DAM- AGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. Except as contained in this notice, the name of a copyright holder shall not be used in advertising or otherwise to promote the sale, use or other dealings in this Software without prior written authorization of the copyright holder. ODBC Driver The Netezza implementation of the ODBC driver is an adaptation of an open source driver, Copyright © 2000, 2001, Great Bridge LLC. The source code for this driver and the object code of any Netezza software that links with it are available upon request to [email protected] Botan License Copyright (C) 1999-2008 Jack Lloyd 2001 Peter J Jones 2004-2007 Justin Karneges 2005 Matthew Gregan 2005-2006 Matt Johnston 2006 Luca Piccarreta 2007 Yves Jerschow 2007-2008 FlexSecure GmbH 2007-2008 Technische Universitat Darmstadt 2007-2008 Falko Strenzke 2007-2008 Martin Doering 2007 Manuel Hartl 2007 Christoph Ludwig 2007 Patrick Sona All rights reserved. Redistribution and use in source and binary forms, for any use, with or without modification, of Botan (http://botan.randombit.net/license.html) is permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions, and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions, and the following disclaimer in the documentation and/ or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE AUTHOR(S) "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR(S) OR CONTRIBUTOR(S) BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CON- SEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBIL- ITYOF SUCH DAMAGE. Regulatory Notices Install the NPS system in a restricted-access location. Ensure that only those trained to operate or service the equipment have physical access to it. Install each AC power outlet near the NPS rack that plugs into it, and keep it freely accessible. Provide approved 30A circuit breakers on all power sources. Product may be powered by redundant power sources. Disconnect ALL power sources before servicing. High leakage current. Earth connection essential before connecting supply. Courant de fuite élevé. Raccordement à la terre indispensable avant le raccordement au réseau. FCC - Industry Canada Statement This equipment has been tested and found to comply with the limits for a Class A digital device, pursuant to part 15 of the FCC rules. These limits are designed to provide reasonable protection against harmful interference when the equipment is operated in a commercial environment. This equipment generates, uses, and can radiate radio-frequency energy and, if not installed and used in accordance with the instruction manual, may cause harmful interference to radio communications. Operation of this equipment in a residential area is likely to cause harmful interference, in which case users will be required to correct the interference at their own expense. This Class A digital apparatus meets all requirements of the Canadian Interference-Causing Equipment Regulations. Cet appareil numérique de la classe A respecte toutes les exigences du Règlement sur le matériel brouilleur du Canada. WEEE Netezza Corporation is committed to meeting the requirements of the European Union (EU) Waste Electrical and Electronic Equipment (WEEE) Directive. This Directive requires producers of electrical and electronic equipment to finance the takeback, for reuse or recycling, of their products placed on the EU market after August 13, 2005. CE Statement (Europe) This product complies with the European Low Voltage Directive 73/23/EEC and EMC Directive 89/336/EEC as amended by European Directive 93/68/EEC. Warning: This is a class A product. In a domestic environment this product may cause radio interference in which case the user may be required to take adequate measures. VCCI Statement この装置は、情報処埋装置等電波障害自主規制協議会（VCCI）の基準に基づくクラス A 情報技術装置です。この装置を家庭環境で使用すると電波妨害を引き起越すことがあります。この場合には使用者が適切な対策を講ずるう要求されることがあります。 v Table of Contents Preface 1 Overview Data Loading Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 Data Loading Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2 New Decimal Delimiter Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2 2 External Tables About External Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 Privileges Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 Displaying External Table Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 Log Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 Usage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 Parsing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 Backing Up and Restoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4 Command Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4 Transient External Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4 Explicit Schema Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 Implicit Schema Definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 Exporting Data Using Transient External Tables . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 Remote Transient External Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 Supported Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 Integer Data Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7 Fixed-Point Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7 Floating-Point Data Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8 Character Strings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10 Time Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-11 Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13 Best Practices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15 Transient External Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15 Fixed-Length Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16 Standard Unloading and Reloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16 vi Back up and Restore a User Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17 3 External Table Options Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1 Option Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 BoolStyle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 Compress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 CRinString . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 CtrlChars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 DataObject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 DateDelim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5 DateStyle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5 DecimalDelim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6 Delimiter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7 EscapeChar. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7 FillRecord. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8 Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8 IgnoreZero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8 IncludeZeroSeconds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8 Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8 LogDir . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9 MaxErrors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9 MaxRows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9 NullValue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9 QuotedValue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10 RecordDelim. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10 RecordLength . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11 RemoteSource. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11 RequireQuotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11 SkipRows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11 SocketBufSize. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12 TimeDelim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12 TimeRoundNanos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12 TimeStyle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12 TruncString. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12 Y2Base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13 vii Option Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13 Counting Rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13 Handling Bad Rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14 Delineating Input Rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14 Matching Input Fields to Table Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14 Using String and Non-string Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14 Handling the Absence of a Value. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14 Enabling Load Continuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15 Handling Legal Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15 Session Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-16 4 Using nzload How the nzload Command Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1 Protection and Privileges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1 Concurrency and Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2 Program Invocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2 Using the nzload Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2 Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3 Additional Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3 Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4 Using a Control File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5 Configuration File Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6 5 Unloading Data Unloading Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 Unloading Data to a Remote Client System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2 6 Using Fixed-Length Format Formatting Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1 Fixed-Length Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1 Data Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2 Format Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2 New Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2 Changed Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3 Unsupported Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3 Default Values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4 Layout Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4 viii Building the Fixed-Length Format Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6 End-of-Record. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6 Record Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6 Skipping Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6 Temporal Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6 Numeric Values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7 Logical Values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8 Null Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8 Appendix A: Examples and Grammar The nzload Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1 Specifying nzload Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1 Using Named Pipes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2 Sample nzload Usage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2 Reference Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-4 Decimal Delimiter Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-4 SQL Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-5 Fixed-Length Format Definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-7 Appendix B: Troubleshooting Tips for Successful Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1 Create Your Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1 Determine Your Data Format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1 Consider the Load Source. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-2 Run the Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-2 Troubleshoot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3 Handle Exceptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3 Validate the Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3 Generate Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3 Test Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3 nzload Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-4 Reporting Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-4 Understanding nzload Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-4 Appendix C: Option Names Specifying Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-1 Index ix Preface The Netezza Data Loading Guide describes the Netezza functionality for data loading. Audience The Netezza Data Loading Guide is written for administrators using data loading features. About this Guide This guide contains the following information: Symbols and Conventions This guide uses the following typographical conventions: Italics for emphasis on terms and user-defined values such as user input Upper case for SQL commands; for example INSERT, DELETE Bold for command line input; for example, nzsystem stop If You Need Help If you are having trouble using the Netezza appliance, you should: 1. Retry the action, carefully following the instructions given for that task in the documentation. Topics See the following Introduction to Data Loading Concepts and Terms Chapter 1, “Overview” How to use External Tables Chapter 2, “External Tables” External Table options to use, and how the system processes them Chapter 3, “External Table Options” Details on the nzload command Chapter 4, “Using nzload” Details on unloading data Chapter 5, “Unloading Data” Details on the Fixed-Length format Chapter 6, “Using Fixed-Length Format” Examples of commands, format, and usage Appendix A, “Examples and Grammar” Command and Task Tips Appendix B, “Troubleshooting” How to enter external table options on the command line, in a control file, or in a SQL command Appendix C, “Option Names” x 2. Go to the Netezza Knowledge Base at https://knowledge.netezza.com. Enter your support username and password. You can search the knowledge base or the latest updates to the product documentation. Click Netezza HelpDesk to submit a support request. 3. If you are unable to access the Netezza Knowledge Base, you can also contact Netezza Support at the following telephone numbers: North American Toll-Free: +1.877.810.4441 United Kingdom Free-Phone: +0.800.032.8382 International Direct: +1.508.620.2281 Refer to your Netezza maintenance agreement for details about your support plan choices and coverage. Netezza Welcomes Your Comments Let us know what you like or dislike about our manuals. To help us with future versions of our manuals, we want to know about any corrections or clarifications that you would find useful. Include the following information: The name and version of the manual that you are using Any comments that you have about the manual Your name, address, and phone number Send us an e-mail message at the following address: [email protected] The doc alias is reserved exclusively for reporting errors and omissions in our documentation. We appreciate your suggestions. 1-1 C H A P T E R 1 Overview What’s in this chapter Data Loading Components Data Loading Formats New Decimal Delimiter Option This chapter provides general information about the data loading methods now available. Note that loading data takes a significant allocation of system resources, which may affect performance. Data Loading Components Within the Netezza environment, data loading means simply to transfer data to the Netezza appliance. Within this framework, there are a number of components: External Tables – These are tables stored as flat files on the host or client systems and not in the Netezza appliance database. These tables can be used to load data into the Netezza appliance. For more information, see Chapter 2, “External Tables.” nzload – This is a command that provides an easy method for using external tables and getting data into the Netezza appliance. For more information, see Chapter 4, “Using nzload.” Format Options – These are options for formatting the data load to and from external tables. Since data comes in different forms, Netezza provides different ways of setting up the load. For more information, see Chapter 4, “Using nzload,” and Chapter 6, “Using Fixed-Length Format.” Backup and Restore – There are different methods for doing backups and restores to transfer data between systems. One method is to create external tables and use nzload, described in Chapter 2, “External Tables,” and Chapter 4, “Using nzload.” For more information on backups and restores, see “Backing Up and Restoring Databases” in the Netezza Performance Server System Administrator’s Guide. nz_migrate – This is a separate tool, not part of the Netezza software package. This utility is a script that can migrate (copy) a database/table from one Netezza appliance to another, or make a copy of a database/table on the same server. Run the following command to see the help explanation text for the command, showing syntax and usage: nz_migrate -? 1-2 20525-1 Rev. 1 Netezza Data Loading Guide Data Loading Formats In the database environment, there is always the need to load data from external sources such as files, pipes, or sockets into a table. These external sources have a variety of formats to represent each of the data types individually, and together as records or rows. When you load data from database-like applications, such as an RDBMS, a Web-server, or some other structured data-store, they may export data into files or streams in different formats. The following formats are used with the Netezza environment: Text-Delimited – The method commonly used for data loading is Text-Delimited format, where every value of a field or column ends with a delimiter, and each set of these values of rows or records has an end-of-record delimiter, typically a new-line character. Previously, this has been the preferred method used for loading data into external tables. Fixed-Length – The new format, which allows for a more expressive form of external table definition, thus increasing the kinds of data formats and layouts that can be loaded. Compressed Binary – This Netezza proprietary format compresses the data before a backup or restore to benefit performance. It typically yields smaller data files, retains information about the Netezza appliance topology, and thus is often faster to backup and restore. Compress the data before loading, and uncompress before unloading. For more information, see the Netezza Performance Server System Administrator’s Guide. New Decimal Delimiter Option In the 6.0 release, a new option allows you to specify a comma as a decimal separator, in addition to the period (the default value). This new option is available for external tables and for nzload, to help you to directly load data without extra pre-load conversion. For the text-delimited format, and for unloading data, this option is available only at the table level. For the fixed-length format, you can specify this option at the column level, making it possible to have a mix of comma and decimal separators. The option is available for the following data types, for both text-delimited and fixed-length formats: Numeric Float Double Time Timetz Timestamp Option usage for each data type is explained in each particular section describing that data type. For examples of how to use this new option, see Appendix A, “Examples and Grammar.” 2-1 C H A P T E R 2 External Tables What’s in this chapter About External Tables Command Syntax Transient External Tables Supported Data Types Restrictions Best Practices Examples This chapter describes external tables, as well as best practices and restrictions for using them. For options for using external table, see Chapter 3, “External Table Options.” For examples of how to use external tables, see Appendix A, “Examples and Grammar.” In the Netezza environment, there are the following types of tables: System tables – Stored on the host User tables – Stored on the SPUs External tables – Stored as flat files on the host or client systems About External Tables An external table allows Netezza to treat an external file as a database table. An external table has a definition (a table schema), but the actual data exists outside of the Netezza appliance database. External tables can be used to access files which are stored on the Netezza host server or, in the case of a remote external table, Netezza can treat a file on a client system as an external table (see REMOTESOURCE option). After you have created the external table definition, you can use INSERT INTO statements to load data from the external file into a database table, or SELECT FROM statements to query the external table. 2-2 20525-1 Rev. 1 Netezza Data Loading Guide Privileges Required To create an external table, you must have LIST privilege on the database and CREATE EXTERNAL TABLE administration privilege. The database user who issues the CREATE EXTERNAL TABLE command owns the resultant table. The operating system user must have proper permission on the data object (READ permission for loading, WRITE permission for unloading). Displaying External Table Information To display information about external tables, use the \d command from the nzsql prompt. To list all external tables found in the current database, use the \dx command. For example: dev(admin)=> \dx List of relations Name | Type | Owner -------------+-----------+------- extlineitem | ext table | admin xlineitem | ext table | admin (2 rows) To list the options defined in an external table, use the \d <external_tablename> command. For example: dev(admin)=>\d extlineitem Log Files By default, loading errors go into the following log files: nzbad – <tablename>.<dbname>.nzbad nzlog – <tablename>.<dbname>.nzlog You can override the default by specifying a file for errors to go by using the following with a filename: bf <filename> for nzbad lf <filename> for nzlog Usage Use external tables to do the following: Load data into the Netezza appliance from an external table and structure the loading operation to manipulate the data by using casts, joins, dropping columns, and so on. Store data outside the Netezza appliance, either to transfer to another application, or as a table backup. See “Backing Up and Restoring” on page 2-4. Create an external table and use data from an external table as part of a SQL query. The power of external tables is that the entire Extraction-Transformation-Loading (ETL) process is mapped to plain SQL. Since a SQL-based ETL process can be initiated/executed from any SQL client that can talk to the Netezza appliance, it reduces or avoids the requirement of specialized ETL tools. 20525-1 Rev. 1 2-3 About External Tables To load an external data file into the Netezza appliance as an external table, you can do either of the following: Use a FROM clause of a SELECT SQL statement/command, like any normal table. Use a WHERE clause of an UPDATE or DELETE SQL statement. To unload an external table into an external data file, use the table as the target table in any of the following SQL statements: INSERT SQL SELECT INTO SQL CREATE TABLE AS SELECT SQL All references to columns in the external table can be complex SQL expressions used for the transformation of external data during a load/unload process. For more information, see “Restrictions” on page 2-13. Parsing For loads, the sequence of rows are parsed one-by-one from the external data file, and con- verted into internal records of the external table. There could be errors during the parsing of each row, or each column. For example, there could be errors in identifying the column value itself, as in the case of a missing delimiter. Or there could be errors during the conversion from external format to internal records of the external table, such as alphabets mentioned for an integer column in Text-Delimited format. Each error is logged in detail in an nzlog file, and bad rows are logged in an nzbad file. These files help user to identify bad rows in the external data file and correct them for reloading. Depending on the load options of the external table in use, each bad row would either cause the row to be skipped, or the entire load to be aborted. Similarly, each bad column of a bad row could cause the rest of the row to be ignored, or if possible to recover, the load could continue to parse subsequent columns of the same row. Note that if there is an error in the project-expression on the external table columns, then the entire load is aborted and the transaction rolled back. Errors of this nature are not logged in nzbad or nzlog files, as they are outside of the scope of the external table load mechanism. Once the processing reaches the normal SQL engine, the external table is treated as if it is a normal table. Unlike an external table that has external rows in an ordered sequence, normal user tables have no implicit row order other than hidden rowid columns. So there is no way for a user not using rowids to identify the bad row in a SQL engine. In this case, the Netezza system just returns an error that a particular column caused an error, without identifying the bad row. It is as if the query was selecting from a normal table and inserting into another normal table, with some row that caused the error during insertion. 2-4 20525-1 Rev. 1 Netezza Data Loading Guide Backing Up and Restoring You can use external tables to back up a table in the system database. While the Netezza appliance database backup utility, nzbackup, enables you to create backups of the entire database, the external table backup method allows you to create a backup of a single table, with the ability to later restore it to the database as needed. To back up table data using an external table, create external table definitions for each user table and then use SQL to insert into the external table. When you restore table data, create a table definition (if it does not exist) and then use SQL to insert into the table from an external table. Command Syntax The CREATE EXTERNAL TABLE command has the following syntax. To create an external table based on another table: CREATE EXTERNAL TABLE table_name SAMEAS table_name USING external_table_options To create an external table by defining columns: CREATE EXTERNAL TABLE table_name ({ column_name type [ column_constraint [ ... ] ]} [, ... ] ) [USING external_table_options] Note: Although you can specify column constraints, they are ignored, and must be defined elsewhere. For more information, see “Column Constraint Rules for Empty Strings” on page 2-10. Transient External Tables Transient external tables (TET) provide a way to define an external table that exists only for the duration of a single query. Transient external tables have the same capabilities and lim- itations as normal external tables. A special feature of a TET is that the schema does not have to be defined when the TET is used to load data into a table or when the TET is created as the target of a SELECT statement. Syntax The following is the syntax for a TET: INSERT INTO <table> SELECT <column_list | *> FROM EXTERNAL 'filename' [(schema_definition)] [USING (external_table_options)]; CREATE EXTERNAL TABLE 'filename' [USING (external_table_options)] AS select_statement; 20525-1 Rev. 1 2-5 Transient External Tables SELECT <column_list | *> FROM EXTERNAL 'filename' (schema_definition) [USING (external_table_options)]; Explicit Schema Definition The schema of a transient external table can be explicitly defined in a query. When defined this way, the schema definition is the same as is used when defining a schema using CRE- ATE TABLE. SELECT x, y, NVL(dt, current_date) AS dt FROM EXTERNAL '/tmp/test.txt' ( x integer, y numeric(18,4), dt date ) USING (DELIM ','); The explicit schema definition feature can be used to specify fixed length formats. SELECT * FROM EXTERNAL '/tmp/fixed.txt' ( x integer, y numeric(18,4), dt date ) USING (FORMAT 'fixed' LAYOUT (bytes 4, bytes 20, bytes 10)); The SAMEAS keyword can also be used to specify that the schema of the external table is identical to some other table that currently exists in the database. SELECT * FROM EXTERNAL '/tmp/test.txt' SAMEAS test_table USING (DELIM ','); Implicit Schema Definition If the schema is not explicitly defined, the schema for a transient external table is determined based on the query being executed. When a TET is used as a data source for an INSERT statement, the external table will take on the schema of the target table. The external table in this INSERT statement takes on the schema of the target table. The columns in the external data file must be in the same order as the target table, and every column in the target table must also exist in the external table data file. INSERT INTO target SELECT * FROM external '/tmp/data.txt' USING (DELIM '|'); Exporting Data Using Transient External Tables A transient external table can also be used to export data out of the database. In this case the schema of the external table is based on the query being executed. Example: CREATE EXTERNAL TABLE '/tmp/export.csv' USING (DELIM ',') AS SELECT foo.x, bar.y, bar.dt FROM foo, bar WHERE foo.x = bar.x; Remote Transient External Tables A session connected to Netezza using ODBC, JDBC, or OLE DB from a client system can import and export data using a remote transient external table, which is defined by using the REMOTESOURCE option in the USING clause. For example, the following SQL statement loads data from a file on a Windows system into a TEMP table on Netezza, using an ODBC connection. CREATE TEMP TABLE mydata AS SELECT cust_id, upper(cust_name) as name from external 'c:\customer\data.csv' (cust_id integer, cust_name varchar(100)) USING (DELIM ',' REMOTESOURCE 'ODBC'); 2-6 20525-1 Rev. 1 Netezza Data Loading Guide Remote external table loads work by sending the contents of a file from the client system to the Netezza server where the data is then parsed. This method minimizes CPU usage on the client system during a remote external table load. Supported Data Types Table 2-1 describes the Netezza supported data types for external tables. For more information about the specific data types, see the Netezza Performance Server Database User’s Guide. Table 2-1: Supported Data Types Data Type Example Description byteint smallint integer bigint 120 0 256 1290985 See “Integer Data Types” on page 2-7. numeric decimal -99.56 123.679 See “Fixed-Point Data Types” on page 2-7. real double precision –81293.35 See “Floating-Point Data Types” on page 2-8. char (n) salary See “Character Strings” on page 2-10 and “Column Constraint Rules for Empty Strings” on page 2-10. varchar (n) I am John See “Character Strings” on page 2-10 and “Column Constraint Rules for Empty Strings” on page 2-10. boolean true An ASCII string that contains any of the following values: [true|false]|[yes|no]|[1|0]|[t|f]|[y|n] See “BoolStyle” on page 3-3. date 2002-02-04 The date is an exact four-byte data type. The system recognizes a range of dates composed of year, month, and day. See “DateStyle” on page 3-5. time 01:59:45 23:00:01 See “Time” on page 2-11. time with time zone 01:15:33 -05 See “Time with time zone” on page 2-12. timestamp 2002-02-04 01:15:33 See “Timestamp” on page 2-12. 20525-1 Rev. 1 2-7 Supported Data Types Integer Data Types Integer types are exact data types. The system generates an error if an input field’s value cannot be expressed without loss of accuracy in the target table. Table 2-2 describes the integer syntax. Table 2-3 describes the integer handling. Fixed-Point Data Types The fixed-point data types are exact data types. The system generates an error if an input field’s value cannot be expressed without loss of accuracy in the target table or database. Table 2-4 lists and describes the fixed-point syntax. Table 2-2: Integer Description Syntax [‘+’|’-’]<digit>… Description • Optional leading sign • Unlimited leading zeros • At least one decimal digit Limitation • No thousands-separator commas • No support for exponential notation Table 2-3: Integer Handling SQL Alias Representation Values byteint int1 1 byte, signed min value = -128 max value = 127 smallint int2 2 bytes, signed min value = -32768 max value = 32767 integer int or int4 4 bytes, signed min value = –2147483648 max value = 2147483647 bigint int8 8 bytes, signed min value = –9223372036854775808 max value = 9223372036854775807 Table 2-4: Fixed-Point Description Syntax [‘+’|’-’]<digit>…[‘.’[<digit>…]] [‘+’|’-’]’.’<digit>… [‘+’|’-’]<digit>…[‘,’[<digit>…]] [‘+’|’-’]’,’<digit>… Description • Optional leading sign • Unlimited leading zeros • At least one decimal digit 2-8 20525-1 Rev. 1 Netezza Data Loading Guide The syntax of fixed-point values is the same as the syntax of integer values with the addition of an optional decimal digit that can occur anywhere — from before the first decimal digit to after the last decimal digit. The optional decimal point can be followed by zero or more decimal digits, if there is at least one decimal digit before the decimal point; followed by one or more decimal digits if there are no decimal digits before the decimal point. If there is no explicit decimal point, the system assumes a decimal point immediately following the last decimal digit. You can also specify a comma as a separator, using it like the decimal digit. For examples of how to do this, see “Decimal Delimiter Examples” on page A-4. Table 2-5 describes the fixed-point precision and representation: The following result in system errors: Precision – Having more decimal digits before the decimal point than the declaration allows (P-S). Scale – Having more decimal digits following the decimal point than the declared scale (S). Note: Because fixed-point is an exact data type, when there are too many digits following the decimal point, the system does not round the number. Floating-Point Data Types The floating-point data types are approximate data types. The system rounds the significand if more precision is present that it can represent. Table 2-6 lists the floating point syntax. Limitation • No thousands-separator commas • No support for exponential notation Table 2-5: Fixed-Point Precision Precision Representation 4 bytes, signed 8 bytes, signed 16 bytes signed Table 2-4: Fixed-Point Description P 9 ≤ 9 P 18 ≤ < 18 P 38 ≤ < Table 2-6: Floating-Point Description Syntax [ '+' | '-' ] <digit>… [ '.' [ <digit>… ] ] [( 'e' | 'E' ) [ '+' | '-' ] <digit>… ] [ '+' | '-' ] '.' <digit>… [ ( 'e' | 'E' ) [ '+' | '-' ] <digit>… ] [ '+' | '-' ] <digit>… [ ',' [ <digit>… ] ] [( 'e' | 'E' ) [ '+' | '-' ] <digit>… ] [ '+' | '-' ] ',' <digit>… [ ( 'e' | 'E' ) [ '+' | '-' ] <digit>… ] 20525-1 Rev. 1 2-9 Supported Data Types The syntax of fixed-point values is the same as the syntax of fixed-point values augmented by an optional trailing exponent specification. The optional decimal point can be followed by zero or more decimal digits, if there is at least one decimal digit before the decimal point; followed by one or more decimal digits if there are no decimal digits before the decimal point. If there is no explicit decimal point, the system assumes a decimal point immediately following the last decimal digit. You can also specify a comma as a separator, using it like the decimal digit. For examples of how to do this, see “Decimal Delimiter Examples” on page A-4. The optional power of ten exponent is ‘e’ (lower or uppercase), with an optional sign, non-empty sequence of decimal digits. Table 2-7 describes the floating-point precision and representation: The following result in system errors: Overflow – If the field exceeds the largest representable value (maximal exponent and maximal significand) Underflow – If the number is too small to approximate in the denormalized range Description • Optional leading sign • Unlimited leading zeros • At least one decimal digit • Decimal point or comma, if needed • Optional ‘e’ or ‘E’ introducing an exponent followed by an optional sign and one or more digits Limitation • No thousands commas • No support for loading exceptional values (Not a Number (NaNs) and infinities) Table 2-7: Floating-Point Precision Type Real Double Representation 4 byte IEEE floating point 8 byte IEEE floating point Approx. largest normalized value ±3.40e+38 ±1.79e+308 Approx. smallest normalized value ±1.18e-38 ±3.40e-308 Approx. smallest denormalized value ±7.01e-46 ±2.50e-324 Table 2-6: Floating-Point Description 2-10 20525-1 Rev. 1 Netezza Data Loading Guide Character Strings Char(n)/nchar(n) are character strings of length n. Varchar(n)/nvarchar(n) are variable-length character strings of maximum length n. A valid character is between the ASCII values 32 to 255. System Handling of Characters Table 2-8 describes how the system handles char, nchar, varchar, and nvarchar characters. Column Constraint Rules for Empty Strings For all char(n) and varchar(n) data types, the result of inserting an empty string and filling in missing data values depends upon whether the columns are declared null-able (default) or not null-able (declared with constraint not null). Table 2-9 describes the different cases. Table 2-8: Character Handling Char, Nchar, Varchar, and Nvarchar How Handled Padding Char/Nchar – Padded to normal length with spaces Varchar/Nvarchar – Not padded Truncation If the data is longer than the field: • The system writes the record to the nzbad file. • The system writes the records and column number to the nzlog file. Note that you can turn on automatic truncation with the -trunc- String option. Note: If you use this option for Unicode character data, it could truncate combined NFC characters if they exceed the specified column length. The switch does not attempt to keep any grapheme clusters; it truncates data as necessary to fit in the specified column size. Table 2-9: Column Constraining Rule for Empty Strings Data Type Column Constraint Null Token Exists Null Token Does Not Exist null token "" (empty string) "" (empty string) Char/Nchar Varchar/Nvarchar NULL NULL char/nchar: space filled. varchar/nvarchar: zero length string. NULL NOT NULL ERROR char/nchar:: space filled. varchar/nvarchar: zero length string. ERROR 20525-1 Rev. 1 2-11 Supported Data Types If the record contains fewer data values than the actual columns defined in the table’s schema, the system writes an error to the nzlog file and discards the record. To override this behavior, use the -fillRecord option, which applies to the entire load operation. The -fillRecord option tells the system to use a null value in place of any missing fields. You can use this option as long as the columns whose values are missing allow nulls. If these columns are defined as not null, the system writes an error to the nzlog file and discards the record. You must resolve this conflict by changing the schema to allow null values or modifying the data file to include a valid non-null value. Time Data Types The system supports time, timestamp, and time with time zone. These data types are exact types, stored to the accuracy of (1/1,000,000 of a second). You can also specify a comma as a separator in time data types, using it like the decimal digit. For examples of how to do this, see “Decimal Delimiter Examples” on page A-4. Time The Netezza appliance time is an exact, eight-byte data type stored internally as a signed integer representing the number of microseconds since midnight. The system accepts both 24 hour and 12 hour AM/PM time values. You can specify the format with the -timeStyle option. The default is the 24-hour format. The time format consists of five components: hour, minute, second, fraction of a second, and AM/PM token. You must have hour and minute; second and fraction of second are optional. The AM or PM token is required for 12 hour and not allowed for 24-hour format. The time options have the following formats. Note that the delimited examples use the default time delimiter, which is a colon (:). 12-hour delimited HH:MM:SS.FFF [AM | PM] (such as 10:12 PM, or 1:02:46.12345 AM) 12-hour undelimited HHMMSS.FFF [AM | PM] (such as 1012 PM or 010246.12345 PM) 24-hour delimited HH:MM:SS.FFF (such as 19:15 or 1:15:00.1234) 24-hour undelimited HHMMSS.FFF (such as 1915 or 10246.12345 PM) In these formats, note the following: Bool, Date, Int (1,2,4,8), Numeric(), Float (4,8), Time, Timestamp, Timetz NULL NULL NULL NULL NOT NULL ERROR ERROR ERROR Table 2-9: Column Constraining Rule for Empty Strings Data Type Column Constraint Null Token Exists Null Token Does Not Exist null token "" (empty string) "" (empty string) 1μSec 2-12 20525-1 Rev. 1 Netezza Data Loading Guide HH is a one- or two-digit hour value from 1–12 in the 12-hour notation or 1–24 in the 24-hour notation. In undelimited format, you must specify two digits such as 01, 02, and so on. MM is a one- or two-digit minute value from 1–60. In undelimited format, you must specify two digits such as 01, 02, and so on. SS is a one- or two-digit seconds value from 1–60. In undelimited format, you must specify two digits such as 01, 02, and so on. FFF specifies a fraction of a second. If you specify a fractional value, you must precede it with a decimal point. If the value can be stored without loss of precision, it is accepted. If the value cannot be stored without loss of precision, it is rejected. You can use the -timeRoundNanos option to allow rounding when the full precision of any fractional digits cannot be preserved, as described in “Using the -timeRoundNanos Option” on page 10-22. Time with time zone The Netezza time with time zone (timetz) is an exact data type stored in 12 bytes. Inter- nally the Netezza appliance stores it as time and an offset. The stored offset has the same resolution as time even though the input is restricted to a one-minute resolution. Syntax The input format of time with time zone value is identical to that of simple time followed by a trailing signed offset from Coordinated Universal Time (UTC — formerly Greenwich Mean Time GMT). The time section must conform to the -timeStyle and -timeDelim in effect during the nzload job. You must specify a signed, time-zone hour, whereas the time-zone minute is optional. If you use the minute, separate it with a colon (the default timeDelim character). Note: You cannot use named time zones, such as EST. Table 2-10 lists the time with time zone syntax. Errors The following are time and range errors: Time – The same errors as the time data type. Range – The time zone offset is restricted to -13:00 to +12:59. Timestamp The Netezza appliance timestamp is an exact data type stored as eight bytes. The stored offset has the same resolution as the time data type. Syntax The input format of a timestamp value is a date value followed by a time value. You can have optional spaces between the date and the time. The date section must conform to the -dateStyle and -dateDelim in effect during the load job. Table 2-10: Time With Time Zone Description time with time zone <time> ( '+' | '-' ) <digit> [ <digit> [ ':' <digit> [ <digit> ] ] ] 1μS 1μS 20525-1 Rev. 1 2-13 Restrictions Table 2-11 lists the timestamp syntax. Errors The following are date and time errors: Date – The same errors as the date data type. Time – The same errors as the time data type. Restrictions The following are restrictions and considerations for use with external tables: Always consider your source and target systems, and whether the data is properly for- matted for loading. To insert and drop an external table, use the INSERT and DROP commands. You cannot delete, truncate, or update an external table. After creating an external table, you can alter as well as drop the table definition. (Dropping an external table deletes the table definition, but it does not delete the data file that is associated with the table.) You can select the rows in the table, as well as insert rows into the table (following a table truncation). While you cannot select from more than one external table at a time in a query or sub- query, you can move data from one external table to another, such as using SELECT and INSERT. The system displays an error if you incorrectly specify multiple external tables in a SQL query, or if you reference the same external table more than once in a query: *ERROR: Multiple external table references in a query not allowed* To specify more than two external tables, load the data in into a non-external table and specify this table in the query. You cannot perform a union operation involving two or more external tables. You cannot back up external tables using the nzbackup command, and attempting to do so displays a warning message. You cannot use the limit clause with compressed external tables. There is a maximum limit of 300 concurrent loads for multiple loads. Best Practices When specifying external tables, note the following: An external table reference can appear as the source table of a SELECT FROM statement. Note that a transient external table reference in a SELECT FROM clause infers its shape from the preceding INSERT INTO clause. Table 2-11: Timestamp Description timestamp <date> <time> 2-14 20525-1 Rev. 1 Netezza Data Loading Guide In Netezza Release 4.6 and later, the system catalog datatypes TEXT and NAME are treated as NVARCHAR. If these types are used in the table that is referenced in the select_clause, include the encoding option in the CREATE EXTERNAL TABLE command to specify internal encoding. Otherwise you could receive the error “LATIN9 encoding cannot be specified with NCHAR/NVARCHAR column definitions.” For example: create external table '/tmp/ext1' using (encoding 'internal') as select username from _t_user; The CREATE EXTERNAL TABLE AS statement supports an optional table name. If you do not provide a table name, the table is transient, which means the external table definition does not persist in the system catalog. If you supply a table name, the external table becomes a named object in the system catalog. The USING clause in the inline external statement is optional. If you omit it, the resulting external table has the default settings. Note that you must specify the USING clause in the CREATE EXTERNAL TABLE SAMEAS statement, because the SAMEAS table might be another external table. When you insert data into an external table that references an existing data file, the system truncates the file before inserting the external table’s data. You cannot use external tables in complex SQL statements. If the statement is not supported, the system displays an error. Before you reload an external table, verify that the destination table in the database is empty or that it does not already contain the rows in the external table that you are about to reload. If the destination table already contains the rows contained in the external table, unintended problems may occur. These problems could also occur if you accidentally reload the external table more than once. For example, loading a text-format external table into a destination table that already contains the same data creates duplicate data in the database. The rows will have unique row IDs, but the data will be duplicated. To fix this problem, you would have to delete the duplicate rows or truncate the database table and reload the external table again (but only once). If you load a compressed binary format external table into a destination table that already has the same rows, you will create duplicate rows with duplicate row IDs in the database table. The system restores the rows using the same row IDs saved in the compressed binary format file. Duplicate row IDs can cause incorrect query results and could lead to problems in the database. You can check for duplicate rowIDs using the rowid keyword as follows: SELECT rowid FROM employee_table GROUP BY rowid HAVING count(rowid) >1; If the query returns multiple rows that share the same row ID, truncate the database table and reload the external table (but only once). After you load data from an external table into a user table, you should run GENERATE STATISTICS to update the statistics for the user table. This improves the performance of queries that run against that table. 20525-1 Rev. 1 2-15 Examples Examples The following examples show how to use the CREATE EXTERNAL TABLE command. To create an external table, enter: CREATE EXTERNAL TABLE ext_orders(ord_num INT, ord_dt TIMESTAMP)USING(dataobject('/tmp/order.tbl') DELIMITER '|'); To create an external table that uses column definitions from an existing table, enter: CREATE EXTERNAL TABLE demo_ext SAMEAS emp USING (dataobject ('/tmp/demo.out') DELIMITER '|'); To create an external table and specify the escape character (‘\’), enter: CREATE EXTERNAL TABLE extemp SAMEAS emp USING( dataobject ('/tmp/extemp.dat') DELIMITER '|' escapechar '\'); To unload data from your database into a file by using an insert statement, enter: INSERT INTO demo_ext SELECT * FROM weather; To drop an external table, enter: DROP TABLE extemp The system removes only the external table’s schema information from the system catalog. The file defined in the dataobject option remains unaffected in the filesystem. To back up by creating an external table, enter: CREATE EXTERNAL TABLE '/path/extfile' USING (FORMAT 'internal' COMPRESS true) AS SELECT * FROM source_table; To restore from an external table, enter: INSERT INTO t_desttbl SELECT * FROM EXTERNAL'/path/extfile' USING(FORMAT 'internal' COMPRESS true); Transient External Table The following examples show how to specify the shape of a transient external table: To take on the schema of the target table: insert into <table> select * from external '<file>' [USING(...)] To take on the schema of the query: create external table '<file>' [USING (...)] as <QUERY> To take on the schema of <table>: select * from external '<file>' sameas <table> [USING(...)] To take on the schema as defined: select * from external '<file>' (schema) [USING(...)] To take on the schema as defined: create external table '<file>' (schema) [USING(...)] To make the source file FIXED format with the schema as defined: select * from external '<file>' (schema) USING (FORMAT 'FIXED' LAYOUT (...)) 2-16 20525-1 Rev. 1 Netezza Data Loading Guide To make the source file FIXED format and the table takes on the schema of the target table: insert into <table> select * from external '<file>' USING (FORMAT 'FIXED' LAYOUT (...)) The following example will not work, because you cannot unload data into a FIXED format external table: create external table '<file>' [(schema)] USING (FORMAT 'FIXED' LAYOUT ... ) Fixed-Length Format The following examples show how to use Fixed-Length format with external tables: To load data in fixed format, enter: INSERT INTO t SELECT * FROM EXTERNAL ‘/data/fixed’ USING ( FORMAT ‘FIXED’ LAYOUT (BYTES 20, REF BYTES 3, BYTES @2) ) To load data with different date/time delimiters for different zones, enter: INSERT INTO t SELECT * FROM EXTERNAL ‘/data/fixed’ USING ( FORMAT ‘FIXED’ LAYOUT ( YMD ‘-‘ BYTES 15, DMY ‘/’ BYTES 15 ) ) To load spatial data (binary data into VARCHAR), enter: INSERT INTO t SELECT * FROM EXTERNAL ‘/data/fixed’ USING ( FORMAT ‘FIXED’ CTRLCHARS true LAYOUT ( BYTES 100, REF BYTES 4, BYTES @2) ) To load fixed format data with record-length and no record-delimiter, enter: INSERT INTO t SELECT * FROM EXTERNAL ‘/data/fixed’ USING ( FORMAT ‘FIXED’ RECORDDELIM ‘’ RECORDLENGTH @1 LAYOUT( REF BYTES 2, BYTES 120, REF BYTES 2, BYTES @3) ) To load data with different NULLIF clauses for different zones, enter: INSERT INTO t SELECT * FROM EXTERNAL ‘/data/fixed’ USING ( FORMAT ‘FIXED’ LAYOUT ( BYTES 15 NULLIF ‘2000-10-10’, BYTES 2 NULLIF & = ‘12’) ) To load data with NULLIF clauses referring other zones, enter: INSERT INTO t SELECT * FROM EXTERNAL ‘/data/fixed’ USING ( FORMAT ‘FIXED’ LAYOUT ( REF BYTES 2, BYTES @1 NULLIF @1 = -1, REF BYTES 4, BYTES 100 NULLIF &&3 = ‘null’ ) ) Standard Unloading and Reloading The following examples unload and load a user table to an external table in text-delimited format. Unloading is not supported for Fixed-Length format. To create a text-format external table, enter: CREATE EXTERNAL TABLE extemp SAMEAS emp USING (DATAOBJECT ('/tmp/emp.dat')); To unload data in user table EMP to the external table EXTEMP, enter: INSERT INTO extemp SELECT * FROM emp; To load data into user table EMP from external table EXTEMP, enter: 20525-1 Rev. 1 2-17 Examples TRUNCATE TABLE emp; INSERT INTO emp SELECT * FROM extemp; Back up and Restore a User Table The following examples show how to back up and restore the user table EMP to an external table in binary compressed format. To create a compressed binary format external table definition called emp_backup for the table emp, enter: CREATE EXTERNAL TABLE emp_backup SAMEAS emp USING ( DATAOBJECT ('/tmp/emp.bck') COMPRESS true FORMAT 'internal'); To back up the emp table data into emp_backup, enter: INSERT INTO emp_backup SELECT * FROM emp; To restore the emp table from emp_backup, make sure that the emp table is empty and enter: TRUNCATE TABLE emp; INSERT INTO emp SELECT * FROM emp_backup; 2-18 20525-1 Rev. 1 Netezza Data Loading Guide 3-1 C H A P T E R 3 External Table Options What’s in this chapter Options Option Details Option Processing Session Variables This chapter describes the options used with external tables. For examples of how to use external tables, see Appendix A, “Examples and Grammar.” Options When you create an external table definition, you can specify options. There are different types of options: some are for records/rows, some are for fields, and some are for loads. Use these options when loading from an external table or when using the external table directly in a SQL query. Note: The best method to verify that the load processing has been successful is to ensure the system records any errors to the nzlog and nzbad files. Check these files occasionally. Table 3-1 lists the external table options, and a description of each option follows. In the Valid Formats column, Text refers to Text-Delimited format and Fixed refers to Fixed-Length format. In the Data type column, enumeration refers to the system accepting a specified set of quoted or unquoted string values. Table 3-1: External Table Options Option Valid Formats Values Default Unload Y/N Data Type BoolStyle Text, Fixed 1_0/T_F/Y_N… NULL, 1_0 Y enumeration Compress Text, Fixed True/False False Y boolean CRinString Text, Fixed True/False NULL, False Y boolean 3-2 20525-1 Rev. 1 Netezza Data Loading Guide CtrlChars Text, Fixed True/False NULL, False N boolean DataObject Text, Fixed Existing file path No default Y filename DateDelim Text, Fixed 1-byte NULL, "-" Y string DateStyle Text, Fixed YMD/MDY/DMY… NULL, YMD Y enumeration DecimalDelim Text, Fixed 1-byte ‘.’ Y string Delimiter Text 1-byte NULL, "|" Y string Encoding Text Inter- nal/Latin9/Utf8 NULL, Internal Y enumeration EscapeChar Text 1-byte NULL Y string FillRecord Text True/False NULL, False N boolean Format Text, Fixed Text/Inter- nal/Fixed Text Y enumeration IgnoreZero Text True/False NULL, False N boolean IncludeZero- Seconds Text True/False NULL, False Y boolean Layout Text, Fixed Zone definitions NULL, Inherit N none LogDir Text, Fixed existing dir path NULL, /tmp N string MaxErrors Text, Fixed >=0 NULL,1 N integer MaxRows Text, Fixed >=0 NULL, 0 N integer NullValue Text, Fixed 4-bytes NULL, "NULL" Y string QuotedValue Text No/Yes/Sin- gle/Double NULL, No N enumeration Table 3-1: External Table Options Option Valid Formats Values Default Unload Y/N Data Type 20525-1 Rev. 1 3-3 Option Details Option Details The following sections details the different options. BoolStyle Specifies the boolean style. During a load, the loader can handle only a specific style of boolean values. Table 3-2 lists the styles and their values. RecordDelim Text, Fixed 4-bytes NULL, /newline N string RecordLength Fixed Integer/Zone-ref expr NULL N integer RemoteSource Text, Fixed ODBC/JDBC NULL Y enumeration RequireQuotes Text True/False NULL, False N boolean SkipRows Text, Fixed >=0 NULL, 0 N bigint SocketBufSize Text, Fixed 64KB-2GB 8MB Y integer TimeDelim Text, Fixed 1-byte NULL, ":" Y string TimeRound Nanos TimeExtraZeros Text True/False NULL, False N boolean TimeStyle Text, Fixed 24hour/12hour NULL, 24hour Y enumeration TruncString Text True/False NULL, False N boolean Y2Base Text, Fixed >=0 NULL, 0 N integer Table 3-1: External Table Options Option Valid Formats Values Default Unload Y/N Data Type Table 3-2: Boolean Values Style Name Value 1_0 1 or 0 3-4 20525-1 Rev. 1 Netezza Data Loading Guide The default style is 1_0. The values can be expressed in mixed case, so ‘true’ can be ‘True’ or ‘TRUE’ or ‘tRuE’. If you specify the YES_NO option on the command line, the system assumes that the data in the Boolean field is in the form yes or no. If the data is any of the other values: true, false, 1, 0, t, f, y, or n, the system discards the record to the nzbad file and logs an error with the record number in the nzlog file. Compress Specifies whether the source datafile data is compressed or not. The valid values are true or on, false or off. The default is false. This can only be true if the format is set to ‘internal’. CRinString Specifies whether to allow unescaped carriage returns in char/varchar and nchar/nvarchar fields. Acceptable values are true or false, on or off. Do not put quotes around the value. False – Default, treats all cr or crlf as end-of-record. True – Accepts unescaped CR in char/varchar fields (LF becomes only end of row). Note: This option is different for Fixed-Length format. For more information, see “Changed Options” on page 6-3. CtrlChars Specifies whether to allow an ASCII value 1-31 in char/varchar and nchar/nvarchar fields. You must escape NULL, CR, and LF characters. Acceptable values are: true or false, on or off. The default is false. Do not insert quotes around the value. Note: This option is different for Fixed-Length format. For more information, see “Changed Options” on page 6-3. DataObject Specifies the OS-path to the source datafile (or any media that can be treated as a file). There is no default, and this must be specified. When the remotesource option is not set (or set to empty string), this path has to be an absolute path and not a relative path. The filename must be a valid UTF-8 string. For loads, this file has to be an existing file with READ permission for the OS user initiating the load. T_F T or F Y_N Y or N YES_NO YES or NO TRUE_FALSE TRUE or FALSE Table 3-2: Boolean Values Style Name Value 20525-1 Rev. 1 3-5 Option Details For unloads, the parent directory of this file has to have READ-WRITE permissions for the OS user initiating the unload, and the data file is overwritten if it already exists. DateDelim Specifies the delimiter character that separates the date components, and used with the dateStyle option. The default is ‘-‘ for all dateStyles except MONDY[2], where the default is ‘ ‘ (space). This is a single-byte string. If you specify the option as an empty string, which means that there is no delimiter between the date components, you must specify days and months as two-digit numbers. Single-digit months and days are not supported. With MonDY or MonDY2, the default dateDelim option is space. With days and months less than 10, use either one or two digits, or a space followed by a single digit. With the dateDelim option as a space, the system allows a comma after the day. With any component (day, month, year) as zero, or any day/month inconsistency, such as August 32 or February 30, the system returns an error. Table 3-3 lists dateDelim option examples. Note: If not using delimiters, the date will be determined as in the following example for June 12, 2009: 06122009 DateStyle Specifies how to interpret the date format. The date style settings ‘YMD’, ‘MDY’, ‘DMY’, ‘DMONY’, ‘MONDY’. The default is YMD. Note: The two-digit year formats (Y2MD, MDY2, DMY2, DMONY2 and MONDY2) are not supported for unloads. The dateStyle options are shown in Table 3-4. Table 3-3: The -dateDelim No dateDelim -dateDelim ’,’ -dateDelim ’ ’ (space) Jan 01 2003 Jan 01,2003 Jan 01, 2003 Jan 1 2003 Jan 1,2003 Jan 1, 2003 Jan 1 2003 Jan 1,2003 Jan 1, 2003 Table 3-4: DateStyle Sequence of Date Components Four-digit Year Two-digit Year Year Numeric-month Day YMD Y2MD Day Numeric-month Year DMY DMY2 3-6 20525-1 Rev. 1 Netezza Data Loading Guide Note: Two-digit year formats are not supported for unloads. The default dateStyle is YMD, and the SQL standard stipulates that the legal years are 0001 to 9999. There is no provision in SQL for years prior to 0001 AD or later than 9999 AD. Date example: In the data file jan-01.data, the data are specified as the following: 14255932|30/06/2002|20238|20127|40662|157| Because the date value is using the DD/MM/YYYY format, specify the following dateStyle and dateDelim options: nzload -t agg_month -df jan-01.data -delim ‘|’ -dateStyle DMY -datedelim '/' DecimalDelim Specifies the decimal delimiter for the following data types, for both text-delimited and fixed-length formats: float, double, numeric, time, timetz, and timestamp. Default is ‘.’. For examples of usage, see “Decimal Delimiter Examples” on page A-4. Delimiter Specifies the field delimiter. The default is the pipe character ‘|’. You can specify characters in the 7-bit ASCII range using either a quoted value (for example: delimiter '|') or by its unquoted decimal number (delimiter 124) . To specify a byte value above 127, use the decimal number. This is a single-byte string. Note: For nzload, the default is ‘\t’(tab). Note: This option is not supported for Fixed-Length format. The system processes an input row by identifying the successive fields within that row. A single character field delimiter separates adjacent fields. The lack of a field delimiter between fields is an error. You can use a trailing field delimiter following the last field in a row (but it is not required). You can specify the following delimiters: Numeric – 0xNN or NN where NN is a number for either hexadecimal or decimal. Control characters – ^A -^Z (low-order 5 bits) and ^a -^z (low-order 5 bits). Symbols – \b backspace (8), \t horizontal tab (9), \n line feed (10), \f form feed (12), \r carriage return (13), \\ backslash, \’ single quote, \” double quote. Literal – Any character, such as c (the non-control character c). Numeric-month Day Year MDY MDY2 Alphabetic-month Day Year MonDY MonDY2 Day Alpha- betic-month Year DMonY DMonY2 Table 3-4: DateStyle Sequence of Date Components Four-digit Year Two-digit Year 20525-1 Rev. 1 3-7 Option Details To use a character other than a 7-bit-ASCII character as a delimiter, make sure that you specify it as a decimal or hex number. Do not specify a character literal, which could result in errors from encoding transformation. For example, to use the hex value 0xe9 as a delimiter (which is é in Latin9), use –d 0xe9 as the value. Do not use –d 'é'. Although the system accepts alpha-numeric characters, to avoid ambiguity do not select a delimiter that conflicts with the data in a field. Also if you use the dateDelim and timeDe- lim options, select different delimiters for each type. Note: When you are using the nzload wrapper you can enter escape characters on the command line, such a \b. If you use the CREATE EXTERNAL TABLE command, the only special character you can specify is \t (“\t”). Encoding Specifies the encoding of the datafile for the character set. The default is ‘internal’. You can also specify ‘utf8’ if the whole file is in UTF-8 encoding and has only nchar/nvarchar data and no char/varchar data. Use ‘internal’ if the file could have both Latin-9 and UTF-8 data– or either type– using char, varchar, nchar, or nvarchar data. The system supports single-byte characters in Latin9 encoding, and Unicode data in the multi-byte UTF-8 encoding. Use the encoding option to specify the type of data in the file. The encoding option has three values: A value of ‘latin9’ indicates that the whole file is in Latin-9 char/varchar data and has no nchar/nvarchar data. (If the file contains any nchar/nvarchar data, it will be rejected by the load operation.) A value of ‘utf8’ indicates the whole file is in UTF-8 encoding and has only nchar/nvarchar data and no char/varchar data. (If the file contains any char/varchar data, it will be rejected by the load operation.) The value ‘internal’ indicates that the file could have either or both Latin-9 and UTF-8 data using any or all of the char, varchar, nchar, or nvarchar data types. As a best prac- tice, use ‘internal’ if you are not certain of the data encoding. For more information, see the “Using International Character Sets” chapter in the Netezza Performance Server Database User’s Guide. Use the nzconvert command to convert character encoding before loading with external tables. For the command options and examples, refer to “Converting Legacy Formats” in the Netezza Performance Server Database User’s Guide. Note: This option is not supported for Fixed-Length format. EscapeChar Specifies the use of an escape character. The character immediately following the ‘\’ is escaped. The only supported value is ‘\’, and the default is no escaping. By default, the system expects fields to be delimited by a field-delimiter character or by an end-of-row sequence. The system assumes all other characters are part of the field’s value. Although efficient, this representation has the drawback that string fields may not contain instances of the field delimiters. In addition, one value typically becomes inexpressible because you have used it to convey the absence of any value (that is, that column is null). 3-8 20525-1 Rev. 1 Netezza Data Loading Guide One solution is to use an escape character for the delimiter. For example, the following command line demonstrates using the escapeChar option. nzload -escapeChar ‘\’ -nullValue ‘NULL’ -delim ‘|’ |NULL| – A null input field |\NULL| – A non-null input field containing the text NULL |\|| – A non-null input field containing the single character | |\\| – A non-null input field containing the single character \ Note: This option is not supported for Fixed-Length format. FillRecord Specifies whether to allow an input line with fewer columns than the table definition. Miss- ing or trailing input fields should be treated as nulls if the columns are nullable. The default is false. The system expects one input field for every column in the target table’s schema, and rejects a row with fewer fields. If you specify the fillRecord option, the system allows omit- ting one or more trailing (rightmost) fields, as long as all corresponding columns can be null. Note: This option is not supported for Fixed-Length format. Format Specifies the data format of the source file to load and unload. The valid values are as follows: ‘text’ (default) – Data in Text-Delimited format ‘fixed’– Data in new Fixed-Length format ‘internal’ – Data in compressed binary format (to use this, the compress option must be set to true) IgnoreZero Specifies discarding byte value zero in char() and varchar() fields. The default is false. If true, the command accepts binary value zeroes in input fields and discards them. Note: This option is not supported for Fixed-Length format. IncludeZeroSeconds Specifies that “00” seconds values will be unloaded to the external table. For example, a time value such as 12:34:00 or 12:34 will be unloaded to the external table in the format 12:34:00. The default is false. Note: This option is not supported for Fixed-Length format, and is only for unloading. Layout Specifies the zone definitions. 20525-1 Rev. 1 3-9 Option Details Note: This option is used only with the Fixed-Length format. For more information, see “New Options” on page 6-2. LogDir Specifies the directory to which nzlog and nzbad files are generated for loads. This is not used for unloads. The default value is '/tmp'. Note that when doing remote loads from Win- dows clients (through ODBC/JDBC), the default output directory is mapped to "C:\". The directory name must be a valid UTF-8 string. MaxErrors Specifies the number of errors at which the system stops processing rows. If the count of rejected rows reaches this threshold, the system immediately aborts and rolls back the load. The default value is 1. This default has the effect of committing a load only if it contains no errors. A maxErrors value n (where n is greater than 1) allows the first n-1 row rejections to be recoverable errors, not including the number of rows processed in the skipped row range. Use this option to specify a different value, from 0 (unlimited errors) up to 2,147,483,647 (the largest signed 32-bit integer). Note: This option is different for Fixed-Length format. For more information, see “Changed Options” on page 6-3. MaxRows Specifies to stop processing after this initial number of rows. Use a limit clause with the select statement to limit loading data. The default is 0 (load all rows). After processing a row (whether inserted, skipped or rejected), the system decides whether to look for another input row: If you did not specify the maxRows option, the system attempts to locate the next input row. If you specified the maxRows option and the input row counter is equal to the maxRows count, the system ends the load and commits all inserted records, not including the rows processed in the skipped row range. Otherwise, the system attempts to locate the next input row. NullValue Specifies the string to use for the null value, with a maximum 4-byte UTF-8 string. The default is ‘NULL’. You can specify a value such as a space (' ') or any string up to four characters. Conceptually a field contains either a value or an indication that there is no value. The system provides some flexibility in how you indicate that a field contains no value. For more information about how the system handles nulls, see “Column Constraint Rules for Empty Strings” on page 2-10. The system determines a field’s type and whether it is null by inspecting the corresponding column declaration: If there is no value, the system sets the corresponding value in the candidate binary record to null. 3-10 20525-1 Rev. 1 Netezza Data Loading Guide If you declared the target column “not null,” then an absence of a value is an error. If a field does not indicate null, the system assumes it contains a value. The system analyzes the contents of that field, converts its textual input representation to binary, and sets the corresponding value in the candidate binary record to that value. QuotedValue Specifies whether data values are quoted or not. The default is false. Specify SINGLE or YES to require single quotes or DOUBLE to require double quotation marks. You can precede the opening quote or follow the closing quote with spaces. You can use the actual quote characters if you enclose them in double quotes. The system recognizes the end of the field by a field-delimiter character or an end-of-row sequence. The system recognizes a quoted value when the first non-space character is the quote character specified in the quotedValue option. If the first non-space character is not the specified quote character, then the system handles it according to the normal rules. In particular, leading or trailing spaces in string fields are considered part of the string’s value. For example, the following command line demonstrates using the quotedValue option. nzload -quotedValue SINGLE -nullValue ‘NULL’ -delim ‘|’ |NULL| – A null input field |’NULL’| – A null input field | I’m | – A non-null input field containing the text “I’m “ | ‘I’’m’ | – A non-null input field containing the text “I’m“ | ‘|’ | – A non-null input filed containing the single character “|” |’ ‘| – A non-null input field containing a single space | | – A non-null input field containing a single space | ‘‘ | – A non-null input field containing a zero-length string || – A non-null input field containing a zero-length string Note that unlike the escapeChar option, the quotedValue option is not able to force the system to accept the nullValue token as a valid non-null input value. The system overhead for processing quoted value syntax is much greater than the default unquoted syntax. In addition, except for strings containing three or more field delimiters that need to be escaped and no embedded quotes, using the quotedValue option results in more bytes of input data than the escapeChar option. When you have a choice, use unquoted syntax. If you expect all values in all input fields (string or otherwise) to be uniformly enclosed in quotes, then use the requireQuotes option to cause the system to enforce this usage. Using the requireQuotes option improves the parsing overhead and provides extra robustness. Note: This option is not supported for Fixed-Length format. RecordDelim Specifies that the row/record delimiter to be used is the string literal. Valid values must be a maximum 8-byte UTF-8 string. Note: This option is used only with the Fixed-Length format. For more information, see “New Options” on page 6-2. 20525-1 Rev. 1 3-11 Option Details RecordLength Specifies the length of the entire record. Includes the length itself, but does not include the RecordDelimiter. Note: This option is used only with the Fixed-Length format. For more information, see “New Options” on page 6-2. RemoteSource Specifies the source datafile is remote, and takes the following values: ODBC, JDBC or empty string. External tables created with the remote source set to ODBC or JDBC are usable only through ODBC or JDBC respectively. External tables created with the remote source not set (or set to empty string) are usable from any client (the source datafile path is assumed to be on the Netezza host, even if the load/unload is initiated remotely from a different host). Note that nzsql does not support remote loads/unloads using external tables (you can only create external tables remotely), though it does support loads/unloads locally on the host. This option is automatically set to ODBC if the hostname option is set to anything but localhost or the reserved IP address (127.0.0.1). RequireQuotes Specifies if quotes are mandatory. The default is false. If set to true, the quoted value must be set to YES, SINGLE, or DOUBLE. See “QuotedValue” on page 3-10. Note: This option is not supported for Fixed-Length format. SkipRows Specifies the number of initial rows to skip before loading the data. The default is 0 (none). After the system has a candidate binary record from an input row, it determines whether to insert that record into the target table: If you did not specify this option, the system inserts every record. If you specified this option and the input row counter is less than or equal to the skipRows count, the system discards the candidate binary record (skipped). Otherwise, the system inserts the record. Note: If you use the skipRows option, the system skips that number of rows, and then begins the count for the maxErrors and/or maxRows options (if you have specified them). Note that this cannot be used for 'header' row processing in a datafile, as even the skipped rows are processed first, so the data in the header rows should be valid with respect to the external table definition. This option can be used for doing a dry-run to validate the datafile is correct, before loading into a user table, by setting a maximum value. 3-12 20525-1 Rev. 1 Netezza Data Loading Guide SocketBufSize Specifies the chunk size at which to read the data from the source file, expressed in bytes. Valid values range from 64KB to 800MB, with a default value of 8MB. Values outside this range result in a system notice that the value will be reset to the appropriate minimum or maximum level. This is used to fine-tune the performance of loads, depending on the speed at which the source data is available for loads. TimeDelim Specifies the single-byte character that separates the time components. The default is ':'. If you specify the timeDelim option as an empty string, you must specify the hour, min- utes, and optional seconds as two-digit numbers. If you specify the 12-hour format, you can precede the AM or PM token with a single space. Note that the tokens, AM and PM are case-insensitive. The system checks syntax and range errors. If an error occurs, the system discards the record to the nzbad file and logs an error with the record number in nzlog file. TimeRoundNanos Rounds the time value to six fractional seconds digits. You can use the timeRoundNanos option to specify allowing but rounding non-zero digits with smaller than microsecond precision. If you do not use the timeRoundNanos option, a value is accepted, as long as it can be stored without loss of precision. If you specify this option, the value is accepted, even when full precision of any fractional seconds cannot be preserved. In this case, the value is rounded. For example, consider the following timestamps: 1999/12/31 23:59:59.9999994 1999/12/31 23:59:59.9999995 Both of these timestamps specify finer than microsecond resolution. Without the option, each would be rejected. Using the option, the first sample timestamp would round to: 1999/12/31 23:59:59.999999 The second sample would round to: 2000/01/01 00:00:00.0 Note: This option is not supported for Fixed-Length format, and is also referred to as the TimeExtraZeros option. TimeStyle Specifies the time format (‘24HOUR’, ‘12HOUR’) used in the data file. The default is ‘24HOUR’. TruncString Specifies truncating a string and inserting it into the declared storage. False – Default, the system reports an error when a string exceeds its declared storage. 20525-1 Rev. 1 3-13 Option Processing True – Truncate any string value that exceeds its declared char/varchar storage. Note: This option is not supported for Fixed-Length format. Y2Base If you specify the Y2-style date, use the -y2Base option to specify the start of the 100-year range. Table 3-5 provides some examples of date ranges and their corresponding input values. Option Processing This section contains additional information on how the system processes the options. Counting Rows The system uses a line-oriented input format – one line of text is an input row. It operates by isolating successive rows in the input stream. Every time it finds a new row, it increments a row counter (starting with number 1) and analyzes the contents of the row. During analysis two sorts of errors can occur: The input text may not match the expected format. A field value might fail to meet a requirement imposed by the target table schema. If a row contains no errors, the system converts the row into a candidate binary record. Table 3-5: The -y2Base Option Desired Range 1900…1999 1923…2022 1976…2075 2000…2999 Option -y2Base 1900 -y2Base 1923 -y2Base 1976 -y2Base 2000 In Y2 input 00 1900 2000 2000 2000 01 1901 2001 2001 2001 02 1902 2002 2002 2002 … 24 1924 1924 2024 2024 25 1925 1925 2025 2025 … 76 1976 1976 1976 2076 77 1977 1977 1977 2077 … 98 1998 1998 1998 2098 99 1999 1999 1999 2099 3-14 20525-1 Rev. 1 Netezza Data Loading Guide Handling Bad Rows When the system encounters an error, it stops analyzing the row, appends the row to the bad rows file, writes a supporting diagnostic message to the nzlog file describing the position and nature of the error, and increments a rejected rows counter. Delineating Input Rows Input rows are separated by any of the common end-of-line conventions: <CR><LF>, <LF><CR>, <CR>, or <LF>. In UNIX environments <LF> is commonly known as NewLine. The last row/line need not have an end-of-line character. Neither of the pairs <CR><CR> nor <LF><LF> is a valid end-of-line sequence. Instead each pair encloses an empty row containing no values. The system considers such an empty row valid only if you specified the fillRecord option, and you specified that every column in the target tables is capable of being set to null. Matching Input Fields to Table Columns The system determines the shape of input rows by inspecting the target table’s schema. The fields are paired-up left-to-right with the columns in the target schema. Once the system has located the start of a field, the declared type of the corresponding target column guides further processing. Note: It is an error for a row to contain more fields than the target table contains columns. Using String and Non-string Fields If an input field corresponds to a column declared char, nchar, varchar, or nvarchar, the system considers it a string field, with all other types as non-string fields. This distinction is important because spaces are significant within string fields, but not elsewhere. Note: An empty field or a field containing only spaces can represent a legitimate string value, but can never be a legitimate non-string value. The system uses the following rules based on whether the field is a string field: If the field is a string field – All characters from the beginning of the field to the termi- nating delimiter or end of row sequence contribute to the field’s value. If the field is a non-string field – The system skips any leading spaces, interprets or converts the field’s contents, and skips any trailing spaces. The string/non-string distinction also affects the details of how a field indicates that it is null. For more information, see “Handling the Absence of a Value” on page 3-14. Handling the Absence of a Value In SQL, a record must include a value if a column is declared not null. When a record contains no value for a column, the column is considered to be null. The system provides an explicit and implicit method for conveying nullness. The explicit method includes a specific token in the field instead of a value. By default, this token is the word “null” (case insensitive). You can use the nullValue option to change this token to any other 1-4 character alphabetic token. You can precede or follow an occurrence of the explicit null token in a non-string field with adjacent spaces. For the system to recognize an explicit null token in a string field, the 20525-1 Rev. 1 3-15 Option Processing token cannot have preceding or trailing adjacent spaces. The explicit null token method makes it impossible to express a string consisting of exactly the text of the null token. The implicit method interprets an empty field as null. This method is always available to non-string fields independent of any nullValue option setting and works even if the non-string field contains spaces. You can use the implicit method on string fields only if you have set the nullValue option to the empty string (''). The system considers a string field empty (potentially null) only if it contains truly zero characters (no spaces). Setting nullValue to the empty string makes it impossible to set any character varying (alias varchar(n)) column to an empty, zero-length string. In other words, if the system encounters an empty string and the nullValue is set to '', then the system treats the empty string as a null value. Enabling Load Continuation If you enable load continuation with the allowReplay option, or set the session variable LOAD_REPLAY_REGION to true, the system ensures that a simple load using external tables has the ability to continue after the system has been paused and resumed. You do not have to abort and resubmit the load. If no value is specified for the allowReplay option, or n is 0, the system defaults to the postgres default setting. If n is a valid non-zero number, it specifies the number of allowable query restarts. The system accomplishes this automatic resumption by holding records to be sent to the SPU in the replay region in host memory. After the system sends the data in this region to the SPUs, it does a partial commit that forces all the unwritten data to the SPUs’ disks and allows the system to re-use the reload region’s data buffers. If an SPU reboots or resets, the system rollbacks to the last partial commit, and reprocesses and resends the data. Note: Setting this option has a performance impact which depends on the speed of the incoming data. In addition, system memory is used for the data buffering that enables loads to be continued. When the buffer memory is exhausted, new loads will pend until needed memory becomes available. Load continuation cannot operate on any table that has one or more materialized views in an active state. Before enabling load continuation, suspend the associated materialized views. You can suspend active materialized views either through the NzAdmin tool or by issuing the ALTER VIEWS command. Sample syntax for ALTER VIEWS follows. ALTER VIEWS ON <table> MATERIALIZE SUSPEND Once loading has completed, you can update and activate the materialized views for the table. Sample syntax follows. ALTER VIEWS ON <table> MATERIALIZE REFRESH For more information, see the Netezza Performance Server System Administrator’s Guide. Handling Legal Characters Input is composed of the printing characters (bytes 33-255), space (byte 32), horizontal tab (byte 9), line feed (byte 10) and carriage return (byte 13). By default you cannot use the nonprinting control characters. Specify the ctrlChars option to permit control characters (bytes 1-8, 11-12, and 14-31) to appear within strings. In this case, only 0, 10, and 13 are not allowed. 3-16 20525-1 Rev. 1 Netezza Data Loading Guide Specify the crInString option to permit unescaped carriage returns (cr) in char/varchar fields. If you specify the crlnString option, line feed (LF) becomes the default end-of-row indicator. Specify the escapeChar option to permit any character preceded with a backslash (\) to be interpreted as an escape character. In this way, you could use the zero (byte 0), line feed (byte 10), carriage return (byte 13), or the closing delimiter. Specify the ignoreZero option to cause the system to check every character for zero. This causes the system to skip over each zero it finds and to consider the next character. If you specify this option, you cannot include a zero byte in a string. For example, assume <nul> is a null byte, the field delimiter is '|' and you have specified ignoreZero: ..|<nul>AB<nul>CDEF<nul>|.. fills a char(6) column with 'ABCDEF'. ..|<nul>127<nul>|.. fills a byteint column with binary 01111111 (= 0x7F). Table 3-6 lists the end-of-row and control characters that are permitted with the different nzload system options. The mark indicates that the option is specified or allowed. Note: In Fixed-Length format, control characters are treated differently. For more information, see Chapter 6, “Using Fixed-Length Format.” Session Variables The following session variables work as nzload options. LOAD_REPLAY_REGION – See “Enabling Load Continuation” on page 3-15. MAX_QUERY_RESTARTS – The number of restarts allowed for load continuation. See “Enabling Load Continuation” on page 3-15. LOAD_LOG_MAX_FILESIZE – The maximum allowed size in MB for the log file. Table 3-6: Control Characters and End of Record Characters Options End of Record Control Characters Allowed within Strings -crlnString -ctrlChars lf cr crlf lfcr 0 1-8 ht lf 11 12 cr 14-31 4-1 C H A P T E R 4 Using nzload What’s in this chapter How the nzload Command Works Using the nzload Command Configuration File Example This chapter describes the nzload command. Netezza SQL is the Netezza Structured Query Language (SQL), which runs on the Netezza data warehouse appliance. Throughout this document, the term SQL refers to Netezza’s SQL implementation. For nzload usage examples, see Appendix A, “Examples and Grammar.” How the nzload Command Works The nzload command is a SQL CLI client application that allows you to load data from the local host or a remote client, on all the supported client platforms. The nzload command processes command-line load options to send queries to the host to create an external table definition, run the insert/select query to load data, and when the load completes, drop the external table. The nzload command connects to a database with a user name and password, just like any other Netezza appliance client application. The user name specifies an account with a particular set of privileges, and the system uses this account to verify access. Note: While you can use the nzload command as an ODBC client application, it does not require nor does it work with Data Source Name (DSN). It bypasses the ODBC Driver Man- ager and connects directly to the Netezza ODBC driver. Protection and Privileges To run the nzload command, you must have the CREATE EXTERNAL TABLE privilege and access privileges to that table or database (LIST, INSERT, SELECT). For more information, see the Netezza Performance Server System Administrator’s Guide. Note: If you issue the nzload command from the Netezza appliance host itself, and the user issuing the command is not the user nz, you must do one of the following: Ensure that the user nz has READ permissions for the data file to load. Use the -host option with the nzload command (such as nzload -host <hostname>). 4-2 20525-1 Rev. 1 Netezza Data Loading Guide Concurrency and Transactions You can run multiple nzload jobs in parallel with each job adding records to the same tables. While loading, you can run concurrent queries, inserts, updates, and deletes against committed records in the target tables. The nzload command performs all insertions into the target table within a single transaction. The nzload command commits the transaction at the end of the job, provided it does not detect any fatal errors. Only after the nzload command has committed the transaction are the newly loaded records visible to other queries. When encountering a load error while running multiple concurrent loads, only the load with the error does not complete. While the nzload job is running, it sends records to the SPUs along with the current transaction ID. When an SPU receives new records, it immediately allocates resources and writes the records to the database or the table on the disk. If the nzload command cannot commit the transaction, these storage resources remain allocated. To free up this disk space, use the nzreclaim command on the specific table or database. For more information about the nzreclaim command, see the Netezza Perfor- mance Server System Administrator’s Guide. If you cancel an nzload job, the nzload command does not commit the transaction. Program Invocation The nzload command is a command-line program that accepts input values from multiple sources. The precedence order is the following: Command line Control file. Without a control file, you can only do one load at a time, and using a control file allows multiple loads. See “Using a Control File” on page 4-5. Environmental variables (only used for user, password, database, and host) Built-in defaults Option names are case insensitive. Every option has a standard name for use in either the command line or the control file. For more information about the input values, see Table 4-1 on page 4-3. Many options include a token argument, which you can enclose in either single or double quotes. The nzload command treats alphabetic characters in option token arguments as case-insensitive (for example -boolStyle YES_NO is equivalent to -boolStyle yes_no). Note: You must quote options that require a punctuation character as a token, and use an escape character if quotes are part of the argument. Using the nzload Command The nzload command takes options and arguments. You can accept the defaults or specify options on the command line, in the control file, or through environment variables. For a complete listing of all options, see Appendix C, “Option Names.” Syntax The nzload command uses the following syntax: nzload [-h|-rev] [options] 20525-1 Rev. 1 4-3 Using the nzload Command Inputs The nzload command uses many of the options for external tables, as detailed in Chapter 3, “External Table Options.” Particular options for nzload are shown in Table 4-1. Additional Options The nzload takes the following additional options: Table 4-1: The nzload Options Option Description -cf filename Specifies the control file. For more information, see “Using a Con- trol File” on page 4-5. -df filename Specifies the datafile to load. If you do not specify a path, the system uses the special token <stdin> to store the filepath string. Corresponds to the DataObject external table option. -lf filename Specifies the log file name. If the file exists, this appends to it. -bf filename Specifies the bad/rejected rows filename (overwrite if the file exists). -outputDir dir Specifies the output directory for the log and bad/rejected rows files. Corresponds to the LogDir external table option. -logFileSize n Session variable (LOAD_LOG_MAX_FILESIZE) that specifies the size (in MB) of the log and bad/rejected rows files. The default is 2000MB (2GB). -fileBufSize -fileBufByteSize Specifies the chunk size (MB for fileBufSize or bytes for fileBufBy- teSize) at which to read the data from the source file. Corresponds to the SocketBufSize external table option. -allowReplay -allowReplay n Session variables (LOAD_REPLAY_REGION and MAX_QUERY_ RESTARTS) that specify the number of query restarts for load continuation if a SPU has reset or failed over. If n is a valid non-zero number, it specifies the number of allowable query restarts. If no value is specified, or n is 0, the system defaults to the postgres default setting. Table 4-2: nzload Additional Options Option Description -u user Specifies the logon user name [NZ_USER]. -pw password Specifies the user’s password [NZ_PASSWORD]. -host name Specifies the hostname or IP address [NZ_HOST]. Runs on the local host if not specified here. If you set this to any name but localhost or any IP address but the reserved one (127.0.0.1), the system sets the remotesource option to ODBC. 4-4 20525-1 Rev. 1 Netezza Data Loading Guide Outputs The nzload command exits with the following codes: 0 – Successful, all input records were inserted. 1 – Failed, no records were inserted due to an error or errors found during the load. 2 – Successful, but errors found during the input did not exceed the error threshold (-maxErrors), good records were inserted. -caCertFile path Specifies the pathname of the root CA certificate file on the client system. This argument is used by Netezza clients who use peer authentication to verify the Netezza host system. The default value is NULL which skips the peer authentication process. -securityLevel level Specifies the security level that you want to use for the session. The argument has four values: • 0 – preferredUnsecured – This is the default value. Specify this option when you would prefer an unsecured connection, but you will accept a secured connection if the Netezza system requires one. • 1 – onlyUnsecured – Specify this option when you want an unsecured connection to the Netezza system. If the Netezza system requires a secured connection, the connection will be rejected. • 2 – preferredSecured – Specify this option when you want a secured connection to the Netezza system, but you will accept an unsecured connection if the Netezza system is configured to use only unsecured connections. • 3 – onlySecured – Specify this option when you want a secured connection to the Netezza system. If the Netezza system accepts only unsecured connections, or if you are attempting to connect to a Netezza system that is running a release prior to 4.5, the connection will be rejected. Note: If you specify an invalid value for the -securityLevel argument of the nzload command, the command defaults to the preferredUnsecured (0) level. -db database Specifies the database to load [NZ_DATABASE]. -t table Specifies the table name. You can specify a fully qualified name for this value. Table 4-2: nzload Additional Options Option Description 20525-1 Rev. 1 4-5 Using the nzload Command Using a Control File An nzload control file allows you to define load operations in a text file without having to specify the options on the nzload command line. You can also use control files to run multiple concurrent loads, with different options into one command instance. Each load is a different transaction, and in case of a rollback, any completed transactions are good. Options Within a control file, you can specify the following options: Any of the valid options for an external table. For more information, see Appendix C, “Option Names.” You can specify the long format name of the option or the short format name. Database – Specifies the name of the database to load. Table – Specifies the name of the table to load the data. Badfile (bf) – Specifies the name of the nzbad file, which contains any records which could not be loaded. The default is table.database.nzbad. Logfile (lf) – Specifies the name of the nzload log file, which contains messages and errors that occurred during the load processing. The default is table.database.nzlog. Datafile – Specifies the pathname of the file that you want to load into the specified table and database. The datafile option must be the first line of the control file, followed by list of control file options in curly braces {}. You can specify more than one datafile, each with its own set of options, in the control file. Decmial delimiter – Specifies to use a comma instead of a period as a decimal delimiter. The default delmiter is a period. The options in a control file are case-insensitive. For example, you could specify the option in letter formats such as database, DataBase, Database, or DATABASE. Note that command line options take precedence over any equivalent options specified in a control file. This allows you to override any control file options as necessary without changing the control file. If you specify a control file for the nzload command, you cannot specify a data file argument (-df) on the command line. Syntax The syntax for using a control file is as follows, where each sequence can be another load: DATAFILE <filename> { [<option name> <option value>]* } For example, the following control file options load the data from customer.dat into the customer table: DATAFILE /home/operation/data/customer.dat { Database dev TableName customer } 4-6 20525-1 Rev. 1 Netezza Data Loading Guide If you save the control file contents as a text file (named cust_control.txt in this example) you can specify it using the nzload command as follows: nzload -cf /home/nz/sample/cust_control.txt Load session of table 'CUSTOMER' completed successfully When you use the nzload command, note that you cannot specify both the -cf and -df options in the same command. You can load from a specified data file, or load from a control file, but not both in one command. The following control file options define two data sets to load. Note that the options can vary for each data set. DATAFILE /home/operation/data/customer.dat { Database dev TableName customer Delimiter '|' Logfile operation.log Badfile customer.bad } DATAFILE /home/imports/data/inventory.dat { Database dev TableName inventory Delimiter '#' Logfile importload.log Badfile inventory.bad } If you save these control file contents as a text file (named import_def.txt in this example) you can specify it using the nzload command as follows: nzload -cf /home/nz/sample/import_def.txt Load session of table 'CUSTOMER' completed successfully Load session of table 'INVENTORY' completed successfully Configuration File Example The following is an example of a fixed format configuration file. { outputdir /home/nzuser crinstring 'true' ctrlchars 'true' decimaldelim '.' format fixed recordlength 10 maxerrors 0 tablename refnull layout ( fld1 bool 1_0 bytes 1 , fld2 char(5) bytes 5 , fld3 char(4) bytes 4) } 5-1 C H A P T E R 5 Unloading Data What’s in this chapter Unloading Options Unloading Data to a Remote Client System This chapter describes the options for unloading data. For usage examples, see Appendix A, “Examples and Grammar.” Unloading Options The following external table options are not supported for unloads. For a complete list of external table options, see Chapter 3, “External Table Options.” CtrlChars FillRecord IgnoreZero Layout LogDir MaxErrors MaxRows QuotedValue RecordDelim RecordLength RequireQuotes SkipRows TimeRound Nanos/TimeExtraZeros TruncString Y2Base The IncludeZeroSeconds external table option is used only for unloads. The 2-digit format of the DateStyle external table option is not supported for unloads. 5-2 20525-1 Rev. 1 Netezza Data Loading Guide Unloading Data to a Remote Client System A special use of the CREATE EXTERNAL TABLE/INSERT INTO commands is to stream data from a Netezza database file on a Netezza host system to a remote client. This unload does not remove rows from the database, but rather stores the unloaded data in a flat file that is suitable for loading back into a Netezza database. You can unload data to any of the supported Netezza appliance clients, which include Win- dows, Linux, Solaris, AIX, and HP-UX (PA-RISC and Itanium). You can unload all data types (including Unicode) and file types (uncompressed and compressed formats). Note: You must be the admin user or have the Create External Table administration privilege to create an external table, and you must have permission to write to the data object’s path. Both the Netezza appliance host and remote client must be running Netezza release 3.1 or greater. Note: Unloading for Fixed-Length format is not supported. To unload to a remote client, do the following: 1. Establish an ODBC or JDBC connection between the client machine and the Netezza appliance host. For example on a Linux or UNIX client, type: >isql 2. Use the CREATE EXTERNAL TABLE command to create an external table. An example follows: CREATE EXTERNAL TABLE emp_backup SAMEAS emp USING ( DATAOBJECT ('/tmp/emp.dat') REMOTESOURCE 'ODBC'); INSERT INTO emp_backup SELECT * FROM emp; In the example, note that the DATAOBJECT file specification must be a valid file on the receiving machine. REMOTESOURCE must be either ODBC or JDBC. The ODBC/JDBC client must be connected with the corresponding Netezza appliance library. If you do not specify a remote source, the system unloads the data to a file on the Netezza appliance host. 3. To reload the data in the external table, you can use a SQL query such as: INSERT INTO emp SELECT * FROM emp_backup; Verify that emp is empty before you reload the data. 6-1 C H A P T E R 6 Using Fixed-Length Format What’s in this chapter Formatting Background Format Options Layout Definitions Building the Fixed-Length Format Definition This chapter describes the fixed-length format for loading data into external tables. Formatting Background All data is a series of byte-sequences and has an associated data type, used here as a con- ceptual or abstract attribute of the data. Without an associated data type, a byte-sequence can be interpreted in too many ways. A single data type can be represented in different forms. For example, an integer data type can be represented or stored in various types of binary format, or in human-readable text/character format (typically ASCII). Similarly, dates, times and other data types have multiple representations used by different programs, languages, and environments. At some point, though, these data types must be represented in readable form, so users can do something with the data. Data for loading into the data warehouse typically is presented in either delimited format or fixed-length format, using either ASCII or UTF-8. Fixed-Length Format Fixed-length format files use ordinal positions, which are offsets to identify where fields are within the record. There are no field delimiters, and there may be no end-of-record delimiter. Data in fixed-length format files seldom has decimal or time delimiters, as these are not necessary, and take up space. Because the fields are fixed in size, the location of delimiters are fixed, and can are specified in the layout definition, which accompanies the fixed-length format data file. Loading fixed format data into the database requires that you define the target data type for the field, as well as the location within the record. Not all fields in a fixed-length format file need to be loaded, and can be skipped using the ‘filler’ specification. The order of fields in the data file must match the order of the target table, or an external table definition must be defined, which specifies the order of the fields as database columns. Using an external table definition in combination with an insert-select statement allows field order to be changed. 6-2 20525-1 Rev. 1 Netezza Data Loading Guide Unknown or null values are typically represented by known data patterns, which are classi- fied as representing null. The Netezza system identifies and act on these values. Data Attributes The typical data attributes in fixed-length format files are as follows: Data Type – The data at a given offset in a record is always of the same type. Representation – The representation is constant, and each field has a fixed width. Data within a field is always presented in the same way. Certain items such as radix points, time separators, and date delimiters are always at the same place and are typically implied, rather than being actually present in the data file. Value – The value can be an actual value or a null indicator. Data representations which indicate a null value are specified by the layout definition. Assuming null is allowed. Length – There is no length specification within the data file, as length in the file is fixed, and the length attribute is specified by the layout definition. Null-ness – Null-ness is identified in the layout definition as either a specific data pattern, such as “all spaces” or as being “flagged” by a value in another column. Format Options For the fixed-length format, new options have been added, and some have been changed. New Options The following added external table options are valid only for the fixed-length format. RecordLength – The length of the entire record, including null-indicator bytes (if any) and excluding record-delimiter (if any). No default value Constant integer RecordDelim – The row/record delimiter. Default is ‘\n’ (new-line). Note that the field is literally interpreted, so ‘\n’ looks for those characters, and not ‘new-line’ The end-of-record delimiter is entered between single quotes. The end-of-record indicator can be up to a maximum 8 bytes long The omission of a record delimiter is defined by side-by side single quotes Layout – Mandatory for fixed-length format. Used to define the location of fields of the input record. No default value Comma separated zone definitions within braces 20525-1 Rev. 1 6-3 Format Options Changed Options The following external table options have a different meaning for the fixed-length format: Unsupported Options The following external table options are not supported for fixed-length format, and if set, result in an error: Encoding FillRecord IgnoreZero TimeExtraZeros TruncString AdjustDistZeroInt IncludeZeroSeconds Table 6-1: Changed Option Meanings Option Meaning CtrlChars Text-Delimited: If False (default), unescaped control characters (except \t) error out. Exception: If CtrlChars is False and CrInString is True, \r (carriage Return) can be used without error. If True, unescaped control characters \0 and \n error out (also \r if CrIn- String is False). Fixed-Length: If True, all unescaped characters allowed. If False (default), unescaped characters error out. Exceptions: \t, \n (and \r if CrInString is ON). CrInString Text-Delimited: Augments CtrlChars behaviors. Fixed-Length: Used only when CtrlChars is OFF. MaxErrors Sets the maximum number of allowed (non-fatal) errors before aborting the load. Since the parser now reports errors for each field or zone rather than just one error for the row, multiple errors can be reported for the same row, so this limit must be set accordingly. When the parser sees an error in a field/zone, it recovers (using the field/zone length) and contin- ues from the next field/zone, until the End-of-Record, a fatal error, or this maxerrors limit is reached. Fatal errors include the following: • RecordLength mis-match • RecordDelimiter not found • RecordLength invalid (negative values or zero) • Zone length invalid (negative values) • UTF-8 initial byte is invalid • UTF-8 continuation bytes are invalid 6-4 20525-1 Rev. 1 Netezza Data Loading Guide Delimiter EscapeChar QuotedValue RequireQuotes Default Values The following existing external table options work as default values for zone definitions: NullValue – Default for the ‘NULLIF’ clause of all zones. DateStyle, DateDelim, TimeStyle, TimeDelim, BoolStyle – Default for zone style for corresponding date, time and bool zones. Layout Definitions Layout is an ordered collection of zone (field) definitions, and is a required option for fixed-length format. Each zone (field) definition is made up of mutually exclusive (non-overlapping) clauses. These clauses must be in the following order, although some are optional and can be empty: Use-type – Indicates whether a zone is a normal (data) zone or a filler zone. For data zones, this value is omitted. Filler zones can only be specified in bytes. Other use-types exist, but are not used for fixed-length format data. Name – The name of the zone. Duplicate zone names are not allowed. This definition is not currently used, but is typically provided to identify the field. Type – Defines the zone type. When not specified, type is defaulted to the corresponding table column’s type. Filler-zones must have a zone type of INT. Valid values are as follows: CHAR VARCHAR NCHAR NVARCHAR INT1 INT2 INT4 INT8 INT UINT1 UINT2 UINT4 UINT8 UINT 20525-1 Rev. 1 6-5 Layout Definitions FLOATING DOUBLE NUMERIC BOOL DATE TIME TIMESTAMP TIMETZ Style – Defines the zone representation, and is optional. This is defaulted based on the zone-type and ‘Format’ option. All other styles are only valid for their corresponding non-textual zone-types. Valid values are the following: INTERNAL – Valid only for textual zones (CHAR/VARCHAR/NCHAR/NVARCHAR) DECIMAL – Valid for integer/numeric zone types DECIMALDELIM – Valid for numeric, float, double, and time-styles (time, timetz, and timestamp) zone type FLOATING – Valid for float or double zone type SCIENTIFIC – Valid for float or double zone type YMD <‘date-delim’> (and other date-styles currently supported in external table options DateStyle and DateDelim; valid for date zones 12Hour <’time-delim’> (and other time-styles currently supported in external table options TimeStyle and TimeDelim; valid for time zones) 24Hour <’time-delim’> (and other time-styles currently supported in external table options TimeStyle and TimeDelim; valid for time zones) YMD <’date-delim’> 24Hour <’time-delim’> (and other combinations of date and time styles currently supported for external table options DateStyle, DateDelim, TimeStyle and TimeDelim; valid for timestamp and timetz zones TRUE_FALSE, Y_N, 1_0 (and other boolean styles currently supported for external table option BoolStyle; valid for boolean zones). Style has to be in accordance with format Length – Specified in bytes. Nullif – Defines the zone null-ness attribute. For fixed format files this clause specifies a known data pattern within the field which when present signifies the field is null. Nulls are detailed in Table 6-2: Table 6-2: Layout Example Use Type Name Type Style Length Nullness NA f1 Int4 DECIMAL Bytes 10 Nullif @ = 0 NA f2 Date YMD Bytes 10 Nullif &= ‘2000-10-10’ 6-6 20525-1 Rev. 1 Netezza Data Loading Guide Building the Fixed-Length Format Definition Fixed-length format files must have a format definition. This section shows examples of typical fixed-length format definitions for typical data types. End-of-Record When fixed format records end in a newline character, no action is required, Newline is the default end-of-record delimiter. When there is no record separator, use single quotes side by side, as in the folowing example: RecordDelim ‘’ RecordDelim is a literal sequence of up to 8 bytes, which does not translate common escape representations or support functions like CHAR(8). Record Length Record Length is optional, but can provide feedback that the format definition has the correct length. This excludes the end-of-record delimiter. The following is an example: Recordlength NNN Skipping Fields The following clause skips four bytes: “filler char(4) bytes 4” However, the preferred method is to indicate the field being skipped, as in the following example: “filler fld_name char(4) bytes 4” Temporal Values Temporal values in fixed-length format files often omit delimiters. Table 6-3 shows clauses that load dates, times, and timestamps without delimiters. NA f3 Char(20) INTERNAL Chars 10 Nullif && =” Filler f4 Char(4) NA Bytes 10 NA Table 6-2: Layout Example Use Type Name Type Style Length Nullness Table 6-3: Temporal Values Datatype Value Format Clause Date 20101231 date1 date YMD'' bytes 8 Time 231559 time1 time(6) 24hour '' bytes 6 Timestamp 0101231231559 stamp1 timestamp(6) 24hour '' bytes 14 20525-1 Rev. 1 6-7 Building the Fixed-Length Format Definition Numeric Values Table 6-4 shows numeric values. Timestamp 2010123123155 9000001 (Load as char(24), then use insert-select) to_timestamp(col,’YYYYMMDDHH24MISSUS') Date 2010-12-31 date2 date YMD'-' bytes 10 Time 23.15.59 time2 time(6) 24hour '.' bytes 8 Timestamp 2010-12-31 23:15:59 tms2 timestamp(6) YMD ‘-’ 24hour ':' bytes 19 Timestamp 2010-12-31 23:15:59.0001 tms3 timestamp(6) YMD ‘-’ 24hour ':' bytes 26 Timetz 12:30:45+03:00 Tz1 TIMETZ(6) 24HOUR ':' bytes 14 Timetz 123045+-0300 (Load as char(11) then use insert-select) (substring(col1,1,2)||':'|| substring(col1,3,2)||':'||substring(col1,5,5)||':'|| substring(col1,10,2))::timetz Table 6-3: Temporal Values Datatype Value Format Clause Table 6-4: Numeric Values Datatype Value Format Clause Integer 32767 int1 int2 bytes 5 Int8 9123456789123456 int2 int8 bytes 16 Numeric 2315.59 num1 numeric(6,2) bytes 7 Numeric 231559 (Load as char(6) then use insert-select) (col/100)::numeric(6,2) Floating 1.2345678 flt1 floating bytes 9 Floating 12345678 (Load as char(8) then use insert-select) (substring(col1,1,1)||'.'||substring(col1,2,7))::float Double 1.2345678 flt1 double bytes 9 Double 12345678 (Load as char(8) then use insert-select) (substring(col1,1,1)||'.'||substring(col1,2,7))::double 6-8 20525-1 Rev. 1 Netezza Data Loading Guide Logical Values Table 6-5 shows logical values. Null Values Fixed-length format files typically use ‘magic’ values to represent nulls. Adding a nullif clause to any specification allows the column to be checked for null. A nullif clause has the following parts: The keyword “nullif” The column reference The test expression As an example, a file specification where field1 is a date and is considered null if it has the value ’99991231’ would have the following characteristics: The nullif specification would be as follows: “nullif &=’99991231’” The entire specification would be as follows: “fld1 date YMD'' bytes 8 nullif &=’99991231’” All format specifications support the nullif clause. In addition to &=, which evaluates to ‘string must exactly match,’ the nullif clause also supports &&=, which allows substring matching. This is useful in cases where the string may occur anywhere in a field with space padding. For example nullif &&=’N’ matches the different expressions “ N “, “N “, “ N”. Table 6-6 shows null values: Table 6-5: Logical Values Datatype Value Format Clause Boolean Y or y, N or n BOOL Y_N BYTES 1 Boolean 1, 0 BOOL 1_0 BYTES 1 Boolean T or t, F or f BOOL T_F BYTES 1 Table 6-6: Null Values Datatype Null Value Format Clause Boolean ‘ ‘ (1 space) BOOL Y_N BYTES 1 NULLIF &=’ ‘ (1 space) DATE 000000 DATE YMD ‘’ BYTES 6 NULLIF &=’000000’ INT ‘ ‘ (6 spaces) INT BYTES 6 NULLIF &=’ ‘ (6 spaces) A-1 A P P E N D I X A Examples and Grammar What’s in this appendix The nzload Command Reference Examples Decimal Delimiter Examples SQL Grammar Fixed-Length Format Definition This appendix includes examples for using external tables, the nzload command, SQL grammar, and references. The nzload Command The following examples describe how to specify nzload arguments, how to use named pipes, and sample ways of using nzload. Specifying nzload Arguments The following examples show how to specify the admin and password or accept the defaults. To load the table repeat_cust, delimited by |, and specifying the input file clickstream.dat, enter: nzload -t repeat_cust -delim '|' -df clickstream.dat This example uses default values for the -u, -pw, -db option. For more information about these default values, see Table 4-1 on page 4-3. To load the database dev as user admin with the password production, specifying the table name areacode, using tab delimiters, and specifying the input file as phone-prefix.dat, enter: nzload -u admin -pw production -db dev -t areacode -delim '\t' -df phone-prefix.dat Note: To ensure optimum performance, run the GENERATE STATISTICS command after you have loaded a table or database. For more information about the generate statistics command, see the Netezza Performance Server System Administrator’s Guide. A-2 20525-1 Rev. 1 Netezza Data Loading Guide Using Named Pipes To load a large amount of data, use a named pipe to stream the data to external tables or to the nzload command. The nzload command loads the data as it fills the pipe, and does not exit until it receives the end-of-file indicator. Note that stdin is supported for nzload. To use a named pipe to load tables with the nzload command, do the following: 1. Create a zero-length, named pipe file, using the Linux command mkfifo: mkfifo mypipe 2. Do the following in a background session: nzload -db <my_db> -t my_table -delim “|” -df /export/home/<my_db>/ mypipe 3. Do the following in a foreground session: cat /export/home/nz/<my_db>/my_table.dat > mypipe Sample nzload Usage The following provides some sample nzload usage. To specify the name of the load file, enter: nzload -u admin -pw password -host nzhost -db emp -t name -df /tmp/ daily/Import.bad To specify the boolean style, enter: nzload -u admin -pw password -host nzhost -db emp -t name -df /tmp -boolStyle yes_no To specify the name of the control file, enter: nzload -u admin -pw password -host nzhost -db emp -t name -cf /tmp/ daily/control.file To allow unescaped carriage returns in char() and varchar() fields, enter: nzload -u admin -pw password -host nzhost -db emp -t name -df /tmp -crinString To allow an ASCII value 1-31 in char() and varchar() fields, enter: nzload -u admin -pw password -host nzhost -db emp -t name -ctrlChars To specify the delimiter to use with the dateStyle option, enter: nzload -u admin -pw password -host nzhost -db emp -t name -dateDelim ‘/’ -dateStyle MDY To specify how to interpret the date format, enter: nzload -u admin -pw password -host nzhost -db emp -t name -dateDelim ‘/’ -dateStyle MDY’ To specify the field delimiter, enter: nzload -u admin -pw password -host nzhost -db emp -t name -delim ‘,’ To specify using an escape character, enter: 20525-1 Rev. 1 A-3 The nzload Command nzload -u admin -pw password -host nzhost -db emp -t name -df /tmp -escapeChar ’\\’ To specify an input line with fewer columns than the table definition, enter: nzload -u admin -pw password -host nzhost -db emp -t name -fillRecord To specify discarding the byte value zero in the char() and varchar() fields, enter: nzload -u admin -pw password -host nzhost -db emp -t name -ignoreZero no To specify the log file name, enter: nzload -u admin -pw password -host nzhost -db emp -t name -lf /tmp/ daily/import.log To specify the maximum number of errors, enter: nzload -u admin -pw password -host nzhost -db emp -t name -maxErrors 100 To specify stopping processing when the specified number of records are in the database, enter: nzload -u admin -pw password -host nzhost -db emp -t name -maxRows 100 To specify the string to use for the null value, enter: nzload -u admin -pw password -host nzhost -db emp -t name -nullValue ‘none’ To specify the output directory for the log files, enter: nzload -u admin -pw password -host nzhost -db emp -t name -outputDir /tmp/daily To specify that quotes are mandatory, except for null values, enter: nzload -u admin -pw password -host nzhost -db emp -t name -requireQuotes quoted value YES To specify the delimiter to use for time formats, enter: nzload -u admin -pw password -host nzhost -db emp -t name -timeDelim ‘.’ To specify allowing but rounding non-zero digits with smaller than microsecond resolution, enter: nzload -u admin -pw password -host nzhost -db emp -t name -timeRoundNanos To specify the time style value in the data file, enter: nzload -u admin -pw password -host nzhost -db emp -t name -timeStyle 12hour To specify truncation a string and inserting it into the declared string, enter: nzload -u admin -pw password -host nzhost -db emp -t name -truncString To specify the first year in the YY format, enter: nzload -u admin -pw password -host nzhost -y2Base 2000 A-4 20525-1 Rev. 1 Netezza Data Loading Guide To enable load continuation, enter: nzload -u admin -pw password -host nzhost -db emp -t name -allowReplay Reference Examples Examples for references are as follows: Decimal Delimiter Examples The following are examples of how to use the new decimal delimiter option. The option use is shown in bold text. For text-delimited format for the table level: INSERT INTO <target-table> SELECT * FROM ‘<external-table>’ USING (delim ‘delim’ decimalDelim ‘,’); For fixed-length format for the table level: INSERT INTO <target-table> SELECT * FROM ‘<external-table>’ USING (decimalDelim ‘,’ format ‘fixed’ layout (c1 int bytes 4, c2 float bytes 6, c3 numeric(10,2) bytes 11, c4 time 24HOUR ‘:’ bytes 11 ); For fixed-length format for the column level. For numeric data type: INSERT INTO <target-table> SELECT * FROM ‘<external-table>’ USING (format ‘fixed’ layout (c1 int bytes 4 , c2 float bytes 6, c3 numeric(10,2) decimal ‘,’ bytes 11) ); Table A-1: Reference Examples Reference Meaning BYTES &2 Error only internal @ reference allowed for length-clause (in any format/zone-type). BYTES @ An error length-clause cannot refer itself. NULLIF & = ‘123’ Self-reference (no number) is valid in null-clause. The length has to be BYTES/CHARS 3, for text-styles. Matches (nullif evaluates to ‘true’) ONLY ‘123’ (a row in the external file containing ‘123’). NULLIF && = ‘123’ Matches (nullif evaluates to ‘true’) ‘123’, ‘ 123 ‘ ‘ 123 ‘, if SPACE is skipped. Length has to be at least BYTES 3 (text-styles) or BYTES 4. NULLIF @ = 123 Valid for numerical zones. Matches ‘123’, ‘ 123 ‘ and so on, in text format, with spaces skipped. NULLIF @ = ‘2000- 01-01’ Valid for date zones 20525-1 Rev. 1 A-5 SQL Grammar INSERT INTO <target-table> SELECT * FROM ‘<external-table>’ USING (format ‘fixed’ layout (c1 int bytes 4 , c2 float bytes 6, c3 numeric(10,2) decimal decimalDelim ‘,’ bytes 11) ); For float data type: INSERT INTO <target-table> SELECT * FROM ‘<external-table>’ USING (format ‘fixed’ layout (c1 int bytes 4 , c2 float floating ‘,’ bytes 6, c3 numeric(10,2) bytes 11) ); INSERT INTO <target-table> SELECT * FROM ‘<external-table>’ USING (format ‘fixed’ layout (c1 int bytes 4 , c2 float floating decimalDelim ‘,’ bytes 6, c3 numeric(10,2) bytes 11) ); For double data type: INSERT INTO <target-table> SELECT * FROM ‘<external-table>’ USING (format ‘fixed’ layout (c1 int bytes 4 , c2 double exponential ‘,’ bytes 6, c3 numeric(10,2) bytes 11) ); INSERT INTO <target-table> SELECT * FROM ‘<external-table>’ USING (format ‘fixed’ layout (c1 int bytes 4 , c2 float exponential decimalDelim ‘,’ bytes 6, c3 numeric(10,2) bytes 11) ); For time data types (time, timetz, timestamp): INSERT INTO <target-table> SELECT * FROM ‘<external-table>’ USING (format ‘fixed’ layout (c1 int bytes 4 , c2 time 12HOUR decimalDelim ‘,’ bytes 12, c3 numeric(10,2) bytes 11) ); INSERT INTO <target-table> SELECT * FROM ‘<external-table>’ USING (format ‘fixed’ layout (c1 int bytes 4 , c2 time timeDelim ‘-’ decimalDelim ‘,’ bytes 12, c3 numeric(10,2) bytes 11) ); INSERT INTO <target-table> SELECT * FROM ‘<external-table>’ USING (format ‘fixed’ layout (c1 int bytes 4 , c2 time timeDelim ‘-’ ‘,’ bytes 12, c3 numeric(10,2) bytes 11) ); INSERT INTO <target-table> SELECT * FROM ‘<external-table>’ USING (format ‘fixed’ layout (c1 int bytes 4 , c2 time 12HOUR ‘-’ decimalDelim ‘,’ bytes 12, c3 numeric(10,2) bytes 11) ); INSERT INTO <target-table> SELECT * FROM ‘<external-table>’ USING (format ‘fixed’ layout (c1 int bytes 4 , c2 time 12HOUR ‘-’ ‘,’ bytes 12, c3 numeric(10,2) bytes 11) ); INSERT INTO <target-table> SELECT * FROM ‘<external-table>’ USING (format ‘fixed’ layout (c1 int bytes 4 , c2 time 12HOUR timeDelim ‘-’ decimalDelim ‘,’ bytes 12, c3 numeric(10,2) bytes 11) ); INSERT INTO <target-table> SELECT * FROM ‘<external-table>’ USING (format ‘fixed’ layout (c1 int bytes 4 , c2 time 12HOUR timeDelim ‘-’ ‘,’ bytes 12, c3 numeric(10,2) bytes 11) ) SQL Grammar This section provides an explanation of the SQL grammar used for CREATE EXTERNAL TABLE. [INSERT INTO <normal-table>] SELECT <col-list> FROM EXTERNAL [name] ‘<data-file>’ [USING ‘(‘ <Load-options>’)’] A-6 20525-1 Rev. 1 Netezza Data Loading Guide CREATE EXTERNAL TABLE <ext-table-name><External-table-shape> (<External-table- shape> | SAMEAS <tablename>) USING ‘(‘ <Load-options> ‘)’ CREATE EXTERNAL TABLE [name] ‘file path’ [USING ‘C’ load-options ‘)’ AS SELECT- statement Load-options: Load-option | Load-option Load-options // space separated list of USING clause options Load-option: FORMAT TEXT | INTERNAL | FIXED | RECORDLENGTH <n>| Length-ref-expr | RECORDDELIM <string-literal-max-8-bytes > | LAYOUT ( Zone-definitions ) ….. Zone-definitions: Zone-def | Zone-def ‘,’ Zone-definitions // comma-separated lists of zone definitions Zone-def: [Zone-use-type] [Zone-name] [Zone-type] [Zone-style] [Zone-len] [Nullness] Zone-use-type: REF | FILLER Zone-name: Identifier Zone-type: CHAR| VARCHAR | NCHAR| NVARCHAR | BOOL | INT1 | INT2 | INT4 | INT8 | INT | UINT1 | UINT2 | UINT4 | UINT8 | UINT | NUMERIC | FLOATING| DOUBLE | DATE | TIME | TIMESTAMP | TIMETZ Zone-style: INTERNAL | DECIMAL [‘decimal-delim’] | FLOATING | SCIENTIFC [‘decimal-delim’] | Date-format | Time-format | Date-format Time-format Date-format: | DateStyle [‘date-delim’] | DATE DELIM ‘date-delim’ Time-format: 20525-1 Rev. 1 A-7 Fixed-Length Format Definition | TimeStyle [‘time-delim’] [‘decimal-delim’] | TIME DELIM ‘time-delim’ DecimalDelim ‘decimal-delim’ Date-style: YMD| DMY | MDY |.. // all date styles Time-style: 12HOUR | 24HOUR Zone-len: BYTES <n> | <Length-ref-expr> | CHARACTERS <n> | <Length-ref-expr> Zone-ref: External-ref | Isolated-ref | Internal-ref External-ref: &[n] // 1 based absolute position of zones, 0, negative values for relative positions backwards Isolated-ref: &&[n] // 1 based absolute position of zones 0, negative values for relative positions backwards Internal-ref: @[n]// 1 based absolute position of zones, 0, negative values for relative positions backwards Length-ref-expr: Internal-ref [ Operator <n> ] Operator: + | - Fixed-Length Format Definition The following is a sample data record: 20011228YF2001122814313425 Forest St Marlborough MA017525083828200600 The record is defined by the following column layout: Columns 1-8 Date format YYYYMMDD Null when value is ‘99991231’ Column 9 Boolean Y/N Null when value is space ‘ ‘ Column 10 Boolean T/F Null when value is space ‘ ‘ Column 11-24 Time stamp format YYYYMMDDHHMMSS Null when value is ‘99991231000000’ Column 25-39 Character Address Null when value is all spaces Column 40-52 Character City Null when value is ’****NULL*****‘ Column 53-54 Character State Null when value is ‘##’ Column 55-59 Number zipcode Null when value is all zeroes Column 60-68 Character Phone Null when value is all zeroes Column 69-72 Number(3,2) Example 600 would be 6.00 Never Null Column 73 Newline end of record. A-8 20525-1 Rev. 1 Netezza Data Loading Guide The following is an example of the Netezza External Table definition for this data: CREATE EXTERNAL TABLE sample_ext ( Col01 DATE , Col09 BOOL , /* Skipped col10 */ Col11 TIMESTAMP, Col26 Char(12), Col38 Char(10), Col48 Char(2), Col50 Int4, Col56 CHAR(10), Col67 CHAR(3) /* Numeric(3,2) cannot be loaded directly */ ) USING ( dataobject('/home/test/sample.fixed') logdir '/home/test' recordlength 72 /* does not include end of record delimiter */ recorddelim ‘ ‘ /* This is actually a newline between the single quotes, really not needed as newline is default */ format 'fixed' layout ( Col01 DATE YMD ‘’ bytes 8 nullif &=‘99991231’, Col09 BOOL Y_N bytes 1 nullif &=‘ ‘, FILLER Char(1) Bytes 1, /* was col10 space */ Col11 TIMESTAMP YMD ‘’ 24HOUR ‘’ bytes 14 nullif &=’99991231000000’, Col26 CHAR(15) bytes 15 nullif &=’ ‘, /* 15 spaces */ Col38 CHAR(13) bytes 13 nullif &=’****NULL*****‘ , Col48 CHAR(2) bytes 2 nullif &=’##‘ , Col50 CHAR(5) bytes 5 nullif &=’00000’ , Col56 CHAR(10) bytes 10 nullif &=’0000000000’, Col67 CHAR(3) bytes 3 /* We cannot load this directly, so we use an insert-select */ ) /* end layout */ ); /* end external table definition. */ 20525-1 Rev. 1 A-9 Fixed-Length Format Definition INSERT INTO sampleTable SELECT Col01, Col09, Col11, Col26, Col38, Col48, Col50, Col56 , (Col67/100)::numeric(3,2) as Col67 /* convert char to numeric(3,2) */ FROM sample_ext ; A-10 20525-1 Rev. 1 Netezza Data Loading Guide B-1 A P P E N D I X B Troubleshooting What’s in this appendix Tips for Successful Loading nzload Error Handling This section contains examples to aid you in troubleshooting data loading. Tips for Successful Loading The following sections describe how to analyze your data, how to set up loading and how to troubleshoot any problems that might arise. Create Your Table Before you create your table, check the following: Choose a distribution key. If you know the primary key or a column that is used fre- quently in joins, use that one. Use a distribution key with the highest selectivity. For more information about distribution keys, see the Netezza Performance Server System Administrator’s Guide. Check that any column that does not contain null data (or should not contain null data) is declared as not null. The system processes not null columns more quickly. Check if you have number fields. Are they declared as int8, int4, smallint, byteint or numeric(s,p). The smaller the storage, the better for large tables. Determine Your Data Format Consider the following when determining the format of your data: Check how many data fields there are in each input line of the data file. Are there the same number of columns defined in the target-table definition? If there are fewer columns than fields, is it acceptable to extend the schema to have filler columns? If not, then the load will not succeed. If there are more columns than fields, is it acceptable to use null values to insert into those columns? If it is acceptable, specify the -fillRecord option. B-2 20525-1 Rev. 1 Netezza Data Loading Guide Check the field delimiter. It should be a character used to separate one field value from another. This field delimiter should be unique and should not appear in a field value, especially in a char or varchar string. Use the -delim option to specify the field delimiter. Check whether there are any NULL values in the data source. How is the null value expressed in the data file? The RDBMS industry convention is to use the string “null” to represent a null value. If the data file uses a different representation, use the - nullValue option to override the default null value. The new value can be an empty string or a value in the range of a-z or A-Z and no longer than four characters. Check whether there are any date, time, time with time zone, or timestamp data types in the table schema. If there are, what style is the date value? The style of these data type values must be consistent throughout the nzload job. Check the handling of string fields for char() or varchar() data types. Does the longest or largest value fit into the storage of the char() or varchar() declaration? If not, is it possible to alter the schema to accommodate the longest string? If schema cannot be altered, is truncating a string an acceptable solution? If truncation is acceptable, specify the -truncString option. If neither is acceptable, the nzload command treats the record with the long string as an error record. The nzload command discards the record to the nzbad file and logs an error with the record and column numbers in nzlog file. See whether there are any special characters used in the string fields. For example, CR, CRLF, or a character in a string that is the same as the field delimiter? This violates the unique character rule. If there are special characters, can you regenerate the data file to have an escape character added to these special characters? If so, then use the -escapeChar '\\' option to process the strings. If you cannot regenerate the data file, then the load will contain incomplete and invalid records. Consider the Load Source See whether you are using pipes. If so, are they from another local feed or from across a network? The preferred method is to read from a named pipe, rather than to read from stdin/stdout. Look at the file. Is the file on an NFS mounted directory? If so, remember that your load performance is constrained by the speed of the network. Run the Job Make a copy of your source table before beginning the load if you are running on a production system. Making a backup is extremely fast within the Netezza appliance and is better than reloading from a backup. For example, the syntax for making a copy is as follows: CREATE TABLE loan_backup AS SELECT * FROM loan; Stage the data before moving it to a production system. Create a new table, load it, validate it, then use the ALTER TABLE command to move the tables to production. For example: 20525-1 Rev. 1 B-3 Tips for Successful Loading ALTER TABLE loan RENAME TO loan_lastmonth; ALTER TABLE loan_stage RENAME TO loan; If you are running multiple nzload jobs to load into a table, use unique names for your nzbad files. The nzload command generates the default file name by using the <tablename>.<databasename> and appending the extension .nzbad. Loading into the data table of the dev database uses the default file name "data.dev.nzbad" for the nzbad file. Each instance of the nzload command overwrites the existing file. If you want to preserve the bad records that are stored in this file, use the -bf <file name> option to specify a different name for each nzload job. Note: If your default system case is uppercase, the system displays lowercase table names as uppercase in nzlog files, for example, DATA.DEV.nzlog and DATA.DEV.nzbad Run the Linux top command on the host to monitor CPU resources. Consider running more loads concurrently if resources are available. Troubleshoot If you see the error message, “Too many data fields for table,” use the Linux command head -1 on the data file to get the first row, which may contain the column’s names extracted. Compare these to your create table's DDL and see if their physical positions match. If you see the error message, “Data type mismatch on column 5,” use the Linux command cut -d^ -f 5 inputfile | more to look at the individual data values in the source file and then compare them to your DDL. Compare these to your create table's DDL and see if their physical positions match. Handle Exceptions Repeat the load on the -bf file. If there are many exceptions, fix them and re-extract from the source system. If they are few, use a text editor to change data. To make large substitu- tions, use the Linux sed or awk commands. Validate the Results After the load completes, validate the results by comparing them with the source system. Count the number of rows and select min/max/sum of each numeric and min/max of each date column in the table. Generate Statistics Remember to run the generate statistics command on your tables and/or database after you have loaded new data. Test Performance If your data is evenly distributed, you should see peak loading performance of at least 75 percent CPU utilization on the host. You can monitor utilization by running the Linux top command during the load. If you see less CPU utilization that means either the data is skewed so that all SPUs are not sharing the workload or the parser is waiting for data. If your input data is skewed, that is, all records are being sent to a small number of SPUs, those SPUs become the performance bottleneck. B-4 20525-1 Rev. 1 Netezza Data Loading Guide If your CPU utilization is less than 75 percent and the data is evenly distributed, you might have a streaming problem: If the load is running from the local host, determine the source of the data. Look for other concurrent database activities — such as activities that are SPU-to-SPU broadcast intensive or SPU disk I/O intensive. If the data is not locally staged or is on a SAN / NFS mount, determine if the bottleneck is the remote source of the data or the network. The performance of the Netezza appliance system depends on the number of SPUs. If, however, data is being streamed across an external network, then the performance is limited by the speed of the network. Test the network by using the FTP command to send a file between the source and the local host, and measure the transfer rate. Under optimal conditions, a Gig-E network transfers at a rate of ~1000Mb/second, or ~125MB/second or ~450GB/hour. nzload Error Handling The nzload command does extensive error checking. This section describes how the nzload command interprets different data types and the way it handles syntax errors. Reporting Errors The nzload command returns standard error status when it completes. 0 – The load was successful, all input records were inserted. 1 – The load failed, no records were inserted due to error(s) found during load. 2 – The load was successful, but the system found error in input that did not exceed error threshold (-maxErrors), so good records were inserted. The nzload command writes high-level errors to the terminal (stderr), nzlog file, and nzbad file. You can specify the nzlog and nzbad filenames on the command line or through the use of a control file. For more information, see “Using a Control File” on page 4-5. Note: Periodically delete log files to free up disk space. Understanding nzload Log Files The system creates the following nzlog file as the result of the command line: nzload -u admin -pw password -t member_profile -db dev -maxErrors 10 -delim '\t' -maxErrors allows the nzload command to continue processing until it has found 10 errors. -delim '\t' specifies the TAB delimiter. The system appends to the nzlog file for every nzload command that loads the same table into the same database. The system names the nzlog file based on the table and the database name with the extension .nzlog. So, in this example, the file name is: member_profile.dev.nzlog There is also a member_profile.dev.nzbad file that contains the record(s) that caused the error(s). The system overwrites this file each time you invoke the nzload command for the same table and database name (unlike the behavior of the nzlog file). C-1 A P P E N D I X C Option Names What’s in this appendix Specifying Options This section details the different methods of using options. Specifying Options Table C-1 shows how to enter the external table options when using the command line method (used for nzload), in a control file, or as part of a SQL command. Table C-1: Specifying External Table Options Option Command Line Control File SQL AllowReplay -allowreplay NA LOAD_REPLAY_ REGION MAX_QUERY_ RESTARTS BadFile -bf badfile NA BoolStyle -boolStyle boolstyle BOOLSTYLE Compress -compress compress COMPRESS CRinString -crInString crinstring CRINSTRING CtrlChars -CtrlChars ctrlchars CTRLCHARS Database -db database NA Datafile -df datafile DATAOBJECT DateDelim -dateDelim datedelim DATEDELIM DateStyle -dateStyle datestyle DATESTYLE DecimalDelim -decimaldelim decimaldelim DECIMALDELIM Delimiter -delim -delimiter delim delimiter DELIM DELIMITER C-2 20525-1 Rev. 1 Netezza Data Loading Guide Encoding -encoding encoding ENCODING EscapeChar -escape -escapeChar escape escapechar ESCAPE ESCAPECHAR FillRecord -fillRecord fillrecord FILLRECORD Format -format format FORMAT IgnoreZero -ignoreZero ignorezero IGNOREZERO IncludeZeroSec- onds NA NA INCLUDEZEROSEC- ONDS Layout -layout layout LAYOUT LogDir -outputDir outputdir LOGDIR LogFile -lf logfile NA LogFileSize -logFileSize NA LOAD_LOG_MAX_ FILESIZE MaxErrors -maxErrors maxerrors MAXERRORS MaxRows -maxRows maxrows MAXROWS NullValue -nullValue nullvalue NULLVALUE QuotedValue -quotedValue quotedvalue QUOTEDVALUE RecordDelim -recdelim recdelim RECDELIM RecordLength -reclength recordlength RECLENGTH RemoteSource -host NA REMOTESOURCE RequireQuotes -requireQuotes requirequotes REQUIREQUOTES SkipRows -skipRows skiprows SKIPROWS SocketBufSize -fileBufSize -fileBufByteSize socketbufsize SOCKETBUFSIZE SuspendMviews -suspendMviews NA NA Tablename -t tablename NA TimeDelim -timeDelim timedelim TIMEDELIM Table C-1: Specifying External Table Options Option Command Line Control File SQL 20525-1 Rev. 1 C-3 Specifying Options TimeRound Nanos TimeExtraZeros -timeRoundNanos -timeExtraZeros timeroundnanos timeextrazeros TIMEROUNDNANOS TIMEEXTRAZEROS TimeStyle -timeStyle timestyle TIMESTYLE TruncString -truncString truncstring TRUNCSTRING Y2Base -y2Base y2base Y2BASE Table C-1: Specifying External Table Options Option Command Line Control File SQL C-4 20525-1 Rev. 1 Netezza Data Loading Guide Index Index-1 Index A allowreplay 4-3, C-1 attributes data 6-2 B backup external tables 2-4 nzload B-2 badfile 4-5, C-1 best practices external tables 2-13 bigint, integer type 2-6 boolstyle 3-3, C-1 byteint, integer type 2-6 C character strings char 2-10 varchar 2-10 column constraint 2-10 compress 3-4, C-1 compressed binary 1-2 concurrency 4-2 control file using 4-5 counting rows 3-13 CREATE EXTERNAL TABLE dropping an external table 2-15 examples 2-15 crinstring 3-4, C-1 ctrlchars 3-4, C-1 D data attributes 6-2 data loading components 1-1 formats 1-2 data types fixed-point 2-7 floating-point 2-8 integer 2-7 supported 2-6 temporal 2-11 database C-1 datafile 4-5, C-1 dataobject 3-4 datedelim 3-5, C-1 datestyle 3-5, C-1 decimaldelim 1-2, 3-2, 3-6, 6-5 delim C-1 delimiter 3-6, C-1 E encoding 3-7, C-2 errors nzload handling B-4 escape C-2 escapechar 3-7, C-2 external table about 2-1 backup and restore 2-4 displaying information 2-2 examples 2-15 options 3-1 parsing 2-3 privileges 2-1 restrictions 2-13 F fileBufByteSize 4-3 filebufbytesize C-2 fileBufSize 4-3 filebufsize C-2 fillrecord 3-8, C-2 fixed point 2-7 floating point 2-8 format 3-8, C-2 format options 6-2 formatting, background 6-1 H host 4-3, C-2 I ignorezero 3-8, C-2 includezeroseconds 3-8, C-2 integer, type 2-6 L layout 3-8 definitions 6-4 legal characters 3-15 load continuation 3-15 load. See also nzload LOAD_LOG_MAX_FILESIZE 4-3 LOAD_REPLAY_REGION 4-3, C-1 loading, success tips B-1 log files 2-2 logdir 3-9, C-2 logfile 4-5, C-2 size C-2 logfilesize 4-3 Index-2 Index M matching input fields 3-14 MAX_QUERY_RESTARTS 4-3, C-1 maxerrors 3-9, C-2 maxrows 3-9, C-2 N NOT NULL 3-10 nullvalue 3-9, C-2 numerics 2-6 nzload command backup B-2 boolStyle 4-2 error reporting B-4 examples A-1 inputs 4-3 privileges 4-1 program invocation 4-2 specifyng arguments A-1 syntax 4-2 tips B-1 uncommitted jobs 4-2 using 4-1 nzmigrate 1-1 nzreclaim command nzload jobs 4-2 O options changed 6-3 external table 3-1 names C-1 new 6-2 processing 3-3 unsupported 6-3 outputdir 4-3, C-2 P pipes A-2 privileges, load session 4-1 Q quotedvalue 3-10, C-2 R recdelim C-2 recorddelim 3-10 recordlength 3-11 references examples A-4 remote client, unloading 5-2 remotesource 3-11, C-2 requirequotes 3-11, C-2 rows bad 3-14 counting 3-13 input 3-14 skipping 3-11 S session variables 3-16 skip rows 3-11 skiprows C-2 smallint, integer type 2-6 socketbufsize 3-12, C-2 SQL grammar A-5 string versus non-string 3-14 supported data types 2-6 suspendmviews C-2 T tablename C-2 temporal data types 2-11 textfixed, using 6-1 timedelim 3-12, C-2 timeextrazeros C-3 timeroundnanos 3-12, C-3 timestamp 2-12 timestyle 3-12, C-3 timetz 2-12 transactions, nzload jobs 4-2 troubleshooting B-1 truncstring 3-12 U unloading examples 2-16 options 5-1 remote client 5-2 V value absence 3-14 Y y2base 3-13, C-3 Z zone definition, default values 6-4 zones default values 6-4