Informatica Question & Answer Set

D W B I C o n c e p t s .c o m Master Informatica Questions and Answer Set Version 2.5 The one stop master manual of Informatica™ interview questions and answers www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 2 Copyright Notice Informatica Master Question and Answer Set is copyright © DWBIConcepts 2013. All rights reserved. No part of this book shall be reproduced, stored in a retrieval system, or transmitted by any means – electronic, mechanical, photocopying, recording, or otherwise – without written permission from the publisher. No patent liability is assumed with respect to the use of the information contained herein. Although every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions. Neither is any liability assumed for damages resulting from the use of the information contained herein. Trademarks All terms mentioned in this book that are known to be trademarks or service marks have been appropriately capitalized. New Riders Publishing cannot attest to the accuracy of this information. Use of a term in this book should not be regarded as affecting the valid- ity of any trademark or service mark. Warning and disclaimer Every effort has been made to make this book as complete and as accurate as possible, but no warranty of fitness is implied. The information is provided on an “as is” basis. The author and the publisher shall have neither liability nor responsibility to any person or entity with respect to any loss or damages arising from the information contained in this book www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 3 How this book should be used This book contains various questions and answers pertaining to Informatica Power Cen- ter™ and allied tools as commonly asked in Job Interviews. As such the book is written for the candidates who are preparing for Job Interviews. It is suggested that the candidate start preparing from the material at least one week in advance so that s/he can finish reading the entire content before appearing for the interview. In case the candidate is stuck with any question or answer, is not clear on something or has a doubt – s/he can interact with the Experts by using DWBIConcepts forum. For the help of the readers, we have tagged certain questions accordingly as shown below: Common / Frequently Asked Questions Harder Questions Additional Information www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 4 Table of Contents COPYRIGHT NOICE 2 TRADEMARKS 2 WARNING AND DISCLAIMER 2 HOW THIS BOOK SHOULD BE USED 3 1. AGGREGATOR TRANSFORMATION 13 1. WHAT IS AN AGGREGATOR TRANSFORMATION? 13 2. HOW AN EXPRESSION TRANSFORMATION DIFFERS FROM AGGREGATOR TRANSFORMATION? 13 3. DOES AN AGGREGATOR TRANSFORMATION SUPPORT ONLY AGGREGATE EXPRESSIONS? 13 4. GIVE ONE EXAMPLE FOR EACH OF CONDITIONAL AGGREGATION, NON-AGGREGATE EXPRESSION AND NESTED AGGREGATION. 13 5. HOW DOES AGGREGATOR TRANSFORMATION HANDLE NULL VALUES? 13 6. WHAT ARE THE PERFORMANCE CONSIDERATIONS WHEN WORKING WITH AGGREGATOR TRANSFORMATION? 14 7. WHAT ARE THE USES OF INDEX AND DATA CACHE? 14 8. WHAT DIFFERS WHEN WE CHOOSE SORTED INPUT FOR AGGREGATOR TRANSFORMATION? 14 9. UNDER WHAT CONDITIONS SELECTING SORTED INPUT IN AGGREGATOR WILL STILL NOT BOOST SESSION PERFORMANCE? 15 10. UNDER WHAT CONDITION SELECTING SORTED INPUT IN AGGREGATOR MAY FAIL THE SESSION? 15 11. SUPPOSE WE DO NOT GROUP BY ON ANY PORTS OF THE AGGREGATOR WHAT WILL BE THE OUTPUT. 15 12. WHAT IS THE EXPECTED VALUE IF THE COLUMN IN AN AGGREGATOR TRANSFORMATION IS NEITHER A GROUP BY NOR AN AGGREGATE EXPRESSION? 15 13. WHAT IS INCREMENTAL AGGREGATION? 15 14. SORTED INPUT FOR AGGREGATOR TRANSFORMATION WILL IMPROVE PERFORMANCE OF MAPPING. HOWEVER, IF SORTED INPUT IS USED FOR NESTED AGGREGATE EXPRESSION OR INCREMENTAL AGGREGATION, THEN THE MAPPING MAY RESULT IN SESSION FAILURE. EXPLAIN WHY? 16 15. HOW CAN WE DELETE DUPLICATE RECORD USING INFORMATICA AGGREGATOR? 16 16. SCENARIO IMPLEMENTATION 1 16 17. SCENARIO IMPLEMENTATION 2 18 2. EXPRESSION TRANSFORMATION 19 1. WHAT IS AN EXPRESSION TRANSFORM? 19 2. HOW MANY TYPES OF PORTS ARE THERE IN EXPRESSION TRANSFORM? 19 3. WHAT IS THE EXECUTION ORDER OF THE PORTS IN AN EXPRESSION? 19 4. DESCRIBE THE APPROACH FOR THE REQUIREMENT. SUPPOSE THE INPUT IS: 19 5. HOW CAN WE IMPLEMENT AGGREGATION OPERATION WITHOUT USING AN AGGREGATOR TRANSFORMATION IN INFORMATICA? 20 6. SCENARIO IMPLEMENTATION 1 20 7. SCENARIO IMPLEMENTATION 2 21 8. SCENARIO IMPLEMENTATION 3 22 9. SCENARIO IMPLEMENTATION 4 22 10. SCENARIO IMPLEMENTATION 5 22 3. FILTER TRANSFORMATION 24 1. WHAT IS A FILTER TRANSFORMATION AND WHY IT IS AN ACTIVE ONE? 24 2. WHAT IS THE DIFFERENCE BETWEEN SOURCE QUALIFIER TRANSFORMATIONS SOURCE FILTER OPTION AND FILTER TRANSFORMATION? 24 D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 5 4. JOINER TRANSFORMATION 25 1. WHAT IS A JOINER TRANSFORMATION AND WHY IT IS AN ACTIVE ONE? 25 2. STATE THE LIMITATIONS WHERE WE CANNOT USE JOINER IN THE MAPPING PIPELINE. 25 3. OUT OF THE TWO INPUT PIPELINES OF A JOINER, WHICH ONE WILL WE SET AS THE MASTER PIPELINE? 25 4. WHAT ARE THE DIFFERENT TYPES OF JOINS AVAILABLE IN JOINER TRANSFORMATION? 26 5. DEFINE THE VARIOUS JOIN TYPES OF JOINER TRANSFORMATION. 27 6. DESCRIBE THE IMPACT OF NUMBER OF JOIN CONDITIONS AND JOIN ORDER IN A JOINER. 27 7. HOW DOES JOINER TRANSFORMATION TREAT NULL VALUE MATCHING? 27 8. WHEN WE CONFIGURE THE JOIN CONDITION, WHAT ARE THE GUIDELINES WE NEED TO FOLLOW TO MAINTAIN THE SORT ORDER? 28 9. WHAT ARE THE TRANSFORMATIONS THAT CANNOT BE PLACED BETWEEN THE SORT ORIGIN AND THE JOINER TRANSFORMATION SO THAT WE DO NOT LOSE THE INPUT SORT ORDER? 28 10. WHAT IS THE USE OF SORTED INPUT IN JOINER TRANSFORMATION? 28 11. CAN WE JOIN TWO TABLES BASED ON A JOIN COLUMN HAVING DIFFERENT DATA TYPE? 29 12. IMPLEMENTATION SCENARIO1 - JOINER TRANSFORMATION IS JOINING TWO TABLES S1 AND S2. S1 HAS 10,000 ROWS AND S2 HAS 1000 ROWS . WHICH TABLE YOU WILL SET MASTER FOR BETTER PERFORMANCE OF JOINER TRANSFORMATION? WHY? 29 5. LOOKUP TRANSFORMATION 30 1. WHAT IS A LOOKUP TRANSFORM? 30 2. WHAT ARE THE DIFFERENCES BETWEEN CONNECTED AND UNCONNECTED LOOKUP? 30 3. WHAT ARE THE DIFFERENT LOOKUP CACHE(S)? 30 4. IS LOOKUP AN ACTIVE OR PASSIVE TRANSFORMATION? 31 5. WHAT IS THE DIFFERENCE BETWEEN STATIC AND DYNAMIC LOOKUP CACHE? 31 6. WHAT ARE THE USES OF INDEX AND DATA CACHES? 31 7. WHAT IS PERSISTENT LOOKUP CACHE? 31 8. WHAT TYPE OF JOIN DOES LOOKUP SUPPORT? 32 9. EXPLAIN HOW LOOKUP TRANSFORMATION WORKS LIKE SQL LEFT OUTER JOIN. 32 10. WHERE AND WHY DO WE USE UNCONNECTED LOOKUP INSTEAD OF CONNECTED LOOKUP? 32 11. HOW CAN WE IDENTIFY PERSISTENT CACHE FILES IN INFORMATICA SERVER? 33 12. HOW TO CONFIGURE A LOOKUP ON A FLAT FILE WITH HEADER? 33 13. WHAT IS THE DIFFERENCE BETWEEN PERSISTENT CACHE AND SHARED CACHE? 33 14. DESCRIBE HOW TO RETURN MULTIPLE PORT VALUES FROM UNCONNECTED LOOKUP IN INFORMATICA. 34 15. HOW TO MAKE THE PERSISTENT LOOKUP CACHE IN SYNC WITH LOOKUP TABLE? 34 16. IF WE USE PERSISTENT CACHE FOR A DYNAMIC LOOKUP, WILL THE CACHE FILE BE UPDATED OR INSERTED AS REQUIRED? 34 17. IS THERE ANYTHING WRONG IN SHARING A PERSISTENT CACHE BETWEEN STATIC AND DYNAMIC LOOKUP? 34 18. WHAT IS THE DIFFERENCE BETWEEN THE TWO UPDATE PROPERTIES - UPDATE ELSE INSERT, INSERT ELSE UPDATE IN DYNAMIC LOOKUP CACHE? 35 19. IF THE DEFAULT VALUE FOR THE LOOKUP RETURN PORT IS NOT SET, WHAT WILL BE THE OUTPUT WHEN THE LOOKUP CONDITION FAILS? 35 20. HOW CAN WE ENSURE DATA IS NOT DUPLICATED IN THE TARGET WHEN THE SOURCE HAS DUPLICATE RECORDS, USING LOOKUP TRANSFORMATION? 35 6. NORMALIZER TRANSFORMATION 36 1. WHAT IS A NORMALIZER TRANSFORMATION? 36 2. SCENARIO IMPLEMENTATION 1 36 3. WHAT ARE LEVELS IN NORMALIZER TRANSFORMATION? 36 4. WHAT IS THE PURPOSE OF GCID AND GK IN A NORMALIZER TRANSFORMATION? 37 D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 6 7. RANK TRANSFORMATION 38 1. WHAT IS A RANK TRANSFORM? 38 2. HOW DOES A RANK TRANSFORM DIFFER FROM AGGREGATOR TRANSFORM FUNCTIONS MAX AND MIN? 38 3. HOW DOES A RANK CACHE WORKS? 38 4. WHAT IS A RANK PORT AND RANKINDEX? 38 5. HOW CAN YOU GET RANKS BASED ON DIFFERENT GROUPS? 38 6. WHAT HAPPENS IF TWO RANK VALUES MATCH? 39 7. WHAT ARE THE RESTRICTIONS OF RANK TRANSFORMATION? 39 8. HOW DOES RANK TRANSFORMATION HANDLE STRING VALUES? 39 9. WHAT IS DENSE RANK AND DOES INFORMATICA SUPPORTS DENSE RANK? 39 10. HOW DO WE ACHIEVE DENSE_RANK IN INFORMATICA? 40 11. SOURCE TABLE HAS 5 ROWS. RANK IN RANK TRANSFORMATION IS SET TO 10. HOW MANY ROWS THE RANK TRANSFORMATION WILL OUTPUT? 40 12. HOW YOU WILL LOAD UNIQUE RECORD INTO TARGET FLAT FILE FROM SOURCE FLAT FILES HAS DUPLICATE DATA? 40 8. ROUTER TRANSFORMATION 42 1. WHAT IS THE DIFFERENCE BETWEEN ROUTER AND FILTER? 42 2. WHAT IS THE MINIMUM NUMBER OF GROUPS WE CAN DECLARE IN A ROUTER TRANSFORMATION? 42 3. SCENARIO IMPLEMENTATION 1 42 4. SCENARIO IMPLEMENTATION 2 43 5. SCENARIO IMPLEMENTATION 3 44 9. SEQUENCE GENERATOR TRANSFORMATION 45 1. WHAT IS A SEQUENCE GENERATOR TRANSFORMATION? 45 2. DEFINE THE PROPERTIES AVAILABLE IN SEQUENCE GENERATOR TRANSFORMATION IN BRIEF. 45 3. SCENARIO IMPLEMENTATION 1 46 4. SCENARIO IMPLEMENTATION 2 46 5. WHAT ARE THE CHANGES WE OBSERVE WHEN WE PROMOTE A NON-REUSABLE SEQUENCE GENERATOR TO A REUSABLE ONE? AND WHAT HAPPENS IF WE SET THE NUMBER OF CACHED VALUES TO 0 FOR A REUSABLE TRANSFORMATION? 47 6. HOW SEQUENCE GENERATOR IN THE MAPPING IS HANDLED WHEN WE MIGRATE THE MAPPING FROM ONE ENVIRONMENT TO ANOTHER? 47 7. SCENARIO IMPLEMENTATION 3 48 8. HOW DO I GET A SEQUENCE GENERATOR TO "PICK UP" WHERE ANOTHER "LEFT OFF"? 48 10. STORED PROCEDURE TRANSFORMATION 49 1. WHAT IS A STORED PROCEDURE TRANSFORMATION? 49 2. HOW MANY TYPES OF STORED PROCEDURE TRANSFORMATION ARE THERE? 49 3. HOW DO WE CALL AN UNCONNECTED STORED PROCEDURE TRANSFORMATION? 49 4. HOW DO WE SET THE EXECUTION ORDER OF PRE-POST LOAD STORED PROCEDURE? 49 5. HOW DO WE SET THE CALL TEXT FOR STORED PROCEDURE TRANSFORMATION? 49 6. HOW DO WE RECEIVE OUTPUT/RETURN PARAMETERS FROM UNCONNECTED STORED PROCEDURE? 50 11. SORTER TRANSFORMATION 51 1. WHAT IS A SORTER TRANSFORMATION? 51 2. WHY IS SORTER AN ACTIVE TRANSFORMATION? 51 3. HOW DOES SORTER HANDLE CASE SENSITIVE SORTING? 51 D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 7 4. HOW DOES SORTER HANDLE NULL VALUES? 51 5. HOW DOES A SORTER CACHE WORKS? 51 6. HOW TO DELETE DUPLICATE RECORDS OR RATHER TO SELECT DISTINCT ROWS FOR FLAT FILE SOURCES? 52 12. UNION TRANSFORMATION 53 1. WHAT IS A UNION TRANSFORMATION? 53 2. WHAT ARE THE RESTRICTIONS OF UNION TRANSFORMATION? 53 3. HOW COME UNION TRANSFORMATION IS ACTIVE? 53 13. UPDATE STRATEGY TRANSFORMATION 54 1. WHAT IS UPDATE STRATEGY TRANSFORM? 54 2. WHAT ARE UPDATE STRATEGY CONSTANTS? 54 3. HOW CAN WE UPDATE A RECORD IN TARGET TABLE WITHOUT USING UPDATE STRATEGY? 54 4. WHAT IS DATA DRIVEN? 54 5. WHAT HAPPENS WHEN DD_UPDATE IS DEFINED IN UPDATE STRATEGY AND TREAT SOURCE ROWS AS INSERT IS SELECTED IN SESSION? 55 6. WHAT ARE THE THREE AREAS WHERE THE ROWS CAN BE FLAGGED FOR PARTICULAR TREATMENT? 55 7. BY DEFAULT OPERATION CODE FOR ANY ROW IN INFORMATICA WITHOUT BEING ALTERED IS INSERT. THEN STATE WHEN DO WE NEED DD_INSERT? 55 8. WHAT IS THE DIFFERENCE BETWEEN UPDATE STRATEGY AND FOLLOWING UPDATE OPTIONS IN TARGET? 55 9. WHAT IS THE USE OF FORWARD REJECT ROWS IN MAPPING? 56 10. SCENARIO IMPLEMENTATION 1 56 14. JAVA TRANSFORMATION 57 1. SCENARIO IMPLEMENTATION 1 57 2. SCENARIO IMPLEMENTATION 2 57 15. SOURCE QUALIFIER TRANSFORMATION 59 1. WHAT IS A SOURCE QUALIFIER? WHAT ARE THE TASKS WE CAN PERFORM USING A SOURCE QUALIFIER AND WHY IT IS AN ACTIVE TRANSFORMATION? 59 2. WHAT HAPPENS TO A MAPPING IF WE ALTER THE DATA TYPES BETWEEN SOURCE AND ITS CORRESPONDING SOURCE QUALIFIER? 59 3. SUPPOSE WE HAVE USED THE SELECT DISTINCT AND THE NUMBER OF SORTED PORTS PROPERTY IN THE SOURCE QUALIFIER AND THEN WE ADD CUSTOM SQL QUERY. EXPLAIN WHAT WILL HAPPEN. 59 4. DESCRIBE THE SITUATIONS WHERE WE WILL USE THE SOURCE FILTER, SELECT DISTINCT AND NUMBER OF SORTED PORTS PROPERTIES OF SOURCE QUALIFIER TRANSFORMATION. 60 5. WHAT WILL HAPPEN IF THE SELECT LIST COLUMNS IN THE CUSTOM OVERRIDE SQL QUERY AND THE OUTPUT PORTS ORDER IN SOURCE QUALIFIER TRANSFORMATION DO NOT MATCH? 60 6. WHAT HAPPENS IF IN THE SOURCE FILTER PROPERTY OF SQ TRANSFORMATION WE INCLUDE KEYWORD WHERE SAY, WHERE CUSTOMERS.CUSTOMER_ID > 1000. 60 7. DESCRIBE THE SCENARIOS WHERE WE GO FOR JOINER TRANSFORMATION INSTEAD OF SOURCE QUALIFIER TRANSFORMATION. 60 8. WHAT IS THE MAXIMUM NUMBER WE CAN USE IN NUMBER OF SORTED PORTS FOR SYBASE SOURCE SYSTEM? 61 9. WHAT IS USE OF SOURCE QUALIFIER IN INFORMATICA? CAN WE CREATE A MAPPING WITHOUT A SOURCE QUALIFIER? 61 10. SUPPOSE WE HAVE TWO TABLES OF SAME DATABASE TYPE, RESIDING IN DIFFERENT DATABASE INSTANCE. IF A DATABASE LINK IS AVAILABLE, HOW CAN WE JOIN THE TWO TABLES USING A SOURCE QUALIFIER IN INFORMATICA PROVIDED THERE ARE VALID JOIN COLUMNS. 61 11. WHAT IS THE MEANING OF “OUTPUT IS DETERMINISTIC” PROPERTY IN SOURCE QUALIFIER TRANSFORMATION? 61 D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 8 12. SCENARIO IMPLEMENTATION 1 62 16. MISCELLANEOUS 63 1. WHAT ARE THE NEW FEATURES OF INFORMATICA 9.X IN DEVELOPER LEVEL? 63 2. NAME THE TRANSFORMATIONS WHICH CONVERTS ONE TO MANY ROWS I.E. INCREASES THE I/P: O/P ROW COUNT. ALSO WHAT IS THE NAME OF ITS REVERSE TRANSFORMATION? 63 3. HOW MANY WAYS WE CAN FILTER RECORDS? 63 4. WHAT ARE THE TRANSFORMATIONS THAT USE CACHE FOR PERFORMANCE? 63 5. WHAT IS THE FORMULA FOR CALCULATION OF LOOKUP/RANK/AGGREGATOR INDEX & DATA CACHES? 64 6. WHAT IS THE DIFFERENCE BETWEEN INFORMATICA POWERCENTER AND EXCHANGE AND MART? 64 7. HOW DO WE HANDLE DELIMITER CHARACTER AS A PART OF THE DATA IN A DELIMITED SOURCE FILE? 65 8. WE HAVE JUST RECEIVED SOURCE FILES FROM UNIX. WE WANT TO STAGE THAT DATA TO ETL PROCESS. WHAT ARE THE POINTS WE NEED TO LOOK FOR? 65 9. WHAT IS THE DIFFERENCE BETWEEN JOINER AND LOOKUP. PERFORMANCE WISE WHICH ONE IS BETTER TO USE. 65 10. WHAT IS THE B2B IN INFORMATICA? HOW CAN WE USE IT IN INFORMATICA? 66 11. WHAT IS CDC, SCD AND MD5 IN INFORMATICA? 66 12. HOW CAN WE IMPLEMENT AN SCD TYPE2 MAPPING WITHOUT USING A LOOKUP TRANSFORMATION? 67 13. HOW DOES JOINER AND LOOKUP TRANSFORMATION TREAT NULL VALUE MATCHING? 67 14. DOES MICROSOFT SQL SERVER SUPPORTS BULK LOADING? IF YES, WHAT HAPPENS WHEN YOU SPECIFY BULK MODE AND DATA DRIVEN FOR SQL SERVER TARGET 67 15. HOW CAN YOU UTILIZE COM COMPONENTS IN INFORMATICA? 67 16. WHAT IS SQL TRANSFORMATION IN INFORMATICA? 67 17. WHAT IS A XML SOURCE QUALIFIER? 68 18. WHAT IS THE “METADATA EXTENSIONS” TAB IN INFORMATICA? 68 19. DESCRIBE SOME OF THE ETL BEST PRACTICES 69 20. IS THERE A SCOPE OF CLOUD COMPUTING IN DATA WAREHOUSING TECHNOLOGY? 69 17. MAPPING 71 1. SCENARIO IMPLEMENTATION 1 71 2. WHAT ARE MAPPING PARAMETERS AND VARIABLES? 71 4. WHAT ARE THE DEFAULT VALUES FOR VARIABLES? 72 5. WHAT DOES FIRST COLUMN OF BAD FILE (REJECTED ROWS) INDICATES? 72 6. OUT OF 100000 SOURCE ROWS SOME ROWS GET DISCARD AT TARGET, HOW WILL YOU TRACE THEM AND WHERE IT GETS LOADED? 72 7. WHAT IS REJECT LOADING? 72 8. WHY INFORMATICA WRITER THREAD MAY REJECT A RECORD? 74 9. WHY TARGET DATABASE CAN REJECT A RECORD? 74 10. DESCRIBE VARIOUS STEPS FOR LOADING REJECT FILE? 74 11. VARIABLE V1 HAS VALUES SET AS 5 IN DESIGNER (DEFAULT), 10 IN PARAMETER FILE, AND 15 IN REPOSITORY. WHILE RUNNING SESSION WHICH VALUE INFORMATICA WILL READ? 74 12. WHAT ARE SHORTCUTS? WHERE IT CAN BE USED? WHAT ARE THE ADVANTAGES? 74 13. CAN WE HAVE AN INFORMATICA MAPPING WITH TWO PIPELINES, WHERE ONE FLOW IS HAVING A TRANSACTION CONTROL TRANSFORMATION AND ANOTHER NOT. EXPLAIN WHY? 75 14. HOW CAN WE IMPLEMENT REVERSE PIVOTING USING INFORMATICA TRANSFORMATIONS? 75 15. IS IT POSSIBLE TO UPDATE A TARGET TABLE WITHOUT ANY KEY COLUMN IN TARGET? 75 18. MAPPLET 77 D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 9 1. WHAT IS A MAPPLET? 77 2. WHAT IS THE DIFFERENCE BETWEEN REUSABLE TRANSFORMATION AND MAPPLET? 77 3. WHAT ARE THE TRANSFORMATIONS THAT ARE NOT SUPPORTED IN MAPPLET? 77 4. IS IT POSSIBLE TO CONVERT REUSABLE TRANSFORMATION TO A NON-REUSABLE ONE? 77 5. WHAT IS THE USE OF MAPPLET & WORKLET IN PROJECT? 78 6. IS IT POSSIBLE TO HAVE A MAPPLET WITHIN A MAPPLET AND WORKLET WITHIN A WORKLET? 78 19. SESSION 79 1. WHAT IS SESSION AND BATCHES? 79 2. WHAT ARE VARIOUS SESSION TRACING LEVELS? 79 3. CAN WE COPY A SESSION TO NEW FOLDER OR NEW REPOSITORY? 79 4. IS IT POSSIBLE TO STORE ALL THE INFORMATICA SESSION LOG INFORMATION IN A DATABASE TABLE? NORMALLY THE SESSION LOG IS STORED AS A BINARY COMPRESSION .BIN FILE IN SESSLOGS DIRECTORY. CAN WE STORE THE SAME INFORMATION IN DATABASE TABLES FOR FUTURE ANALYSIS? 79 5. CAN WE CALL A SHELL SCRIPT FROM SESSION PROPERTIES? 80 6. CAN WE CHANGE THE SOURCE AND TARGET TABLE NAMES IN SESSION LEVEL? 81 7. HOW TO WRITE FLAT FILE COLUMN NAMES IN TARGET? 81 8. WHAT ARE THE ERROR TABLES PRESENT IN INFORMATICA? 81 9. WHAT ARE THE ALTERNATE WAYS TO STOP A SESSION WITHOUT USING “STOP ON ERRORS” OPTION SET TO 1 IN SESSION PROPERTIES? 81 10. SUPPOSE A SESSION FAILS AFTER LOADING OF 10,000 RECORDS IN THE TARGET. HOW CAN WE LOAD THE RECORDS FROM 10,001 WHEN WE RUN THE SESSION NEXT TIME? 82 11. DEFINE THE TYPES OF COMMIT INTERVALS APART FROM USER DEFINED? 82 12. SUPPOSE SESSION IS CONFIGURED WITH COMMIT INTERVAL OF 10,000 ROWS AND SOURCE HAS 50,000 ROWS EXPLAIN THE COMMIT POINTS FOR SOURCE BASED COMMIT & TARGET BASED COMMIT. ASSUME APPROPRIATE VALUE WHEREVER REQUIRED? 82 13. HOW TO CAPTURE PERFORMANCE STATISTICS OF INDIVIDUAL TRANSFORMATION IN THE MAPPING AND EXPLAIN SOME IMPORTANT STATISTICS THAT CAN BE CAPTURED? 83 14. HOW CAN WE PARAMETERIZE SUCCESS OR FAILURE EMAIL LIST? 83 15. IS IT POSSIBLE THAT A SESSION FAILED BUT STILL THE WORKFLOW STATUS IS SHOWING SUCCESS? 83 16. WHAT IS BUSY PERCENTAGE? 83 17. CAN WE WRITE A PL/SQL BLOCK IN PRE AND POST SESSION OR IN TARGET QUERY OVERRIDE? 84 18. WHENEVER A SESSION RUNS DOES THE DATA GETS OVERWRITTEN IN A FLAT FILE TARGET? IS IT POSSIBLE TO KEEP THE EXISTING DATA AND ADD THE NEW DATA TO THE TARGET FILE? 84 19. CAN WE USE THE SAME SESSION TO LOAD A TARGET TABLE IN DIFFERENT DATABASES HAVING SAME TARGET DEFINITION? 84 20. HOW DO YOU REMOVE THE CACHE FILES AFTER THE TRANSFORMATION? 84 21. WHY DOESN'T A RUNNING SESSION QUIT WHEN ORACLE OR SYBASE RETURN FATAL ERRORS? 84 20. WORKFLOW 86 1. WHAT IS THE DIFFERENCE BETWEEN STOP AND ABORT OPTIONS IN WORKFLOW? 86 2. RUNNING INFORMATICA WORKFLOW CONTINUOUSLY – HOW TO RUN A WORKFLOW CONTINUOUSLY UNTIL A CERTAIN CONDITION IS MET? 86 3. HOW DO WE SEND EMAILS FROM INFORMATICA AFTER THE SUCCESSFUL COMPLETION OF ONE SESSION? THE EMAIL WILL CONTAIN THE JOB NAME/ SESSION START TIME AND SESSION END TIME IN THE MESSAGE BODY. 87 4. SCENARIO IMPLEMENTATION 1 87 5. HOW CAN WE SEND TWO SEPARATE EMAILS AFTER A SUCCESSFUL SESSION RUN? 87 6. WHAT IS COLD START IN INFORMATICA? 88 7. SCENARIO IMPLEMENTATION 2 88 D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 10 8. WE KNOW THERE ARE 3 OPTIONS FOR SESSION RECOVERY STRATEGY - RESTART TASK, FAIL TASK AND CONTINUE RUNNING THE WORKFLOW, RESUME FROM LAST CHECKPOINT WHENEVER A SESSION FAILS. HOW DO WE RESTART A WORKFLOW AUTOMATICALLY WITHOUT ANY MANUAL INTERVENTION IN THE EVENT OF SESSION FAILURE? 89 9. WHAT IS THE DIFFERENCE REAL-TIME AND CONTINUOUS WORKFLOWS? 89 11. SCENARIO IMPLEMENTATION 3 89 12. HOW DO WE SEND A SESSION FAILURE MAIL WITH THE WORKFLOW OR SESSION LOG AS ATTACHMENT? 90 13. EXPLAIN DEADLOCK IN INFORMATICA AND HOW DO WE RESOLVE IT? 90 14. SCENARIO IMPLEMENTATION 4 90 15. HOW CAN WE PASS A VALUE FROM ONE WORKFLOW TO ANOTHER? 91 21. ADMINISTRATION 92 1. WHAT IS LOAD MANAGER? 92 2. WHAT IS DTM PROCESS? HOW MANY THREADS IT CREATES TO PROCESS DATA, EXPLAIN EACH THREAD IN BRIEF? 92 3. CAN YOU CREATE A FOLDER WITHIN DESIGNER? 92 4. HOW DO YOU TAKE CARE OF SECURITY USING A REPOSITORY MANAGER? 93 5. WHAT ARE THE DIFFERENT USES OF A REPOSITORY MANAGER? 93 6. WHAT ARE 2 MODES OF DATA MOVEMENT IN INFORMATICA SERVER? 93 7. WHAT IS CODE PAGE USED FOR? 93 8. WHAT IS CODE PAGE COMPATIBILITY? 94 9. WHAT IS DEFAULT BLOCK BUFFER SIZE? 94 10. WHAT IS DEFAULT LM SHARED MEMORY SIZE? 94 11. DEFINE SERVER CONCEPTS WITH RESPECT TO MEMORY BUFFERS 94 12. WHAT ARE THE TWO PROGRAMS THAT COMMUNICATE WITH THE INFORMATICA SERVER? 95 22. COMMAND LINE ARGUMENTS 96 1. WHAT IS PMCMD COMMANDS? 96 2. WHAT IS PMREP COMMANDS? 96 3. HOW DO WE START & STOP SESSION FROM PMCMD COMMAND LINE? 96 23. METADATA REPOSITORY 97 1. IS THERE ANY METADATA QUERY TO FIND THE LIST OF INFORMATICA FOLDER NAME, WORKFLOW NAMES WHICH ARE MIGRATED IN A PARTICULAR QUARTER? 97 3. WRITE A METADATA QUERY TO IDENTIFY THE SESSIONS HAVING TRUNCATE OPTION ENABLED 97 4. WHERE CAN I FIND A HISTORY / METRICS OF THE LOAD SESSIONS THAT HAVE OCCURRED IN INFORMATICA? 97 5. HOW TO EXTRACT THE WORKFLOW MONITOR RECORD INFORMATION FROM INFORMATICA METADATA REPOSITORY? 98 24. REPOSITORY MANAGER 100 1. DESCRIBE THE STEPS FOR EXPORT AND IMPORT? 100 2. WHAT ARE THE VARIOUS METHODS OF CODE MIGRATION OR WHICH IS THE BEST WAY OF DEPLOYMENT? 100 3. WHAT ARE THE VARIOUS OPTIONS FOR ETL CODE MIGRATION 101 4. WHAT IS LABELING IN INFORMATICA? 101 5. SUPPOSE HAVING INFORMATICA VERSION CONTROL IN PLACE, CAN WE REVERT BACK AN OBJECT TO A STATE OF TWO PREVIOUS VERSION. 102 6. WHAT DO WE MEAN BY TEAM BASED DEVELOPMENT IN INFORMATICA? 102 25. SCENARIO QUESTIONS 104 D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 11 1. SUPPOSE WE HAVE TEN SOURCE FLAT FILES OF SAME STRUCTURE. HOW CAN WE LOAD ALL THE FILES IN TARGET DATABASE IN A SINGLE BATCH RUN USING A SINGLE MAPPING? 104 2. SUPPOSE WE HAVE TWO SOURCE QUALIFIER TRANSFORMATIONS SQ1 AND SQ2 CONNECTED TO TARGET TABLES TGT1 AND TGT2 RESPECTIVELY. HOW DO YOU ENSURE TGT2 IS LOADED AFTER TGT1? 104 3. SUPPOSE WE HAVE A SOURCE QUALIFIER TRANSFORMATION THAT POPULATES TWO TARGET TABLES. HOW DO YOU ENSURE TGT2 IS LOADED AFTER TGT1? 106 4. SUPPOSE WE HAVE THE EMP TABLE AS OUR SOURCE. IN THE TARGET WE WANT TO VIEW THOSE EMPLOYEES WHOSE SALARY ARE GREATER THAN OR EQUAL TO THE AVERAGE SALARY FOR THEIR DEPARTMENTS. DESCRIBE YOUR MAPPING APPROACH. 106 5. HOW CAN WE PERFORM CHANGED DATA CAPTURE BASED ON LOAD SEQUENCE NUMBER (INTEGER) COLUMN PRESENT IN THE SOURCE TABLE? 110 6. SCENARIO IMPLEMENTATION 1 111 7. HOW CAN WE LOAD ‘X’ RECORDS (USER DEFINED RECORD NUMBERS) OUT OF ‘N’ RECORDS FROM SOURCE DYNAMICALLY, WITHOUT USING FILTER AND SEQUENCE GENERATOR TRANSFORMATION? 112 8. SUPPOSE WE HAVE ‘N’ NUMBER OF ROWS IN THE SOURCE AND WE HAVE TWO TARGET TABLES. HOW CAN WE LOAD ‘N/2’ I.E. FIRST HALF THE SOURCE DATA INTO ONE TARGET AND THE REMAINING HALF INTO THE NEXT TARGET? 112 9. SUPPOSE WE HAVE A FLAT FILE WHICH HAS A HEADER RECORD WITH ‘FILE CREATION DATE’, AND DETAILED DATA RECORDS. DESCRIBE THE APPROACH TO LOAD THE 'FILE CREATION DATE' COLUMN ALONG WITH EACH AND EVERY DETAILED RECORD. 113 10. SCENARIO IMPLEMENTATION 2 113 11. SUPPOSE WE HAVE A FLAT FILE WHICH CONTAINS JUST A NUMERIC VALUE. WE NEED TO POPULATE THIS VALUE IN ONE COLUMN OF THE TARGET TABLE FOR EVERY SOURCE RECORD. HOW CAN WE ACHIEVE THIS? 113 12. HOW WILL YOU LOAD A SOURCE FLAT FILE INTO A STAGING TABLE WHEN THE FILE NAME IS NOT FIXED? THE FILE NAME IS LIKE SALES_2013_02_22.TXT, I.E. DATE IS APPENDED AT THE END OF THE FILE AS A PART OF FILE NAME. 114 13. SOLVE THE BELOW SCENARIO USING INFORMATICA AND DATABASE SQL. 114 14. SUPPOSE WE HAVE A COLUMN IN SOURCE WITH VALUES AS BELOW: 115 15. CAN WE PASS THE VALUE OF A MAPPING VARIABLE BETWEEN 2 PIPELINES UNDER THE SAME MAPPING? IF NOT HOW CAN WE ACHIEVE THIS? 116 16. SCENARIO IMPLEMENTATION 3 116 17. SCENARIO IMPLEMENTATION 4 117 18. IMPLEMENT SLOWLY CHANGING DIMENSION OF TYPE 2 WHICH WILL LOAD CURRENT RECORD IN CURRENT TABLE AND OLD DATA IN LOG TABLE. 118 26. PERFORMANCE TUNING 119 1. WHICH ONE IS FASTER CONNECTED OR UNCONNECTED LOOKUP? 119 2. HOW WE CAN IMPROVE PERFORMANCE OF INFORMATICA NORMALIZATION TRANSFORMATION. 119 3. HOW TO IMPROVE THE SESSION PERFORMANCE? 119 4. HOW DO YOU IDENTIFY THE BOTTLENECKS IN MAPPINGS? 120 5. HOW DO YOU HANDLE PERFORMANCE ISSUES IN INFORMATICA? WHERE CAN YOU MONITOR THE PERFORMANCE? 121 6. WHAT ARE PERFORMANCE COUNTERS? 122 7. HOW CAN WE INCREASE SESSION PERFORMANCE? 122 8. SCENARIO IMPLEMENTATION 1 124 D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 12 Topic Matrix: Serial Number Topics Questions 1 Aggregator 17 2 Expression 10 3 Filter 2 4 Joiner 12 5 Lookup 20 6 Normalizer 4 7 Rank 12 8 Router 5 9 Sequence Generator 8 10 Stored Procedure 6 11 Sorter 6 12 Union 3 13 Update Strategy 10 14 Java 2 15 Source Qualifier 12 16 Miscellaneous 20 17 Mapping 12 18 Mapplet 6 19 Session 22 20 Workflow 15 21 Administration 12 22 Command Line Arguments 3 23 Metadata Repository 5 24 Repository Manager 6 25 Scenario Questions 18 26 Performance Tuning 8 D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 13 1. Aggregator Transformation 1. What is an Aggregator Transformation? Answer: An aggregator is an Active, Connected transformation which performs aggregate calculations like AVG, COUNT, FIRST, LAST, MAX, MEDIAN, MIN, PERCENTILE, STDDEV, SUM and VARIANCE. 2. How an Expression Transformation differs from Aggregator Transformation? Answer: An Expression Transformation performs calculation on a row-by-row basis, whereas an Aggregator Trans- formation performs calculations on groups. 3. Does an Aggregator Transformation support only aggregate expressions? Answer: Apart from aggregate expressions, aggregator transformation supports non-aggregate expressions and conditional clauses. 4. Give one example for each of Conditional Aggregation, Non-Aggregate expression and Nested Aggregation. Answer:  Use conditional clauses in the aggregate expression to reduce the number of rows used in the aggregation. The conditional clause can be any clause that evaluates to TRUE or FALSE.  SUM (SALARY, JOB = ‘CLERK’)  Use non-aggregate expressions in group by ports to modify or replace groups.  IIF (PRODUCT = ‘Brown Bread’, ‘Bread’, PRODUCT)  Nested aggregation expression can include one aggregate function within another aggregate function.  MAX (COUNT (PRODUCT)) 5. How does Aggregator Transformation handle NULL values? Answer: D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 14 By default, the aggregator transformation treats null values as NULL in aggregate functions. But we can specify to treat null values in aggregate functions as NULL or zero. 6. What are the performance considerations when working with Aggregator Transfor- mation? Answer:  Filter the unnecessary data before aggregating it. Place a Filter transformation in the mapping before the aggregator transformation to reduce unnecessary aggregation.  Improve performance by connecting only the necessary input/output ports to subsequent transformations, thereby reducing the size of the data cache.  Use Sorted input which reduces the amount of data cached and improves session performance. Aggregator performance improves dramatically if records are sorted before passing to the aggregator and “Sorted Input” option under aggregator properties is checked. The record set should be sorted on those columns that are used in Group By operation. It is often a good idea to sort the record set in database level (click here to see why?) e.g. inside a source qualifier transformation, unless there is a chance that already sorted records from source qualifier can again become unsorted before reaching aggregator. 7. What are the uses of index and data cache? Answer: The group data is stored in index files whereas Row data stored in data files. 8. What differs when we choose Sorted Input for Aggregator Transformation? Answer: Integration Service creates the index and data caches files in memory to process the Aggregator transformation. If the Integration Service requires more space as allocated for the index and data cache sizes in the transformation properties, it stores overflow values in cache files i.e. paging to disk. One way to increase session performance is to increase the index and data cache sizes in the transformation properties. But when we check Sorted Input the Integration Service uses memory to process an Aggregator transformation it does not use cache files. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 15 9. Under what conditions selecting Sorted Input in aggregator will still not boost session performance? Answer:  Incremental Aggregation, session option is enabled.  The aggregate expression contains nested aggregate functions.  When session property, Treat Source rows as is set to data driven. 10. Under what condition selecting Sorted Input in aggregator may fail the session? Answer:  If the input data is not sorted correctly, the session will fail.  Also if the input data is properly sorted, the session may fail if the sort order by ports and the group by ports of the aggregator are not in the same order. 11. Suppose we do not group by on any ports of the aggregator what will be the output. Answer: If we do not use an input port in group-by neither in aggregate expression, the Integration Ser- vice will return only the last row value of the column for the input rows. For example, if we have 100 rows coming from source then aggregator will output only the last record (100 th record) 12. What is the expected value if the column in an aggregator transformation is neither a group by nor an aggregate expression? Answer: Integration Service produces one row for each group based on the group by ports. The columns which are neither part of the key nor aggregate expression will return the corresponding value of last record of the group received. However, if we specify particularly the FIRST function, the Integration Service then returns the value of the specified first row of the group. So default is the LAST function. 13. What is Incremental Aggregation? D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 16 Answer: We can enable the session option, Incremental Aggregation for a session that includes an Aggregator Trans- formation. When the Integration Service performs incremental aggregation, it actually passes changed source data through the mapping and uses the historical cache data to perform aggregate calculations incrementally. 14. Sorted input for aggregator transformation will improve performance of mapping. How- ever, if sorted input is used for nested aggregate expression or incremental aggregation, then the mapping may result in session failure. Explain why? Answer: In case of a nested aggregation, there are multiple levels of sorting associated as each aggregation function will require one sorting pass, and after the first level of aggregation, the sort order of the group by column may get jumbled up, so before the second level of aggregation, Informatica must internally sort it again. However, if we already indicate that input is sorted, Informatica will not do this sorting - resulting into failure. In incremental aggregation, the aggregate calculations are stored in historical cache on the server. In this historical cache the data may not be in sorted order. If we give sorted input, the records come as presorted for that particular run but in the historical cache the data may not be in the sorted order. 15. How can we delete duplicate record using Informatica Aggregator? Answer: One way to handle duplicate records in source batch run is to use an Aggregator Transformation and using the Group By checkbox on the ports having duplicate occurring data. Here you can have the flexibility to select the last or the first of the duplicate column value records. 16. Scenario Implementation 1 Suppose in our Source Table we have data as given below: Student Name Subject Name Marks Sam Maths 100 Tom Maths 80 Sam Physical Science 80 John Maths 75 Sam Life Science 70 John Life Science 100 John Physical Science 85 Tom Life Science 100 Tom Physical Science 85 D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 17 We want to load our Target Table as: Student Name Maths Life Science Physical Science Sam 100 70 80 John 75 100 85 Tom 80 100 85 Describe your approach. Answer: Here our scenario is to convert many rows to one row, and the transformation which will help us to achieve this is Aggregator. Our Mapping will look like this: We will sort the source data based on STUDENT_NAME ascending followed by SUBJECT ascending. Now based on STUDENT_NAME in GROUP BY clause the following output subject columns are populated as  MATHS: MAX( MARKS, SUBJECT = ’Maths’ )  LIFE_SC: MAX( MARKS, SUBJECT = ’Life Science’ )  PHY_SC: MAX( MARKS, SUBJECT = ’Physical Science’ ) D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 18 17. Scenario Implementation 2 Source: 100 XYZ AAA 100 XYZ BBB 100 XYZ CCC The expected output data: 100 XYZ AAA BBB CCC Which transformations are used for this? Answer: Use an Aggregator transformation with variable. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 19 2. Expression Transformation 1. What is an Expression Transform? Answer: Expression is a Passive connected transformation used to calculate values in a single row before you write to the target. We can use the Expression transformation to perform any non-aggregate calculations. We can also use the Expression transformation to test conditional statements before you output the results to target tables or other transformations. For example, we might need to adjust employee salaries, concatenate first and last names, or convert strings to numbers. 2. How many types of ports are there in Expression transform? Answer: There are three types of ports- INPUT, OUTPUT, and VARIABLE 3. What is the execution order of the ports in an expression? Answer:  All ports are executed TOP TO BOTTOM in a serial physical ordering fashion, but they are done in the following groups:  All input ports are pushed values first.  Then all variables are executed (top to bottom physical ordering in the expression).  Last - all output expressions are executed to push values to output ports You can utilize this to your advantage, by placing lookups in to variables, then using the variables "later" in the execution cycle. 4. Describe the approach for the requirement. Suppose the input is: Col1 Col2 10 a 20 b 30 c 40 D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 20 50 d The desired output is: Col1 Col2 10 a 20 a,b 30 a,b,c 40 a,b,c 50 a,b,c,d Answer: Use an Expression transformation:- Port Name Port Type Expression Col1 I/O Col2 I V_Seq V CUME(1) V_Col2 V IIF (V_Seq = 1, Col2, IIF ( ISNULL (Col2), Prev_Col2, Prev_Col2 || ',' || Col2)) Prev_Col2 V V_Col2 Out_Col2 O Prev_Col2 Keep in mind the string length of the variable and output ports. CUME function is used to calculate the cumulative amount based on the argument of the cumulative function. This means, if we call CUME with argument 1, e.g. CUME(1); then on the first call it will return 1; on the second call, it will return 2; on the third call, it will return 3 and so on. Since Informatica process data row by row, this means that when the first row is processed CUME(1) will return 1; for the next row, it will return 2 and so on. 5. How can we implement aggregation operation without using an Aggregator Transfor- mation in Informatica? Answer: We will use the very basic concept of the Expression Transformation, that at a time we can access the previous row data as well as the currently processed data in an expression transformation. What we need is simple Sorter, Expression and Filter transformation to achieve aggregation at Informatica level. For detailed understanding visit Aggregation without Aggregator. 6. Scenario Implementation 1 Source Col1 Col2 A W B R C E A R D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 21 B E Target Col1 READ WRITE EXECUTE A 1 1 0 B 1 0 1 C 0 0 1 In this scenario Source values in Col2 W, R, E means read write and execute. Answer: Take an Expression transformation followed by Aggregator transformation. In Expression Transformation: Port Name Port Type Expression Col1 I/O Col2 I/O Read O IIF ( Col2 = 'R', 1, 0 ) Write O IIF ( Col2 = 'W', 1, 0 ) Execute O IIF ( Col2 = 'E', 1, 0 ) In Aggregator Transform: Col 1 I/O GROUP BY Read I/O MAX (Read) Write I/O MAX (Write) Execute I/O MAX (Execute) 7. Scenario Implementation 2 Source data is like below: Id name1 name2 10 A B 10 C D 20 E F Desired Target data is like below Id name 10 AB 10 CD 20 EF D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 22 Answer: Use Expression Transformation to concatenate both values as- name = name1 || name2 8. Scenario Implementation 3 Suppose we have a field in source file named as DATA. We need to mark those records having 9 characters such that the first 2 characters must be alphabets i.e.(A-Z) and the rest 7 characters must be alphanumeric i.e.(A-Z) or (0-9) for the DATA field as output. And the records which don’t match the condition should be marked as “Invalid”. How do we implement this? E.g. DATA OUTPUT AB345GH6756 AB345GH67 CD56789PJ CD56789PJ 56CHJK97889 Invalid DG//*67DF Invalid Answer: Use the below logic in an output port of an Expression Transformation in Informatica:- IIF( REG_MATCH( SUBSTR(DATA,1,2), '[[:alpha:]]{2}' ) = 1 ANDREG_MATCH( SUBSTR(DATA,3,7), '[[:alnum:]]{7}' ) = 1, SUBSTR(DATA, 1, 9), 'Invalid' ) 9. Scenario Implementation 4 How do we convert a Date field coming as data type string from a flat file? Answer: Use Date Conversion Functions:- IIF( IS_DATE( Column1 ) = 1, TO_DATE( Column1 , 'YYYYMMDD' ), NULL ) In the above example, we have assumed the format of the date field is ‘YYYYMMDD’. If the format is something else (e.g. YYYY-MM-DD), we need to specify the same 10. Scenario Implementation 5 Source: Col1 Col2 1 B 2 C D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 23 3 D 4 E Target Col1 Col2 Col3 Col4 1 B 2 C 3 D 4 E Describe the approach to the above scenario where the source 1st record loaded to target col1,col2 then 2nd record loaded to col3,col4 again 3rd record to col1,col2 and so on. Answer: Use an Expression transformation: Port Name Port Type Expression Col1 I Col2 I V_ID V 1 – MOD (Col1, 2) O_ID O V_ID O_Col1 O V_Col1 O_Col2 O V_Col2 O_Col3 O Col1 O_Col4 O Col2 V_Col1 V Col1 V_Col2 V Col2 Next use a Filter transformation with condition O_ID = 1 Next map O_Col1, O_Col2, O_Col3, O_Col4 to Col1, Col2, Col3, Col4 of the target respectively. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 24 3. Filter Transformation 1. What is a Filter Transformation and why it is an Active one? Answer: A Filter transformation is an Active and Connected transformation that can filter rows in a mapping. Only the rows that meet the Filter Condition pass through the Filter transformation to the next transformation in the pipeline. TRUE and FALSE are the implicit return values from any filter condition we set. If the filter condition evaluates to NULL, the row is assumed to be FALSE. The numeric equivalent of FALSE is zero (0) and any non-zero value is the equivalent of TRUE. As an ACTIVE transformation, the Filter transformation may change the number of rows passed through it. A filter condition returns TRUE or FALSE for each row that passes through the transformation, depending on whether a row meets the specified condition. Only rows that return TRUE pass through this transformation. Discarded rows do not appear in the session log or reject files. 2. What is the difference between Source Qualifier transformations Source filter option and filter transformation? Answer: SQ Source Filter Filter Transformation Source Qualifier transformation filters rows when read from a source. Filter transformation filters rows from within a mapping Source Qualifier transformation can only filter rows from relational sources. Filter transformation filters rows coming from any type of source system in the mapping level. Source Qualifier limits the row set extracted from a source. Filter transformation limits the row set sent to a target. Source Qualifier reduces the number of rows used throughout the mapping and hence it provides better performance. To maximize session performance, include the Filter transformation as close to the sources in the mapping as possible to filter out unwanted data early in the flow of data from sources to targets. The filter condition in the Source Qualifier transformation only uses standard SQL as it runs in the database. Filter Transformation can define a condition using any statement or transformation function that returns either a TRUE or FALSE value. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 25 4. Joiner Transformation 1. What is a Joiner Transformation and why it is an Active one? Answer: A Joiner is an Active and Connected transformation used to join two source data streams coming from same or heterogeneous databases or files. The Joiner transformation joins sources with at least one matching column. The Joiner transformation uses a condition that matches one or more pairs of columns between the two sources. In the Joiner transformation, we must configure the transformation properties namely Join Condition, Join Type and optionally Sorted Input option to improve Integration Service performance. The join condition contains ports from both input sources that must match for the Integration Service to join two rows. Depending on the join condition and the type of join selected, the Integration Service either adds the row to the result set or discards the row. Because of this reason, the number of rows in Joiner output may not be equal to the number of rows in Joiner Input. This is why Joiner is considered an Active transformation. 2. State the limitations where we cannot use Joiner in the mapping pipeline. Answer: The Joiner transformation accepts input from most transformations. However, following are the limitations:  Joiner transformation cannot be used when either of the input pipelines contains an Update Strate- gy transformation.  Joiner transformation cannot be used if we connect a Sequence Generator transformation directly before the Joiner transformation. 3. Out of the two input pipelines of a joiner, which one will we set as the master pipeline? Answer: During a session run, the Integration Service compares each row of the master source against the detail source. The master and detail sources need to be configured for optimal performance. When the Integration Service processes an unsorted Joiner transformation, it blocks the detail source while it caches rows from the master source. Once the Integration Service finishes reading and caching all master rows, it unblocks the detail source and reads the detail rows. This is why if we have the source containing fewer input rows in master, the cache size will be smaller, thereby improving the performance. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 26 For a Sorted Joiner transformation, use the source with fewer duplicate key values as the master source for optimal performance and disk storage. When the Integration Service processes a sorted Joiner transformation, it caches rows for one hundred keys at a time. If the master source contains many rows with the same key value, the Integration Service must cache more rows, and performance can be slowed. Blocking logic is possible if master and detail input to the Joiner transformation originate from different sources. Otherwise, it does not use blocking logic. Instead, it stores more rows in the cache. 4. What are the different types of Joins available in Joiner Transformation? Answer: In SQL, a join is a relational operator that combines data from multiple tables into a single result set. The Joiner transformation is similar to an SQL join except that data can originate from different types of sources. The Joiner transformation supports the following types of joins:  Normal  Master Outer  Detail Outer  Full Outer A normal or master outer join performs faster than a full outer or detail outer join. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 27 5. Define the various Join Types of Joiner Transformation. Answer:  In a normal join, the Integration Service discards all rows of data from the master and detail source that do not match, based on the join condition.  A master outer join keeps all rows of data from the detail source and the matching rows from the master source. It discards the unmatched rows from the master source.  A detail outer join keeps all rows of data from the master source and the matching rows from the detail source. It discards the unmatched rows from the detail source.  A full outer join keeps all rows of data from both the master and detail sources. 6. Describe the impact of number of join conditions and join order in a Joiner. Answer: We can define one or more conditions based on equality between the specified master and detail sources. Both ports in a condition must have the same data type. If we need to use two ports in the join condition with non-matching data types we must convert the data types so that they match. The Designer validates data types in a join condition. Additional ports in the join condition, increases the time necessary to join two sources. The order of the ports in the join condition can impact the performance of the Joiner transformation. If we use multiple ports in the join condition, the Integration Service compares the ports in the order we specified. Only equality operator is available in joiner join condition. 7. How does Joiner transformation treat NULL value matching? Answer: The Joiner transformation does not match null values. For example, if both EMP_ID1 and EMP_ID2 contain a row with a null value, the Integration Service does not consider them a match and does not join the two rows. To join rows with null values, replace null input with default values in the Ports tab of the joiner, and then join on the default values. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 28 If a result set includes fields that do not contain data in either of the sources, the Joiner transformation populates the empty fields with null values. If we know that a field will return a NULL and we do not want to insert NULLs in the target, set a default value on the Ports tab for the corresponding port. 8. When we configure the join condition, what are the guidelines we need to follow to maintain the sort order? Suppose we configure Sorter transformations in the master and detail pipelines with the following sorted ports in order: ITEM_NO, ITEM_NAME and PRICE. Answer: If we have sorted both the master and detail pipelines in order of the ports say ITEM_NO, ITEM_NAME and PRICE we must ensure that:  Use ITEM_NO in the First Join Condition.  If we add a Second Join Condition, we must use ITEM_NAME.  If we want to use PRICE as a Join Condition apart from ITEM_NO, we must also use ITEM_NAME in the Second Join Condition.  If we skip ITEM_NAME and join on ITEM_NO and PRICE, we will lose the input sort order and the In- tegration Service fails the session. 9. What are the transformations that cannot be placed between the sort origin and the Join- er transformation so that we do not lose the input sort order? Answer: The best option is to place the Joiner transformation directly after the sort origin to maintain sorted data. However do not place any of the following transformations between the sort origin and the Joiner transformation:  Custom  Unsorted Aggregator  Normalizer  Rank  Union transformation  XML Parser transformation  XML Generator transformation  Mapplet [if it contains any one of the above mentioned transformations] 10. What is the use of sorted input in joiner transformation? Answer: It is recommended to Join sorted data when possible. We can improve session performance by configuring the Joiner transformation to use sorted input. When we configure the Joiner transformation to use sorted data, it improves performance by minimizing disk input and output. We see D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 29 great performance improvement when we work with large data sets. For an unsorted Joiner transformation, designate as the master source the source with fewer rows. For optimal performance and disk storage, designate the master source as the source with the fewer rows. During a session, the Joiner transformation compares each row of the master source against the detail source. The fewer unique rows in the master, the fewer iterations of the join comparison occur, which speeds the join process. 11. Can we join two tables based on a join column having different data type? For example table 1 EMPNO (string) and table 2 EMPNUM (number) Answer: Yes possible in this case. If we are using Joiner, we should be able to do this explicit conversion in an expression transformation before joining the tables. 12. Implementation Scenario1 - Joiner transformation is joining two tables s1 and s2. s1 has 10,000 rows and s2 has 1000 rows . Which table you will set master for better performance of joiner transformation? Why? Answer: Set table S2 as Master table because informatica server has to keep master table in the cache so if it is 1000 in cache will get performance instead of having 10000 rows in cache. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 30 5. Lookup Transformation 1. What is a Lookup transform? Answer: The transform is used to look up data in a flat file, relational table, views, or synonym. The informatica server queries the lookup table based on the lookup ports in the transformation. It compares lookup transformation port values to lookup table column values based on the lookup condition. The result is passed to other transformations and the target. Uses:  Get related value  Perform a calculation  Update slowly changing dimension tables. 2. What are the differences between Connected and Unconnected Lookup? Answer: The differences are illustrated in the below table: Connected Lookup Unconnected Lookup Connected lookup participates in dataflow and receives input directly from the pipeline Unconnected lookup receives input values from the result of a LKP: expression in another transformation Connected lookup can use both dynamic and static cache Unconnected Lookup cache can NOT be dynamic Connected lookup can return more than one column value ( output port ) Unconnected Lookup can return only one column value i.e. output port Connected lookup caches all lookup columns Unconnected lookup caches only the lookup output ports in the lookup conditions and the return port Supports user-defined default values (i.e. value to return when lookup conditions are not satisfied) Does not support user defined default values 3. What are the different lookup cache(s)? Answer: Informatica Lookups can be cached or un-cached (No cache). And Cached lookup can be either static or dynamic. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 31 A static cache is one which does not modify the cache once it is built and the data remains same during the session run. On the other hand, a dynamic cache is refreshed during the session run by inserting or updating the records in cache based on the incoming source data. By default, Informatica cache is static cache. A lookup cache can also be divided as persistent or non-persistent based on whether Informatica retains the cache even after the completion of session run or deletes it. 4. Is lookup an active or passive transformation? Answer: From Informatica 9x, Lookup transformation can be configured as an "Active" transformation. Find out How to configure lookup as active transformation. However, in the earlier versions of Informatica, lookup is a passive transformation. 5. What is the difference between Static and Dynamic Lookup Cache? Answer: We can configure a Lookup transformation to cache the underlying lookup table. In case of static or read- only lookup cache the Integration Service caches the lookup table at the beginning of the session and does not update the lookup cache while it processes the Lookup transformation. Rows are not added dynamically in the cache. In case of dynamic lookup cache the Integration Service dynamically inserts or updates data in the lookup cache and passes the data to the target. The dynamic cache is synchronized with the target. It basically, caches the rows as and when it is passed. In case you are wondering why we need to make lookup cache dynamic, read this article on dynamic lookup. 6. What are the uses of index and data caches? Answer: The conditions are stored in index cache and records from the lookup are stored in data cache 7. What is Persistent Lookup Cache? Answer: D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 32 If the cache generated for a Lookup needs to be preserved for subsequent use then persistent cache is used. It will not delete the index and data files. It is useful only if the lookup table remains constant. Lookups are cached by default in Informatica. Lookup cache can be either non-persistent or persistent. The Integration Service saves or deletes lookup cache files after a successful session run based on, whether the Lookup cache is checked as persistent or not. 8. What type of join does Lookup support? Answer: Lookup is just similar like SQL LEFT OUTER JOIN. 9. Explain how lookup transformation works like SQL Left Outer Join. Answer: Lookup means if the source input column value matches the lookup table comparison column value then it will Return valid values from the lookup table else it will return NULL. Let’s consider the EMP table as Source and DEPT table as lookup. We want to extract the location of each employee based on his or her department number. So if the Location details are not available in the DEPT table, still we want to have all the other information of the employee coming from the source EMP table, apart from NULL as location and load in our target table. So the equivalent SQL query looks like below:- SELECT EMP.*, DEPT.LOC FROM EMP LEFT OUTER JOIN DEPT ON EMP.DEPTNO = DEPT.DEPTNO Hence Lookup is associated with the Source table as Left Outer Join. 10. Where and why do we use Unconnected Lookup instead of Connected Lookup? Answer: The best part of unconnected lookup is that, we can call the lookup based on some condition and not every time. I.e. based on some condition met we can invoke the unconnected lookup in an expression transformation else not. By this we may optimize the performance of a flow. We may consider unconnected lookup as a function in any procedural language. It takes multiple parameters as input and returns one values, and can be used repeatedly. Same way unconnected lookup can be used in any scenario where we need to use the lookup repeatedly either in single or multiple transformation. With the unconnected lookup, we get the performance benefit of not caching the same data multiple times. Also it is a good coding practice. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 33 11. How can we Identify Persistent Cache Files in Informatica Server? Answer:  Cache files are generated in the Cache directory of the Informatica Server for transformations like Aggregator, Joiner, Lookup, Rank & Sorter.  Two types of cache files are generated i.e. the data and index files exception being Sorter transformation.  Most Important point is that Informatica automatically deletes all the generated .dat and .idx cache files after a session run is finished.  So the files that are present in the Cache directory are basically the Persistent Cache files of Lookup transformation, Aggregator Cache files of Incremental Aggregation sessions or if the session run was not successfully completed.  Informatica generated cache files are named as: PMAGG*.idx, PMAGG*.dat, PMJNR*.idx, PMJNR*.dat, PMLKP*.idx, PMLKP*.dat.  Often while handling big data cache Informatica creates multiple index and data files due to paging and appends a number to the end of the files e.g. PMAGG*.dat0, PMAGG*.idx0, PMAGG*.dat1, PMAGG*.idx1. So if we have followed any particular naming convention for Lookup Persistent Cache Name e.g. ta- ble_name_PC or the table names have a convention like GDW_ then use shell commands accordingly to identify the cache files in server. In this context you can revisit Lookup Persistent Cache and Incremental Aggregation article 12. How to configure a Lookup on a flat file with header? Answer: When we try to create a lookup transformation, we have the option to select the location of the Lookup Ta- ble from any of Source, Target, Source Qualifier, Import from Relational Table or Import from Flat File. So after selecting the flat file as lookup from the desired location, the edit Transformation tab of the lookup will have the Flat file information to choose between Delimited or Fixed width and advanced properties to modify like Column Delimiters, Code Page and obviously Number of initial rows to skip. Set Number of initial rows to skip as 1. Set the Lookup condition as required. Apart from that go to the Mapping tab of the corresponding session and select the lookup transformation to configure the Lookup source file directory and filename and Lookup source file type i.e. Direct or Indirect. 13. What is the difference between persistent cache and shared cache? Answer: Persistent cache is a type of Informatica lookup cache in which the cache file is stored in disk. We can configure the session to re-cache if necessary. It will be used only if we are sure that lookup table will not change between sessions. It will be used if your mapping uses any static tables as lookup mostly. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 34 If the persistent cache is shared across mappings, we call it as shared cache (named). We will provide a name for this cache file. If the lookup table is used in more than one transformation/mapping then the cache built for the first lookup can be used for the others. It can be used across mappings. For Shared cache we have to give the name in cache file name prefix property. Use the same name it in different lookup where we want to use the cache. Unshared cache: Within the mapping if the lookup table is used in more than one transformation then the cache built for the first lookup can be used for the others. It cannot be used across mappings. 14. Describe how to return multiple port values from unconnected lookup in Informatica. Answer: Informatica Unconnected Lookup by default supports only one return port. So alternatively we can write a Lookup SQL override with the required ports values concatenated into a single string as return port value. Call the Unconnected lookup from the expression transformation and use various output ports to retrieve the lookup values based on the concatenated return value. Use SUBSTR, INSTR functions to extract the column values from the concatenated return field. 15. How to make the persistent lookup cache in sync with lookup table? Answer: To make the persistent cache in sync with the lookup table simply enable Re-cache option of the lookup transformation to rebuild the lookup cache from lookup table again. While loading the target dimension table we can choose to make the lookup cache dynamic and recache-persistent so that once dimension is loaded the persistent cache file is in sync and available during Fact table loading. 16. If we use persistent cache for a dynamic lookup, will the cache file be updated or inserted as required? Answer: Having persistent cache will not impact the dynamic cache anyway in doing insert & updates to the cache file. Just that cache file will have a proper name assigned using persistent named cache and it can be reused later. 17. Is there anything wrong in sharing a persistent cache between static and dynamic lookup? Answer: D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 35 Static & Dynamic lookup cannot share the same persistent cache. 18. What is the difference between the two update properties - update else insert, insert else update in dynamic lookup cache? Answer: I In Dynamic Cache:  Update else Insert: In this scenario, if incoming record already exists in lookup cache then the record is going to be updated in the cache and also the target else it will be inserted.  Insert else Update: In this scenario, if incoming record does not exist in lookup cache then the record is going to be inserted in the cache and also the target else it will be updated. These options play a role in the performance part. If we know the nature of the source data we can set the update option accordingly. Suppose if the maximum source data is destined for insert we will select Insert else Update, otherwise we will go for Update else Insert. Also, if the number of duplicate records coming from Source is greater or there are few potential duplicates in source then we go for Update Else Insert or Insert Else Update respectively for better performance. 19. If the default value for the lookup return port is not set, what will be the output when the lookup condition fails? Answer: NULL will be returned from lookup transformation on lookup condition failure. 20. How can we ensure data is not duplicated in the target when the source has duplicate records, using lookup transformation? Answer: Using Dynamic lookup cache we can ensure duplicate records are not inserted in the target. That is through Using Dynamic Lookup Cache of the target table and associating the input ports with the lookup port and checking the Insert Else Update option will help to eliminate the duplicate records in source and hence loading unique records in the target. For more details check, Dynamic Lookup Cache D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 36 6. Normalizer Transformation 1. What is a Normalizer transformation? Answer: The normalizer transformation normalizes records from COBOL and relational sources, allowing you to or- ganize the data according to your own needs. A Normalizer transformation can appear anywhere in a data flow when you normalize a relational source. Use a Normalizer transformation instead of the Source Qualifi- er transformation when you normalize COBOL source. When you drag a COBOL source into the Mapping De- signer Workspace, the Normalizer transformation appears, creating input and output ports for every columns in the source. 2. Scenario Implementation 1 Suppose in our Source Table we have data as given below: Student Name Math Life Science Physical Science Sam 100 70 80 John 75 100 85 Tom 80 100 85 We want to load our Target Table as: Student Name Subject Name Marks Sam Math 100 Sam Life Science 70 Sam Physical Science 80 John Math 75 John Life Science 100 John Physical Science 85 Tom Math 80 Tom Life Science 100 Tom Physical Science 85 Describe your approach. Answer: Here to convert the Rows to Columns we have to use the Normalizer Transformation followed by an Expres- sion Transformation to decode the column taken into consideration. For more details on how the mapping is performed please visit Working with Normalizer. 3. What are levels in Normalizer transformation? D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 37 Answer: The VSAM Normalizer transformation is the Source Qualifier for a COBOL source definition. A COBOL can contain multiple-occurring data (Group of columns of same type) and multiple types of records in the same file. Mostly level is for that use. The Normalizer tab defines the structure of the source data. A group of columns might define a record in a COBOL source or it might define a group of multiple-occurring fields in the source. The column level number identifies groups of columns in the data. Level numbers define a data hierarchy. Columns in a group have the same level number and display sequentially below a group-level column. A group-level column has a lower level number, and it contains no data. 4. What is the purpose of GCID and GK in a Normalizer transformation? Answer: Let’s take an example: Source data is: Name FOOD HOUSERENT TRANSPORT Saurav 1000 2000 500 Jenny 2000 2500 700 When we set the OCCURS property of the Normalizer to 3, the Normalizer creates 3 input ports to get data from the source. Say the 3 columns FOOD, HOUSERENT and TRANSPORT is connected to the 3 input ports of the Normalizer. Then the GCID gets 3 values 1, 2 and 3 corresponding to the connected input columns for FOOD, HOUSERENT and TRANSPORT. Going forward it generates 3 rows for each input columns values of a single source row. On the other hand GK will keep a sequence value starting from 1 to number of source records. It holds the sequence number of the source records being processed. Below will help to visualize output data from the Normalizer in GCID and GK fields: Name EXPENSEHEAD GCID_EXPENSEHEAD EXPENSE GK_EXPENSEHEAD Saurav FOOD 1 1000 1 Saurav HOUSERENT 2 2000 1 Saurav TRANSPORT 3 500 1 Jenny FOOD 1 2000 2 Jenny HOUSERENT 2 500 2 Jenny TRANSPORT 3 700 2 D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 38 7. Rank Transformation 1. What is a Rank Transform? Answer: Rank is an Active Connected transformation used to select a set of top or bottom values of data. It basically filters the required number of records from the top or from the bottom. 2. How does a Rank Transform differ from Aggregator Transform functions MAX and MIN? Answer: Like the Aggregator transformation, the Rank transformation also groups information. The Rank Transform allows us to select a group of top or bottom values, not just one value as in case of Aggregator MAX, MIN functions. 3. How does a Rank Cache works? Answer: During a session, the Integration Service compares an input row with rows in the data cache. If the input row out-ranks a cached row, the Integration Service replaces the cached row with the input row. If we configure the Rank transformation to rank based on different groups, the Integration Service ranks incrementally for each group it finds. The Integration Service creates an index cache to stores the group information and data cache for the row data. 4. What is a RANK port and RANKINDEX? Answer: Rank port is an input/output port used to specify the column for which we want to rank the source values. By default Informatica creates an output port RANKINDEX for each Rank transformation. It stores the ranking position for each row in a group. 5. How can you get ranks based on different groups? Answer: Rank transformation lets us group information. We can configure one of its input/output ports as a group by port. For each unique value in the group port, the transformation creates a group of rows falling within the rank definition (top or bottom, and a particular number in each rank). D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 39 6. What happens if two rank values match? Answer: If two rank values match, they receive the same value in the rank index and the transformation skips the next value. 7. What are the restrictions of Rank Transformation? Answer:  We can connect ports from only one transformation to the Rank transformation.  We can select the top or bottom rank.  We need to select the Number of records in each rank.  We can designate only one Rank port in a Rank transformation. 8. How does Rank transformation handle string values? Answer: Rank transformation can return the strings at the top or the bottom of a session sort order. When the Integration Service runs in Unicode mode, it sorts character data in the session using the selected sort order associated with the Code Page of Integration Service which may be French, German, etc. When the Integration Service runs in ASCII mode, it ignores this setting and uses a binary sort order to sort character data. 9. What is Dense Rank and does Informatica supports Dense Rank? Answer: When multiple rows share the same rank the next rank in the sequence is not consecutive. On the other hand DENSE RANK assigns consecutive ranks. Take the following example: Let’s say we want to see the top 2 highest salary of each department. DEPTNO SAL RANK DENSE_RANK 10 400 1 1 10 400 1 1 10 300 3 2 10 100 4 3 20 550 1 1 20 550 2 2 20 150 2 2 30 200 1 1 D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 40 40 600 1 1 So the normal RANK will generate the result set where we can miss rank (here RANK = 2 is missing for department 10) for due to sharing of same ranks between multiple records. On the other hand the DENSE RANK will generate all the consecutive ranks. Informatica RANK transform performs a simple RANK, not DENSE RANK. So using Informatica RANK transform we may miss consecutive ranks. 10. How do we achieve DENSE_RANK in Informatica? Answer: In order to achieve the DENSE RANK functionality in Informatica we will use the combination of Sorter, Ex- pression and Filter transformation. Based on the previous example data set, let’s say we want to get the top 2 highest salary of each department as per DENSE RANK.  Use a SORTER transformation. DEPTNO ASC, SAL DESC  After the sorter place an EXPRESSION transformation. PORT_NAME TYPE EXPRESSION DEPT I/O SAL I/O V_COMP V IIF (DEPT <> V_DEPT_PREV, 1, IIF (DEPT = V_DEPT_PREV AND SAL <> V_SAL_PREV, RANK+1, RANK)) RANK O V_COMP V_DEPT_PREV V DEPT V_SAL_PREV V SAL  Next use a FILTER transformation. FILTER CONDITION: RANK < 3 11. Source table has 5 rows. Rank in rank transformation is set to 10. How many rows the rank transformation will output? Answer: 5 Rank 12. How you will load unique record into target flat file from source flat files has duplicate data? Answer: D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 41 In rank transformation using group by port (Group the records) and then set no. of rank 1. Rank transformation returns one value from the group. That value will be a unique one. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 42 8. Router Transformation 1. What is the difference between Router and Filter? Answer: Following differences can be note: Router Filter Router transformation divides the incoming records into multiple groups based on some condition. Such groups can be mutually inclusive (Dif- ferent groups may contain same record) Filter transformation restricts or blocks the incoming record set based on one given condition. Router transformation itself does not block any record. If a certain record does not match any of the routing conditions, the record is routed to default group Filter transformation does not have a default group. If one record does not match filter condition, the record is blocked Router acts like CASE... WHEN statement in SQL (Or Switch ()... statement in C) Filter acts like WHERE condition is SQL. In filter transformation the records are filtered based on the condition and rejected rows are discarded. In Router the multiple conditions are placed and the rejected rows can be assigned to a port. 2. What is the minimum number of groups we can declare in a Router transformation? Answer: We can define minimum 1 group condition for a Router transformation, and it will create automatically another group called Default to pass those records that do not conform to the Router condition for the group defined. 3. Scenario Implementation 1 Loading Multiple Target Tables Based on Conditions- Suppose we have some serial numbers in a flat file source. We want to load the serial numbers in two target files one containing the EVEN serial numbers and the other file having the ODD ones. Answer: After the Source Qualifier place a Router Transformation. Create two Groups namely EVEN and ODD, with filter conditions as:  MOD(SERIAL_NO,2)=0  MOD(SERIAL_NO,2)=1 Then output the two groups into two flat file targets. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 43 4. Scenario Implementation 2 Suppose we have a source table and we want to load three target tables based on source rows such that first row moves to first target table, second row in second target table, third row in third target table, fourth row again in first target table so on and so forth. Describe your approach. Answer: We can clearly understand that we need a Router transformation to route or filter source data to the three target tables. Now the question is what will be the filter conditions. First of all we need an Expression Transformation where we have all the source table columns and along with that we have another i/o port say seq_num, which gets sequence numbers for each source row from the port NEXTVAL of a Sequence Generator start value 0 and increment by 1. Now the filter condition for the three router groups will be:  MOD(SEQ_NUM,3)=1 connected to 1st target table  MOD(SEQ_NUM,3)=2 connected to 2nd target table  MOD(SEQ_NUM,3)=0 connected to 3rd target table D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 44 5. Scenario Implementation 3 How can we distribute and load ‘n’ number of Source records equally into two target tables, so that each have ‘n/2’ records? Answer:  After Source Qualifier use an expression transformation.  In the expression transformation create a counter variable V_COUNTER = V_COUNTER + 1 (Variable port) O_COUNTER = V_COUNTER (o/p port) This counter variable will get incremented by 1 for every new record which comes in.  Router Transformation: Group_ODD: IIF(MOD(O_COUNTER, 2) = 1) Group_EVEN: IIF(MOD(O_COUNTER, 2) = 0) Half of the record (all odd number record) will go to Group_ODD and rest to Group_EVEN.  Finally the target tables. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 45 9. Sequence Generator Transformation 1. What is a Sequence Generator Transformation? Answer: A Sequence Generator is a Passive and Connected transformation that generates numeric values. It is used to create unique primary key values, replace missing primary keys, or cycle through a sequential range of numbers. This transformation by default contains two OUTPUT ports only, namely CURRVAL and NEXTVAL. We cannot edit or delete these ports neither we cannot add ports to this unique transformation. We can create approximately two billion unique numeric values with the widest range from 1 to 2147483647. 2. Define the Properties available in Sequence Generator transformation in brief. Answer: Sequence Generator Properties Description Start Value Start value of the generated sequence that we want the Integration Service to use if we use the Cycle option. If we select Cycle, the In- tegration Service cycles back to this value when it reaches the end value. Default is 0. Increment By Difference between two consecutive values from the NEXTVAL port. Default is 1. End Value Maximum value generated by Sequence Generator. After reaching this value the session will fail if the sequence generator is not configured to cycle. Default is 2147483647. Current Value Current value of the sequence. Enter the value we want the Inte- gration Service to use as the first value in the sequence. Default is 1. Cycle If selected, when the Integration Service reaches the configured end value for the sequence, it wraps around and starts the cycle again, beginning with the configured Start Value. Number of Cached Values Number of sequential values the Integration Service caches at a time. Default value for a standard Sequence Generator is 0. Default value for a reusable Sequence Generator is 1,000. Reset Restarts the sequence at the current value each time a session runs. This option is disabled for reusable Sequence Generator transformations. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 46 3. Scenario Implementation 1 Suppose we have a source table populating two target tables. We connect the NEXTVAL port of the Se- quence Generator to the surrogate keys of both the target tables. Will the Surrogate keys in both the target tables be same? If not how can we flow the same sequence values in both of them. Answer: When we connect the NEXTVAL output port of the Sequence Generator directly to the surrogate key columns of the target tables, the Sequence number will not be the same. A block of sequence numbers is sent to one target tables surrogate key column. The second target receives a block of sequence numbers from the Sequence Generator transformation only after the first target table receives the block of sequence numbers. Suppose we have 5 rows coming from the source, so the targets will have the sequence values as TGT1 (1,2,3,4,5) and TGT2 (6,7,8,9,10). [Taken into consideration Start Value 0, Current value 1 and Increment by 1] Now suppose the requirement is like that we need to have the same surrogate keys in both the targets. Then the easiest way to handle the situation is to put an Expression transformation in between the Se- quence Generator and the Target tables. The Sequence Generator will pass unique values to the expression transformation, and then the rows are routed from the expression transformation to the targets. 4. Scenario Implementation 2 Suppose we have 100 records coming from the source. Now for a target column population we used a Se- quence generator. Suppose the Current Value is 0 and End Value of Sequence generator is set to 80. What will happen? Answer: End Value is the maximum value the Sequence Generator will generate. After it reaches the End value the session fails with the following error message: D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 47 TT_11009 Sequence Generator Transformation: Overflow error. Failing of session can be handled if the Sequence Generator is configured to Cycle through the sequence, i.e. whenever the Integration Service reaches the configured end value for the sequence; it wraps around and starts the cycle again, beginning with the configured Start Value. 5. What are the changes we observe when we promote a non-reusable Sequence Generator to a reusable one? And what happens if we set the Number of Cached Values to 0 for a reusable transformation? Answer: When we convert a non-reusable sequence generator to reusable one we observe that the Number of Cached Values is set to 1000 by default. And the Reset property is disabled. When we try to set the Number of Cached Values property of a Reusable Sequence Generator to 0 in the Transformation Developer we encounter the following error message: The number of cached values must be greater than zero for reusable sequence transformation. 6. How Sequence Generator in the mapping is handled when we migrate the mapping from one environment to another? Answer: While promoting the Informatica Objects using Copy Folder Wizard we have the option to choose to retain existing values or to replace them with values from the source folder. Generally we Retain the current values for the Sequence Generator transformation in the destination folder, else we may end up having duplicate values for the sequence generated column and may result to session failure. Find the below Informatica Metadata query which gives the list of the current value of Sequence Generator transform: SELECT OPB_SUBJECT.SUBJ_NAME AS "FOLDER NAME", OPB_MAPPING.MAPPING_NAME AS "MAPPING NAME", REP_WIDGET_INST.INSTANCE_NAME AS "SEQ NAME", OPB_WIDGET_ATTR.ATTR_VALUE AS "CURRENT VALUE" FROM REP_WIDGET_INST INNER JOIN OPB_MAPPING ON (REP_WIDGET_INST.MAPPING_ID = OPB_MAPPING.MAPPING_ID) INNER JOIN OPB_WIDGET_ATTR ON (REP_WIDGET_INST.WIDGET_TYPE = OPB_WIDGET_ATTR.WIDGET_TYPE AND REP_WIDGET_INST.WIDGET_ID = OPB_WIDGET_ATTR.WIDGET_ID) INNER JOIN OPB_SUBJECT ON (OPB_MAPPING.SUBJECT_ID = OPB_SUBJECT.SUBJ_ID ) WHERE REP_WIDGET_INST.WIDGET_TYPE_NAME like 'Sequence%' AND OPB_WIDGET_ATTR.ATTR_ID = 4 --Current Value ORDER BY OPB_MAPPING.MAPPING_NAME D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 48 7. Scenario Implementation 3 Consider we have two mappings that populate a single target table from two different source systems. Both the mappings have Sequence Generator transform to generate surrogate key in the target table. How can we ensure that the surrogate key generated is consistent and does not generate duplicate values when populating data from two different mappings? Answer: We should use a Reusable Sequence Generator in both the mappings to generate the target surrogate keys. 8. How do I get a Sequence Generator to "pick up" where another "left off"? Answer: Use an unconnected lookup on the Sequence ID of the target table. Set the properties to "LAST VALUE", input port is an ID. the condition is: SEQ_ID >= input_ID. Then in an expression set up a variable port: connect a NEW self-resetting sequence generator to a new input port in the expression. The variable port's expression should read: IIF( v_seq = 0 OR ISNULL(v_seq) = true, :LKP.lkp_sequence(1), v_seq). Then, set up an output port. Change the output port's expression to read: v_seq + input_seq (from the resetting sequence generator). Thus you have just completed an "append" without a break in sequence numbers. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 49 10. Stored Procedure Transformation 1. What is a Stored Procedure Transformation? Answer: Stored Procedure is a Passive transformation used to execute stored procedures pre-built on the database through Informatica ETL. It can also be used to call functions to return calculated values. 2. How many types of Stored Procedure transformation are there? Answer: There are two types of Stored Procedure transformation based on calling, Connected and Uncon- nected. Based on the execution order they can be classified as Source Pre Load, Source Post Load, Normal, Target Pre Load and Target Post Load. Normal Stored Procedure transformation can be configured as both connected and unconnected whereas Pre-Post Load Stored Procedures are unconnected ones. 3. How do we call an Unconnected Stored Procedure transformation? Answer: The unconnected Stored Procedure transformation is called from expression transformation using the :SP.<Stored_Procedure_Name>(Argument1, Argument2). Conditional execution of a Stored Procedure is possible using Unconnected Stored Procedure unlike the connected one. 4. How do we set the Execution order of Pre-Post Load Stored Procedure? Answer: We set the execution order using the Stored Procedure Plan from the mapping property. 5. How do we set the Call Text for Stored Procedure transformation? Answer: Once we specify the Stored Procedure Type other than Normal, the Call Text Attribute in the Properties tab gets enabled. Here we have to specify how the procedure has to be called along with arguments to be passed. E.g. <Stored_Procedure_Name>(Argument1, Argument2). D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 50 6. How do we receive output/return parameters from Unconnected Stored Procedure? Answer: Configure the expression to send any input parameters and capture any output parameters or return value You must know whether the parameters shown in the Expression Editor are input or output parameters. You insert variables or port names between the parentheses in the exact order that they appear in the stored procedure itself. The datatypes of the ports and variables must match those of the parameters passed to the stored procedure. For example, when you click the stored procedure, something similar to the following appears: :SP.GET_NAME_FROM_ID() This particular stored procedure requires an integer value as an input parameter and returns a string value as an output parameter. How the output parameter or return value is captured depends on the number of output parameters and whether the return value needs to be captured. If the stored procedure returns a single output parameter or a return value (but not both), you should use the reserved variable PROC_RESULT as the output variable. In the previous example, the expression would appear as: :SP.GET_NAME_FROM_ID(inID, PROC_RESULT) InID can be either an input port for the transformation or a variable in the transformation. The value of PROC_RESULT is applied to the output port for the expression. If the stored procedure returns multiple output parameters, you must create variables for each output parameter. For example, if you created a port called varOUTPUT2 for the stored procedure expression, and a variable called varOUTPUT1, the expression would appears as: :SP.GET_NAME_FROM_ID (inID, varOUTPUT1, PROC_RESULT) The value of the second output port is applied to the output port for the expression, and the value of the first output port is applied to varOUTPUT1. The output parameters are returned in the order they are declared in the stored procedure itself. With all these expressions, the datatypes for the ports and variables must match the datatypes for the input/output variables and return value. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 51 11. Sorter Transformation 1. What is a Sorter Transformation? Answer: Sorter is an Active Connected transformation used to sort data in ascending or descending order according to specified sort keys. The Sorter transformation contains only input/output ports. 2. Why is Sorter an Active Transformation? Answer: This is because we can select the “distinct” option in the sorter property. When the Sorter transformation is configured to treat output rows as distinct, it assigns all ports as part of the sort key. The Inte- gration Service discards duplicate rows compared during the sort operation. The number of Input Rows will vary as compared with the Output rows and hence it is an Active transformation. 3. How does Sorter handle Case Sensitive sorting? Answer: The Case Sensitive property determines whether the Integration Service considers case when sorting data. When we enable the Case Sensitive property, the Integration Service sorts uppercase characters higher than lowercase characters. 4. How does Sorter handle NULL values? Answer: We can configure the way the Sorter transformation treats null values. Enable the property Null Treated Low if we want to treat null values as lower than any other value when it performs the sort operation. Disa- ble this option if we want the Integration Service to treat null values as higher than any other value. 5. How does a Sorter Cache works? Answer: The Integration Service passes all incoming data into the Sorter Cache before Sorter transformation performs the sort operation. The Integration Service uses the Sorter Cache Size property to determine the maximum amount D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 52 of memory it can allocate to perform the sort operation. If it cannot allocate enough memory, the Integra- tion Service fails the session. For best performance, configure Sorter cache size with a value less than or equal to the amount of available physical RAM on the Integration Service machine. If the amount of incoming data is greater than the amount of Sorter cache size, the Integration Service tem- porarily stores data in the Sorter transformation work directory. The Integration Service requires disk space of at least twice the amount of incoming data when storing data in the work directory. 6. How to delete duplicate records or rather to select distinct rows for flat file sources? Answer: Since the source system is a Flat File you will not be able to select the distinct option in the source qualifier as it will be disabled due to flat file source table. Hence the next approach may be we use a Sorter Trans- formation and check the Distinct option. When we select the distinct option all the columns will the selected as keys, in ascending order by default. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 53 12. Union Transformation 1. What is a Union Transformation? Answer: Union is an Active, Connected non-blocking multiple input group transformation used to merge data from multiple pipelines or sources into one pipeline branch. Similar to the UNION ALL SQL statement, the Union transformation does not remove duplicate rows. 2. What are the restrictions of Union Transformation? Answer:  All input groups and the output group must have matching ports. The precision, data type, and scale must be identical across all groups.  We can create multiple input groups, but only one default output group.  The Union transformation does not remove duplicate rows.  We cannot use a Sequence Generator or Update Strategy transformation upstream from a Union transformation.  The Union transformation does not generate transactions. 3. How come union transformation is active? Answer: Active transformations are those that may change the number or position of rows in the data stream. Any transformation that splits or combines data streams or reduces, expands or sorts data is an active transformation because it cannot be guaranteed that when data passes through the transformation the number of rows and their position in the data stream are always unchanged. Union is an active transformation because it combines two or more data streams into one. Though the total number of rows passing into the Union is the same as the total number of rows passing out of it, and the sequence of rows from any given input stream is preserved in the output, the positions of the rows are not preserved, i.e. row number 1 from input stream 1 might not be row number 1 in the output stream. Union does not even guarantee that the output is repeatable. For Union, number of input rows does not match with the number of output rows. Consider, we have two sources with 10 and 20 rows individually. For each of this input Source we are getting 30 output rows. We could probably consider this like a Joiner with 10 and 20 rows with Full Outer Join, with no matching columns, which will give you all the rows as output. It is a debatable Topic as why UNION transformation is Active. Union Transformation is derived from Multigroup External transformation. As Multigroup External transformation is Active, Union transformation can be termed as active. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 54 13. Update Strategy Transformation 1. What is Update Strategy transform? Answer: Update strategy defines the sources to be flagged for insert, update, delete, and reject at the targets. 2. What are Update Strategy Constants? Answer:  DD_INSERT - 0  DD_UPDATE - 1  DD_DELETE - 2  DD_REJECT - 3 3. How can we update a record in target table without using Update strategy? Answer: A target table can also be updated without using “Update Strategy”. For this, we need to define the key in the target table in Informatica level and then we need to connect the key and the field we want to update in the mapping Target. In the session level, we should set the target property as “Update as Update” and enable the “Update” check-box. Let's assume we have a target table "Customer" with fields as "Customer ID", "Customer Name" and "Cus- tomer Address". Suppose we want to update "Customer Address" without an Update Strategy. Then we have to define "Customer ID" as primary key in Informatica level and we will have to connect Customer ID and Customer Address fields in the mapping. If the session properties are set correctly as described above, then the mapping will only update the customer address field for all matching customer IDs. 4. What is Data Driven? Answer: Update strategy defines the sources to be flagged for insert, update, delete, and reject at the targets. Treat input rows as Data Driven: This is the default session property option selected while using an Update Strategy transformation in a mapping. The integration service follows the instructions coded in mapping to flag the rows for insert, update, delete or reject. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 55 5. What happens when DD_UPDATE is defined in update strategy and Treat source rows as INSERT is selected in Session? Answer: If in Session anything other than DATA DRIVEN is mentioned then Update strategy in the mapping is ignored. 6. What are the three areas where the rows can be flagged for particular treatment? Answer:  In Mapping – Update Strategy  In Session - Treat Source Rows As  In Session - Target Insert / Update / Delete Options. 7. By default operation code for any row in Informatica without being altered is INSERT. Then state when do we need DD_INSERT? Answer: When we handle data insertion, updating, deletion and/or rejection in a single mapping, we use Update Strategy transformation to flag the rows for Insert, Update, Delete or Reject. We flag it by either providing the values 0, 1, 2, 3 respectively or by DD_INSERT, DD_UPDATE, DD_DELETE or DD_REJECT in the Update Strategy transformation. By default the transform has the value '0' and hence it performs insertion. Suppose we want to perform insert or update target table in a single pipeline. Then we can write the below expression in update strategy transformation to insert or update based on the incoming row. IIF (LKP_EMPLOYEE_ID IS NULL, DD_INSERT, DD_UPDATE) If we can use more than one pipeline then, it’s not a problem. For the Insert part we don’t even need an Up- date Strategy transform explicitly (DD_INSERT), we can map it straight away. 8. What is the difference between update strategy and following update options in target? Update as Update - Update as Insert - Update else Insert Even if we do not use update strategy we can still update the target by setting, for example Update as Update and treating target rows as data driven. So what's the difference here? Answer: The operations for the following options will be done in the Database Level.  Update as Update  Update as Insert  Update else Insert D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 56 It will write a 'select' statement on the target table and will compare with the source. Accordingly if the record already exits it will do an update else it will insert. On the other hand the update strategy the operations will be done at the Informatica level itself. Update strategy also gives conditional update option - wherein based on some condition you can update/ insert even reject the rows. Such conditional options are not available in target based updates (wherein it will either “update” or it will perform “update else insert” based on the keys defined in Informatica level) 9. What is the use of Forward Reject rows in Mapping? Answer: If DD_REJECT is selected in the Update Strategy, then we need to select this option to generate the Reject/ Bad file. 10. Scenario Implementation 1 Suppose we have source employee table and we want to load employees who belong to department 10 to Target 1, 20 to Target 2 and 30 to Target 3. Describe the approach without using FILTER or ROUTER Trans- formations. Answer: We will use three separate Update Strategy transformations before each of the target tables (T1, T2, T3), and provide below condition in their expression editor: UPD_T1: IIF (DEPTNO = 10, DD_INSERT, DD_REJECT) UPD_T2: IIF (DEPTNO = 20, DD_INSERT, DD_REJECT) UPD_T3: IIF (DEPTNO = 30, DD_INSERT, DD_REJECT) D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 57 14. Java Transformation 1. Scenario Implementation 1 Source: Col1 Col2 A 3 B 2 C 2 Target: Col1 Col2 A 3 A 3 A 3 B 2 B 2 C 2 C 2 Answer: Using Java transformation in Informatica we can generate as many records required as per the requirement. Here goes the Java code. In_Col1 = Col1; In_Col2 = Col2; for (int i = 0, i < In_Col2, i++) { Out_Col1 = In_Col1; Out_Col2 = In_Col2; generaterows(); } 2. Scenario Implementation 2 How can I replace characters e.g. A to Z in a particular string to its ASCII value? E.g. Input String-AB123C1; Output string-6566123671 Answer: If the INPUT string is fixed size of 9 characters, Use the below code as expression in an Output port of an Informatica Expression transformation. Alternatively you can use Informatica User-Defined Function with the INPUT string as an Argument: IIF( IS_NUMBER( SUBSTR( INPUT, 1, 1 ) ) = 1, SUBSTR( INPUT, 1, 1 ), TO_CHAR( ASCII( SUBSTR( INPUT, 1, 1 ) ) ) ) || IIF( IS_NUMBER( SUBSTR( INPUT, 2, 1 ) ) = 1, SUBSTR( INPUT, 2, 1 ), TO_CHAR( ASCII( SUBSTR( INPUT, 2, 1 ) ) ) ) || IIF( IS_NUMBER( SUBSTR( INPUT, 3, 1 ) ) = 1, SUBSTR( INPUT, 3, 1 ), TO_CHAR( ASCII( SUBSTR( INPUT, 3, 1 ) ) ) ) || IIF( IS_NUMBER( SUBSTR( INPUT, 4, 1 ) ) = 1, SUBSTR( INPUT, 4, 1 ), TO_CHAR( ASCII( SUBSTR( INPUT, 4, 1 ) ) ) ) || IIF( IS_NUMBER( SUBSTR( INPUT, 5, 1 ) ) = 1, SUBSTR( INPUT, 5, 1 ), D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 58 TO_CHAR( ASCII( SUBSTR( INPUT, 5, 1 ) ) ) ) || IIF( IS_NUMBER( SUBSTR( INPUT, 6, 1 ) ) = 1, SUBSTR( INPUT, 6, 1 ), TO_CHAR( ASCII( SUBSTR( INPUT, 6, 1 ) ) ) ) || IIF( IS_NUMBER( SUBSTR( INPUT, 7, 1 ) ) = 1, SUBSTR( INPUT, 7, 1 ), TO_CHAR( ASCII( SUBSTR( INPUT, 7, 1 ) ) ) ) || IIF( IS_NUMBER( SUBSTR( INPUT, 8, 1 ) ) = 1, SUBSTR( INPUT, 8, 1 ), TO_CHAR( ASCII( SUBSTR( INPUT, 8, 1 ) ) ) ) || IIF( IS_NUMBER( SUBSTR( INPUT, 9, 1 ) ) = 1, SUBSTR( INPUT, 9, 1 ), TO_CHAR( ASCII( SUBSTR( INPUT, 9, 1 ) ) ) ) As per the requirement we want to convert just the Characters in an input String to its ASCII equivalent not the Digits. If the requirement were to convert a single character to ASCII equivalent in Informatica, then the ASCII in-built function of Informatica would have been helpful. E.g. ASCII(inp_chr) But single this is a string and we need the ASCII equivalent of each characters in the string i.e. parse each characters; concept of loop comes in picture. So use Informatica JAVA transformation. Use Informatica Passive Java transformation: I have the i/p column name as INPUT and o/p value from Java transform as OUTPUT port created. On the Java Code tab of Java transformation use the below java code:- String inp = INPUT; String ch; String out=""; for (int i = 0; i < inp.length(); i++) { ch= inp.substring(i, i+1); char c = inp.charAt(i); if(! Character.isDigit(c)) { int j = (int) c; out = out + j; } else out = out + ch; } OUTPUT = out; D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 59 15. Source Qualifier Transformation 1. What is a Source Qualifier? What are the tasks we can perform using a Source Qualifier and why it is an ACTIVE transformation? Answer: A Source Qualifier is an Active and Connected transformation that reads the rows from a relational database or flat file source.  We can configure the SQ to join [Both INNER as well as OUTER JOIN] data originating from the same source database.  We can use a source filter to reduce the number of rows the Integration Service queries.  We can specify a number for sorted ports and the Integration Service adds an ORDER BY clause to the default SQL query.  We can choose Select Distinct option for relational databases and the Integration Service adds a SE- LECT DISTINCT clause to the default SQL query.  Also we can write Custom/Used Defined SQL query which will override the default query in the Source Qualifier by changing the default settings of the transformation properties for relational databases.  Also we have the option to write Pre as well as Post SQL statements to be executed before and after the Source Qualifier query in the source database. Since the transformation provides us with the property Select Distinct, when the Integration Service adds a SELECT DISTINCT clause to the default SQL query, which in turn affects the number of rows returned by the Database to the Integration Service and hence it is an Active transformation. 2. What happens to a mapping if we alter the data types between Source and its corresponding Source Qualifier? Answer: The Source Qualifier transformation displays the Informatica data types. The transformation data types determine how the source database binds data when the Integration Service reads it. Now if we alter the data types in the Source Qualifier transformation or the data types in the Source definition and Source Qualifier transformation do not match, the Designer marks the mapping as invalid when we save the mapping. 3. Suppose we have used the Select Distinct and the Number of Sorted Ports property in the Source Qualifier and then we add Custom SQL Query. Explain what will happen. Answer: Whenever we add Custom SQL or SQL override query it overrides the User-Defined Join, Source Filter, Num- ber of Sorted Ports, and Select Distinct settings in the Source Qualifier transformation. Hence only the user defined SQL Query will be fired in the database and all the other options will be ignored. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 60 4. Describe the situations where we will use the Source Filter, Select Distinct and Number of Sorted Ports properties of Source Qualifier transformation. Answer: Source Filter option is used basically to reduce the number of rows the Integration Service queries, so as to improve performance. Select Distinct option is used when we want the Integration Service to select unique values from a source. Filtering out unnecessary data earlier in the data flow, will improve performance. Number Of Sorted Ports option is used when we want the source data to be in a sorted fashion, so as to use the same in some following transformations like Aggregator or Joiner, those when configured for sorted input will improve the performance. 5. What will happen if the SELECT list COLUMNS in the Custom override SQL Query and the OUTPUT PORTS order in Source Qualifier transformation do not match? Answer: Mismatch or changing the order of the list of selected columns in the SQL Query override of Source Qualifier to that of the connected transformation output ports may result is unexpected value result for ports if data types matches by chance, else will lead to session failure. 6. What happens if in the Source Filter property of SQ transformation we include keyword WHERE say, WHERE CUSTOMERS.CUSTOMER_ID > 1000. Answer: We use Source filter to reduce the number of source records. If we include the string WHERE in the source filter, the Integration Service fails the session. In the above case, the correct syntax will be CUSTOM- ERS.CUSTOMER_ID > 1000 7. Describe the scenarios where we go for Joiner transformation instead of Source Qualifier transformation. Answer: While joining Source Data of heterogeneous sources as well as to join flat files we will use the Joiner transformation. Use the Joiner transformation when we need to join the following types of sources:  Join data from different Relational Databases.  Join data from different Flat Files.  Join relational sources and flat files. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 61 8. What is the maximum number we can use in Number of Sorted Ports for Sybase source system? Answer: Sybase supports a maximum of 16 columns in an ORDER BY clause. So if the source is Sybase, do not sort more than 16 columns. 9. What is use of Source Qualifier in Informatica? Can we create a mapping without a source qualifier? Answer: Source Qualifier is used to convert the data types of Heterogeneous Source Objects supported by Informatica to Native Informatica data types, after which Informatica processes the following objects in a mapping with consistent Informatica data types. Also for relational table Source Qualifier helps to join multiple tables from the same database and also allows doing Pre or Post SQL operations. We cannot create a mapping without Source Qualifier; it is the first transformation in Informatica that is at- tached with the source tables or source flat file instance. 10. Suppose we have two tables of same database type, residing in different Database instance. If a Database Link is available, how can we join the two tables using a Source Qualifier in Informatica provided there are valid join columns. Answer: Source Qualifier Override:- SELECT e.empno, e.ename, s.salary, s.comm FROM emp e, sal@dblinkname s WHERE e.empno=s.empno It is advisable to create a Public Synonym at Database for the remote tables so that we can avoid using the syntax : TableName@DBLinkName 11. What is the meaning of “output is deterministic” property in source qualifier transformation? Answer: Output is deterministic means we are informing Informatica that the output does not change (for the same input) across every session run. Why is this required? Consider the source is relational and we have enabled the session for recovery. The session fails and we resume the session. In this D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 62 case if we have set the source as deterministic, then the session would have created a cache (on the disc) of the source during normal run to be used for recovery. This saves time during recovery because we need not issue the SQL command to the source database again. If this was not set, then the source data cache is not created during normal run and SQL will be reissued during recovery. In some cases, if this property is not set you will not be able to enable recovery for the session. 12. Scenario Implementation 1 How to delete duplicate rows present in relational database using Informatica? Suppose we have duplicate records in Source System and we want to load only the unique records in the Target System eliminating the duplicate rows. What will be the approach? Answer: Assuming that the source system is a Relational Database, to eliminate duplicate records, we can check the Distinct option of the Source Qualifier of the source table and load the target accordingly. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 63 16. Miscellaneous 1. What are the new features of Informatica 9.x in developer level? Answer: From a developer's perspective, some of the new features in Informatica 9.x are as follows:  Now Lookup can be configured as an active transformation - it can return multiple rows on successful match.  Now you can write SQL override on un-cached lookup also. Previously you could do it only on cached lookup.  You can control the size of your session log. In a real-time environment you can control the session log file size or time.  Database deadlock resilience feature - this will ensure that your session does not immediately fail if it encounters any database deadlock, it will now retry the operation again. You can configure number of retry attempts.  Cache can be updated based on a condition or expression.  New interface for admin console, now onwards called Informatica Administrator. (Create connection objects, grant permission on database connections, deploy or configure deployment units from the Informatica Administrator)  PowerCenter licensing now onwards based on the number of CPUs and repositories. 2. Name the transformations which converts one to many rows i.e. increases the I/P: O/P row count. Also what is the name of its reverse transformation? Answer: Normalizers as well as Router Transformations are two Active transformations which can increase the number of input rows to output rows. Aggregator Transformation performs the reverse action of Normalizer transformation. 3. How many ways we can filter records? Answer:  Source Qualifier  Filter transformation  Router transformation  Update strategy 4. What are the transformations that use cache for performance? Answer: D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 64 Aggregator, Sorter, Lookups, Joiner and Rank transformations use cache. 5. What is the formula for calculation of Lookup/Rank/Aggregator index & data caches? Answer:  Index cache size = Total no. of rows * size of the column in the lookup condition (50 * 4)  Aggregator/Rank transformation Data Cache size = (Total no. of rows * size of the column in the lookup condition) + (Total no. of rows * size of the connected output ports)  Aggregator Index cache: #Groups ((Σ column size) + 7)  Aggregate data cache: #Groups ((Σ column size) + 7)  Lookup Index Cache : #Rows in lookup table [(Σ column size) + 16)  Lookup Data Cache: #Rows in lookup table [(Σ column size) + 8]  Joiner Index Cache: #Master rows [(Σ column size) + 16)  Joiner Data Cache: #Master row [(Σ column size) + 8]  Rank Index Cache : #Groups ((Σ column size) + 7)  Rank Data Cache: #Group [(#Ranks * (Σ column size + 10)) + 20] 6. What is the difference between Informatica PowerCenter and Exchange and Mart? Answer: PowerCenter:  PowerCenter can have many repositories.  It supports the Global Repository and networked local repositories.  PowerCenter can connect to all native legacy source systems such as Mainframe, ERP, CRM, EAI (TIBCO, MSMQ, JMQ)  High Availability and Load sharing on multiple servers in the grid.  Informatica Session level Partioning is available.  Informatica Pushdown Optimizer is available. PowerMart:  PowerMart supports only one repository.  PowerMart can connect to Relational and flat file sources. PowerExchange:  PowerExchange Client and PowerExchange ODBC are PowerExchange interfaces to extract and load data for a variety of data types on a variety of platforms relational, non-relational, and changed data in batch-mode or real-time using PowerCenter. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 65  The PowerExchange Client for PowerCenter is installed with PowerCenter and integrates PowerExchange(Separate License for the required source system; Check Sources->Import from PowerExchange) and PowerCenter to extract relational, non-relational, and changed data. 7. How do we handle delimiter character as a part of the data in a delimited source file? Answer: For delimiter files the delimiter is the separator that identifies the data values of fields present in the file. So ideally if the data file contains the delimiter character as a part of the data in a field value, the field value either remains within double or single quotes or an escape character precedes the delimiter that is actually to be treated as a normal character. To handle the same flat-files in Informatica, use the following options as per the data file format while defining the file structure. 1. Select Optional Quotes to Double or Single Quote. The column delimiters within the quote characters are ignored. 2. Escape Character used to escape the delimiter or quote character. Escape character preceding the delimiter character in an unquoted string or the quote character in a quoted string is treated as regular character. 8. We have just received source files from UNIX. We want to stage that data to ETL process. What are the points we need to look for? Answer: When a source flat file is loaded to a staging database table, generally we focus on the below items:  Define proper file-format for the input file (Delimited/Fixed-width), Code Page etc.  Header information having any Processing date to be checked with sysdate or some other business logic.  Check the detail records count in the file with the information in the Trailer information if any.  Sum of any measure fields of detail records matches with Header/Trailer information if any.  In case of Indirect Loading we can add the filename and record number in file as part of columns in the staging table. Basically everything depends on your/business requirement. 9. What is the difference between Joiner and Lookup. Performance wise which one is better to use. Answer: Joiner: D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 66  Only “=” operator can be used in join condition  Supports normal, full outer, left/right outer join  Active transformation  No SQL override  Connected transformation  No dynamic cache  Heterogeneous source Lookup:  =, <, <=, >, >=, != operators can be used in join condition  Supports left outer join  Earlier a Passive transformation, 9 onwards an Active transformation (Can return more than 1 records in case of multiple match)  Supports SQL override  Connected/Unconnected  Supports dynamic cache update  Relational/FlatFile source/target  Pipeline Lookup Selection between these two transformations is completely dependent on project requirement. It’s a debatable topic to conclude which one among these two serves good in terms of performance. 10. What is the B2B in Informatica? How can we use it in Informatica? Answer: B2B allows to parse and read unstructured data such as PDF, EXCEL, HTML etc. It has the capability to read binary data such as Messages, EBCDIC File etc. and has a very large list of supported formats. B2B Data Transformation Studio is the Developer tool, by which the parsing of (reading) the unstructured data is done. B2B mostly gives the output as an XML file. B2B Data Transformation is integrated with Informatica PowerCenter using a Transformation "Unstructured Data Transformation", This transformation can receive the output of B2B Data Transformation studio and load into any Target supported by PowerCenter. 11. What is CDC, SCD and MD5 in Informatica? Answer:  CDC - Changed Data Capture. How, only the changed data is captured from the Source System.  SCD- Slowly Changing Dimension. How, history data is maintained in the Dimension tables.  MD5- MD5 Checksum Encoding. It generates 32 character HEX code encoding, can be used to decide Insert/Update strategy for target records. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 67 12. How can we implement an SCD Type2 mapping without using a lookup transformation? Answer: The entire implementation will be same as that using a lookup. The only thing we need to replace the Lookup transformation with a Joiner transformation. In the Joiner transformation the Source table will be used as Master and the Target table as Detail. The join condition will be same as that of lookup condition and the join type being Detail Outer Join. 13. How does Joiner and Lookup transformation treat NULL value matching? Answer: A NULL value is not equal to another NULL value in Joiner whereas, Lookup transformation matches null values. 14. Does Microsoft SQL server supports bulk loading? If yes, What happens when you specify bulk mode and data driven for SQL server target Answer: Yes MS SQL Server supports Bulk Loading. But if we select Treat Source Rows as Data Driven with the Target Load Type as Bulk then the session will fail. We have to select Normal Load with Data Driven source records. 15. How can you utilize COM components in Informatica? Answer: By writing C+, VB, VC++ code in External Stored Procedure Transformation 16. What is SQL transformation in Informatica? Answer: A SQL transformation can processes any SQL queries midstream in an Informatica pipeline. It supports mostly all the DDL, DML, DCL, TCL. For quick reference following are some important notes:-  We can configure the SQL transform in two modes that makes it Active/Passive.  Active, Query mode fires the SQL query in the database defined in the transformation.  Script mode, which is the Passive, one can call external SQL scripts to be executed. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 68  Query mode can be configured to handle Static SQL Query (i.e. the SQL query is the same with bind variables) or Dynamic SQL Query (i.e. different query statements for each input row).  In case of Dynamic Query when we substitute the entire SQL query of the Query_Port is called Full Query or portion of the query statement called Partial Query.  We can configure the SQL transformation to connect to a database with a Static Connection (i.e. selecting a particular connection object) or Dynamic Connection (i.e. based on the logic it will dynamically select the connection object to connect to a database). Also we can pass the entire database connection information (i.e. username,password, connectstring, codepage) called Full Database Connection. 17. What is a XML source qualifier? Answer: The XML source qualifier represents the data elements that the Informatica server reads when it runs a session with XML sources. 18. What is the “metadata extensions” tab in Informatica? Answer: PowerCenter allows end users and partners to extend the metadata stored in the repository by associating information with individual objects in the repository. That why it’s called Metadata Extension. For example, when we create a mapping, we can store the information like the mapping functionality, business user information, CR information. Similarly for Session we can store schedule information, contact person for failed session information. We basically associate the information with repository metadata using metadata extensions. When we create reusable metadata extensions for a repository object using the Repository Manager, the metadata extension becomes part of the properties of that type of object. For example, we can create a reusable metadata extension for source definition called SourceCreator. When we create or edit any source definition in the Designer, the SourceCreator extension appears on the Metadata Extensions tab. anyone who creates or edits a source can enter the name of the person that created the source into this field. PowerCenter Client applications can contain the following types of metadata extensions:-  Vendor-defined. Third-party application vendors create vendor-defined metadata extensions. We can view and change the values of vendor-defined metadata extensions, but we cannot create, delete, or redefine them.  User-defined. We create user-defined metadata extensions using PowerCenter. We can create, edit, delete, and view user-defined metadata extensions. We can also change the values of user-defined extensions. All metadata extensions exist within a domain. We see the domains when we create, edit, or view metadata extensions. Vendor-defined metadata extensions exist within a particular vendor domain. If we use third- D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 69 party applications or other Informatica products, we may see domains such as Ariba or PowerExchange for Siebel. We cannot edit vendor-defined domains or change the metadata extensions in them. User-defined metadata extensions exist within the User Defined Metadata Domain. When we create metadata extensions for repository objects, we add them to this domain. Both vendor and user-defined metadata extensions can exist for the repository objects- Source definitions, Target definitions, Transformations, Mappings, Mapplets, Sessions, Tasks, Workflows, Worklets. 19. Describe some of the ETL Best Practices Answer: A lot of best practices may be applicable to a certain tool and pointless for the other. In a very high level and in a very tool independent way-  Naming conventions for ETL objects  Naming conventions for Database objects  Parameterization of connections (so that things are easy for moving from 1 environment to other)  Maintaining of ETL job log - ideally automated maintenance through logging of job run  Handling of rejected records (and logging)  Data reconciliation  Meta data management- e.g. - maintaining Meta data columns in tables (Use of Audit columns e.g. load date/ load user/ batch id etc.)  Error reporting  ETL job Performance evaluation  Following generic coding standards  Documentation  Decomposing complex logic in multiple ETL stages - load balancing (pushdown optimization wherever applicable) etc.  Removal of unwanted ports from different transformations used in a mapping  Using Shortcuts for source, target and lookups  Using mapplet, worklet as and when required  Write some comments for every transformation  Use Decode function rather that “if than else”  make sure that the sorted data is moved into the aggregator transformation  If the target table is having indexes, loading data into such tables will decrease the performance; in such situations, use pre SQL to drop the index before loading the data into target tables and once the data is loaded then, re-create the index using post SQL. 20. Is there a scope of cloud computing in Data warehousing technology? Answer: This is not only possible; in fact, this is the way to go for many of the providers of the modern day BI tools. There are certain advantages and benefits of using cloud computing for Business Intelligence applications and this is a big topic of discussion today. I will quickly touch upon a few points that will substantiate the need of Cloud BI and in the future I will try to make a comprehensive article post in this website with more details. First, if you see the current state of BI - there are these typical characteristics: D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 70  High Infrastructure requirement, leading to high upfront investment  High development cost (needs special talent) as well high maintenance cost  Unpredictable workload (data volume), and skewed business growth pattern All these lead to the issues of longer cycle time and limited adoption of BI solutions. Now cloud platform, as opposed to typical in-house software platform, is basically an alternative delivery method for the software service. When you deliver the software or platform or infrastructure (as a service) through cloud, you can instantly start to get the following benefits:  Lower entry cost  Lower maintenance cost (pay as you use)  Faster deployment  Reduced risk  Lower TCO (total cost of ownership)  Multiple deployment model etc. etc. Moreover, Small and medium enterprises (SMEs) can easily adapt to this model given their typical constraints of small business. Companies like Pentaho etc. are already “in” with their products in SaaS (software as a service) model of cloud computing. But cloud models like SaaS has some typical problems (e.g. no flexibility of design, security concerns etc.). As opposed to SaaS model, we have another cloud model called PaaS - Platform as a service - which has the benefit of design flexibility. PaaS is very suitable for custom applications and even enterprise level BI applications. This cloud service is being offered by almost everyone in the BI market - - BusinessObjects - SAS - Microsoft Azure (check here: http://en.wikipedia.org/wiki/SQL_Azure ) - Vertica - Greenplum etc. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 71 17. Mapping 1. Scenario Implementation 1 Suppose we have a source port called ename with data type varchar(20) and the corresponding target port as ename with varchar(20). The data type is now altered to varchar(50) in both source and target database. Describe the changes required to modify the mapping. Answer: Reimport the source and target definition. Next open the mapping and Right click on the source port ename and use "Propagate Attribute" option. This option allows us to change the properties of one port across multiple transformations without manually modifying the port in each and every transformation. We can choose the direction of propagation (forward / backward / both) and can also select attributes of propagation e.g. data type, scale, precision etc. 2. What are mapping parameters and variables? Answer: A mapping parameter is a user definable constant that takes up a value before running a session. It can be used in SQ expressions, Expression transformation etc. A mapping variable is also defined similar to the parameter except that the value of the variable is subjected to change. It picks up the value in the following order.  From the Session parameter file  As stored in the repository object in the previous run  As defined in the initial values in the designer  Data type Default values 3. Which type of variables or parameters can be declared in parameter file? $, $$, $$$ - Can all be declared or not. Answer: There is a difference between variable and parameter.  Variable, as the name suggests, is like a variable value which can change within a session run.  Parameters are fixed and their values don't change during session run.  $ - for session level parameters which can be declared in parameter files.  $$ - for mapping level parameters which can be declared in parameter files.  $$$- Inbuilt Informatica system variables that cannot be declared in parameter files E.g. $$$SessStartTime these are constant throughout the mapping and cannot be changed. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 72 Read this article to get a detail understanding:http://www.dwbiconcepts.com/etl/14-etl- informatica/74-stop-hardcoding-follow-parameterization-technique.html 4. What are the default values for variables? Answer:  String = Null  Number = 0  Date = 1/1/1753 5. What does first column of bad file (rejected rows) indicates? Answer:  First Column - Row indicator (0, 1, 2, 3)  Second Column – Column Indicator (D, O, N, T) 6. Out of 100000 source rows some rows get discard at target, how will you trace them and where it gets loaded? Answer:  Rejected records are loaded into bad files. It has record indicator and column indicator.  Record indicator identified by (0-insert,1-update,2-delete,3-reject) and  Column indicator identified by (D-valid,O-overflow,N-null,T-truncated).  Normally data may get rejected in different reason due to transformation logic 7. What is Reject loading? Answer: During a session, the Informatica server creates a reject file for each target instance in the mapping. If the writer or the target rejects data, the Informatica server writes the rejected row into reject file. The reject file and session log contain information that helps you determine the cause of the reject. You can correct reject files and load them to relational targets using the Informatica reject load utility. The reject loader also creates another reject file for the data that the writer or target reject during the reject loading. Reject Loading During a session, the server creates a reject file for each target instance in the mapping. If the writer of the target rejects data, the server writers the rejected rows into the reject file. You can correct those rejected D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 73 data and re-load them to relational targets, using the reject loading utility. (You cannot load rejected data in- to a flat file target) Each time, you run a session, the server appends a rejected data to the reject file. Locating the BadFiles  $PMBadFileDir / Filename.bad When you run a partitioned session, the server creates a separate reject file for each partition. Reading Rejected data Ex: 3,D,1,D,D,0,D,1094345609,D,0,0.00 To help us in finding the reason for rejecting, there are two main things.  Row indicator - Row indicator tells the writer, what to do with the row of wrong data. Row indicator Meaning Rejected By o 0 Insert Writer or target o 1 Update Writer or target o 2 Delete Writer or target o 3 Reject Writer If a row indicator is 3, the writer rejected the row because an update strategy expression marked it for reject.  Column indicator - Column indicator is followed by the first column of data, and another column indicator. They appears after every column of data and define the type of data preceding it Column Indicator Meaning Writer Treats as o D Valid Data Good Data. The target accepts it unless a database error occurs, such as finding duplicate key. o Overflow Bad Data. o N Null Bad Data. o T Truncated Bad Data NOTE: NULL columns appear in the reject file with commas marking their column. Correcting Reject File Use the reject file and the session log to determine the cause for rejected data. Keep in mind that correcting the reject file does not necessarily correct the source of the reject. Correct the mapping and target database to eliminate some of the rejected data when you run the session again. Trying to correct target rejected rows before correcting writer rejected rows is not recommended since they may contain misleading column indicator. For example, a series of “N” indicator might lead you to believe the target database does not accept NULL values, so you decide to change those NULL values to Zero. However, if those rows also had a 3 in row indicator. Column, the row was rejected b the writer because of an update strategy expression, not because of a target database restriction. If you try to load the corrected file to target, the writer will again reject those rows, and they will contain inaccurate 0 values, in place of NULL values. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 74 8. Why Informatica writer thread may reject a record? Answer:  Data overflowed column constraints  An update strategy expression 9. Why target database can reject a record? Answer:  Data contains a NULL column  Database errors, such as key violations 10. Describe various steps for loading reject file? Answer:  After correcting the rejected data, rename the rejected file to reject_file.in  The rejloader used the data movement mode configured for the server. It also used the code page of server/OS. Hence do not change the above, in middle of the reject loading  Use the reject loader utility Pmrejldr pmserver.cfg [folder name] [session name] 11. Variable v1 has values set as 5 in designer (default), 10 in parameter file, and 15 in repository. While running session which value Informatica will read? Answer: Informatica read value 15 from repository 12. What are shortcuts? Where it can be used? What are the advantages? Answer: There are 2 shortcuts (Local and global) Local used in local repository and global used in global repository. The advantage is reusing an object without creating multiple objects. Say for example a source definition want to use in 10 mappings in 10 different folders without creating 10 multiple source you create 10 shortcuts. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 75 13. Can we have an Informatica mapping with two pipelines, where one flow is having a Transaction Control transformation and another not. Explain why? Answer: No it is not possible. Whenever we have a Transaction Control transformation in a mapping, the session commit type is ‘User Defined’. Whereas for a pipeline without the Transaction Control transform, the session expects the commit type to be either Source based or Target based. Hence we cannot have both the pipelines in a single mapping; rather we have to develop single mappings for each of the pipelines. 14. How can we implement Reverse Pivoting using Informatica transformations? Answer: Pivoting can be done using Normalizer transformation. For reverse-pivoting we will need to use an aggregator transformation like below: From, Col1 Col2 A 10 B 20 To, Col1 Col2 A B 10 20 can be done using one Expression transformation and one Aggregator transformation: In Expression transform, create two ports, o_col_a, o_col_b. o_col_a = IIF (col1="A", ColB, 0) o_col_b = IIF (col1="B", ColB, 0) Next in the aggregator transform, take the MAX () of o_col_a, o_col_b and map it to target A and B columns. (We may need to take SUM (), instead of MAX () if we have multiple A, B rows) 15. Is it possible to update a Target table without any key column in target? Answer: D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 76 Yes it is possible to update the target table either by defining keys at Informatica level in Warehouse designer or by using Update Override. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 77 18. Mapplet 1. What is a Mapplet? Answer: Mapplets are reusable objects that represent collection of transformations. 2. What is the difference between Reusable transformation and Mapplet? Answer: Any Informatica Transformation created in the Transformation Developer or a non-reusable pro- moted to reusable transformation from the mapping designer which can be used in multiple mappings is known as Reusable Transformation. When we add a reusable transformation to a mapping, we actually add an instance of the transformation. Since the instance of a reusable transformation is a pointer to that transformation, when we change the transformation in the Transformation Developer, its instances reflect these changes. A Mapplet is a reusable object created in the Mapplet Designer which contains a set of transformations and lets us reuse the transformation logic in multiple mappings. A Mapplet can contain as many transformations as we need. Like a reusable transformation when we use a mapplet in a mapping, we use an instance of the mapplet and any change made to the mapplet in Mapplet Designer, is inherited by all instances of the mapplet. 3. What are the transformations that are not supported in Mapplet? Answer:  Normalizer  Cobol sources  XML sources  XML Source Qualifier  Target definitions  Pre- and Post- session Stored Procedures  Other Mapplet 4. Is it possible to convert reusable transformation to a non-reusable one? Answer: Reusable transformations are created in the Transformation Developer. Another way is to promote a non-reusable transformation in a Mapping/Mapplet to reusable one. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 78 **Converting a non-reusable transformation into a reusable transformation is not reversible. But we can use the reusable transformation as a non-reusable one in any mapping or mapplet by dragging the selected Reusable Transform from the Repository Navigator and press the Ctrl key just before dropping the object in the Mapplet/Mapping designer. The same applies for creating a non-reusable session from a reusable one in the Worklet/Workflow designer. 5. What is the use of Mapplet & Worklet in project? Answer: Mapplet and Worklets allow you to create reusable objects and thus make your informatica code reusable. Just like a procedure or function in a procedural language, we can build a mapplet or worklet, to incorporate a business logic, which can be used again and again in different mapping and workflow. Mapplet can be created in PowerCenter Designer and reused in mapings. Worklet can be created in Work- flow Manager and reused in Workflows. 6. Is it possible to have a mapplet within a mapplet and worklet within a worklet? Answer: Informatica does not support mapplet within a mapplet transformation but it supports worklet within a worklet. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 79 19. Session 1. What is Session and Batches? Answer: SESSION - A Session is a set of instructions that tells the Informatica Server / Integration Service, how and when to move data from Sources to Targets. After creating the session, we can use either the server manager or the command line program pmcmd to start or stop the session. BATCHES - It Provides A Way to Group Sessions For Either Serial Or Parallel Execution By The Informatica Server. There Are Two Types Of Batches:  SEQUENTIAL - Run Session One after the Other.  CONCURRENT - Run Session at the Same Time. 2. What are various session tracing levels? Answer: Normal - default Logs initialization and status information, errors encountered, skipped rows due to transformation errors, summarizes session results but not at the row level. Terse - Log initialization, error messages, notification of rejected data. Verbose Initialization - In addition to normal tracing levels, it also logs additional initialization information, names of index and data files used and detailed transformation statistics. Verbose Data - In addition to verbose initialization, it records row level logs. 3. Can we copy a session to new folder or new repository? Answer: Yes we can copy session to new folder or repository, provided the corresponding Mapping is already in the folder or repository. 4. Is it possible to store all the Informatica session log information in a database table? Normally the session log is stored as a binary compression .bin file in SessLogs directory. Can we store the same information in database tables for future analysis? D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 80 Answer: It is not possible to store all the session log information in some table. Along with error related information we may get some other session related information from metadata repository tables like REP_SESS_LOG. To capture error data, we can configure the session as below: Go to Session->Config Object-> Error Handling Section Give the setting- Error Log Type: Relational Database. Error Log Type: Give the Database Connection, where we want to store the error tables. Error Log Table Name Prefix: Prefix for the error tables. By default, Informatica creates 4 different error tables. If we provide a prefix here the error tables will be created with the same prefix in the database. Log Row Data: This option is used to log the data at the point where the error happened. Log Source Row Data: Capture the source date for the error record. Log Source Row Data: Error data will be stored into a single column of the database table. We can specify the delimiter for the source data here. List of Error tables created by Informatica: PMERR_DATA. Stores data and metadata about a transformation row error and its corresponding source row. PMERR_MSG. Stores metadata about an error and the error message. PMERR_SESS. Stores metadata about the session. PMERR_TRANS. Stores metadata about the source and transformation ports, such as name and data type, when a transformation error occurs. The above tables are specifically used to store the information about exception (error) records - e.g. records in the reject file. We can use this as a base for error handling strategy. But this does not contain all the information that are present in session log - like performance details (thread busy percentage), details of the transformation in- voked in the session etc. We can also check the contents of REP_SESS_LOG view under Informatica repository schema; however, that too does not contain all the information. 5. Can we call a shell script from session properties? Answer: The Integration Service can execute shell commands at the beginning or at the end of the session. The Work- flow Manager provides the following types of shell commands for each Session task:  Pre-session command  Post-session success command  Post-session failure command Use any valid UNIX command or shell script for UNIX nodes, or any valid DOS or batch file for Windows nodes. Configure the session to run the pre- or post-session shell commands. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 81 6. Can we change the Source and Target table names in Session level? Answer: Yes, we can change the source and target table names in the session level. Go to the session and navigate to the mapping tab. Select the source/target to be changed- for target mention new table name in “Target Table Name” & for source choose “Source Table Name”. One more suitable method would be to parameterize the source and target table name. We can run the same mapping concurrently using different parameter files. We have to enable concurrent run mode in the Workflow level. Also find more information regarding parameterization. 7. How to write flat file column names in target? Answer: There are two options available in session properties to take care of this requirement. For this, Go to Map- ping Tab Target Properties and Choose the header option as Output Field names OR Use Header Command output File. Option 1, will create your output file with a header record and the column heading names will be same as your Target transformation port names. Option 2, we can create our command to generate the header record text. We can use an 'echo' command here to get this created. Here is an example echo '"Employee ID"|"Department ID"' It is recommended using the second option as it gives more flexibility for writing the column names. 8. What are the ERROR tables present in Informatica? Answer:  PMERR_DATA- Stores data and metadata about a transformation row error and its corresponding source row.  PMERR_MSG- Stores metadata about an error and the error message.  PMERR_SESS- Stores metadata about the session.  PMERR_TRANS- Stores metadata about the source and transformation ports, such as name and data type, when a transformation error occurs. 9. What are the alternate ways to stop a session without using “STOP ON ERRORS” option set to 1 in session properties? Answer: We can also use the functions STOP () or ERROR () in an expression transformation to stop the execution of a session based on some user-defined conditions. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 82 10. Suppose a session fails after loading of 10,000 records in the target. How can we load the records from 10,001 when we run the session next time? Answer: If we configure the Session for Normal load rather than Bulk load & by using Recovery Strategy in the Session Properties & selecting the Option “Resume from last Check point”, then we can run the Session from the last Commit Interval. In this case if we specify the Commit Interval as 10,000 & the Integration Service issues a commit after loading 10,000 records then you can load the records from 10,001. If 9999 rows were loaded and the session fails and Integration Service did not issue any commit as the Com- mit Interval in this case is 10,000 then we cannot perform Recovery. In this case truncate the Target Table & Restart the session. 11. Define the types of Commit intervals apart from user defined? Answer: The different commit intervals are: Target-based commit. The Informatica Server commits data based on the number of target rows and the key constraints on the target table. The commit point also depends on the buffer block size and the commit interval. Source-based commit. The Informatica Server commits data based on the number of source rows. The commit point is the commit interval you configure in the session properties. 12. Suppose session is configured with commit interval of 10,000 rows and source has 50,000 rows explain the commit points for source based commit & target based commit. Assume appropriate value wherever required? Answer:  Target Based commit (First time Buffer size full 7500 next time 15000) Commit Every 15000, 22500, 30000, 40000, 50000  Source Based commit(Does not affect rows held in buffer) Commit Every 10000, 20000, 30000, 40000, 50000 D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 83 13. How to capture performance statistics of individual transformation in the mapping and explain some important statistics that can be captured? Answer: Use tracing level Verbose data. 14. How can we parameterize success or failure email list? Answer: We can parameterize the email user list and modify the values in parameter file. Use $PMSuccessEmailUser, $PMFailureEmailUser. Also we can use pmrep command to update the email task: updateemailaddr -d <folder_name> -s <session_name> -u <success_email_address> -f <failure_email_address> 15. Is it possible that a session failed but still the workflow status is showing success? Answer: If the workflow completes successfully it will show the execution status of success irrespective of whether any session within the workflow failed or not. The workflow success status has nothing to do with session failure. If and only if we set the session general option in the workflow designer Fail Parent if this task fails, then only the workflow status will display as failed on session failure. 16. What is Busy Percentage? Answer: Duration of time the thread was occupied compared to total run time of the mapping. So let’s say, we have one writer thread - this thread is internally responsible for writing data to the target table/ file. Now if our mapping runs for 100 seconds but the time taken by the mapping to write the data to the target is only 20 seconds (because other time it was busy in reading/ transforming the data), then busy percentage of the writer thread is 20% D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 84 17. Can we write a PL/SQL block in pre and post session or in target query override? Answer: Yes we can. Remember always to put a backslash (\) before any semi-colon ( ; ) we use in the PL-SQL block. 18. Whenever a session runs does the data gets overwritten in a flat file target? Is it possible to keep the existing data and add the new data to the target file? Answer: Normally with every session run target file data will be overwritten, except if we select “Append if Exist” (8x onwards) option for the Target session Property which will append the new data to the existing data in the flat file target. 19. Can we use the same session to load a target table in different databases having same target definition? Answer: Yes we can use the same session to load same target definition in different databases with the help of the Parameterization; i.e. using different parameter files with different values for the parameterized Target Con- nection object $DBConnection_TGT and Owner/Schema name Table Name Prefix with $Param_Tgt_Tablename. To run the single workflow with the session, to load two different database target tables we can consider using Concurrent workflow Instances with different parameter files. Even we can load two instance of the same target connected in the same pipeline. At the session level use different relational connection object created for different Databases. 20. How do you remove the cache files after the transformation? Answer: After session complete, DTM remove cache memory and deletes caches files. In case using persistent cache and Incremental aggregation then caches files will be saved. 21. Why doesn't a running session QUIT when Oracle or Sybase return fatal errors? Answer: The session will only QUIT when its threshold: "Stop on errors" is set to 1. Otherwise the session will continue to run. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 85 22. If we have written a source override query in source qualifier in mapping level but have modified the query in session level SQL override then how integration service behaves. Answer: Informatica Integration Service treats the Session Level Query as final during the session run. If both the queries are different Integration Service will consider the Session level query for execution and will ignore the Mapping level query. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 86 20. Workflow 1. What is the difference between STOP and ABORT options in Workflow? Answer: When we issue the STOP command on the executing session task, the Integration Service stops reading data from source. It continues processing, writing and committing the data to targets. If the Integration Service cannot finish processing and committing data, we can issue the abort command. In contrast ABORT command has a timeout period of 60 seconds. If the Integration Service cannot finish processing and committing data within the timeout period, it kills the DTM process and terminates the session. We can stop or abort tasks, worklets within a workflow from the Workflow Monitor or Control task in the workflow or from command task by using pmcmd stop or abort command. We can also call the ABORT function from mapping level. When we stop or abort a task, the Integration Service stops processing the task and any other tasks in the path of the stopped or aborted task. The Integration Service however continues processing concurrent tasks in the workflow. If the Integration Service cannot stop the task, we can abort the task. The Integration Service aborts any workflow if the Repository Service process shuts down. 2. Running Informatica Workflow continuously – How to run a workflow continuously until a certain condition is met? Answer: We can schedule a workflow to run continuously. A continuous workflow starts as soon as the In- tegration Service initializes. If we schedule a real-time session to run as a continuous workflow, the Integration Service starts the next run of the workflow as soon as it finishes the first. When the workflow stops, it restarts immediately. Alternatively for normal batch scenario we can create conditional-continuous workflow as below. Suppose wf_Bus contains the business session that we want to run continuously until a certain conditions is meet before it stops, may be presence of file or particular value of workflow variable etc. So modify the workflow as Start-Task followed by Decision Task which evaluates a condition to be TRUE or FALSE. Based on this condition the workflow will run or stop. Next use the Link Task to link the business session for $Decision.Condition=TRUE. For the other part use a Command Task for $Decision.Condition=FALSE. In the command task create a command to call a dummy workflow using pmcmd functionality. e.g. "C:\Informatica\PowerCenter8.6.0\server\bin\pmcmd.exe" startworkflow -sv IS_info_repo8x -d Domain_hp -u info_repo8x -p info_repo8x -f WorkFolder wf_dummy Next create the dummy workflow name it as wf_dummy. Place a Command Task after the Start Task. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 87 Within the command task put the pmcmd command as "C:\Informatica\PowerCenter8.6.0\server\bin\pmcmd.exe" startworkflow -sv IS_info_repo8x -d Domain_sauravhp -u info_repo8x -p info_repo8x -f WorkFolder wf_Bus In this way we can manage to run a workflow continuously. So the basic concept is to use two workflows and make them call each other. 3. How do we send emails from Informatica after the successful completion of one session? The email will contain the job name/ session start time and session end time in the message body. Answer: The first thing is to have "mail" utility configured in the Informatica server (UNIX/WINDOWS). After that, we will use the Informatica Email Task. We can create a email task and call it in the session level “On Success Email”. Here we can use Informatica pre-build variables like- mapping name (%m), session start time (%b) etc. 4. Scenario Implementation 1 How to pass a value calculated in mapping variable to the email message. The email will be sent in HTML format with a predefined message in which one value will be populated from one mapping variable. Sup- pose, the predefined message is: <html> <body> The last transaction service ID is: <informatica_variable> </body> </html> In the place of <informatica_variable>, the value of the mapping variable at the end of the session will go. Answer: We cannot use a mapping variable in Workflow or Session level. It is local to a mapping. Instead, we have to use a Workflow variable for this purpose. But, we cannot pass the value of the Mapping Variable to the Workflow variable directly from your mapping. 1) Write the calculated value in some Flat File using your mapping say "value.txt". 2) Create a shell script say "mail.sh" to send the 2nd mail. Read the value from the "value.txt" into a variable in "mail.sh". Use this variable in the body of the mail. 3) Create a Cmd task in the WF level. Call this "mail.sh" in that Cmd task. 4) Use this Cmd task upstream of your actual session and link it on its success. 5. How can we send two separate emails after a successful session run? Answer: D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 88 The problem is we cannot call two email tasks from one session i.e. from session level “On Success Email”. So, for the second email we can create another Email Task following the Session using and link them using Link Task with execution condition as status=SUCCEEDED. 6. What is Cold Start in Informatica? Answer: In general terms, “Cold Start” means ‘To start a program from the very beginning, without being able to continue the processing that was occurring previously when the system was interrupted.’ With respect to Informatica, we can resume a stopped or failed real-time session. To resume a session, we must restart or recover the session. The Integration Service can recover a session automatically if you enabled the session for automatic task recovery. When you restart a session, the Integration Service resumes the session based on the real-time source. Depending on the real-time source, it restarts the session with or without recovery. We can restart a task or workflow in cold start mode. When you restart a task or workflow in cold start mode, the Integration Service discards the recovery information and restarts the task or workflow. For e.g. if a workflow failed in between and we don't want to recover data because we manually did all clean up of data in the impacted target tables. If workflow recovery is enabled then we can opt for a cold start which will skip recovery task. Cold start will remove all recover data if any stored when session failed.  When we restart a stopped or failed task or workflow that has recovery enabled in cold start mode, the Integration Service discards the recovery information and restarts the task or workflow.  Cold Start Task, Cold Start Workflow or Cold Start Workflow from Task commands can be executed from the Workflow Manager, Workflow Monitor, or pmcmd command line programs.  If we restart a session in cold start mode, targets may receive duplicate rows.  So avoid cold start and restart the session with recovery to prevent data duplication.  So if recovery is not enabled in a session, then there is no difference between cold start and restart. 7. Scenario Implementation 2 Email - I have a llist of 10 peoples in email after session failure. can we edit the list emails dynamically - I mean can we add or delete email ID without touching the mapping. Answer: We can parameterize the email user list and modify the values in parameter file. Use $PMSuccessEmailUser, $PMFailureEmailUser. Also you can use pmrep command to update the email task: updateemailaddr -d <folder_name> -s <session_name> -u <suc- cess_email_address> -f <failure_email_address> You can create a distribution list and use that DL in the session failure cmd. What so ever emails will be listed in the DL will receive the mail. Later on you can add/remove the emails in the DL depending upon your requirement. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 89 8. We know there are 3 options for Session recovery strategy - Restart task, Fail task and continue running the workflow, Resume from last checkpoint whenever a session fails. How do we restart a workflow automatically without any manual intervention in the event of session failure? Answer: Select “Automatically recover terminated tasks” option in workflow properties. Also we can specify the maximum number of auto attempts in the workflow property “Maximum automatic recovery attempts”. 9. What is the difference Real-time and continuous workflows? Answer: Real-time Workflow is source XML Message triggered workflow, whereas if any workflow which runs continuously using two workflows and command line arguments to call each other. 11. Scenario Implementation 3 Suppose we have two workflows workflow 1 (wf1) having two sessions (s1, s2) and workflow 2 (wf2) having three sessions (s3, s4, s5) in the same folder, like below wf1: s1, s2 wf2: s3, s4, s5 How can we run s1 first then s3 after that s2 next s4 and s5 without using pmcmd command or unix script? Answer: Use Command Task or Post Session Command to create touch file and use Event Wait Task to wait for the file (Filewatch Name). Combination of Command Task and Event Wait will help to solve the problem. WF1----->S1------>CMD1----->EW2------>S2------->CMD3 WF2----->EW1--->S3--------->CMD2----->EW3---->S4------>S5 So run both the workflows, session s1 starts and after successful execution calls command task cmd1. cmd1 generates a touch file say s3.txt After that the execution passes to event wait ew2. Immediately event wait ew1 will start to process session s3 after the file s3.txt was generated. Next after success of session s3 it will pass the control to command task cmd2 which in turn will generate a touch file say s2.txt and passes the control to event wait task ew3. Immediately at the same time the event wait ew2 gets started after receiving the event wait file s2.txt and passes the control to session s2. After completion of session s2 it triggers command task cmd3 which in turn generates a wait file s4.txt and the workflow wf1 ends. On the other hand the event wait ew3 gets triggered with wait file s4.txt in place and calls the session s4 which in turn after success triggers the last session s5 and the workflow wf2 completes. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 90 12. How do we send a session failure mail with the workflow or session log as attachment? Answer: Design an Informatica email task to send email communication in the event of session failure and used email variable %g to attach the corresponding session log. Email Variables: (%g) - To attach session log. (%a<>) - To attach any file, Absolute path needs to be given <>. 13. Explain deadlock in Informatica and how do we resolve it? Answer: In Database level deadlock normally occurs when two concurrent user sessions are trying to apply a DML command for same row in a table. Say for example, below query got executed by us- er1 in session1 update emp set deptno=20 where deptno=10; Before user1 is commits the transaction, if user2 from session2 execute the same query as below , it causes deadlock error. update emp set deptno=30 where deptno=10; In informatica normally deadlock occurs when two sessions are updating or deleting records from a table in parallel, (parallel insert is not a problem). One option to avoid deadlock is to identify those sessions and make them sequential. Another option is to make use of the session level properties such as ‘deadlock retry limits’ and ‘deadlock recovery option’ 14. Scenario Implementation 4 Busy Percentage is given by (runtime-idle time) * 100 / runtime. If a thread is having 0 idle time, which means more Busy Percentage. So do we need to tune that thread component? Why is it like that? So does it means we need to tune the thread whose busy percentage (BP) is more or the one having more idle time. Answer: 3 persons are asked to run 1 mile each. Each one of them is allotted 20 minutes of time. First person completes 1 mile in 5 minutes and stands idle other 15 minutes of his allotted time. The 2nd person completes it in 10 minute and sits idle the rest 10 minute. The last one takes all 20 minutes and idle for 0 minutes. Who is the worst performer? Isn't it the last person who had no idle time? It's the same for a thread with 0 idle time. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 91 15. How can we pass a value from one workflow to another? Answer: Pass the Workflow variable value to a session variable in pre-assignment and then next to mapping parameter. Next develop a mapping to generate a parameter file with the desired value as a workflow variable that can be passes to the next workflow using this parameter file. Alternatively, develop the mapping to store the value in a flat file or Database table. Next create another mapping to use that in the next workflow by passing it to the session in post-assignment and then to workflow level if required. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 92 21. Administration 1. What is Load Manager? Answer: The load Manager performs the following tasks  Manages session and batch scheduling.  Locks the session and read session properties.  Reads the parameter file.  Expand the server and session variables and parameters.  Verify permissions and privileges.  Validate source and target code pages.  Create the session log file.  Create the Data Transformation Manager which executes the session. 2. What is DTM process? How many threads it creates to process data, explain each thread in brief? Answer: After the load manager performs validations for the session, it creates the DTM process. The DTM process is the second process associated with the session run. The primary purpose of the DTM process is to create and manage threads that carry out the session tasks. The DTM allocates process memory for the session and divide it into buffers. This is also known as buffer memory. It creates the main thread, which is called the master thread. The master thread creates and manages all other threads. If we partition a session, the DTM creates a set of threads for each partition to allow concurrent processing. When Informatica server writes messages to the session log it includes thread type and thread ID. Following are the types of threads that DTM creates:  MASTER THREAD - Main thread of the DTM process. Creates and manages all other threads.  MAPPING THREAD - One Thread to Each Session. Fetches Session and Mapping Information.  Pre and Post Session Thread - One Thread Each To Perform Pre and Post Session Operations.  READER THREAD - One Thread for Each Partition for Each Source Pipeline.  WRITER THREAD - One Thread for Each Partition If Target Exist in the Source pipeline Write to the Target.  TRANSFORMATION THREAD - One or More Transformation Thread For Each Partition. 3. Can you create a folder within designer? Answer: D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 93 Not possible 4. How do you take care of security using a repository manager? Answer:  Using repository privileges, folder permission and locking.  Repository privileges(Session operator, Use designer, Browse repository, Create session and batches, Administer repository, administer server, super user)  Folder permission(owner, groups, users)  Locking(Read, Write, Execute, Fetch, Save) 5. What are the different uses of a repository manager? Answer: Repository manager used to create repository which contains metadata the Informatica uses to transform data from source to target. And also it use to create informatica users and folders and copy, backup and re- store the repository 6. What are 2 modes of data movement in Informatica Server? Answer: The data movement mode depends on whether Informatica Server should process single byte or multi-byte character data. This mode selection can affect the enforcement of code page relationships and code page validation in the Informatica Client and Server.  Unicode – IS allows 2 bytes for each character and uses additional byte for each non-ascii character (such as Japanese characters)  ASCII – IS holds all data in a single byte The IS data movement mode can be changed in the Informatica Server configuration parameters. This comes into effect once you restart the Informatica Server. 7. What is Code Page used for? Answer: A code page contains the encoding to specify characters in a set of one or more languages. An encoding is the assignment of a number to a character in the character set. Code Page is used to identify characters that might be in different languages. If you are importing Japanese data into mapping, then u must select the Japanese code page for the source data. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 94 8. What is Code Page Compatibility? Answer: Compatibility between code pages is used for accurate data movement when the Informatica Sever runs in the Unicode data movement mode. If the code pages are identical, then there will not be any data loss. One code page can be a subset or superset of another. For accurate data movement, the target code page must be a superset of the source code page. Superset - A code page is a superset of another code page when it contains the character encoded in the other code page. It also contains additional characters not contained in the other code page. Subset - A code page is a subset of another code page when all characters in the code page are encoded in the other code page. 9. What is default block buffer size? Answer: 64K 10. What is default LM shared memory size? Answer: 2MB 11. Define Server Concepts with respect to memory buffers Answer: The Informatica server used three system resources – CPU, Shared Memory & Buffer MemoryInformatica server uses shared memory, buffer memory and cache memory for session information and to move data between session threads. LM Shared Memory - Load Manager uses both process and shared memory. The LM keeps the information server list of sessions and batches, and the schedule queue in process memory. Once a session starts, the LM uses shared memory to store session details for the duration of the session run or session schedule. This shared memory appears as the configurable parameter (LMSharedMemory) and the server allots 2,000,000 bytes as default. This allows you to schedule or run approximately 10 sessions at one time. DTM Buffer Memory - The DTM process allocates buffer memory to the session based on the DTM buffer poll size settings, in session properties. By default, it allocates 12,000,000 bytes of memory to the session. DTM divides memory into buffer blocks as configured in the buffer block size settings. (Default: 64,000 bytes per block) D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 95 12. What are the two programs that communicate with the Informatica Server? Answer: Informatica provides Server Manager and pmcmd programs to communicate with the Informatica Server: Server Manager - A client application used to create and manage sessions and batches, and to monitor and stop the Informatica Server. You can use information provided through the Server Manager to troubleshoot sessions and improve session performance. pmcmd - A command-line program that allows you to start and stop sessions and batches, stop the Informatica Server, and verify if the Informatica Server is running. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 96 22. Command Line Arguments 1. What is pmcmd commands? Answer: pmcmd is a command line program to communicate with the Informatica server. This does not replace the server manager, since there are many tasks that you can perform only with server Manager. These are some operations that you can do using PMCMD - Start, Stop and abort the session 2. What is pmrep commands? Answer: You can use pmrep to create or delete repository users and groups. You can also use pmrep to modify repository privileges assigned to users and groups. 3. How do we start & stop session from pmcmd command line? Answer: Use the following syntax to ping the Informatica Server on a UNIX system: pmcmd ping [{user_name | %user_env_var} {password | %password_env_var}] [hostname:]portno Use the following syntax to start a session or batch on a UNIX system: pmcmd start {user_name | %user_env_var} {password | %password_env_var} [hostname:]portno [folder_name:]{session_name | batch_name} [:pf=param_file] session_flag wait_flag Use the following syntax to stop a session or batch on a UNIX system: pmcmd stop {user_name | %user_env_var} {password | %password_env_var} [hostname:]portno[folder_name:]{session_name | batch_name} session_flag Use the following syntax to stop the Informatica Server on a UNIX system: pmcmd stopserver {user_name | %user_env_var} {password | %pass- word_env_var} [hostname:]portno D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 97 23. Metadata Repository 1. Is there any metadata query to find the list of Informatica folder name, workflow names which are migrated in a particular Quarter? Answer: The below SQL will give you the list of folders, workflows and their last saved date. SELECT W.SUBJECT_AREA FOLDER_NAME, W.WORKFLOW_NAME, W.WORKFLOW_LAST_SAVED FROM REP_WORKFLOWS W ORDER BY TO_DATE (W.WORKFLOW_LAST_SAVED, 'MM/DD/YYYY HH24:MI:SS') DESC 2. How can I run Metadata Queries in Informatica PowerCenter? Answer: Informatica metadata is stored in some database repository. This can be the same database where we have our source/ staging / target tables or it may be a completely different database (that is the case in general). We can execute User defined queries metadata queries only on this database. We may need to ask Informatica administrator about the database login credentials. We need to have a read access username/password for the database. After that we can connect to the database and run the metadata queries. 3. Write a metadata query to identify the sessions having truncate option enabled Answer: select task_name, 'Truncate Target Table' ATTR, decode(attr_value,1,'Yes','No') Value from OPB_EXTN_ATTR OEA, REP_ALL_TASKS RAT where OEA.SESSION_ID=rat.TASK_ID and attr_id=9 4. Where can I find a history / metrics of the load sessions that have occurred in Informatica? Answer: D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 98 The tables which house this information are OPB_LOAD_SESSION, OPB_SESSION_LOG, and OPB_SESS_TARG_LOG. OPB_LOAD_SESSION contains the single session entries, OPB_SESSION_LOG contains a historical log of all session runs that have taken place. OPB_SESS_TARG_LOG keeps track of the errors, and the target tables which have been loaded. Keep in mind these tables are tied together by Session_ID. If a session is deleted from OPB_LOAD_SESSION, it's history is not necessarily deleted from OPB_SESSION_LOG, nor from OPB_SESS_TARG_LOG. Unfortunately - this leaves un-identified session ID's in these tables. How- ever, when you can join them together, you can get the start and complete times from each session. 5. How to extract the workflow monitor record information from Informatica metadata repository? Answer: SELECT DISTINCT FOLDER_NAME, WORKFLOW_NAME, SESSION_NAME, START_DATE, START_TIME, END_DATE, END_TIME, DURATION "DURATION IN DD:HH:MI:SS", SOURCE_ROWS, TARGET_ROWS, REJECTED_ROWS, REJECTED_STATUS, STATUS, FAILED_REASON FROM ( SELECT t.SUBJECT_AREA FOLDER_NAME, t.WORKFLOW_NAME, t.SESSION_NAME, DECODE(t.RUN_STATUS_CODE, 2,NULL, TO_CHAR(t.ACTUAL_START,'DD-MON-YYYY')) START_DATE, DECODE(t.RUN_STATUS_CODE, 2,NULL, TO_CHAR(t.ACTUAL_START,'HH24:MI:SS AM')) START_TIME, DECODE(t.RUN_STATUS_CODE, 2,NULL, TO_CHAR(t.SESSION_TIMESTAMP,'DD-MON- YYYY')) END_DATE, DECODE(t.RUN_STATUS_CODE, 2,NULL, TO_CHAR(t.SESSION_TIMESTAMP,'HH24:MI:SS PM')) END_TIME, DECODE(t.RUN_STATUS_CODE, 2,NULL, TRUNC((((86400*(SESSION_TIMESTAMP- ACTUAL_START))/60)/60)/24)||':' || (TRUNC(((86400*(SESSION_TIMESTAMP-ACTUAL_START))/60)/60) - 24*(TRUNC((((86400*(SESSION_TIMESTAMP-ACTUAL_START))/60)/60)/24)))||':' || (TRUNC((86400*(SESSION_TIMESTAMP-ACTUAL_START))/60) - 60*(TRUNC(((86400*(SESSION_TIMESTAMP-ACTUAL_START))/60)/60))) ||':' || (TRUNC(86400*(SESSION_TIMESTAMP-ACTUAL_START)) - 60*(TRUNC((86400*(SESSION_TIMESTAMP-ACTUAL_START))/60)))) DURATION , DECODE(t.RUN_STATUS_CODE, 2,NULL, t.SUCCESSFUL_SOURCE_ROWS) SOURCE_ROWS , DECODE(t.RUN_STATUS_CODE, 2,NULL, t.SUCCESSFUL_ROWS) TARGET_ROWS, DECODE(t.RUN_STATUS_CODE, 2,NULL, t.FAILED_ROWS) REJECTED_ROWS, DECODE(t.RUN_STATUS_CODE, 2,NULL,CASE WHEN t.SUCCESSFUL_SOURCE_ROWS <> t.SUCCESSFUL_ROWS THEN 'VALIDATE THE MISMATCH' END) REJECTED_STATUS, D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 99 DECODE(t.RUN_STATUS_CODE, 1,'Succeeded', 2,'Disabled', 3,'Failed', 4,'Stopped', 5,'Aborted', 6,'Running', 7,'Suspending', 8,'Suspended', 9,'Stopping', 10,'Aborting', 11,'Waiting', 15,'Terminated') AS STATUS, REPLACE(REPLACE(t.FIRST_ERROR_MSG,CHR(10),' '),'No errors encountered.','') AS FAILED_REASON, RANK() OVER (PARTITION BY session_name ORDER BY t.SESSION_TIMESTAMP DESC) rnk FROM REP_SESS_LOG t WHERE t.SUBJECT_AREA='<<informatica_folder_name>>' ) sess_run WHERE sess_run.rnk = 1 ORDER BY START_DATE, START_TIME Don't forget to put the informatica folder name in the SUBJECT_AREA filter above. Also we might need to make some other small adjustments above to better suit your purpose / informatica version. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 100 24. Repository Manager 1. Describe the steps for export and import? Answer:  Open the folder which contains the mapping.  Check Out the mapping to be exported.  Click Repository-->Export Objects and save it in your local drive.  Open the folder in which you want to export the mapping.  Click Repository-->Import Objects and select mapping xml file and Click import.  Once the mapping is imported to the new folder just save it and Check In. 2. What are the various methods of code migration or which is the best way of deployment? Answer: The best way is, arguably, the XML export and import, as it is very easy. But again it all depends upon the requirement; if we want to migrate some workflows with dependent objects at once shot, then the suggested way is XML export and import. If you need to migrate only some small objects (say some designer or workflow manager objects) then we can go for copying through Repository Manager or through Designer(for Designer objects) or through Work- flow manager (for Workflow manager objects) itself. But for this we have to be connected to both the repositories while coping. Sometime we may need to migrate entire project and want to have a complete log of deployment, then we can go for creating Deployment Group using Deployment Wizard. We might use pmrep to automate exporting objects on a daily or weekly basis. To use this command, we must create a Control File with all the specifications that the Copy Wizard requires. The control file is an XML file defined by the depcntl.dtd file. A deployment control file is an XML file that you use with the DeployFolder and DeployDeploymentGroup pmrep commands to deploy a folder or deployment group. We can create a deployment control file manually to provide parameters for deployment, or you can create a deployment control file with the Copy Wizard. If you create the deployment control file manually, it must conform to the depcntl.dtd file that is installed with the PowerCenter Client. You include the location of the depcntl.dtd file in the deployment control file. One good thing is we can roll back a deployment to purge the deployed versions from the target repository or folder. When we roll back a deployment, you roll back all the objects in a deployment group that we deployed at a specific date and time. We cannot roll back part of a deployment. In the PowerCenter Client, we can export repository objects to an XML file and then import repository objects from the XML file. Use the following client applications to export and import repository objects:  Repository Manager: You can export and import both Designer and Workflow Manager Objects.  Designer: You can export and import Designer objects.  Workflow Manager: You can export and import Workflow Manager objects. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 101  pmrep: You can export and import both Designer and Workflow Manager objects. You might use pmrep to automate exporting objects on a daily or weekly basis. 3. What are the various options for ETL code migration Answer: There are couples of Options Available for Code migration. If you have a Versioned Repository, as the first step Check in all the Workflows and dependent objects. Now we have Couple of different ways to achieve the migration. Option 1. Now you can export the Workflow from Repository Manager using the Export Object Option to export as XML and then import into QA using Repository Manager Import Object Option. Option 2. You can keep your Dev and QA is in the same Repo, you can just do the Drag and Drop option. For this Open Both Dev and QA Folders in Repository Manager and Just Drag the Objects from Dev to QA. Option 3. You can Create a Deployment Group using Repository Manager and attach all the Workflows you need to migrate in the Deployment group and This Deployment group can be migrated Option 4. You have the Option to Migrate the Entire Folder As well when we can Use these Options Option 1. We can use this Option when the number of Workflows to migrate is few. If you do not have Informatica Versioned Repository, These Exported XML can be used to keep your Versions. Option 2. When you have less number of Workflows to Migrate you can use this option. Option 3. Large number of Objects migrated together. It will keep the list of Objects migrated as a group and in case of a rollback is required it is easy in this approach. Option 4. Mostly used when you migrate a Project for the first time to QA with a large number of workflows . 4. What is labeling in Informatica? Answer: we can see label concept in many places like in our mail box. Some time we do group some of our mails to different level. Like marking some mails to personal level. In Informatica, Label is a global object that you can associate with any versioned object or group of versioned objects in a repository. You may want to apply labels to versioned objects to achieve the following results: - Track versioned objects during development. - Improve query results. - Associate groups of objects for deployment. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 102 - Associate groups of objects for import and export. For example, you might apply a label to sources, targets, mappings, and sessions associated with a workflow so that you can deploy the workflow to another repository without breaking any dependency. You can apply the label to multiple versions of an object. Or you can specify that you can apply the label to one version of the object. You can create and modify labels in the Label Browser. From the Repository Manager, click Versioning > La- bels to browse for a label. Informatica Version control is nothing but a team based development methodology where we create copies of the actual objects to tract the modification using check in and checkout options. 5. Suppose having Informatica Version Control in place, can we revert back an object to a state of two previous version. Answer:  From the Version History of the Object, open the required version of the Object in Workspace.  Next export the xml metadata of the Object.  Next Check out the Object.  Then import the metadata exported earlier.  Save and Check In the Object. 6. What do we mean by Team based development in Informatica? Answer: Team based development is nothing but version control for the metadata objects. If we have the team-based development option, we can enable version control for the repository. A versioned repository stores multiple versions of an object. Each version is a separate object with unique properties. A PowerCenter version control feature allows us to efficiently develop, test, and deploy metadata into production. During development, we can perform the following change management tasks to create and manage multiple versions of objects in the repository:  Check out and check in versioned objects.  Compare objects. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 103  Track changes to an object.  Delete or purge a version.  Use global objects such as queries, deployment groups, and labels to group versioned objects. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 104 25. Scenario Questions 1. Suppose we have ten source flat files of same structure. How can we load all the files in target database in a single batch run using a single mapping? Answer: After we create a mapping to load data in target database from source flat file definition, next we move on to the session property of the Source Qualifier. To load a set of source files we need to create a file say final.txt containing the source flat file names, ten files in our case and set the Source filetype option as Indirect. Next point this flat file final.txt, fully qualified with Source file directory and Source filename. 2. Suppose we have two Source Qualifier transformations SQ1 and SQ2 connected to Target tables TGT1 and TGT2 respectively. How do you ensure TGT2 is loaded after TGT1? Answer: If we have multiple Source Qualifier transformations connected to multiple targets, we can designate the order in which the Integration Service loads data into the targets. In the Mapping Designer, We need to configure the Target Load Plan based on the Source Qualifier transformations in a mapping to specify the required loading order. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 105 It defines the order in which Informatica server loads the data into the targets. This is to avoid integrity constraint violations D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 106 3. Suppose we have a Source Qualifier transformation that populates two target tables. How do you ensure TGT2 is loaded after TGT1? Answer: In the Workflow Manager, we can Configure Constraint based load ordering for a session. The Integration Service orders the target load on a row-by-row basis. For every row generated by an active source, the Inte- gration Service loads the corresponding transformed row first to the primary key table, then to the foreign key table. Hence if we have one Source Qualifier transformation that provides data for multiple target tables having primary and foreign key relationships, we will go for Constraint based load ordering. 4. Suppose we have the EMP table as our source. In the target we want to view those employees whose salary are greater than or equal to the average salary for their departments. Describe your mapping approach. Answer: Our Mapping will look like this: D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 107 To start with the mapping we need the following transformations: After the Source qualifier of the EMP table place a Sorter transformation. Sort based on DEPTNO port. Next we place a Sorted Aggregator Transformation. Here we will find out the AVERAGE SALARY for each (GROUP BY) DEPTNO. When we perform this aggregation, we lose the data for individual employees. To maintain employee data, we must pass a branch of the pipeline to the Aggregator Transformation and pass a branch with the same sorted source data to the Joiner transformation to maintain the original data. When we join both branches of the pipeline, we join the aggregated data with the original data. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 108 So next we need Sorted Joiner Transformation to join the sorted aggregated data with the original data, based on DEPTNO. Here we will be taking the aggregated pipeline as the Master and original dataflow as De- tail Pipeline. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 109 After that we need a Filter Transformation to filter out the employees having salary less than average salary for their department. Filter Condition: SAL >= AVG_SAL D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 110 Finally we place the Target table instance. 5. How can we perform changed data capture based on load sequence number (integer) column present in the Source table? Answer: Create a Mapping Variable as integer data type and Aggregation type as MAX. Set the value of this mapping variable in any of these transformations (Expression, Filter, Router or Update Strategy). Use SETMAXVARIABLE( $$Variable, load_seq_column ) function. This function will assign the MAX sequence number of that particular load into the variable $$variable. This function executes only if a row is marked as insert. SETMAXVARIABLE ignores all other row types and the current value remains unchanged. The function sets the current value of a mapping variable to the higher of two values- the current value of the variable or the value from the source column for each record. At the end of a successful session, the Integration Service saves the final current value to the repository. When used with a session that contains multiple partitions, the Integration Service generates different current values for each partition. At the end of the session, it saves the highest current value across all partitions to the repository. Unless overridden, it uses the saved value as the initial value of the variable for the next session run. Now since the max sequence number for previous load is captured in this mapping variable and is saved in the repository. We can use this variable as a filter in the Source Qualifier query. Next time when we run the workflow, it will only extract those records having load sequence number greater than this sequence number. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 111 6. Scenario Implementation 1 In my mapping I have 3 tables that we are joining. In the source query we want to filter the data based off a value that is stored in one of our target tables. Is there a way of pulling that one particular value from that target table and be able to use it in the filter in the source qualifier? Basically the value is a load sequence number that gets incremented with each session run. So when the session runs again we only pull records that are greater than that load sequence number. Answer: There are different options to solve the problem. Option 1: Assumption- Source and target tables cannot be accessed using a single DB Connection and "load Sequence Number" is modified by the current process. In this case you can use a mapping variable in the mapping and set the value of the mapping variable to the highest/current value using the SETMAXVARIABLE function. This value will be stored in Informatica repository and the same value can be used in Source Qualifier Filter for the next session run. If incase the workflow fails, the value of the mapping variable will not get incremented. Steps  Define mapping Variable with Aggregation type as MAX.  Use SETMAXVARIABLE($$variable, “Current load Sequence Number") function to store the value into repository.  Use the variable $$Variable in Source Qualifier filter. We can provide a default value for the variable and change the value during your code migration to set the starting value Option 2: Assumption- Source and target tables cannot be accessed using a single DB Connection and "load Sequence Number" is modified by different process. In this case you can create a mapping parameter and need to pass the value as a parameter. Steps  Create a workflow to get the latest "load Sequence Number" and create a parameter file. This workflow will write a flat file which will contain the parameter value. E.g. [wf_DAILY_INCR_LOAD] $$Variable=100  In the actual mapping Define a mapping parameter $$Variable and use $$Variable in the Source Qualifier Each time you need to run the workflow which creates the parameter file before your actual workflow is run Option 3: Assumption- Source and Target table can be accessed using a single DB connection. If both your source and target tables are connected using a single DB Connection, we can write the filter to get the latest data in the Source Qualifier itself joining all the tables. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 112 7. How can we load ‘x’ records (user defined record numbers) out of ‘N’ records from source dynamically, without using filter and sequence generator transformation? Answer:  Take a mapping parameter say $$CNT to pass the number of records we want to load dynamically by changing in the parameter file each time before session run.  Next after the Source Qualifier use an Expression transformation and create one output port say CNTR with value CUME (1).  Next use an Update Strategy with condition IIF ($$CNT >= CNTR, DD_INSERT, DD_REJECT). 8. Suppose we have ‘n’ number of rows in the Source and we have two target tables. How can we load ‘n/2’ i.e. first half the source data into one target and the remaining half into the next target? Answer: Use a Expression transformation with an output port ROWNUM with the expression CUME(1) Next use a Router with 2 groups having below conditions: MOD( ROWNUM, 2 ) = 0 MOD( ROWNUM, 2 ) = 1 Connect to the corresponding target instances. Alternatively, Below are the implementation steps in Informatica.  First place the Source table and its corresponding Source Qualifier in the mapping.  Next split the data into two flows; One going to the Expression Transformation with all the ports and the other flow with any one column to an Aggregator Transformation.  In the Aggregator add a numeric output port say CNT with expression as COUNT (1) and do not group by on any other input port.  Propagate this output column CNT to an Expression Transformation. Next in this expression transformation create another numeric output port JN with expression value 1.  Now let’s go back to the first expression transformation having all the source columns. Introduce a Sequence Generator transformation with RESET attribute property enabled and propagate the NEXTVAL port to the expression transformation. Next also add one more numeric output port JN with expression value 1  Now take a Joiner Transformation and check the property Sorted Input.  Now bring in all the columns from the Expression Transformation next to the Source Qualifier. An- other flow to the joiner is from the expression with two columns CNT and JN. Join condition is based on JN ports.  Next after the joiner place a Router Transformation. Create one group say FST with condition as NEXTVAL < (CNT/2).  Next introduce two target tables first and second. Propagate the columns of the FST group of the router to the first target. Next propagate the columns of the Default group of the router transformation to the second target. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 113 9. Suppose we have a flat file which has a header record with ‘file creation date’, and detailed data records. Describe the approach to load the 'file creation date' column along with each and every detailed record. Answer:  We can use the below shell command to write the header information in another flat file as pre- session command. head -1 Sourc_File.dat > header.txt  Next Use this flat file header.txt as Lookup in the mapping.  Create an output port in expression transformation with value 'H' or the tag in the source data file that identifies the header record  Use this as Lookup condition and get the file creation date as return field and populate it in your target table. 10. Scenario Implementation 2 Suppose we have the below two tables. What will be the output if we select Table 1 as Source and use Joiner and Lookup transformation on Table 2 based on column ID? Table 1 Table 2 ID ID Name 10 10 A 10 B 10 C Answer: When we use a Joiner Transformation as Inner Join on column id, we will get 3 rows as output. When we use Passive Lookup Transformation we will get 1 row as output. In this case of multiple lookup match, lookup will return either the first or the last as configured in “on multiple matches” property of the transformation. When we use Active Lookup Transformation we will get 3 rows as output, as active lookup returns all the matching values on multiple lookup matches. 11. Suppose we have a flat file which contains just a numeric value. We need to populate this value in one column of the target table for every source record. How can we achieve this? Answer:  Use an Expression and create a decimal Output port say ‘DUMMY’ with a very high number along with other I/O ports from the source table. Say, DUMMY = 99999999999 [Note- Use such a number value that can never appear in the lookup flat file.]  Now use a Lookup transformation based on the source file. Say, the column name in the lookup is ‘VALUE’ D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 114  Map DUMMY from Expression to Lookup and use the lookup condition as DUMMY != VALUE  Next use the VALUE column of the Lookup to populate the target column. 12. How will you load a source flat file into a staging table when the file name is not fixed? The file name is like sales_2013_02_22.txt, i.e. date is appended at the end of the file as a part of file name. Answer: The generic file name is like- sales_YYYY_MM_DD.txt One option is to rename the file in the pre session load task. We will use OS level command to rename the file to a fixed name. We will next set the Informatica source filename to this fixed name and load the file. E.g. in Unix: $> mv sales_*.txt sales.txt Another option is to use Indirect Loading with a fixed file name. The content of the filename will contain the actual filename to be processed. E.g. in Unix: $> ls sales_*.txt > sales.txt 13. Solve the below scenario using Informatica and Database SQL. Source PRODUCT_ID PRODUCT_NAME PRODUCT_PRICE 10 Lux 100 10 Dove 200 20 Cinthol 400 20 Dettol 500 30 Fiama 600 Target Answer: Using Informatica: In one pipeline, calculate SUM (product-price) GROUP BY product-id using Aggregator transformation. In the other flow bring all the data normally, then join the first flow with the second using an Informatica Joiner transformation suing join column product-id and join type inner join. PRODUCT_ID PRODUCT_NAME PRODUCT_PRICE SUM_PRODUCT_PRICE 10 Lux 100 300 10 Dove 200 300 20 Cinthol 400 900 20 Dettol 500 900 30 Fiama 600 600 D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 115 Using SQL: SELECT M.*, N. SUM_PRODUCT_PRICE FROM SOURCE M, (SELECT SUM (PRODUCT_PRICE) SUM_PRODUCT_PRICE, PRODUCT_ID FROM SOURCE GROUP BY PRODUCT_ID) N WHERE M. PRODUCT_ID = N. PRODUCT_ID 14. Suppose we have a column in source with values as below: EMPNO ENAME SAL 1 Tom 100 2 Jack 200 3 Peter 150 4 Donald 230 999 TEST 999 6 Eric 300 If we encounter EMPNO = 999, then whole record set should not be loaded in target table. Describe the approach. Answer: From Source create two flows:- 1: Source -> Expression -> Sorter 2: Source -> Filter ->Expression -> Sorter 1.1 In the Expression create output field dummy_M as 'X' 1.2 Sort on dummy 2.1 In the Filter set Filter Condition as EMPNO = 999 2.2 In the Expression create output field dummy_D as 'X' 2.3 Sort on dummy 3. Next use a Joiner Transform: Set first flow as Master and second flow as Detail. Set Join Condition as dummy_M = dummy_D Set Join Type as Detail Outer Join. Use Sorted Input. 4. Next use a Filter Transform: Set Filter Condition as dummy_D IS NULL And finally your Target. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 116 15. Can we pass the value of a mapping variable between 2 pipelines under the same mapping? If not how can we achieve this? Answer: We cannot pass the value of an Informatica variable between 2 pipelines in a same mapping. Mapping variables are values that can change between sessions. The Integration Service saves the latest value of a mapping variable to the repository only at the end of each successful session run. Now in case we have two pipelines under same mapping- The mapping will have a single session and the value of the mapping variable will be saved to the repository only when this session succeeds, that means when both the pipeline execution completes. The alternative method to solve this scenario is as below: 1. Split the pipelines into two different mappings say “map1” and “map2”. 2. Create a mapping variable say “var1” in “map1” and set the value of the variable using SETVARIABLE () function. Next our goal is to pass the value of “var1” at the end of the successful session run to “map2”. 3. Create a mapping variable say “var2” in “map2” and use this in the mapping where ever the value of the variable from the first mapping “var1” is required. 4. Create the workflow with a workflow variable say "wfvar". 5. Create two Non-Reusable sessions say “ses1”,”ses2” for “map1”, “map2” respectively. 6. In the Post-session success variable assignment of “ses1” assign the value of mapping variable “var1” to workflow variable “wfvar”. 7. In the Pre-session variable assignment of “ses2” assign the value of workflow variable “wfvar” to the mapping variable “var2”. With this approach, we will be able to pass the value from the first session to the second session. 16. Scenario Implementation 3 Suppose we have a huge (size in GB) flat file as source. The flat file contains 22 columns- out of which 4 columns are considered as “key” columns-CUST_SRC_ID, PRODUCT_ID, FF_ID, SNM_ID There is one more column in the flat file relevant to the discussion that is DATE_ID which stores date in YYYY- MM-DD format. The flat file contains duplicate records based on the above 4 columns (that is - the records are not entirely duplicated, may be some values are different in some other columns). Now the requirement is to choose all the unique records from the flat file based on the uniqueness of the above mentioned “keys”. If there is any duplicate record then, we must select the record for which DATE_ID column contains the latest value. So suppose we get following records in the flat file: CUST_SRC_ID PRODUCT_ID FF_ID SNM_ID DATE_ID OTHER COLUMNS 123 P1 F1 S1 2013-01-02 X, Y, Z 123 P1 F1 S1 2013-01-06 P, Q, R 123 P1 F1 S1 2013-01-02 S, T, U In the above case we want the following row in the target: CUST_SRC_ID PRODUCT_ID FF_ID SNM_ID DATE_ID OTHER COLUMNS 123 P1 F1 S1 2013-01-06 P, Q, R D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 117 How can we achieve this in a single mapping? Answer: Use a Sorter transformation after Source Qualifier. Sorting key will be in below order:  CUST_SRC_ID Ascending order  PRODUCT_ID Ascending order  FF_ID Ascending order  SNM_ID Ascending order  DATE_ID Descending order Next use an Expression transformation and create 3 variable ports in the below order:  V_Keys = CUST_SRC_ID || PRODUCT_ID || FF_ID || SNM_ID  V_FLAG = IIF (V_Keys != V_Keys_PREV, 1, 0)  V_Keys_PREV = V_Keys  O_FLAG = V_FLAG (output port) Now use a filter transformation with filter condition as below:  O_FLAG=1 After sorting the data, for every group based on the unique keys, first record will have the latest date, because we have sorted it on DATE_ID descending. Using this expression logic, for every group 1st record (with latest date) will have O_FLAG value as 1 and rest others with 0. We will filter those unwanted duplicate records using Filter transformation. 17. Scenario Implementation 4 I have a flat file with just one column as given below- C1 L1 C2 L2 C3 L3 where data starting with C denotes company name and that of L depicts Location of the Company. Have to load this data in Target table (using Infa) as - C1, L1 C2, L2 C3, L3 Answer: This is what i would do to achieve this req. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 118 1. After the SQ, in a expression generate (This is tricky, use variable port logic) unique sequence number each group unique number for each record with in the group duplicate the column once After the Expression the output will be as below Col1, Col2, Col3, Col4 1,1,C1,C1 1,2,L1,L1 2,1,C2,C2 2,2,L2,L2 3,1,C3,C3 3,2,L3,L3 2. Add an Aggregator with group by on the first column Agg expression max(col3, col2 = 1) Agg expression max(col3, col2 = 2) 18. Implement slowly changing dimension of Type 2 which will load current record in Current table and old data in Log table. Answer:  Use Joiner transformation to join Source and Current table with Full Outer Join.  Next use Expression transformation to mark the rows which are new or old and correspondingly assign values like 0 or 1 in new output port.  Pass all the columns to a Router transformation and filter based on new port created.  If 0 means use Update Strategy transform DD_INSERT with insert to current table.  If 1 means use Update Strategy transform DD_UPDATE with update to current table  Also populate the data from Current table for 1 to the Log table. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 119 26. Performance Tuning 1. Which one is faster Connected or Unconnected Lookup? Answer: There can be some very specific situation where unconnected lookup may add some performance benefit on total execution. If you are calling the “Unconnected lookup” based on some condition (e.g. calling it from an expression transformation only when some specific condition is met - as opposed to a connected lookup which will be called anyway) then you might save some “calls” to the unconnected lookup, thereby marginally improving the performance. The improvement will be more apparent if your data volume is really huge. Keep the “Pre-build Lookup Cache” option set to “Always disallowed” for the lookup, so that you can ensure that the lookup is not even cached if it is not being called, although this technique has other disadvantages, check http://www.dwbiconcepts.com/etl/14-etl-informatica/46-tuning-informatica-lookup.html , especially the points under following subheadings: - Effect of choosing connected OR Unconnected Lookup, and - WHEN TO set Pre-build Lookup Cache OPTION (AND WHEN NOT TO) 2. How we can improve performance of Informatica Normalization Transformation. Answer: As such there is no way to improve the performance of any session by using Normalizer. Normalizer is a transformation used to pivot or normalize datasets and has nothing to with performance. In fact, Normalizer does not much impact the performance (apart from taking a little more memory). 3. How to improve the Session performance? Answer:  Run concurrent sessions  Partition session (Power center)  Tune Parameter - DTM buffer pool, Buffer block size, Index cache size, data cache size, Commit In- terval, Tracing level (Normal, Terse, Verbose Initialization, Verbose Data)  The session has memory to hold 83 sources and targets. If it is more, then DTM can be increased.  The Informatica server uses the index and data caches for Aggregate, Rank, Lookup and Joiner transformation. The server stores the transformed data from the above transformation in the data cache before returning it to the data flow. It stores group information for those transformations in index cache. If the allocated data or index cache is not large enough to store the date, the server stores the data in a temporary disk file as it processes the session data. Each time the server pages to the disk the performance slows. This can be seen from the counters. Since generally data cache is larger than the index cache, it has to be more than the index.  Remove Staging area D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 120  Tune off Session recovery  Reduce error tracing 4. How do you identify the bottlenecks in Mappings? Answer: Bottlenecks can occur in  Targets - The most common performance bottleneck occurs when the informatica server writes to a target database. You can identify target bottleneck by configuring the session to write to a flat file target. If the session performance increases significantly when you write to a flat file, you have a target bottleneck. Solution:  Drop or Disable index or constraints  Perform bulk load (Ignores Database log)  Increase commit interval (Recovery is compromised)  Tune the database for RBS, Dynamic Extension etc.,  Sources - Set a filter transformation after each SQ and see the records are not through. If the time taken is same then there is a problem. You can also identify the Source problem by Read Test Session - where we copy the mapping with sources, SQ and remove all transformations and connect to file target. If the performance is same then there is a Source bottleneck. Using database query - Copy the read query directly from the log. Execute the query against the source database with a query tool. If the time it takes to execute the query and the time to fetch the first row are significantly different, then the query can be modified using optimizer hints. Solution:  Optimize Queries using hints.  Use indexes wherever possible.  Mapping - If both Source and target are OK then problem could be in mapping. Add a filter transformation before target and if the time is the same then there is a problem. (OR) Look for the performance monitor in the Sessions property sheet and view the counters. Solutions:  If High error rows and rows in lookup cache indicate a mapping bottleneck.  Optimize Single Pass Reading:  Optimize Lookup transformation : o Caching the lookup table: When caching is enabled the Informatica server caches the lookup table and queries the cache during the session. When this option is not enabled the server queries the lookup table on a row-by row basis. Static, Dynamic, Shared, Un-shared and Persistent cache o Optimizing the lookup condition: Whenever multiple conditions are placed, the condition with equality sign should take precedence. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 121 o Indexing the lookup table: The cached lookup table should be indexed on order by columns. The session log contains the ORDER BY statement The un-cached lookup since the server issues a SE- LECT statement for each row passing into lookup transformation, it is better to index the lookup table on the columns in the condition  Optimize Filter transformation: You can improve the efficiency by filtering early in the data flow. Instead of using a filter transformation halfway through the mapping to remove a sizable amount of data.  Use a source qualifier filter to remove those same rows at the source, If not possible to move the filter into SQ, move the filter transformation as close to the source qualifier as possible to remove unnecessary data early in the data flow.  Optimize Aggregate transformation: o Group by simpler columns. Preferably numeric columns. o Use Sorted input. The sorted input decreases the use of aggregate caches. The server assumes all input data are sorted and as it reads it performs aggregate calculations. o Use incremental aggregation in session property sheet.  Optimize Seq. Generator transformation: o Try creating a reusable Seq. Generator transformation and use it in multiple mappings o The number of cached value property determines the number of values the Informatica server caches at one time.  Optimize Expression transformation: o Factoring out common logic o Minimize aggregate function calls. o Replace common sub-expressions with local variables. o Use operators instead of functions.  Sessions: If you do not have a source, target, or mapping bottleneck, you may have a session bottleneck. You can identify a session bottleneck by using the performance details. The informatica server creates performance details when you enable Collect Performance Data on the General Tab of the session properties. Performance details display information about each Source Qualifier, target definitions, and individual transformation. All transformations have some basic counters that indicate the Number of input rows, output rows, and error rows. Any value other than zero in the readfromdisk and writetodisk counters for Aggregate, Joiner, or Rank transformations indicate a session bottleneck. Low BufferInput_efficiency and BufferOutput_efficiency counter also indicate a session bottleneck. Small cache size, low buffer memory, and small commit intervals can cause session bottlenecks.  System (Networks) 5. How do you handle performance issues in Informatica? Where can you monitor the performance? Answer: There are several aspects to the performance handling .Some of them are:-  Source tuning  Target tuning  Repository tuning D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 122  Session performance tuning  Incremental Change identification in source side.  Software, hardware (Use multiple servers) and network tuning.  Bulk Loading  Use the appropriate transformation. To monitor this  Set performance detail criteria  Enable performance monitoring  Monitor session at runtime &/ or Check the performance monitor file . 6. What are performance counters? Answer: The performance details provide that help you understand the session and mapping efficiency. Each Source Qualifier, target definition, and individual transformation appears in the performance details, along with that display performance information about each transformation Understanding Performance Counters All transformations have some basic that indicates the number of input rows, output rows, and error rows. Source Qualifiers, Normalizes, and targets have additional that indicates the efficiency of data moving into and out of buffers. You can use these to locate performance bottlenecks. Some transformations have specific to their functionality. For example, each Lookup transformation has an indicator that indicates the number of rows stored in the lookup cache. When you read performance details, the first column displays the transformation name as it appears in the mapping, the second column contains the name, and the third column holds the resulting number or efficiency percentage. When you partition a source, the Informatica Server generates one set of for each partition. The following performance illustrate two partitions for an Expression transformation: Transformation Counter Value  EXPTRANS [1] o Expression_input rows 8 o Expression_output rows 8  EXPTRANS [2] o Expression_input rows 16 o Expression_output rows 16 Note: When you partition a session, the number of aggregate or rank input rows may be different from the number of output rows from the previous transformation. 7. How can we increase Session Performance? D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 123 Answer:  Minimum log (Terse)  Partitioning source data  Performing ETL for each partition, in parallel. (For this, multiple CPUs are needed)  Adding indexes  Changing commit Level  Using Filter transformation to remove unwanted data movement  Increasing buffer memory, when large volume of data  Multiple lookups can reduce the performance. Verify the largest lookup table and tune the expressions.  In session level, the causes are small cache size, low buffer memory and small commit interval At system level,  WIN NT/2000-Use the task manager  UNIX: VMSTART, IOSTART Hierarchy of optimization  Target  Source  Mapping  Session  System Optimizing Target Databases:  Drop indexes /constraints  Increase checkpoint intervals  Use bulk loading /external loading  Turn off recovery  Increase database network packet size Source level  Optimize the query (using group by, group by)  Use conditional filters  Connect to RDBMS using IPC protocol Mapping  Optimize data type conversions  Eliminate transformation errors  Optimize transformations/ expressions Session  Concurrent batches  Partition sessions D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s www.dwbiconcepts.com – Community of DWBI Professionals © www.dwbiconcepts.com – All rights reserved. 124  Reduce error tracing  Tune session parameters System  Improve network speed  Use multiple preservers on separate systems  Reduce paging 8. Scenario Implementation 1 What would be the best approach to update a huge table (more than 200 million records) using Informatica. The table does not contain any primary key. However there are a few indexes defined on it. The target table is partitioned. On the other hand the source table contains only a few records (less than a thousand) that will go to the target and update the same. Is there any better approach than just doing it by an update strategy transformation? Answer: Since the target busy percentage is 99.99% it is very clear that the bottleneck is on the target. So we need tweak the target. I have couple of Options 1. Since the target tale is partitioned on time_id, you need to include in the WHERE clause of the SQL fired by Informatica. For that you can define the time_id column as primary key in the target definition. With this your update query will have the time_id in the where clause. 2. With Informatica update strategy, it fires update sql for every row which is marked for update by update strategy. To avoid multiple update statements you can INSERT all the records which is meant to be UPDATE into a temporary table. Then use a correlated sql to update the records in the actual table (200M table). This query can be fires as a post session SQL. Please see the sample SQL UPDATE TGT_TABLE U SET (U.COLUMNS_LIST /*Column List to be updated*/) = (SELECT I.COLUMNS_LIST /*Column List to be updated*/ FROM UPD_TABLE I WHERE I.KEYS = U.KEYS AND I.TIME_ID = U.TIME_ID) WHERE EXISTS (SELECT 1 FROM UPD_TABLE I WHERE I.KEYS = U.KEYS AND I.TIME_ID = U.TIME_ID) TGT_TABLE – Actual table with 200M records UPD_TABLE - Table with records meant for UPDATE (1K record) We need to make sure that your indexes are up to date and stats are collected. Since this is more to be done with DB performance, you may need the help of DBA as well to check the DB throughput, SQL cost etc Hope this will help you. D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s D W B I C o n c e p t s

Comments

Description