ETL Tools ETL Tools are meant to extract, transform and load the data into the datawarehouse for decision making. Before ETL Tools, ETL Process was done manually by SQL code created by programmers. This task was cumbersome and tedious since it involved many resources, complex coding and more work hours. Maintaining the code placed a great challenge among the programmers. ETL tools are very powerful and then offer many advantages in all stages of ETL process starting from execution, data cleansing(Purification), data profiling, and transformation, debugging and loading the data into data warehouse when compared to old method. Popular ETL tools: Tool Name Informatica Data Stage Ab Initio Oracle Warehouse Builder Company Name Informatica Corporation IBM Ab Inito Sofware Corporation Oracle Corporation Why Informatica? Informatica is powerful ETL tool from Informatica Corporation. In Informatica all the metadata about source system, target system and transformations are stored in the Informatica Repository. Informatica PowerCenter client and Repository server access this repository to store and retrieve metadata. Informatica is very powerful in designing and building data warehouses. It can connect to the several source and targets to extract metadata from sources and targets, transform and load the data into the target systems. Basic Details about Informatica Informatica was founded in 1993. its headquarter is in Redwood City, California founded by Diaz Nesamoney and Gaurav Dhillion Sohaib Abbasi is the chairman and CEO current version is 9.5 GE MSAT Internal Informatica Versions Version Informatica Powercenter 4.1 Informatica Powercenter 5.1 Informatica Powercenter 6.1.2 Informatica Powercenter 7.1.2 Informatica Powercenter 8.1 Informatica Powercenter 8.5 Informatica Powercenter 8.6 Informatica Powercenter 9.1 Informatica Powercenter 9.5 Release Date Comments Nov 2003 Jun 2006 Nov 2007 Jun 2008 Jun 2011 May 2012 Service Oriented Architecture Informatica power centre contains below tools Repository Manager Designer Workflow manager Workflow Monitor Repository Manager is used to Configure domains Create the repository Deploy the code Change password Designer has the below tools Source Analyzer: - Import or create source definitions for flat file, XML, COBOL, Application, and relational sources. Target Designer: - Import or create target definitions. Transformation Developer: - Create reusable transformations. Mapplet Designer:-Create mapplets. GE MSAT Internal Mapping Designer:- Create mappings Workflow Manager Tools To create a workflow, you first create tasks such as a session, which contains the mapping you build in the Designer. You then connect tasks with conditional links to specify the order of execution for the tasks you created. The Workflow Manager consists of three tools to help you develop a workflow: Task Developer. Use the Task Developer to create tasks you want to run in the workflow. Workflow Designer. Use the Workflow Designer to create a workflow by connecting tasks with links. You can also create tasks in the Workflow Designer as you develop the workflow. Worklet Designer: - it is used to create a worklet. Workflow Tasks You can create the following types of tasks in the Workflow Manager: Assignment. Assigns a value to a workflow variable. Command. Specifies a shell command to run during the workflow. Control. Stops or aborts the workflow. Decision. Specifies a condition to evaluate. Email. Sends email during the workflow. Event-Raise:- Notifies the Event-Wait task that an event has occurred. Event-Wait: - Waits for an event to occur before executing the next task. Session:-Runs a mapping you create in the Designer. Timer:-Waits for a timed event to trigger . GE MSAT Internal Workflow is a set of instructions that tell the Informatica server how to execute the tasks.Workflow Monitor is used to Monitor workflow and tasks. Informatica PowerCenter Client Tools: Designer Repository manager Workflow manager Workflow monitor Informatica PowerCenter Repository Repository is the heart of the Informatica tool. Basic Definitions: Mapping represents data flow from sources to targets. stop. All the client tools and Informatica Server fetch data from Repository. It is kind of database where all the data related to mappings. Informatica Architecture Informatica ETL product. GE MSAT Internal . This can be treated as backend of Informatica. sources. fetches data. known as Informatica Power Center consists of 3 main components. Mapplet is a set of transformations. That can be used in one or more mappings. Informatica client and server without repository is same as a PC without memory/harddisk. Worklet is an object that represents a set of tasks. Informatica PowerCenter Server Server is the place. We can run. Server makes physical connections to sources/targets. abort and resume workflow in workflow monitor. applies the transformations mentioned in the mapping and loads the data in the target system. targets are Stored. Session is a set of instructions to move data from sources to targets. which has got the ability to process data but has no data to process. where all the executions take place. i2 EAI: MQ Series. ODBC Applications: SAP R/3. XML. PeopleSoft. i2 Applications: SAP R/3. JD Edwards. JMS. Flat Files. IDMS. Web Services EAI: MQ Series. PeopleSoft. Flat Files. SAP BW. VSAM. Web Services Legacy: Mainframes (DB2. SAP BW. Adabas)AS400 (DB2. Siebel. XML.This architecture is visually explained in diagram below: Sources Targets Standard: RDBMS. JMS. Siebel. Flat File) Legacy: Mainframes (DB2)AS400 (DB2) Remote Targets Remote Sources GE MSAT Internal . Tibco. IMS. JD Edwards. Tibco. ODBC Standard: RDBMS. It connects to the Integration Service to start workflows. session.How components works in the Informatica Architecture Repository: . workflow. define sources and targets. They are used to manage users. Power Center Client tools: . and create workflows to run the mapping logic. The PowerCenter Client connects to the repository through the Repository Service to fetch details. GE MSAT Internal .The PowerCenter Client consists of multiple tools.It connects to the repository. Whenever you develop any mapping. PowerCenter Administration Console: This is simply a web-based administration tool you can use to administer the PowerCenter installation.it extracts data from sources.Repository is nothing but a relational database which stores all the metadata created in power center. build mappings and mapplets with the transformation logic. processes it as per business logic and load data to targets. Integration Service: . So essentially client tools are used to code and give instructions to PowerCenter servers. an entries are made in the repository. fetches data from the repository and sends back them to the requested components (mostly client tools and integration service). Repository Service: . What are the functionalities we can do with source qualifier transformation? Source qualifier is an active and connected transformation. Normal Join Master Outer Join Details Outer Join Full Outer Join We will learn about each join type with an example. Let say i have the following students and subjects tables as the source. The Integration Service adds a SELECT DISTINCT statement to the default SQL query.1. It is used to represent the rows that the integrations service reads when it runs a session. Joins: You can join two or more tables from the same source database. Joiner Transformation Functionalities The joiner transformation is an active and connected transformation used to join two heterogeneous sources. Table Name: Subjects Subject_Id subject_Name ----------------------1 2 Maths Chemistry GE MSAT Internal . 2. Filter rows: You can filter the rows from the source database Sorting input: You can sort the source data by specifying the number for sorted ports. The Integration Service adds an ORDER BY clause to the default SQL query Distinct rows: You can get distinct rows from the source by choosing the "Select Distinct" property. Source qualifier transformation converts the source data types to the Informatica native data types. Custom SQL Query: You can write your own SQL query to do calculations. Join Type The joiner transformation supports the following four types of joins. It discards the unmatched rows from the master source. the joiner transformation keeps all the records from the detail source and only the matching rows from the master source.3 Physics Table Name: Students Student_Id Subject_Id --------------------10 20 30 1 2 NULL Assume that subjects source is the master and students source is the detail and we will join these sources on the subject_id port. Normal Join: The joiner transformation outputs only the records that match the join condition and discards all the rows that do not match the join condition. The output of the normal join is Master Ports | Detail Ports --------------------------------------------Subject_Id Subject_Name Student_Id Subject_Id --------------------------------------------1 2 Maths Chemistry 10 20 1 2 Master Outer Join: In a master outer join. The output of master outer join is GE MSAT Internal . The output of detail outer join is Master Ports | Detail Ports --------------------------------------------Subject_Id Subject_Name Student_Id Subject_Id --------------------------------------------1 2 3 Maths Chemistry Physics 10 20 NULL 1 2 NULL Full Outer Join: The full outer join first brings the matching rows from both the sources and then it also keeps the nonmatched records from both the master and detail sources.Master Ports | Detail Ports --------------------------------------------Subject_Id Subject_Name Student_Id Subject_Id --------------------------------------------1 2 NULL Maths Chemistry NULL 10 20 30 1 2 NULL Detail Outer Join: In a detail outer join. The output of full outer join is GE MSAT Internal . the joiner transformation keeps all the records from the master source and only the matching rows from the detail source. It discards the unmatched rows from the detail source. You can improve the session performance by configuring the Sorted Input option in the joiner transformation properties tab. GE MSAT Internal .Master Ports | Detail Ports --------------------------------------------Subject_Id Subject_Name Student_Id Subject_Id --------------------------------------------1 2 3 NULL Maths Chemistry Physics NULL 10 20 NULL 30 1 2 NULL NULL Joiner Transformation Performance Improve Tips To improve the performance of a joiner transformation follow the below tips If possible. perform joins in a database. Performing joins in a database is faster than performing joins in a session. Specify the source with fewer rows and with fewer duplicate keys as the master and the other source as detail. Limitations of Joiner Transformation The limitations of joiner transformation are You cannot use joiner transformation when the input pipeline contains an update strategy transformation. You cannot connect a sequence generator transformation directly to the joiner transformation. get all the states in a country. The dynamic cache is synchronized with the target. What is the difference between Static and Dynamic Lookup Cache? We can configure a Lookup transformation to cache the underlying lookup table. we can get the related value like city name for the zip code value. The lookup transformation is used to perform the following tasks: Get a Related Value: You can get a value from the lookup table based on the source value. In case of dynamic lookup cache the Integration Service dynamically inserts or updates data in the lookup cache and passes the data to the target.Why do we need Lookup? Lookup transformation is used to look up data in a flat file. From Informatica version 9 onwards lookup is an active transformation. Get Multiple Values: You can get multiple rows from a lookup table. Lookup is a passive/active transformation It can be used in both connected/unconnected modes. We can use the value from the lookup table and use it in calculations. As an example. Perform Calculation. The lookup transformation can return a single row or multiple rows. As an example. Update Slowly Changing Dimension tables: Lookup transformation can be used to determine whether a row exists in the target or not. GE MSAT Internal . In case of static or read-only lookup cache the Integration Service caches the lookup table at the beginning of the session and does not update the lookup cache while it processes the Lookup transformation. relational table. view or synonym. Difference Between Joiner Transformation And Lookup Transformation Joiner Active Transformation We cannot override the query in joiner Lookup Active/Passive Transformation We can override the query in lookup to fetch the data from multiple tables. Support Equi Join And Non Equi Join Support Equi Join only In joiner we cannot configure to use persistence Where as in lookup we can configure to use cache, shared cache, uncached and dynamic persistence cache, shared cache, uncached and cache dynamic cache. We can perform outer join in joiner transformation. Joiner used only as Source We cannot perform outer join in lookup transformation. Lkp used as Source and as well as Target Difference between Active and Passive Transformation What is a Transformation A transformation is a repository object which reads the data, modifies the data and passes the data. Transformations can be classified as active or passive, connected or unconnected. Active Transformations: A transformation can be called as an active transformation if it performs any of the following actions. Change the number of rows: For example, the filter transformation is active because it removes the rows that do not meet the filter condition. Change the transaction boundary: The transaction control transformation is active because it defines a commit or roll back transaction. Change the row type: Update strategy is active because it flags the rows for insert, delete, update or reject. GE MSAT Internal Passive Transformations: Transformations which does not change the number of rows passed through them, maintains the transaction boundary and row type are called passive transformation List of Active and Passive Transformation Active Transformation - An active transformation changes the number of rows that pass through the mapping. 1. 2. 3. 4. 5. 6. 7. 8. 9. Source Qualifier Transformation Sorter Transformations Aggregator Transformations Filter Transformation Union Transformation Joiner Transformation Normalizer Transformation Rank Transformation Router Transformation 10. Update Strategy Transformation 11. Advanced External Procedure Transformation Passive Transformation - Passive transformations do not change the number of rows that pass through the mapping. 1. 2. 3. 4. 5. 6. 7. 8. Expression Transformation Sequence Generator Transformation Lookup Transformation Stored Procedure Transformation XML Source Qualifier Transformation External Procedure Transformation Input Transformation(Mapplet) Output Transformation(Mapplet) GE MSAT Internal Is look up Active Transformation? Before Informatica version 9.1, Look up transformation was passive. For each input row that we pass to lookup transformation, we can get only one output row even if we get multiple rows as output. Look up Policy on multiple matches: This property determines which rows to return when the Lookup transformation finds multiple rows that match the lookup condition. Select one of the following values: Report Error. The Integration Service reports an error and does not return a row. Use First Value. Returns the first row that matches the lookup condition. Use Last Value. Return the last row that matches the lookup condition. Use Any Value. The Integration Service returns the first value that matches the lookup condition. It creates an index based on the key ports instead of all Lookup transformation ports. From Informatica 9.1 onwards Lookup transformation can returns multiple rows as output. There is new value added to the above property. Use All Values. Return all matching rows. Hence Lookup transformation can be configured as active or passive transformation. GE MSAT Internal If there is no primary key available on the target table. DD_DELETE: Numeric value is 2. Used for flagging the row as Delete. deleting or rejecting. DD_INSERT: Numeric value is 0. It Allows us to select a group of top or bottom values. It is used to insert. It can also reject the records before reaching target tables. DD_REJECT: Numeric value is 3. update and delete records in target table. Important Note: Update strategy works only when we have a primary key on the target table. Used for flagging the row as Insert. The constants and their numeric equivalents for each database operation are listed below. Flagging Rows in Mapping with Update Strategy: You have to flag each row for inserting. Used for flagging the row as Reject. updating. Used for flagging the row as Update. not just one value Update Strategy Transformation It is an active and connected transformation.Rank Transformation Rank Transformation is an active and connected transformation. It is used to select top or bottom rank of data. then you have to specify a primary key in the target definition in the mapping for update strategy transformation to work GE MSAT Internal . DD_UPDATE: Numeric value is 1. The integration service treats any other numeric value as an insert. or the results of a stored procedure sent as an output parameter to another transformation. It either runs before or after the session. or it may only return an integer value. the Stored Procedure transformation takes only the first value returned from the procedure. Status Codes: Status codes provide error handling for the Integration Service during a workflow. You cannot see this value. Connected and Unconnected Stored Procedure Transformation: Connected Stored Procedure Transformation: The stored procedure transformation is connected to other transformations in the flow of the mapping. or is called by an expression in another transformation in the mapping. GE MSAT Internal . which means that it can act similar to a single output parameter. If a stored procedure returns a result set rather than a single return value. This value can either be user-definable. There are three types of data which pass between the integration service and the stored procedure: Input / Output Parameters: Used to send and receive data from the stored procedure.Stored Procedure Transformation It is passive transformation It can work as a connected or unconnected transformation It is used to run the stored procedure in database You can perform below tasks in stored procedure transformation Check the status of the target database before loading data into it Determine if enough space exists in data base or not Perform a specialized calculation Dropping and re-creating indexes Stored Procedure Transformation Overview: One of the important features of stored procedure is that you can send data to the stored procedure and receive data from the stored procedure. Stored procedure issues a status code that notifies whether or not the stored procedure completed successfully. Use connected Stored Procedure transformation when you need data from an input port sent as an input parameter to the stored procedure. most databases returns a value. Return Values: After running a stored procedure. Unconnected Stored Procedure Transformation: The stored procedure transformation is not connected directly to the flow of the mapping. Pre-load of the Target: Runs before the session sends data to the target. We can use a static cache. Does not support user-defined default values. the Power Center Server returns the result of the lookup condition for all lookup/output ports.Specifying when the Stored Procedure Runs: The property. Connected stored procedures run only in normal mode. We can use a dynamic or static cache. If there is a match for the lookup condition. Unconnected Lookup Receives input values from the result of a :LKP expression in another transformation. This is useful for verifying target tables or disk space on the target system. Useful for verifying the existence of tables or performing joins of data in a temporary table. This is useful when running a calculation against an input port. GE MSAT Internal . Pass one output value to another transformation. "Stored Procedure Type" is used to specify when the stored procedure runs. If there is no match for the lookup condition. the lookup condition and the lookup/return port. the Power Center Server returns NULL. Post-load of the Source: Runs after reading data from the source. Pass multiple output values to another transformation. Connected and Unconnected Lookup Connected Lookup Receives input values directly from the pipeline. Cache includes all lookup columns Cache includes all lookup/output ports in used in the mapping. This is useful for re-creating indexes on the database. Useful for removing temporary tables. the Power Center Server returns the default value for all output ports. Post-load of the Target: Runs after loading data into the target. Supports user-defined default values If there is no match for the lookup condition. If there is a match for the lookup condition.the Power Center Server returns the result of the lookup condition into the return port. The different values of this property are shown below: Normal: The stored procedure transformation runs for each row passed in the mapping. Pre-load of the Source: Runs before the session reads data from the source. our task is to generate a result-set where we will have separate rows for every quarter.. It also returns an index . As you can see each row represent one shop and the columns represent the corresponding sales. The following source rows contain four quarters of sales by store: Source Table Store Shop 1 Shop 2 Quarter1 100 250 Quarter2 300 450 Quarter3 500 650 Quarter4 700 850 The Normalizer returns a row for each shop and sales combination.called GCID (we will know later in detail) .that identifies the quarter number: Target Table Shop Shop 1 Sales 100 Quarter 1 GE MSAT Internal .Number of Transformations in Informatica? Around 30 Normalizer Transformation in Informatica It is an active transformation Can output multiple rows from each input row Can transpose the data(transposing columns to rows) Transposing data using Normalizer Let's imagine we have a table like below that stores the sales figure for 4 quarters of a year in 4 different columns. We can configure a Normalizer transformation to return a separate row for each quarter like below. imagine . Next. the integration service saves the lookup cache files after a successful session run.Shop 1 Shop 1 Shop 1 Shop 2 Shop 2 Shop 2 Shop 2 300 500 700 250 450 650 850 2 3 4 1 2 3 4 Persistent cache: Lookups are cached by default in Informatica. In the first mapping we will create the Named Persistent Cache file by setting three properties in the Properties tab of Lookup transformation. In persistent cache. persistent cache gets deleted then integration services generates the cache file again when the session run next time. If by mistake. So the solution is to use Named Persistent Cache. GE MSAT Internal . This means that Informatica by default brings in the entire data of the lookup table from database server to Informatica Server as a part of lookup cache building activity during session run. Persistent cache is required when lookup table size is huge and the same lookup table is being used in different mappings. e. Do not enter . Cache File Name Prefix: user_defined_cache_file_name i.e.dat Re-cache from lookup source: To be checked i.e. a Named Persistent Cache will be used. the Named Persistent cache file name that will be used in all the other mappings using the same lookup table. Lookup cache persistent: To be checked i. Next in all the mappings where we want to use the same already built Named Persistent Cache we need to set two properties in the Properties tab of Lookup transformation. Enter the prefix name only.idx or . the Named Persistent Cache file will be rebuilt or refreshed with the current data of the lookup table. GE MSAT Internal . GE MSAT Internal . If there is any Lookup SQL Override then the SQL statement in all the lookups should match exactly even also an extra blank space will fail the session that is using the already built persistent cache file.e. the lookup will be using a Named Persistent Cache that is already saved in Cache Directory and if the cache file is not there the session will not fail it will just create the cache file instead. the Named Persistent cache file name that was defined in the mapping where the persistent cache file was created. Lookup cache persistent: To be checked i.e. Cache File Name Prefix: user_defined_cache_file_name i. then go with unconnected. Like. Connected or unconnected transformation. Why should we go for stored procedure transformation in Informatica. 2. If you are calling lookup transformation multiple times in the same mapping. 3. What will be your approach when a workflow is running for more hours? Check which session is taking long time. What are the functionalities we can do in source qualifier transformation? filtration of the records if source is RDBMS Joins of the tables Remove duplicate records thorough distinct option Ordering the records 5. and the same lookup table is used in many mappings then the best way to handle the situation is to use one-time build. main properties in Joiner Transformations? joins more than one table Table with less records are master table Table with more records are details table Informatica automatically keep the tables with fewer records as master table. then we have to keep many number of transformations in the mapping Which will make more complex to understand? 4. Which process is taking more time. already created persistent named cache 1. we need to check if we have enough space in the server or not. loading data into the target What was the regular source count and today’s count Check if any database lock in source or target table. the lookup table’s data volume that need to be cached is also high. building cache. GE MSAT Internal ..So if the incoming source data volume is high. we can do in Informatica as well but there are some situation where we must have to go with Stored procedure transformation like If the complex logic has to be built and if we try to build through Informatica. reading data from the source. which is good in performance wise? Unconnected better than connected. why can't we build the same logic in Informatica itself? Yes. Joiner operations functionality.. So that unconnected will build the cache once so that performance will reduce than connected transformation. While building cache. In this scenario we should go for persistent cache. and then to eliminate duplicate records we can check the DISTINCT option of the source qualifier in the source table and load the data into the target. now how do you re-run the workflow? If there is a lookup on target to check if we get new records then insert it otherwise update it. Yes.6.. How do you update target table without using update strategy transformation. If the source system is flat file. then building cache will require less time. it has to build the cache and the time for building the cache depends on the size of the lookup table. How do you handle duplicate record to be loaded into the Informatica? If the source system is RDBMS. If the lookup tables has huge data and in many mappings it is being used. we need to define the key in the target table in Informatica level and Then we need to connect the key and the field we want to update in the mapping Target. If any changes in the data in the lookup table then need to refresh the persistent lookup cache. In the session level. If the size of the lookup table is huge. If the size of the lookup table is small. 10. then building cache will require more time. Workflow failed. it is required. build the cache once and use the same cache whenever required. For this. we should set the target property as "Update as Update" in the mapping and check the "Update" check-box in Session properties. is lookup required? Yes. SCD type 2. in this case we can re-run the workflow from the starting itself. 9. and then we can use SORTER transformation to eliminate duplicate records. performance degrades. Hence. 8. Because each time when we run the session. GE MSAT Internal . out of 200. we can update target table without using update Strategy transformation. Difference between persistent cache and non-persistent cache? What if persistent cache gets deleted by mistake? Persistent cache does not delete the lookup cache even after the session has completed. To avoid these performance degrade. 100 records got loaded. 7. )" "Error updating sequence generator [SEQTRANS]. How do you load data into dimension table and fact table? First we will load the data from source to the stage tables then stage to the dimensional table. Which Informatica version you have worked? Informatica 9. If the data is completely loaded into the target running the session is not required GE MSAT Internal . Which schema follows your project? Star Schema 14. How many mapplets you have created in your project? Around 10 18. 16.1 12.11. How many fact and dimensional tables in your project? Fact Table – 1 Dimensional Table – 6 17. List of Errors which u faced in your project? Issue: "Repository Error ([REP_55101] Connection to the Repository Service [INF_EN2_REP] is broken." Cause: Sequence generator value is not updated Ticket to other teams : Raise SC with GAMS team Solution: Sequence generator value should be updated with max (primary key) in that mapping. How many mappings u have created in the project? Around 40 to 50 15. Which scheduler you have used? Informatica scheduler 13. And stage to the fact table. .. Issue: “Database driver error.m. Function Name: Logon ORA-01017: invalid username/password...]” Cause: Invalid Login details for this user Ticket to other teams : Raise a SC with DB team to check if there are any password changes for this user recently. GE MSAT Internal .Issue: Severity Timestamp Node Thread Message Code Message NODE_INF_EN2 READER_1_1_1 ERROR 22/05/2010 12:11:26 p. Function Name : Execute Cause: Low memory space Ticket to other teams : Raise SC with DB team to increase the table space.COM]..GE. logon denied Database driver error. RR_4035 SQL Error [ ORA-01652: unable to extend temp segment by 128 in tablespace TEMP Database driver error.. CMN_1022 [ Database driver error.AE. Function Name: Connect Database Error: Failed to connect to database using user [csdwh_inf] and connection string [EUERDWP1... Function Name : Connect Database Error: Failed to connect to database using user [MKT_APP] and connection string [GPSOR252] GE MSAT Internal . Function Name : Logon ORA-12541: TNS : no listener Database driver error. Issue: Error connecting to database [Database driver error.. Cause:: This error occurs when an invalid data comes from ERP.Issue: “ORA-01089: immediate shutdown in progress ... Ticket to other teams : Raise a SC with GAMS team..no operations are permitted” Cause:: Network issue. Solution: Rerun the session once the issue is fixed... Issue: “ SQL Error [ ORA-01722: invalid number ORA-01722: invalid number Database driver error. Cause .GEPSCSIDW user dont have access to P76ORP99R DB link Ticket to other teams : Raise a service call with ERP DB team for getting access to DB link for GEPSCSIDW user. GE MSAT Internal .DB is changed to etpsblp2 from GPSOR252 Ticket to other teams : Raise SC with ERP DB team and the DB team has to add the TNS on the new server and the connection string should be changed to etpsblp2 Issue: SQL Error [ ORA-00942: table or view does not exist Cause: GEPSREADONLY user is replaced with GEPSCSIDW Ticket to other teams : Work order should be raised with ERP DB team for getting access to the tables for GEPSCSIDW user. Issue: SQL Error [ ORA-02019: connection description for remote database not found Cause . CMN_1022 [ Database driver error. “ Cause: DB is down.Issue: “Database driver error. Cause:: DB issue Solution: Rerun the session Issue: “FATAL ERROR: Failed to allocate memory (out of virtual memory). Function Name: Logon ORA-03113: end-of-file on communication channel Database driver error.. Function Name : Connect Database Error: Failed to connect to database using user [GEPSREADONLY] and connection string [gpsesp76].” Cause: Low space on the Informatica server GE MSAT Internal .... Solution: Check with DB team if the DB is fine and rerun the session.. Issue: FATAL ERROR: Aborting the DTM process due to fatal signal/exception.. Cause Sequence generator value is not updated.09am'. Ticket to other teams : Raise SC with GAMS team....Solution: Rerun the session Issue Transformation Evaluation Error [<<Expression Error>> [TO_DATE]: invalid string for converting to Date TO_DATE(s:LTRIM(s:RTRIM(s:'10..s:' ').. CMN_1022 [ Database driver error. Issue:: “Database driver error.s:' ').s:'MM-DD-YYYY HH12:MI:SS AM')] Cause:: Issue with mapping. Function Name : Logon ORA-12541: TNS: no listener Database driver error. Issue: Unique Constraint violated error. Ticket to other teams : Raise SC with GAMS team Solution: Sequence generator value must be updated with the value max (primary key) in that mapping. GE MSAT Internal .. Solution: Check with DB team and if the instance is up then rerun the session.] “ Cause: TNS entries changed on the server.dat ]....Function Name: Connect Database Error: Failed to connect to database using user [epsuser] and connection string [atlorp38]. Operating system error message [No such file or directory]." Cause: File is not available on the server. GE MSAT Internal . Function Name : Logon ORA-12537: TNS:connection closed Database driver error.” Cause: DB is down.. Issue: “Error connecting to database [ Database driver error. Issue: "Error opening file [/dba96/d031/staging/SrcFiles/CSDWH01/INFA/CSDWH_TC_REQUEST_30Jun2008_28sep2008.]. Ticket to other teams : Raise SC with ERP DB team to get the Tns details. Function Name : Connect Database Error: Failed to connect to database using user [gepsreadonly] and connection string [gpsescp1]. ..bad]] Cause: Bad file path in the corresponding session is not correct.. GE MSAT Internal . Ticket to other teams : Raise SC with ERP DB team for login details Issue: Writer initialization failed [Error 13 opening session bad (reject) file [/dba96/d031/staging/Prod/BadFiles/csdwh_etech_doc_t1.. Function Name: Connect Database Error: Failed to connect to database using user [epsuser] and connection string [ATLORP38]. Function Name: Logon ORA-12545: Connect failed because target host or object does not exist Database driver error." Cause: Change in the login details.Ticket to other teams : Raise SC with GAMS team Issue: "Database driver error.bad]] [Error 13 opening session bad (reject) file [/dba96/d031/staging/Prod/BadFiles/tgt_csdwh_etech_milestone_d_upd1. Project Architecture: Source Systems Staging Layer Core Database GE MSAT Internal . use the SESSSTARTTIME variable instead of SYSDATE. we call PLSQL procedure to load the data. From stage to base table. SESSSTARTTIME SESSSTARTTIME returns the current date and time value on the node that runs the session when the Integration Service initializes the session. You can reference SESSSTARTTIME only within the expression language. SYSDATE SYSDATE returns the current date and time up to seconds on the node that runs the session for each row passing through the transformation To capture a static system date. We need to develop a mapping to load the data from source systems to the staging environment after performing all field level validations. GE MSAT Internal .dat and . our source was database. In few enhancements. All file level validations will be done through UNIX shell scripting. Use SESSSTARTTIME in a mapping or a mapplet.What kind of enhancements you have worked on? I have worked on many enhancements as per business requirements.csv files. We need to fetch the data from the table and generate a file files and send it to the integration team. In some enhancement. we had source as flat files like . Source qualifier Expression Lookup Stored procedure Filter Update Strategy Router Union Field Level Validations includes Trimming the spaces through LTRIM and RTRIM Check if the value is NULL or not using ISNULL function Check if the value contains any spaces using IS_SPACE function Check if the length is greater than specified limit using LENGTH function Check if the value is NUMBER or not using NOT IS_NUMBER function. GE MSAT Internal . Used lookup sql override to fetch data by querying multiple tables.What is the complex mapping you have developed? There was few mappings which I feel quite complex. I have used more transformation in the mapping. we will get as a string and then convert it to date/time using TO_DATE function. Used around 10 different transformations in the mapping. We get the country code then we do lookup on the PITSTBT_CNTRY master table and get the country name Calendar date. Pushdown Optimization When you run a session configured for pushdown optimization. the Integration Service translates the transformation logic into SQL queries and sends the SQL queries to the database The following figure shows a mapping containing transformation logic that can be pushed to the source database: GE MSAT Internal . ITEM_NAME. ITEM_DESC) SELECT CAST((CASE WHEN 5419 IS NULL THEN '' ELSE 5419 END) + '_' + (CASE WHEN ITEMS. Target-side pushdown optimization. it reads the results of this SQL query and processes the remaining transformations. an underscore. Then. Full pushdown optimization. the Integration Service generates the following SQL statement: INSERT INTO T_ITEMS(ITEM_ID. ITEM_NAME. The Integration Service generates and executes a SELECT statement based on the transformation logic for each transformation it can push to the database. The Integration Service pushes as much transformation logic as possible to the target database. Pushdown Optimization Types You can configure the following types of pushdown optimization: Source-side pushdown optimization. Running Source-Side Pushdown Optimization Sessions When you run a session configured for source-side pushdown optimization.ITEM_ID IS NULL THEN '' ELSE ITEMS. If the Integration Service cannot push all transformation logic to the database. it performs both source-side and target-side pushdown optimization. create new item IDs.This mapping contains an Expression transformation that creates an item ID based on the store number 5419 and the item ID from the source. and ITEM_DESC columns in the target table. name. and insert the values into the ITEM_ID. To push the transformation logic to the database.ITEM_DESC FROM ITEMS2 ITEMS The Integration Service generates an INSERT SELECT statement to retrieve the ID.ITEM_ID END) AS INTEGER). It concatenates the store number 5419. and description values from the source table. ITEMS. The Integration Service attempts to push all transformation logic to the target database. ITEMS. ITEM_NAME. Running Target-Side Pushdown Optimization Sessions GE MSAT Internal . the Integration Service analyzes the mapping from the source to the target . and the original ITEM ID to get the new item ID. The Integration Service pushes as much transformation logic as possible to the source database. If the Integration Service cannot push all transformation logic to the target database. What are the types of loading in Informatica? There are mainly two types of loading in Informatica GE MSAT Internal . Running Full Pushdown Optimization Sessions To use full pushdown optimization. the Integration Service analyzes the mapping from the source to the target or until it reaches a downstream transformation it cannot push to the target database. The Integration Service generates and executes an INSERT SELECT. If it cannot push all transformation logic to the source or target. If you configure the session for full pushdown optimization. the Integration Service analyzes the mapping from the target to the source. For example. the source and target databases must be in the same relational database management system. it executes the generated SQL on the target database. DELETE. processes the Rank transformation.When you run a session configured for target-side pushdown optimization. or UPDATE statement for each database to which it pushes transformation logic. The Integration Service does not fail the session if it can push only part of the transformation logic to the database. and then pushes the remaining transformation logic to the target database. When you configure a session for full optimization. the Integration Service pushes as much transformation logic to the source database. DELETE. the Integration Service pushes the Source Qualifier transformation and the Aggregator transformation to the source. it tries to push all transformation logic to the source database. and pushes the Expression transformation and target to the target database. or UPDATE statement based on the transformation logic for each transformation it can push to the target database. It generates an INSERT. processes intermediate transformations that it cannot push to any database. Then. The Integration Service processes the transformation logic up to the point that it can push the transformation logic to the database. a mapping contains the following transformations: The Rank transformation cannot be pushed to the source or target database. In Bulk Load. complete data from source table will be loaded into the target table in single time. Timestamp of pervious load has to be maintained. record by record gets loaded into the target table and it generates log for that but it takes time. Difference between Full Load and Incremental Load? In Full Load or One-time load or History Load. It truncates all rows and loads from scratch.1. GE MSAT Internal . In Incremental load. Normal 2. This contains historical data usually. It takes less time. It takes less time to load the data into the target table. difference between target and source data is loaded at regular interval. It takes more time. Bulk In Normal Load. a number of records get loaded into the target table but it ignores logs. Once the historical data is loaded we keep on doing incremental loads to process the data that came after one-time load PMCMD Command The pmcmd is a command line utility provided by the Informatica to perform the following tasks. By Full Load or One-time load we mean that all the data in the Source table(s) should be processed. Start workflow from a specific task. Stop workflow Pmcmd command to stop the Informatica workflow is shown below: pmcmd stopworkflow -service informatica-integration-Service -d domain-name -u user-name -p password -f folder-name -w workflow-name 3. Start workflow from a task you can start the workflow from a specified task. Stopping a task. This is shown below: pmcmd startask -service informatica-integration-Service -d domain-name -u username -p password -f folder-name -w workflow-name -startfrom task-name 4. Start workflows. Schedule the workflows. Stop Abort workflows and Sessions. How to use PMCMD Command in Informatica: 1. Start workflow the following pmcmd command starts the specified workflow: pmcmd startworkflow -service informatica-integration-Service -d domain-name -u user-name -p password -f folder-name -w workflow-name 2. The following pmcmd command stops the specified task instance: pmcmd stoptask -service informatica-integration-Service -d domain-name -u username -p password -f folder-name -w workflow-name task-name 5. Aborting workflow and task. The following pmcmd commands are used to abort workflow and task in a workflow: GE MSAT Internal . pmcmd abortworkflow -service informatica-integration-Service -d domain-name -u user-name -p password -f folder-name -w workflow-name pmcmd aborttask -service informatica-integration-Service -d domain-name -u user-name -p password -f folder-name -w workflow-name task-name 6. Scheduling the workflow the pmcmd command syntax for scheduling the workflow is shown below: pmcmd scheduleworkflow -service informatica-integration-Service -d domain-name -u user-name -p password -f folder-name -w workflow-name You cannot specify the scheduling options here. Tried with below command pmcmd startworkflow -service Alpha_Dev_UTF8_IS -d Capital_Americas_Development -u 502197909 -p D502197909 -f CFDW_LOAN_LAYER -wait wflow_test_1001 GE MSAT Internal . This command just schedules the workflow for the next run. PARTITIONING ATTRIBUTES 1. Partition points By default. Number of Partitions GE MSAT Internal . 2. IS sets partition points at various transformations in the pipeline. Transformation. it can achieve higher Performance by partitioning the pipeline and performing the extract.Partitioning In Informatica Is used to improve performance in Informatica It is done at session Level Adding partitions in the pipeline Use more of the system hardware Achieve performance through parallel data processing A pipeline consists of a source qualifier and all the transformations and Targets that receive data from that source qualifier. When the Integration Service runs the session. and load for each partition in parallel. We can define up to 64 partitions at any partition point in a pipeline. the Workflow Manager increases or decreases the number of partitions at all Partition points in the pipeline. The number of partitions we create equals the number of connections to the source or target. For one partition. GE MSAT Internal . Increasing the number of partitions or partition points increases the number of threads. When we increase or decrease the number of partitions at any partition point. one database connection will be used. The Integration Service uses the key and ranges to send rows to the appropriate partition. You define the number of ports to generate the partition key. In pass-through partitioning. The Integration Service distributes data evenly among all partitions. you define a range of values. This option is purchased separately. The Integration Service groups the data based on a partition key. but do not want to change the distribution of data across partitions Round-robin. the Integration Service distributes rows of data based on a port or set of ports that you define as the partition key. All rows in a single partition stay in the partition after crossing a pass-through partition point. Partition types The Integration Service creates a default partition type at each partition point. If we have the Partitioning option. Hash user keys. The Integration Service uses a hash function to group rows of data among partitions. Hash auto-keys. Choose pass-through partitioning when you want to create an additional pipeline stage to improve performance. Use key range partitioning when the sources or targets in the pipeline are partitioned by key range. Key range. Types of Partition Database Partitioning In Database Partitioning Integration Service queries the oracle database system for table partition information. we can change the partition type. For each port. Use round-robin partitioning where you want each partition to process approximately the same number of rows GE MSAT Internal .3. Pass-through. It reads the partitioned data from the corresponding nodes in the database. the Integration Service processes data without redistributing rows among partitions. The Integration Service uses a hash function to group rows of data among partitions. With key range partitioning. The partition type controls how the Integration Service distributes data among partitions at partition points. New Features in Informatica 9. You can configure the number of deadlock retries and the deadlock sleep interval for an Integration Service.We can configure the Lookup transformation to return all rows that match a lookup condition Limit the Session Log You can limit the size of session logs for real-time sessions. The Integration Service attempts to re-execute the last statement for a specified retry period. the session does not fail. You can override these values at the session level as custom properties. You can configure number of retry attempts Multiple rows return Lookups can now be configured as an Active transformation to return Multiple Rows. the SQL transformation returns one output row for each input row.this will ensure that your session does not immediately fail if it encounters any database deadlock. Passive transformation We can configure the SQL transformation to run in passive mode instead of active mode. GE MSAT Internal . Important Notes on above features Database Deadlock Resilience When a database deadlock error occurs. When the SQL transformation runs in passive mode. it will now retry the operation again.1 Database deadlock resilience feature . the Integration Service attempts to run the statement. The number of times the PowerCenter Integration Service retries a target write on a database deadlock. The Integration Service logs a message in the session log whenever it retries a statement. Minimum is 0. DeadlockSleep. Default is 10.Configure following Integration Service Properties: NumOfDeadlockRetries. The Integration Service waits for a delay period between each retry attempt. If a deadlock occurs. If you want the session to fail on deadlock set NumOfDeadlockRetries to zero. If all attempts fail due to deadlock. Number of seconds before the PowerCenter Integration Service retries a target write on database deadlock. the session fails. GE MSAT Internal . 5. converting strings to date etc. 1.e Reader thread. Transformation Thread and Writer thread to extract. 3. 3. transform and load the data. 2. In informatica this is the main background process which runs after completion of the Load Manager. Distributes the session to worker servers. Locks the workflow and reads the workflow properties Reads the parameter file and expands the workflow Variables Created the workflow log file. Starts the session and Create DTM process DTM Process gets started it does the below tasks 1. Validates the source and target code pages. Expression Transformation It is passive and connected transformation. It is used to calculate values on a single row. Expression transformation can also be used to test conditional statements before passing the data to other transformations. 5. Examples of calculations are concatenating the first and last name. 7. GE MSAT Internal . 2. adjusting the employee salaries. Verifies the connection object permissions Runs pre and post session commands It creates the three processes i. When Powercenter Server runs a workflow it initializes Load manager and the Load Manager is responsible to perform the below tasks. 6. 4. 4.DTM PROCESS DTM means Data Transformation Manager. Creates and expands session variables Creates the session log file. It fetches session and mapping metadata from the repository. The sorter transformation can also be used for case-sensitive sorting and can be used to specify whether the output rows should be distinct or not. The sorter transformation is used to sort the data from relational or flat file sources. In case of relational sources. Performance improvement Tip Use the sorter transformation before the aggregator and joiner transformation and sort the data for better performance. It is used to filter out rows in the mapping.Sorter Transformation It is active and connected transformation. You cannot connect ports from more than one transformation to the filter. Note: The input ports to the filter transformation must come from a single transformation. It is used to sort the data in ascending or descending order. if possible use the source qualifier transformation to filter the rows. This will reduce the number of rows to be read from the source. Performance Tuning Tips Use the filter transformation as close as possible to the sources in the mapping. Filter Transformation It is active and connected transformation. GE MSAT Internal . This will reduce the number of rows to be processed in the downstream transformations. You can configure input.Aggregator Transformation It is active and connected transformation. It is used to perform calculations such as sums. When the Integration Service performs incremental aggregation. GE MSAT Internal . it passes source data through the mapping and uses historical cache data to perform aggregation calculations incrementally. The Integration Service assumes all the data is sorted by group and it performs aggregate calculations as it reads rows for a group. Configuring the aggregator transformation: You can configure the following components in aggregator transformation Aggregate Cache: The integration service stores the group values in the index cache and row data in the data cache. Incremental Aggregation: After you create a session that includes an Aggregator transformation. then integration service fails the session. Aggregate Expression: You can enter expressions in the output port or variable port. Sorted Input: This option can be used to improve the session performance. Group by Port: This tells the integration service how to create groups. Incremental Aggregation. input/output or variable ports for the group. averages. You can use this option only when the input to the aggregator transformation in sorted on group by ports. counts on groups of data. If you specify the sorted input option without actually sorting the data. you can enable the session option. Sorted Input: You can improve the performance of aggregator transformation by specifying the sorted input. or Auto for the data cache size. or Auto for the index cache size. You can set a numeric value.000. Select the Group By option for the ports that define the groups. It is used to select top or bottom rank of data. In the ports tab. the Integration Service determines the cache size at runtime. In case of Auto.Rank Transformation It is an Active and Connected Transformation. In case of Auto. Rank Data Cache Size: The data cache size default value is 2.000. by default it creates RANKINDEX port. You can check the Rank (R) option for only one port.000 bytes. Number of Ranks: specify the number of rows you want to rank. the Integration Service determines the cache size at runtime. GE MSAT Internal . check the Rank (R) option for the port which you want to do ranking. Configure the following properties of Rank transformation Top/Bottom: Specify whether you want to select the top or bottom rank of data.000 bytes. This port is used to store the ranking position of each row in the group. You can set a numeric value. Optionally you can create the groups for ranked rows. Rank Index Cache Size: The index cache size default value is 1. When we create RANK Transformation. Unconnected Stored Procedure Transformation It is not directly connected to other transformations in the mapping. The different values of this property are shown below: Normal: The stored procedure transformation runs for each row passed in the mapping. Pre-load of the Source: Runs before the session reads data from the source. Connected stored procedures run only in normal mode. "Stored Procedure Type" is used to specify when the stored procedure runs. Stored Procedure Transformation is used to run the stored procedure in the database.Stored Procedure Transformation It is passive and can be acts as connected or unconnected transformation. Check the status of a target database before loading data into it. Dropping and recreating indexes Connected Stored Procedure Transformation It is connected to other transformations in the mapping. Determine if enough space exists in a database. This is useful when running a calculation against an input port. Perform a specialized calculation. Use connected stored procedure transformation when data from an port sent as input parameters to the stored procedure transformation. GE MSAT Internal . Specifying when the Stored Procedure Runs: The property. It runs either before or after the session or is being called by an expression in other transformation in the mapping. Some of the Tasks we can perform through Stored Procedure Transformation. Useful for verifying the existence of tables or performing joins of data in a temporary table. Used for flagging the row as Update. DD_DELETE: Numeric value is 2. Used for flagging the row as Reject. Used for flagging the row as Insert. Update Strategy Transformation It is an Active and connected transformation. The constants and their numeric equivalents for each database operation are listed below. This is useful for verifying target tables or disk space on the target system. you can set the update strategy at two different levels: Session Level: Configuring at session level instructs the integration service to either treat all rows in the same way (Insert or update or delete). Post-load of the Target: Runs after loading data into the target. In the Informatica. then you have to specify a primary key in the target definition in the mapping for update strategy transformation to work. It can also reject the records without reaching target tables. Useful for removing temporary tables. Pre-load of the Target: Runs before the session sends data to the target. update and delete the records in the target table. The integration service treats any other numeric value as an insert Important Note: Update strategy works only when we have a primary key on the target table. Flagging Rows in Mapping with Update Strategy: You have to flag each row for inserting. update. Used for flagging the row as Delete. GE MSAT Internal . If there is no primary key available on the target table. DD_REJECT: Numeric value is 3. Mapping Level: Use update strategy transformation to flag rows for insert. DD_UPDATE: Numeric value is 1. It is used to insert. This is useful for re-creating indexes on the database. DD_INSERT: Numeric value is 0. updating. deleting or rejecting. delete or reject. Post-load of the Source: Runs after reading data from the source. performs a lookup and returns data to the pipeline. From Informatica 9 onwards. The lookup transformation is used to perform the following tasks: Get a Related Value: You can get a value from the lookup table based on the source value. get all the states in a country. you can use a dynamic or static cache. You can lookup values in the cache to determine if the values exist in the target. GE MSAT Internal . It is used to look up data in flat file or relational database. Get Multiple Values: You can get multiple rows from a lookup table. then you can mark the row for insert or update in the target. By default. we can get the related value like city name for the zip code value. If you cache the lookup source. The unconnected lookup returns one column to the calling transformation. Cached or Un-cached Lookup: You can improve the performance of the lookup by caching the lookup source. Update Slowly Changing Dimension tables: Lookup transformation can be used to determine whether a row exists in the target or not. As an example. It can return single row or multiple rows. the integration service inserts or updates row in the cache. If you use a dynamic cache. Perform Calculation. the lookup cache is static and the cache does not change during the session. Connected or Unconnected lookup: A connected lookup receives source data. A transformation in the pipeline calls the lookup transformation with the :LKP expression. Lookup is an active transformation. We can use the value from the lookup table and use it in calculations. An unconnected lookup is not connected to source or target or any other transformation. As an example. It can be used in both connected/unconnected modes.Lookup Transformation It is passive/Active transformation. and the sequence of rows from any given input stream is preserved in the output. Though the total number of rows passing into the Union is the same as the total number of rows passing out of it. Important Notes GE MSAT Internal . The two input pipelines include a master and a detail pipeline. To join more than our sources. It is used to merge the data from multiple pipelines into single pipelines. To join n number of sources. It is used to join two heterogeneous sources. row number 1 from input stream 1 might not be row number 1 in the output stream. It is same as UNION ALL transformation in SQL.Union Transformation It is Active and Connected Transformation. The integration service joins both the sources based on the Join Condition.e. It does not remove any duplicate rows. Union does not even guarantee that the output is repeatable Joiner Transformation It is Active and Connected Transformation. Why union transformation is active Union is an active transformation because it combines two or more data streams into one. i. the positions of the rows are not preserved. you need to join the output of joiner transformation with another source. we have to use n-1 joiner transformations in the mapping. Table Name: Subjects Subject_Id subject_Name ----------------------1 2 3 Maths Chemistry Physics Table Name: Students Student_Id Subject_Id GE MSAT Internal . By default the designer creates the input/output ports for the source fields in the joiner transformation as detail fields. By default the designer configures the second source ports as master fields Join Condition The integration service joins both the input sources based on the join condition. The join condition contains ports from both the input sources that must match. Let say i have the following students and subjects tables as the source. Drag the ports from the first source into the joiner transformation. As an example. Here department_id1 is the port of departments source and department_id is the port of employees source. if you want to join the employees and departments table then you have to specify the join condition as department_id1= department_id. Now drag the ports from the second source into the joiner transformation. Other operators are not allowed in the join condition. Join Type The joiner transformation supports the following four types of joins. Normal Join Master Outer Join Details Outer Join Full Outer Join We will learn about each join type with an example. You can specify only the equal (=) operator between the join columns. --------------------10 20 30 1 2 NULL Assume that subjects source is the master and students source is the detail and we will join these sources on the subject_id port. It discards the unmatched rows from the master source. the joiner transformation keeps all the records from the detail source and only the matching rows from the master source. The output of the normal join is Master Ports | Detail Ports --------------------------------------------Subject_Id Subject_Name Student_Id Subject_Id --------------------------------------------1 2 Maths Chemistry 10 20 1 2 Master Outer Join: In a master outer join. The output of master outer join is Master Ports | Detail Ports --------------------------------------------Subject_Id Subject_Name Student_Id Subject_Id --------------------------------------------1 2 NULL Maths Chemistry NULL 10 20 30 1 2 NULL GE MSAT Internal . Normal Join: The joiner transformation outputs only the records that match the join condition and discards all the rows that do not match the join condition. Detail Outer Join: In a detail outer join. The output of detail outer join is Master Ports | Detail Ports --------------------------------------------Subject_Id Subject_Name Student_Id Subject_Id --------------------------------------------1 2 3 Maths Chemistry Physics 10 20 NULL 1 2 NULL Full Outer Join: The full outer join first brings the matching rows from both the sources and then it also keeps the nonmatched records from both the master and detail sources. It discards the unmatched rows from the detail source. the joiner transformation keeps all the records from the master source and only the matching rows from the detail source. The output of full outer join is Master Ports | Detail Ports --------------------------------------------Subject_Id Subject_Name Student_Id Subject_Id --------------------------------------------1 Maths 10 1 GE MSAT Internal . If you use more than one filter transformation. Where as in a router transformation. you can specify only one condition and drops the rows that do not satisfy the condition. Use router transformation if you need to test the same input data on multiple conditions Advantages of Using Router over Filter Transformation Use router transformation to test multiple conditions on the same input data. Specify the source with fewer rows and with fewer duplicate keys as the master and the other source as detail Normalizer Transformation It is active and connected Transformation. the integration service needs to process the input for each filter GE MSAT Internal . It returns multiple rows for a source row. Performing joins in a database is faster than performing joins in a session. perform joins in a database. you can specify more than one condition and provides the ability for route the data that meet the test condition. You can improve the session performance by configuring the Sorted Input option in the joiner transformation properties tab. In a filter transformation. Router Transformation Router Transformation is an Active and Connected Transformation. GCID holds column number of the occurrence field.2 3 NULL Chemistry Physics NULL 20 NULL 30 2 NULL NULL Joiner Transformation Performance Improve Tips To improve the performance of a joiner transformation follow the below tips If possible. It is used to filter the data based on some condition. The two new ports are GK field generate sequence number starting from the value as defined in sequence field. In case of router transformation. and begins a new transaction. begins a new transaction.transformation. the integration service processes the input data only once and thereby improving the performance. Transaction Control Transformation It is an active and connected Transformation. The current row is in the new transaction. Use the following built-in variables in the expression editor of the transaction control transformation: TC_CONTINUE_TRANSACTION: The Integration Service does not perform any transaction change for this row. and writes the current row to the target. The current row is in the rolled back transaction. TC_COMMIT_AFTER: The Integration Service writes the current row to the target. TC_ROLLBACK_BEFORE: The Integration Service rolls back the current transaction. It is used to control the commit and rollback of transactions. We can define the transactions at following Levels. Mapping Level: Use the transaction control transformation to define the transactions. begins a new transaction. and begins a new transaction. Session Level: Specify the Commit Type in Session Properties Tab. TC_COMMIT_BEFORE: The Integration Service commits the transaction. This is the default value of the expression. GE MSAT Internal . rollback or continue. The current row is in the committed transaction. The current row is in the new transaction. and writes the current row to the target. then the integration service fails the session. commits the transaction. TC_ROLLBACK_AFTER: The Integration Service writes the current row to the target. rolls back the transaction. If the transaction control transformation evaluates to a value other than the commit. SQL Transformation SQL Transformation is a connected transformation used to process SQL queries in the midstream of a pipeline. SQL transformation is an active transformation. You can configure it as passive transformation. We can insert. Database Type: The type of database that the SQL transformation connects to. GE MSAT Internal . You can also pass the database connection information to the SQL transformation as an input data at run time. The SQL transformation processes external SQL scripts or SQL queries created in the SQL editor. update. Active/Passive: By default. delete and retrieve rows from the database at run time using the SQL transformation. Configuring SQL Transformation The following options can be used to configure an SQL transformation Mode: SQL transformation runs either in script mode or query mode. Connection type: You can pass database connection information or you can use a connection object. It is optional. GE MSAT Internal . Mapplet Output: We must use Mapplet output Transformation to store Mapplet output. Mapplet Input: We use Mapplet Input Transformation to give input to the Mapplet.Mapplet It contains set of transformations and lets us reuse that transformations logic in multiple mappings. It is a reusable object that we create in Mapplet Designer. and subtracts one when the row is Marked for deletion. delete. The Integration service saves the latest value of the Mapping Variables to the Repository at the end of each successful session. Mapping Variables Mapping variables are values that can change between sessions. or reject. delete. It ignores rows marked for update. Aggregation type set to Min. Aggregation type set to Max. SetMaxVariable: Sets the variable to the maximum value of a group of values. SetCountVariable: Increments the variable value by one. It retains the same value throughout the session run. It ignores rows marked for update or reject. Variable functions Variable functions determine how the Integration Service calculates the current value of a mapping variable in a pipeline. It ignores rows marked for update. Aggregation type set to Count. It adds one to the variable value when a row is marked for insertion.Mapping Parameters and Mapping Variables Mapping Parameters It represents a constant value that we define before running a session. or reject. GE MSAT Internal . We use Mapping Variable to perform an incremental read of the source. SetMinVariable: Sets the variable to the minimum value of a group of values. We can run it in Pre-Session Command or Post Session Success Command or Post Session Failure Command. For example. At the end of a session. Pre. To run a session.txt from D drive to E. it compares the final current value of the variable to the start value of the variable. SESSION TASK A session is a set of instructions that tells the Power Centre Server how and when to move data from sources to targets.and post-session shell command: We can call a Command task as the pre. Standalone Command task: We can use a Command task anywhere in the workflow or worklet to run shell commands. it saves a final value to the repository. Example: to copy a file sample.or postsession shell command for a Session task.SetVariable: Sets the variable to the configured value. Command: COPY D:\sample. 2. depending on our needs EMAIL TASK The Workflow Manager provides an Email task that allows us to send email during a workflow. we must first create a workflow to contain the Session task. or archive target files. we can specify shell commands in the Command task to delete reject files. Ways of using command task: 1. We can run as many sessions in a workflow as we need. This is done in COMPONENTS TAB of a session. Select the Value and Type option as we did in Email task. We can run the Session tasks sequentially or concurrently.txt E:\ in windows GE MSAT Internal . COMMAND TASK The Command task allows us to specify one or more shell commands in UNIX or DOS commands in Windows to run during the workflow. copy a file. Based on the aggregate type of the variable. When you create a parameter file for the session. Session Parameters This parameter represents values you might want to change between sessions. After it completes set the value to TransDB2 and run the session again. then define the parameters in a session parameter file. To execute a worklet. you can create a database connection parameter. If a certain set of tasks has to be reused in many workflows then we use worklet.Worklet Worklet is a set of tasks. and use it as the source database connection for the session. you set $DBConnectionSource to TransDB1 and run the session. NOTE: You can use several parameter together to make session management easier. it has to be placed inside the workflow. We can use session parameter in a session property sheet. and you use the database connections TransDB1 and TransDB2 to connect to the databases. For example. GE MSAT Internal . like $DBConnectionSource. such as DB Connection or source file. you have the same type of transactional data written to two different databases. The user defined session parameter are: (a) DB Connection (b) Source File directory (c) Target file directory (d) Reject file directory Description: Use session parameter to make sessions more flexible. Instead of creating two sessions for the same mapping. You want to use the same mapping for both tables. For example. it fails to initialize the session. Decision tasks determine how the Integration Service runs a workflow. use the Status variable to run a second session only if the first session completes successfully. Decision tasks. Links. Use workflow variables when you configure the following types of tasks: 1. and another link to follow when the decision condition evaluates to false. For Example. when the server can not find a value for a session parameter. Use an Assignment task to assign a value to a user-defined workflow variable.Session parameters do not have default value. after a Decision task. GE MSAT Internal . User-defined workflow variables. 4. Timer tasks. you can increment a user-defined counter variable by setting the variable to its current value plus 1. Assignment tasks. Use workflow variables in links to create branches in the workflow. you can create one link to follow when the decision condition evaluates to true. 2. 3. You create user-defined workflow variables when you create a workflow. Links connect each workflow task. For example. The Workflow Manager provides predefined workflow variables for tasks within a workflow. Use a user-defined date/time variable to specify the time the Integration Service starts to run the next task. Timer tasks specify when the Integration Service begins to run the next task in the workflow. Workflow Variables Workflow Variables are used to reference values and record Information. Types of Workflow Variables Predefined workflow variables. If there is no error. Note: You might use this variable when a task consistently fails with this final error message. The Workflow Manager lists taskspecific variables under the task name in the Expression Editor. If the task fails. Sample syntax: $s_item_summary. Integration Service Name.Condition = <TRUE | FALSE | NULL | any integer> End Time Date and time the associated task ended. or workflow start time.EndTime > TO_DATE('11/10/2004 08:13:25') Last error code for the associated task. Use built-in variables in a workflow to return run-time or system information such as folder name. the Workflow Manager keeps the condition set to null.ErrorCode = 24013. Last error message for the associated task.If there is no error. Sample syntax: $s_item_summary. The Workflow Manager provides a set of task-specific variables for each task in the workflow. system date. Sample syntax: $s_item_summary. Task-Specific Variables Condition Description Task Types Data type Integer Evaluation result of decision condition expression. the Integration Service sets ErrorMsg to an empty string when the task completes.Predefined Workflow Variables: Each workflow contains a set of predefined variables that you use to evaluate workflow and task conditions.ErrorMsg = All tasks Date/Time ErrorCode All tasks Integer ErrorMsg All tasks Nstring GE MSAT Internal . The Workflow Manager lists built-in variables under the Built-in node in the Expression Editor. Use task-specific variables in a link condition to control the path the Integration Service takes when running the workflow. Decision Sample syntax: $Dec_TaskStatus. Use the following types of predefined variables: Task-specific variables. Precision is to the second. the Integration Service sets ErrorCode to 0 when the task completes. Built-in variables. FirstErrorMsg = 'TE_7086 Tscrubber: Debug info… Failed to evalWrapUp'Variables of type Nstring can have a maximum length of 600 characters.STOPPED 4.FAILED 3.PrevTaskStatus = FAILED Total number of rows the Integration Service failed to read from the source. Sample syntax: $s_item_summary.ABORTED Session Integer All Task Date/Time Status All Task Integer GE MSAT Internal . Sample syntax: $s_dist_loc. Sample syntax: $s_dist_loc.SrcFailedRows = 0 Session Integer FirstErrorMsg Session Nstring PrevTaskStatus All Tasks Integer SrcFailedRows Session Integer SrcSuccessRows Total number of rows successfully read from the sources.ABORTED 2. Precision is to the second. First Error Code Error code for the first error message in the session. Status of the previous task in the workflow that the Integration Service ran. Sample syntax: $s_item_summary. Note: You might use this variable when a task consistently fails with this final error message. Sample syntax: $s_item_summary.If there is no error. the Integration Service sets FirstErrorCode to 0 when the session completes. If there is no error.SrcSuccessRows > 2500 StartTime Date and time the associated task started.'PETL_24013 Session run completed with failure Variables of type Nstring can have a maximum length of 600 characters. Statuses include: . Sample syntax: $Dec_TaskStatus.FirstErrorCode = 7086 First error message in the session.StartTime > TO_DATE('11/10/2004 08:13:25') Status of the previous task in the workflow.SUCCEEDED Use these key words when writing expressions to evaluate the status of the previous task. the Integration Service sets FirstErrorMsg to an empty string when the task completes. Statuses include: 1. This event Waits for a specified file to arrive at a given location.Status = SUCCEEDED TgtFailedRows Total number of rows the Integration Service failed to write to the target. Sample syntax: $s_dist_loc.TgtFailedRows = 0 Session Integer TgtSuccessRows Total number of rows successfully written to the target.. Open any workflow where we want to create an event. Use user-defined variables when you need to make a workflow decision based on criteria you specify. Steps for creating User Defined Event: 1. GE MSAT Internal .TgtSuccessRows > 0 TotalTransErrors Total number of transformation errors.STARTED . When u create variable in the workflow. User-defined event: A user-defined event is a sequence of tasks in the Workflow. We create events and then raise them as per need.DISABLED . Sample syntax: $s_dist_loc. WORKING WITH EVENT TASKS We can define events in the workflow to specify the sequence of task execution. Sample syntax: $s_dist_loc. it is valid only in that workflow.NOTSTARTED . Sample syntax: $s_dist_loc.SUCCEEDED Use these key words when writing expressions to evaluate the status of the current task. Types of Events: Pre-defined event: A pre-defined event is a file-watch event.TotalTransErrors = 5 Session Integer Session Integer User Defined Workflow Variables: We can create variables within a workflow.FAILED .STOPPED . Example: Run session s_m_filter_example relative to 1 min after the timer task. EVENT WAIT: Event-Wait task waits for a file watcher event or user defined event to occur before executing the next session in the workflow. 4. Validate the workflow and Save it. The Decision task has a pre-defined variable called $Decision_task_name. Types of Events Tasks: EVENT RAISE: Event-Raise task represents a user-defined event. Example1: Use an event wait task and make sure that session s_filter_example runs when abc. Click Apply -> Ok. Click to Add button to add events and give the names as per need. If any of s_m_filter_example or S_M_TOTAL_SAL_EXAMPLE fails then S_m_sample_mapping_EMP should run. We use this task to raise a user defined event. or the top-level workflow starts. The Power Center Server evaluates the condition in the Decision task and sets the predefined condition variable to True (1) or False (0). TIMER TASK The Timer task allows us to specify the period of time to wait before the Power Center Server runs the next task in the workflow.condition that represents the result of the decision condition.txt file is present in D:\FILES folder. the parent workflow. 3. DECISION TASK The Decision task allows us to enter a condition that determines the execution of the workflow. The next task in workflow will run as per the date and time specified. The Timer task has two types of settings: Absolute time: We specify the exact date and time or we can choose a user-defined workflow variable to specify the exact time. Relative time: We instruct the Power Center Server to wait for a specified period of time after the Timer task. GE MSAT Internal . We can specify one decision condition per Decision task. Example: Command Task should run only if either s_m_filter_example or S_M_TOTAL_SAL_EXAMPLE succeeds.2. similar to a link condition. Click Workflow-> Edit -> Events tab. Workflow. [Global] [Folder_Name.com Each heading section identifies the Integration Service. $PMBadFileDir=<null> GE MSAT Internal . or Session to which the parameters or variables apply. ASSIGNMENT TASK The Assignment task allows us to assign a value to a user-defined workflow variable Parameter File Parameters file provides us with the flexibility to change parameter and variable values every time we run a session or workflow. Worklet. or fail the top-level workflow or the parent workflow based on an input link condition.WT:Worklet_Name. A parent workflow or worklet is the workflow or worklet that contains the Control task. set the parameter or variable value to <null> or simply leave the value blank. $$LOAD_SRC=SAP $$DOJ=01/01/2011 00:00:01
[email protected]:Workflow_Name.ST:Session_Name] [Session_Name To assign a null value. A parameter file contains a list of parameters and variables with their assigned values.CONTROL TASK We can use the Control task to stop. We give the condition to the link connected to Control Task. Folder. abort. $PMCacheDir= Difference between Mapping Parameters and Variables A mapping parameter represents a constant value that we can define before running a session. A mapping parameter retains the same value throughout the entire session. If we want to change the value of a mapping parameter between session runs we need to Update the parameter file. A mapping variable represents a value that can change through the session. The Integration Service saves the value of a mapping variable to the repository at the end of each successful session run and uses that value the next time when we run the session. Variable functions like SetMaxVariable, SetMinVariable, SetVariable, SetCountVariable are used in the mapping to change the value of the variable. At the beginning of a session, the Integration Service evaluates references to a variable to determine the start value. At the end of a successful session, the Integration Service saves the final value of the variable to the repository. The next time we run the session, the Integration Service evaluates references to the variable to the saved value. To override the saved value, define the start value of the variable in the parameter file. GE MSAT Internal Constraint based Loading Constraint based load ordering is used to load the data first in to a parent table and then in to the child tables. You can specify the constraint based load ordering option in the Config Object tab of the session. For every row generated by the active source, the integration service first loads the row into the primary key table and then to the foreign key tables. The constraint based loading is helpful to normalize the data from a denormalized source data. The constraint based load ordering option applies for only insert operations. You cannot update or delete the rows using the constraint base load ordering. You have to define the primary key and foreign key relationships for the targets in the target designer. The target tables must be in the same Target connection group. Complete Constraint based load ordering There is a work around to do updates and deletes using the constraint based load ordering. The informatica powercenter provides an option called complete constraint-based loading for inserts, updates and deletes in the target tables. To enable complete constraint based loading, specify FullCBLOSupport=Yes in the Custom Properties attribute on the Config Object tab of session If you don't check the constraint based load ordering option, then the work flow will succeed in two cases. 1. When there is no primary key constraint on the deparatments table. GE MSAT Internal 2. When you have only unique values of department id in the source. If you have primary key and foreign key relation ship between the tables, then always you have to insert a record into the parent table (departments) first and then the child table (employees). Constraint based load ordering takes care of this. Target Load Order Is used to specify the order in which integration service loads the targets. If you have multiple source qualifier transformations connected to multiple targets, you can specify the order in which the integration service loads the data into the targets. Target Load Order Group: A target load order group is the collection of source qualifiers, transformations and targets linked in a mapping The following figure shows the two target load order groups in a single mapping: Use of Target Load Order: Target load order will be useful when the data of one target depends on the data of another target. For example, the employees table data depends on the departments data because of the primary-key and foreign-key relationship. So, the departments table should be loaded first and then the employees table. Target load order is useful when you want to maintain referential integrity when inserting, deleting or updating tables that have the primary key and foreign key constraints. GE MSAT Internal I have used only the year and price columns of sales table. Consider the below sales table as an example and see how the incremental aggregation works. This process makes the integration service to update the target incrementally and avoids the process of calculating the aggregations on the entire source.Incremental Aggregation in Informatica Incremental Aggregation is the process of capturing the changes in the source and calculating the aggregations in a session. When you run the session for the first time using the incremental aggregation. Source: YEAR PRICE ---------2010 100 2010 200 2010 300 2011 500 2011 600 2012 700 For simplicity. We need to do aggregation and find the total price in each year. index and data file. After the aggregation. then integration service process the entire source and stores the data in two file. The integration service creates the files in the cache directory specified in the aggregator transformation properties. Target: YEAR PRICE GE MSAT Internal . the target table will have the below data. The incremental aggregation uses the data stored in the cache and calculates the aggregation. The target table will contains the below data. GE MSAT Internal . the source will contain the last four records. Once the aggregation is done.---------2010 600 2011 1100 2012 700 Now assume that the next day few more rows are added into the source table. So. the integration service writes the changes to the target and the cache. Source: YEAR PRICE ---------2010 100 2010 200 2010 300 2011 500 2011 600 2012 700 2010 400 2011 100 2012 200 2013 800 Now for the second run. you have to pass only the new data changes to the incremental aggregation. then the session perfromance many not benfit. Note: The integration service creates a new aggregate cache when A new version of mapping is saved Configure the session to reinitialize the aggregate cache Moving or deleting the aggregate files Decreasing the number of partitions Configuring the mapping for incremental aggregation Before enabling the incremental aggregation option. In this case go for normal aggregation. make sure that you capture the changes in the source data. When you use incremental aggregation. If the incremental aggregation process changes more than hhalf of the data in target. Use incremental aggregation only if the target is not going to change significantly.Target: YEAR PRICE ---------2010 1000 2011 1200 2012 900 2013 800 Points to remember 1. 2. first time you have to run the session with complete source data and in the subsequent runs you have to pass only the changes in the source data. You can use lookup transformation or stored procedure transformation to remove the data which is GE MSAT Internal . Transformation threads process data according to the transformation logic in the mapping and writer thread connects to the target and loads the data. The first step in performance tuning is to identify performance bottlenecks. the session.already processed. Tuning starts with the identification of bottlenecks in source. Performance bottlenecks can occur in the source and target. and the system When a PowerCenter session is triggered. mapping and further to session tuning.disoln. transformation thread and writer thread.html GE MSAT Internal . integration service starts Data Transformation Manager (DTM).org/2013/08/Informatica-PowerCenter-Performance-Turning-A-to-Z-Guide. It might need further tuning on the system resources on which the Informatica PowerCenter Services are running. the mapping. target. You can also create a trigger on the source database and can read only the source changes in the mapping. Performance Tuning Performance tuning process identifies the bottlenecks and eliminates it to get a better ETL load time. which is responsible to start reader thread. Reader thread is responsible to read data from the sources. Any data processing delay in these threads leads to a performance issue http://www. With mapping bottleneck. This causes the entire session to run slower. writer thread will not be able to free up space for reader and transformer threads. Inefficient query can cause source bottlenecks. until the data is written to the target.Source Bottlenecks Performance bottlenecks can occur when the Integration Service reads from a source database. Target Bottlenecks When target bottleneck occurs. transformation thread runs slower causing the reader thread to wait for free blocks and writer thread to wait blocks filled up for writing to target Session Bottlenecks GE MSAT Internal . So the the reader and transformer threads to wait for free blocks. Mapping Bottlenecks A complex mapping logic or a not well written mapping logic can lead to mapping bottleneck. you may have a session bottleneck. you might need to increase or decrease the buffer block size. and session. System Bottlenecks After you tune the source. such as Aggregator. Sorter. can slowdown reading. GE MSAT Internal . mapping. Small cache size. This in turn leads to a bottleneck on the reader. target data. Not having enough buffer memory for DTM process. 1. and Rank Buffer Memory Optimization When the Integration Service initializes a session. or mapping bottleneck. XML. The Integration Service uses system resources to process transformations. Adding extra memory blocks can keep the threads busy and improve session performance. Optimizing the Buffer Block Size Depending on the source. low buffer memory. Sessions that use a large number of sources and targets might require additional memory blocks. and small commit intervals can cause session bottlenecks. Session bottleneck occurs normally when you have the session memory configuration is not turned correctly. transformation or writer thread. run sessions. You can do this by adjusting the buffer block size and DTM Buffer size. Joiner. Lookup. The Integration Service also uses system memory to create cache files for transformations. consider tuning the system to prevent system bottlenecks. target. it allocates blocks of memory to hold source and target data. and read and write data. target.If you do not have a source. transforming or writing process. which improves performance. Increasing DTM Buffer Size When you increase the DTM buffer memory. GE MSAT Internal . the Integration Service creates more buffer blocks.2. Lookup uses cache memory to store transformed data. Rank. If the allocated cache memory is not large enough to store the data. which includes index and data cache. 1. GE MSAT Internal . Session performance slows each time the Integration Service reads from the temporary cache file. Caches Memory Optimization Transformations such as Aggregator.II. the Integration Service stores the data in a temporary cache file. Increasing the Cache Sizes You can increase the allocated cache sizes to process the transformation in cache memory itself such that the integration service do not have to read from the cache file. GE MSAT Internal .You can update the cache size in the session property of the transformation as shown below. GE MSAT Internal . Using Bulk Loads You can use bulk loading to improve the performance of a session that inserts a large amount of data into a Oracle. the Integration Service bypasses the database log. 5. the target database cannot perform rollback. you may not be able to perform recovery. Without writing to the database log. As a result. Dropping Indexes and Key Constraints When you define key constraints or indexes in target tables. you might be able to improve performance by optimizing the query with optimizing hints. Increasing Database Checkpoint Intervals The Integration Service performance slows each time it waits for the database to perform a checkpoint. you slow the loading of data to those tables.III. Optimizing the Target 1. Inefficient query can cause source bottlenecks. which speeds performance. 2. IV. Optimizing the Query If a session joins multiple source tables in one Source Qualifier. To increase performance. You can rebuild those indexes and key constraints after the session completes. To improve performance. however. 1. Optimizing the Source Performance bottlenecks can occur when the Integration Service reads from a source database. increase the checkpoint interval in the database. When bulk loading. or Microsoft SQL Server database. drop indexes and key constraints before you run the session. Partitioning: the session improves the session performance by creating multiple connections to sources and targets and loads data in parallel pipe lines. GE MSAT Internal . you can use incremental aggregation to improve session performance. Filter Transformations: If your session contains filter transformation. Group transformations: Aggregator.Optimizing Transformations Each transformation is different and the tuning required for different transformation is different. create that filter transformation nearer to the sources or you can use filter condition in source qualifier. Lookup Transformations: If the session contained lookup transformation you can improve the session performance by enabling the look up cache. To improve session performance in this case use sorted ports option ie sort the data before using the transformation. But generally. you reduce the number of transformations in the mapping and delete unnecessary links between transformations to optimize the transformation. Rank and joiner transformation may often decrease the session performance . The cache improves the speed by saving the previous data and hence no need to load that again.Because they must group data before processing it. Incremental Aggregation: In some cases if a session contains an aggregator transformation. So. To deal issue. SCD Type 1 2. we have the following SCD types 1. SCD Type 2 3. I have the customer table with the below data. How to record such changes is a common concern in Data warehousing. the original entry in the customer lookup table has the following record: Customer Key 1001 Name City Chennai Rajkumar At a later date. It is used to correct data errors in the dimension. he first lived in Chennai. This method overwrites the old data in the dimension table with the new data.SCD Slowly changing dimensions are dimensions that have data that changes slowly. he moved to Vizag on Dec 2008. As an example. SCD Type 3 SCD Problem: Rajkumar is a customer with ABC Inc. surrogate_key customer_id customer_name Location -----------------------------------------------1 1 Marspton Illions GE MSAT Internal . How should ABC Inc. SCD Type 1: SCD type 1 methodology is used when there is no need to store historical data in the dimension table. now modify its customer table to reflect this change? This is the "Slowly Changing Dimension" problem. Here the customer name is misspelt. It should be Marston instead of Marspton. In type 2. The latest sequence number always represents the current row and the previous sequence numbers represents the past data. If you use type1 method. you can store the data in three different ways. As an example. The disadvantage is that there is no historical data kept in the data warehouse. The data in the updated table will be. They are Versioning Flagging Effective Date SCD Type 2 Versioning: In versioning method. surrogate_key customer_id customer_name Location -----------------------------------------------1 1 Marston Illions The advantage of type1 is ease of maintenance and less space occupied. surrogate_key customer_id customer_name Location Version -------------------------------------------------------- GE MSAT Internal . a sequence number is used to represent the change. Initially the customer is in Illions location and the data in dimension table will look as. let’s use the same example of customer who changes the location. it just simply overwrites the data. SCD Type 2 It stores entire history of data in the dimensional table. the customer dimension will look as.1 1 Marston Illions 1 The customer moves from Illions to Seattle and the version number will be incremented. surrogate_key customer_id customer_name Location Version -------------------------------------------------------- GE MSAT Internal . SCD Type 2 Flagging: In flagging method. a flag column is created in the dimension table. a new record will be inserted into the dimension table with the next version number. the old records will be updated with flag value as 0 and the latest record will have the flag value as 1. Now for the first time. The current record will have the flag value as 1 and the previous records will have the flag as 0. The dimension table will look as surrogate_key customer_id customer_name Location Version -------------------------------------------------------1 2 1 1 Marston Marston Illions Seattle 1 2 Now again if the customer is moved to another location. surrogate_key customer_id customer_name Location flag -------------------------------------------------------1 1 Marston Illions 1 Now when the customer moves to a new location. 1 2 1 1 Marston Marston Illions Seattle 0 1 SCD Type 2 Effective Date: In Effective Date method. The customer dimension table in the type 3 method will look as surrogate_key customer_id customer_name Current_Location previous_location -------------------------------------------------------------------------1 1 Marston Illions NULL Let say. surrogate_key customer_id customer_name Location Start_date End_date ------------------------------------------------------------------------1 2 1 1 Marston Marston Illions Seattle 01-Mar-2010 21-Feb-2011 20-Fdb-2011 NULL The NULL in the End_Date indicates the current version of the data and the remaining records indicate the past data. SCD Type 3: In type 3 methods. the customer moves from Illions to Seattle and the updated table will look as surrogate_key customer_id customer_name Current_Location previous_location GE MSAT Internal . only the current status and previous status of the row is maintained in the table. To track these changes two separate columns are created in the table. the period of the change is tracked using the start_date and end_date columns in the dimension table. Stopping or Aborting a Session Task: When you issue a stop command on a session. then the updated table will be surrogate_key customer_id customer_name Current_Location previous_location -------------------------------------------------------------------------1 1 Marston NewYork Seattle The type 3 method will have limited history and it depends on the number of columns you create. Abort command is handled the same way as the stop command.-------------------------------------------------------------------------1 1 Marston Seattle Illions Now again if the customer moves from seattle to NewYork. except that the abort command has timeout period of 60 seconds. SCD type 4 provides a solution to handle the rapid changes in the dimension tables. Things to know In SCD O Dimensional Table. we just keep the data as it is and it will never change. It continues processing and writing data to the targets and then commits the data. GE MSAT Internal . the integration service first stops reading the data from the sources. it kills the DTM process and terminates the session. If the Integration Service cannot finish processing and committing data within the timeout period. Load Manager And DTM Process - Informatica While running a Workflow, the PowerCenter Server uses the Load Manager process and the Data Transformation Manager Process (DTM) to run the workflow and carry out workflow tasks. When the PowerCenter Server runs a workflow, the Load Manager performs the following tasks: 1. 2. 3. 4. 5. 6. Locks the workflow and reads workflow properties. Reads the parameter file and expands workflow variables. Creates the workflow log file. Runs workflow tasks. Starts the DTM to run sessions. Sends post-session email if the DTM terminates abnormally. When the PowerCenter Server runs a session, the DTM performs the following tasks: 1. Fetches session and mapping metadata from the repository. 2. Creates and expands session variables. 3. Creates the session log file. 4. Verifies connection object permissions. 5. Runs pre-session shell commands. 6. Runs pre-session stored procedures and SQL. 7. Creates and runs mapping, reader, writer, and transformation threads to extract,transform, and load data. 8. Runs post-session stored procedures and SQL. 9. Runs post-session shell commands. 10. Sends post-session email DTM Data Transformation Manager The PowerCenter Integration Service process starts the DTM process to run a session. The DTM is the process associated with the session task. DTM process does the below tasks Read the session information The DTM retrieves the mapping and session metadata from the repository and validates it. Perform Pushdown Optimization If the session is configured for pushdown optimization, the DTM runs an SQL statement to push transformation logic to the source or target database Expand Variables and Parameters GE MSAT Internal If the workflow uses a parameter file, the PowerCenter Integration Service process sends the parameter file to the DTM when it starts the DTM. The DTM creates and expands sessionlevel, service-level, and mapping-level variables and parameters Create the Session Log The DTM creates logs for the session. The session log contains a complete history of the session run, including initialization, transformation, status, and error messages Verify Connection Object Permissions The DTM verifies that the user who started or scheduled the workflow has execute permissions for connection objects associated with the session. Run Pre-Session Operations After verifying connection object permissions, the DTM runs pre-session shell commands. The DTM then runs pre-session stored procedures and SQL commands Run the Processing Threads After initializing the session, the DTM uses reader, transformation, and writer threads to extract, transform, and load data. Run Post-Session Operations After the DTM runs the processing threads, it runs post-session SQL commands and stored procedures. The DTM then runs post-session shell commands. Send Post-Session Email When the session finishes, the DTM composes and sends email that reports session completion or failure Test Load We can configure the Integration service to perform a test load. GE MSAT Internal With test load the Integration service reads and transforms data without writing it to target. The integration service generates all session files and performs pre and post-session functions. The Integration Service writes data to the relational targets but rollback the data when session completes. For other targets, Integration Service does not write data to the targets. Enter the number of source rows you want to test in the Number of Rows to Test Field. Project Process (end-to-end) Requirement Gathering We will get FSD or BRD document from the client. Based on that we need to do analysis. If any doubt in the requirement we have to get it from the BA first then client. We need to develop the mapping. Need to prepare Mapping document. Testing Code Migration from Dev to QA Raise a Change Request with Informatica Design team (L3 Informatica Architecture) ( SLA is 3 days) and Attach below documents in the ticket. 1. CRF(wfs,sessions,mappings) 2. Load Statistics(need to provide in separate document) 3. Volumetrics(list of tables,frequency of data) 4. session logs and workflow logs Need Informatica Design team Approval - Review session logs and provide the approval They will assign the ticket to Informatica Admin team(L2 Informatica admin team) ( SLA is 2 days) Admin team will work on it and only the listed changes in the CRF document will migrate to the QA. Note GE MSAT Internal GE MSAT Internal . We observe how much time is taken by reader and writer. need to perform Integration testing. Integration Testing After Unit testing is done. Calculate the load time Run the session and view the statistics. Use the session and workflow logs to capture the load statistics. After unit testing we need to prepare Unit Test Case document. We need to review field by field from the source to target and ensure that the required transformation logic is applied. We generally check the source and target count in each mapping. 3. The process for Code Migration from QA to Production is same as Code Migration from Dev to QA except IM Approval is required here. Minimum test records we can test is 100. Integration testing would cover end-to-end testing for data warehouse. Validate source and Target Analyse and validate your transformation business rules. 2. Unit Testing 1. A Mapplet is a reusable object that can contain as many transformations as you need and it is a reusable object by "default". it is not fixed logic that you can reuse in other mappings.and post.Reusable Transformation has to be made reusable while creating a transformation and it is single transformation. Normalizer transformations . so that is dependent on the input to the normalizer.The mapplet is a reusable object so if we are using sequence generator in our mapplet it should be reusable.Informatica might never see such a business scenario or functionality requirement.What are the transformations we can not use in Mapplet? Mapplet .Mapplet is a reusable logic that you can use across different mappings. Other mapplets .pre-session and post-session properties are present in target so we can't use them also.Mapplet is just the set of reusable transformation. COBOL sources<!--[if !supportLists]-->5) !supportLists]-->6) <!--[endif]-->XML Source Qualifier transformations<!--[if <!--[endif]-->XML sources Non Reusable Sequence Generator transformation . it is not used to load data in target. Normalizer is a dynamic transformation which converts rows to columns or vice-versa.session stored procedures . You cannot include the following objects in a mapplet: Target definitions . so they limited this option. GE MSAT Internal . Pre. Reusable Transformation . That’s why target definition is missing. Code page The code page in Informatica is used to specify the character encoding. the remaining sequence values will be discarded. It is selected based on the source data. There should not be any constraints in the target table. Hence improves performance. If it has then disable it before loading and enable it after loading. Integration service loads bulk data into the target table but bypasses writing into the database logs. When bulk loading. When Bulk loading to oracle database. Note If your mapping has update Strategy then your session will be data driven. Example: If it caches values 1 to 10 and the session completes at sequence 7. GE MSAT Internal . the sequence starts from 11. In this case even if you use BULK mode Informatica will treat this as a N What is the use of cache value in sequence generator transformation in Informatica Cached values increase the performance by reducing the amount of time to contact the repository for getting next sequence value. define the large commit interval to increase performance. It affects sequence.Why we should not use Bulk Load on the targets having indexes/constraints? Bulk load Bulk Load is used to improve the performance of the session. As a result target database cannot perform rollback and recovery is not possible. In the next run. UTF-8. The Integration Service cannot recover the session.ASCII. GE MSAT Internal . Q1. If a session fails after loading 10000 records in the target how can we start loading into the target from 10001 records? We can run the session with recovery strategy mode. if you create a mapping to process Japanese data. you can copy a session in a different folder or repository. To avoid data loss. Fail session and continue the workflow. then the code page is selected to support Japanese text. Most commonly selected encoding systems are . Restart. you must select a Japanese code page for the source data. but it continues the workflow. Choose one of the following recovery strategies: Resume from the last checkpoint. The Integration Service runs the session again when it recovers the workflow. The Integration Service saves the session state of operation and maintains target recovery tables. For example. This is the default session recovery strategy Informatica and Oracle interview questions Can you copy the session to a different folder or repository? Yes.32 An encoding is the assignment of a number to a character in the character set You use code pages to identify data that might be in different languages. UTF. If the source data contains Japanese characters. By using copy session wizard. What are the types of mapping in Getting Started Wizard? Simple Pass through mapping Slowly growing target mapping What are the different types of Type2 slowly changing dimensions? There are three types of slowly changing dimensions SCD with versioning SCD with flags SCD with Date What are the different threads in DTM process? Master thread Mapping thread GE MSAT Internal .What is a command that used to run a batch? pmcmd is used to start a batch. GE MSAT Internal . How can we store previous session logs? Just run the session in time stamp mode then automatically session log will not overwrite current session log. What are the scheduling options to run a session? Different options of scheduling are Run only on demand: Informatica server runs the session only when user starts session explicitly Run once: Informatica server runs the session only once at a specified date and time.Reader thread Writer thread Pre and post session threads What are the active and passive transformations? An active transformation can change the number of rows that pass through it. Customized repeat: Informatica server runs the session at the date and time specified in the repeat dialog box. Run every: Informatica server runs the session at regular intervals as you configured. A passive transformation does not change the number of rows that pass through it. Mapping Dependencies. view Source/Target dependencies. What are the various Tasks in Informatica Workflow Manager? The various Tasks in Informatica are Assignment Task Command Task Control Task Decision Task E-mail Task Event Raise Task Event Wait Task GE MSAT Internal . Mapplet is a set of reusable transformations for applying same business logic. Versioning etc.What is the difference between Mapping and Mapplet? Mapping is the collection of Source Definition. Transformation(s) and/or Mapplet. adding and removing repositories. What is the use of Repository Manager? Repository Manager is used to manage folders. Target Definition. What is Target Designer called prior to Informatica 8. What is the use of Target Designer? Target Designer is used to create Target Definition.Session Task Timer Task Link Task What is the use of Source Analyzer? Source Analyzer is used to create source definition.6? Warehouse Designer Can we use Reusable Sequence Generator transformation in Mapplet? No Why Cache? Explain with an example? GE MSAT Internal . Select * from Dual. It gives output of exactly one column name dummy and one record ‘x’. this change has to be updated to each and every user. If the object undergoes a change. ----------- DUMMY X Can we install Power Center Informatica 8. Instead. Doing this increases the performance of the system. Suppose there is an object which is used by many users in the company. What is the use of Shared Object? It is easier to explain this using an example. if the object is made as shared. then the update has to be done to the object and all other users get the update. What is Dual table? Dual table is a table that is created by oracle along with data dictionary.If there are x lot number of records with y number of columns from source data and we need to extract z number of columns only (very less) then the cache stores those columns for respective records in the $PMCACHEDIR of Informatica Server so that we don’t need to extract each record from database and load into Informatica.X version on Windows 7? GE MSAT Internal . What are the types of variables in Informatica? There are three types of variables in Informatica Predefined variable represented by $ User defined variable represented with $$ System variable denoted by $$$ Difference between Informatica normal join and Oracle Equi join? Equi join in Oracle is performed on oracle sources (relational sources) while Informatica Equi joins can be performed on non-relational sources too (oracle and flat file etc). Design. What is requirements gathering? It is carried out by Business Analyst. What is degenerate dimension? Dimension which has no dimension table of its own and is derived from the fact table. Implementation and Testing and finally Maintenance are carried on. It is nothing but interacting with end users and getting to know what his requirements are. GE MSAT Internal .x on Windows 7. the rest of the phases Analysis. We can Install Informatica 9. Based on his requirements.No we can’t install Informatica on Power Center. What are the types of joins in Informatica and in Oracle? There are four types of joins in oracle equi join non equi join self join outer join Joins in informatica master join (right outer join) detailed join (left outer join) GE MSAT Internal .What is Junk dimension? The dimension that is formed by lumping of smaller dimensions is called Junk dimension. What is Staging Area? Staging Area is indeed a database where data from different source systems are brought together and this database acts as an input to Data Cleansing. we take the table with lesser number of rows as master while the more number of rows as detailed. If Versioning is turned off. mappings etc. we will not be able to track the changes for the respective Sessions/Mappings/Workflows. What is tracing level? What are the types of tracing level? Tracing level is the amount of information that Informatica server writes into a log file. Types of tracing level Normal Terse Verbose init Verbose data In joiner transformation. each and every row of the master is compared with every row of the detailed and so. in Repository? The format of files for Informatica Objects in Repository is XML Where can we find Versioning in Informatica? What happens if Versioning is turned off? In Informatica. the less is the number of iterations and so better is the performance of the system. we can find Versioning in Repository Manager. Why? In joiner. the less number of rows in master. What are all the databases the Informatica server on windows can connect to? Informatica server on windows can connect to SQL server database Oracle Sybase Teradata MS Access MS Excel Informix DB2 What are the databases the Informatica server on UNIX can connect to? Informatica on UNIX can connect to Sybase Teradata Informix GE MSAT Internal .normal join What is the file extension or format of files for the Informatica Objects like sessions. What is a session? Session is a set of instructions that tells the Informatica server when to and how to move the data from source to targets.DB2 Oracle What is an overview window? It is a window where you can see all the transformations in a mapping. How does server recognize the source and target databases? GE MSAT Internal . STOP and ABORT session using PMCMD command. What are the differences between Informatica 6 and 7? Informatica 7 has We can use pmcmd command Union and custom transformation Version controlling What is the use of bitmap indexes? Bitmap indexes are used to join large fact tables to smaller dimension tables. How can we delete duplicate rows from flat files? We can make use of sorter transformation and select distinct option. If a session fails after loading 10000 records into the target how can we start loading into the target from the 10001th record? We can run the session with the recovery strategy mode. What is the limit of joiner transformations? We cannot use sequence generator or update strategy transformations before or after joiner transformations. What are the different things we can do using PMCMD command? We can START. In how many ways we can update a source definition? Two ways We can reimport the source definition We can edit the source definition What is mapping? Mapping is nothing but data flow between source and target. it is just an ETL tool but we can generate metadata reports. you can’t What are the types of groups in Router Transformation? There are three types of groups in Router Transformation namely Input group Otput group Default group What are batches? What are the types of batches? Batches provide a way to group sessions for either sequential or parallel execution by Informatica server. What r the types of metadata that stores in repository? GE MSAT Internal . Can you start batches within a batch? No.By using ODBC if they are relational. What is rank index in a group? Power Center Designer automatically creates a RANK INDEX port while using Rank transformation. (SCD-2 by RamaSurReddy) Yes. The purpose of this RANK INDEX port is to store the ranking for the column(s) we are interested in. we cannot generate reports using Informatica. Concurrent batches – which run at the same time Sequential batches – which run one after the other What is Power Center Repository? Power Center Repository allows you to share metadata among different repositories and to create a datamart domain. FTP if they are flat files. What are the constants or flags for each database operation and their numeric equivalent in Update Strategy? Insert DD_INSERT 0 Update DD_UPDATE 1 Delete DD_DELETE 2 Reject DD_REJECT 3 Can you generate reports using Informatica? No. Using Informatica Data Analyzer Tool we can generate reports. Syntax Create [replace] synonym for [schema. The dimensions here are normalized. views. What is SnowFlake Schema? In SnowFlake Schema. Implementation and Testing and Maintenance. then it will first looks in the lookup cache and if the row is not present in the cache. What is page code compatibility? It is nothing but compatibility of code for maintaining data accuracy. First type is measures and second type is the foreign keys for the dimension tables. Design. stored procedures etc. What are Synonyms? Synonyms are alternative names for database objects such as tables. The dimensions here are denormalized. Fact table has two types of columns.Source definition Target definition Mappings Mapplet Transformations Differences between dynamic cache and static cache In case of dynamic cache. its stores only into target table and not in cache.]object_name Types of Lookup Cache? Static cache Dynamic cache Persistent cache Recache from database Shared cache What are various stages of SDLC? Requirements Gathering. if we want to insert a new row. It comes into picture when data is in different languages. the dimensions are further divided into sub dimensions. What is the use of source qualifier? Source qualifier is used to convert different data types to Informatica compatible data types. GE MSAT Internal . What is Star Schema? Star Schema is a simplest form of schema which has one fact table and at least one dimension table. Analysis. it inserts the row into the cache as well as target but in case of static cache. What is Fact table? It is a centralized table in Star Schema. and relationships for the physical implementation of a database. What is Physical Data Modeling? Physical Data Modeling is a type of data modeling which includes all required tables. What are the types of files created by Informatica server during the session running? Types of files created are Cache file Session log file Informatica server log file Output file Reject file What are the types of repositories created by Informatica repository manager? Four types of repositories are created using repository manager Standalone repository Global repository Local repository Versioned repository GE MSAT Internal .What is Dimension table? A dimension table is one that describes the business entities of an enterprise. What is operational data store (ODS)? Operational data store is defined to be structure that is Subject-oriented Integrated Volatile and current data that is a day or perhaps a month old. we can make only one transformation as reusable. What is Logical Data Modeling? Logical Data Modeling is a type of data modeling which represents business requirements of an organization. Difference between mapplet and reusable transformation? Using mapplet we can make set of transformations reusable whereas in reusable transformation. columns. what is Data cleansing? It is the process of converting data from different format of files or databases to single required format. What is throughput in Informatica? Throughput is nothing but the rate at which Informatica server reads the data from sources and writes them successfully to the target. Lookup transformation can run on source or target tables while Union tables work only on source tables. the source tables or data should have similar structure while its not the case with the Lookup transformation. right click on session. For Union transformation. What are types of loading in Informatica? The two types of loading available in Informatica are Bulk Loading Normal Loading What is the difference between local index and global index? Global index is nothing but a single index covering all partitions whereas local index has separate index for each partition. Where can we find the throughput option in Informatica? We can view this in workflow monitor In workflow monitor. then click on Get Run Properties and under Source/Target statistics we can find throughput option What is code page? Code page consists of encoding to specify characters in set of one or more languages and are selected based on source language. What is Complex Mapping? Complex Mapping will have the following features Dificult requirements Many number of transformations Complex business logic GE MSAT Internal .What are the two types of processes that run the session? The two types of processes that run the session are Load Manager DTM processes (Data Transformation Manager) What is the difference between Union and Lookup transformation? Union transformation is Active while Lookup transformation is Passive. What is the difference between Filter transformation and Router transformation? Filter transformation drops the data that do not meet the condition whereas Router transformation captures the data even though the condition is not met and saves it in Default output group. What is the similarity between Router and Filter transformations? Router transformation and Filter transformations are used to filter the data based on condition. Both Filter and Router transformation are Active transformations. We need matching keys to join two relational sources in Source Qualifier transformation and is not the case with joiner transformation. Both Filter and Router transformation are connected. Router transformation can give more than one output. What is the difference between Source Qualifier transformation and Joiner transformation? Source Qualifier transformation is used to join the data from homogeneous sources while Joiner transformation is used to join data from heterogeneous sources as well as homogenous sources from different schemas. the easier the migration. GE MSAT Internal . Filter transformation works on single condition only while Router transformation works on multiple conditions as well.How many ways you can add ports? Two ways From other transformation Click on add port button How many number of sessions can you can you group in batches? Any number of sessions but the lesser the number of sessions in a batch. What is the difference between Aggregator transformation and Expression Transformation? Aggregator transformation use aggregator functions and performs calculation on entire group whereas in Expression transformation performs calculation on row by row basis. Filter transformation gives only one output. These parameter files are defined in session properties. What is PARAM file? Param file is an ordinary text file where we can define the value for the parameter which is defined in session. Joiner works on source data only while Lookup works on source as well as target data. it inserts accordingly. Joiner transformation supports equi joins only while Lookup supports equi join as well as non equi joins. How can we improve the session performance in Aggregator transformation? We can increase the session performance by sending the sorted input to the aggregator transformation. Delete and Reject. Joiner transformation is connected while Lookup transformation can be either connected or unconnected. What is aggregated cache? Aggregator cache is a temporary location which stores the input data values while the aggregation calculation is being carried out. Update.Which transformation should we use to normalize the COBOL and relational sources? We need to make use of Normalizer Transformation What are the various transformations which we cannot use in Mapplet? Transformations which we cannot use in Mapplet are Normalizer Transformation XML Source Qualifier Transformation Target Definition Cobol Sources Pre and Post Session Stored Procedures What is the dfference between Joiner transformation and Lookup transformation? Joiner is active transformation while Lookup is passive transformation. Difference between connected Lookup and Unconnected Lookup? Connected lookup receives input directly from mapping pipeline whereas unconnected lookup receives input from :LKP expression of another transformation. Connected lookup returns more than one column in a row where as unconnected GE MSAT Internal . What is Data Driven? Informatica Server follows the instructions coded into update strategy within session mapping which determine how to flag the records for Insert. What is the use of Lookup transformation? Lookup transformation is used to check whether matching record exists in the target table and if the matching record doesn’t exist. we can perform recovery process while in indirect loading. Can Lookup be done on flat files? Yes What is the transformation used in loading 4 flat files of similar structure to a single target? We can make use of Union transformation Difference between direct and indirect loading options in sessions? Direct loading can be used on single transformations while indirect loading can be used on multiple transformations In direct loading. What are the various techniques for implementing a dimensional model? Star schema Snowflake schema What are the types of dimensions? There are three types of dimensions Slowly changing dimensions Confirmed dimensions Casual dimensions What is SQL override? It is nothing but overriding SQL in source qualifier or lookup for additional logic. we cannot perform recovery process. Connected lookup is not reusable whereas unconnected is.lookup returns only one column in each row. What are the default values for variables? The default variables for Number=0 Variable=NULL GE MSAT Internal . What is a Mapplet? Mapplet is an object which consists of set of reusable transformations which can be used in different mappings. Performance of connected lookup is lower compared to unconnected lookup. Connected lookup Supports user-defined values while unconnected doesn’t. Date=1/1/1753 What are the types of data movement in Informatica server? There are two types of modes in Informatica Ascii mode Unicode mode What happens to the discarded records in Filter transformation? Discarded rows do not appear in session logs and reject files. Update. In case of flat files (which comes through FTP) haven’t arrived what happens? The session is going to fail because of fatal error Why do we need to reinitialise aggregate cache? We need to reinitialise the aggregate cache only to remove the previous data present in the aggrgator cache and to be used by the new source. What are the various mapping objects available in Informatica? Mapping objects that are aviailable in Informatica are Source Definition Target Definition Link Transformation Mapplet What is the default source option for Update Strategy? Data Driven What is the use of Update Strategy transformation? Update Strategy is used to perform DML operations like Insert. we need to use Filter transformation as close as possible to the sources in the mapping. How to increase the performance of session in using Filter transformation? To increase the performance of session. What is the default join that the Joiner transformation provides? Normal join What are the types of Lookup Cache? Static cache Dynamic cache Persistent cache Recache from database GE MSAT Internal . Delete and Reject on already populated targets. If we know the RowId. we can read entire row. with default options. primary key is created as a clustered index while unique key is created as a non clustered index A unique key is similar to primary key but we can have more than one unique key per table What are the types of joins in Informatica and in Oracle? There are four types of joins in oracle equi join non equi join self join outer join Joins in informatica master join (right outer join) detailed join (left outer join) normal join What is the difference between RowId and RowNum? RowId is the physical address of a row.Shared cache What are the differences between unique key and primary key? Primary key cannot contain null value whereas unique key can contain one and only one null value In case of sql server. RowNum is the temporary number allocated during query execution What are pseudo columns? What are the various types of pseudo columns? Pseudo columns are columns which are not in the table but they can be used in sequel queries as if they are part of the table GE MSAT Internal . RowNum RowId Sysdate User Currval Nextval What are the various types of Statements in Oracle? The various Statements that are available in Oracle are Data Manipulation Language (DML) Statements Data Definition Language (DDL) Statements Transaction Control (TC) Statements Session Control (SC) Statements System Control (SC) Statements What are the various types of DML Statements? Select Update Delete Insert Merge What are various AGGREGATOR FUNCTIONS? SUM AVG MIN MAX COUNT STDDEV VARIANCE FIRST GE MSAT Internal . Third normal form: Third normal form states that if a dependency exist between non key attributes. For example. Varchar datatype has a size of 2000 bytes whereas varchar2 has a size of 4000 bytes. Second normal form: Second normal form states that data redundancy can be reduced if all the non key attributes which are dependent on one of the primary keys of a composite primary key are put to a separate table along with depended primary key . First normal form: First normal form state that each field is atomic. though we entered less size for a column. Difference between varchar and varchar2 Varchar and varchar2 are variable length datatypes. it is not going to allocate the maximum size. state depend on country then the table is sepeated as two different tables as attributes having partid. It is the process of reducing the complex data structure into a simpler one by removing the redundancy.LAST What are the types of joins in oracle? Equijoins or simple joins or inner joins Non-equijoin Outer join Self join Difference between char and varchar? Char is a fixed length data type where as varchar is a variable length data type. This should also satisfy the 1nf. What is Normalisation? Define 1NF. state and country and partid. if a table has attributes like partid. city. state. country for a composite primary key and city. city. then these attributes GE MSAT Internal . Varchar is in ascii whereas varchar2 is in unicode. it is going to allocate max size and in case of varchar. So in case of char. 2NF and 3NF. country and country. 1. The Integration Service builds the cache when it processes the first Lookup transformation. Instead of using JOINER transformation.i.e. We had multiple sources from the relational table.if a file failed then how many records loaded and how many not loaded yet and figure out how to load GE MSAT Internal .any session failed or not. 2. You can share named cache between multiple lookup transformations in the same mapping or different mappings. we used lookup override to fetch data for a particular business. Role of ETL Developer The daily routines of the developer in support role: 1)Do lights on activity. You can configure multiple Lookup transformations in a mapping to share a single lookup cache. Some of the steps are below. 3. If the source is relational then instead of using sorter transformation to sort the records. we used source qualifier to sort the data. Shared Cache You can share the lookup cache between multiple lookup transformations. we took care of this. Instead of doing lookup on entire table. we performed join in source qualifier itself. It uses the same cache to perform lookups for subsequent Lookup transformations that share the cache You can share unnamed cache between multiple lookup transformations in the same mappings.How did you implement performance tuning in Informatica? To answer these questions we can say that we specifically did not work on performance tuning but while implementing the mapping. check if the informatica services are running. but it should not be too far from this order: 1. i. 3)Optimise the current mappings and workflows 4)Ensure daily repository backup is taken 5)Follow SLA In development role: 1)Develop mappings and workflows as per provided requirements. what are the steps of doing dimensional modelling? {M} Answer: There are many ways. To make it 3rd NF we need to split it into 2 tables: table1 which has column1 & column2 and table2 which has column2 and column3. 3. Declare the grain of the fact table. In this case column3 is “transitively dependant” on column1.rest. 9.also find out the reason for session failure. Question: Tell me how to design a data warehouse. Question: What is 3rd normal form? {L} Give me an example of a situation where the tables are not in 3rd NF. 4. Add the measures to the fact tables (from Kimball’s Toolkit book GE MSAT Internal . Create the dimension tables including attributes.e. column1 is dependant on column2 and column2 is dependant on column3. 2)Develop informatica mappings and workflows for new change requests. 3)Test those mappings and prepare test cases. {M} Answer: No column is transitively depended on the PK. Understand the business process. then make it 3rd NF. 2. 4)Send daily updates to customer. For example. 2)Create high level and low level docs for the mappings created.also ask for peer review and create doc for same. then create the dims). and declaring the grain must always be the second. Informatica Regular expression and function are very handy here. Step 3 and 4 could be reversed (add the fact first. How to remove junk characters from a column through Informatica mappings Data cleansing is important aspect while performing ETL. Use REGREPLACE function for handling GE MSAT Internal . So here is how we can avoid Non Printable and Special Characters when we are loading the target tables. There are many cases were we get non printable and special characters in the input file or table. Example declarations of the grain include: An individual line item on a customer’s retail sales ticket as measured by a scanner device An individual transaction against an insurance policy A line item on a bill received from a doctor A boarding pass used by someone on an airplane flight An inventory measurement taken every week for every product in every store. but step 1 & 2 must be done in that order.chapter 2). Understanding the business process must always be the first. Remember that a fact table record captures a measurement. Declaring the grain means saying exactly what a fact table record represents. If you omit this option.'[^[:print]]'. syntax. GE MSAT Internal .''). so its better we use '' instead of NULL If we have a characters like ¿ à  � Ó Ø in the field it will replace those with nulls. Enclose the pattern in single quotes. So here we go for REPLACESTR() function which can handle multiple special characters from the input field. numReplacements Optional Numeric datatype. Whereas we need to replace a set of special characters from the input field. Informatica doesn't replace non printable with NULLS. datatype. You Required String pattern must use perl compatible regular expression datatype. Over here the function looks for '[^[:print]]' which means its searching for non printable characters in the field which we are passing and its been replaced with '' (NULL). but it will replace only one specific character from the field. Sometimes if we use REG_REPLACE(PRODUCT_DESC. for instance if the PRODUCT_DESC field looks like this ABCD�XYZ¿® C-PQRSTî it will cleansed and the output will be like ABCDXYZ-CPQRST Special Characters: There are some cases where we are asked for replacing special characters from the field we can think of REPLACECHR() function here. replace. Required String Specifies the number of occurrences you want to replace datatype. pattern. Passes the character string to be replaced. replace.Non Printable characters and REPLACESTR for replacing multiple special characters in the field. REG_REPLACE( subject. Examples: Non Printable: Syntax : 1.NULL). Use REG_REPLACE(PRODUCT_DESC. numReplacements ) Argument Required/Optional Description Required String subject Passes the string you want to search. Passes the new character string. REG_REPLACE will replace all occurrences of the character string.'[^[:print]]'. it places the cursor after the replaced characters in InputString before searching for the next match. InputString Required Must be a character string. When all OldString arguments are NULL or empty. You must enter at least one OldString argument. multiple characters. REPLACESTR returns InputString. 'abc'. OldString1. the first OldString argument has precedence over the second OldString argument. an empty string. If you pass a numeric value. if you enter multiple OldString arguments. and the second OldString argument has precedence over the third OldString argument. You can enter any valid transformation expression. When CaseFlag is a null value or 0. REPLACESTR removes all occurrences of OldString in InputString. The string you want to replace. Determines whether the arguments in this function are case sensitive. The function replaces the characters in the OldString arguments in the order they appear in the function. the function is case sensitive. When REPLACESTR contains multiple OldString arguments. for example. You can enter any valid transformation expression. When REPLACESTR replaces a string. Here REPLACESTR is very useful in replacing multiple characters from the GE MSAT Internal .. Must be a character string. If NewString is NULL or empty. You can enter one character. You can enter one or more characters per OldString argument. When CaseFlag is a number other than 0. or NULL. OldString Required NewString Required Must be a character string. OldStringN. If InputString is NULL. For example. . and one or more OldString arguments is NULL or empty. InputString. You can enter any valid transformation expression. Passes the strings you want to search. REPLACESTR returns NULL.Syntax: REPLACESTR ( CaseFlag. You can enter any valid transformation expression.. the function is not case sensitive. If you pass a numeric value. You can also enter a text literal enclosed within single quotation marks. [OldString2. REPLACESTR ignores the OldString argument.] NewString ) Argument CaseFlag Required/ Description Optional Required Must be an integer. the function converts it to a character string. the function converts it to a character string. '/'.'#'.'+'.'^'.'?'.'+'.' !'.#?#+/$!~`) will be replaced with '' ie NULL Like if PRODUCT_DESC is 'ABC~`DEF^%GH$%XYZ#!' the output of the expression will be 'ABCDEFGHXYZ'.'$'.'"'.'.'.'?'.'#'. For instance take the expression REPLACESTR(1.'`'.'`'.input field.'~'.'^'.PRODUCT_DESC.'') Important Note: Use relational connection with code page UTF-8 here Hope this will help us all in data cleansing.'~'.'').'$'.'[^[:print:]]'.PRODUCT_DESC. Happy learning!!!! Correction and suggestions are always welcomed :) GE MSAT Internal .'/'. This is how we can handle special characters from the input field.'!'.'"'. By using REG_REPLACE and REPLACESTR we can take care of both non printable and special characters from the input field like below REG_REPLACE(REPLACESTR(1.'.'.' ') Here it will replace all occurrence of special characters (".