Complete Reference to Informatica

April 2, 2018 | Author: hemanthgaddam | Category: Character Encoding, Computer Programming, Data, Data Management, Computer Data


Comments



Description

Introduction ETL Life Cycle The typical real-life ETL cycle consists of the following execution steps: 1. Cycle initiation 2. Build reference data 3. Extract (from sources) 4. Validate 5. Transform (clean, apply business rules, check for data integrity, create aggregates or disaggregates) 6. Stage (load into staging tables, if used) 7. Audit reports (for example, on compliance with business rules. Also, in case of failure, helps to diagnose/repair) 8. Publish (to target tables) 9. Archive 10. Clean up Best practices Four-layered approach for ETL architecture design • • • • • • • • • • • • Functional layer: Core functional ETL processing (extract, transform, and load). Operational management layer: Job-stream definition and management, parameters, scheduling, monitoring, communication and alerting. Audit, balance and control (ABC) layer: Job-execution statistics, balancing and controls, rejects- and errorhandling, codes management. Utility layer: Common components supporting all other layers. Storage costs relatively little Intermediate files serve multiple purposes: Used for testing and debugging Used for restart and recover processing Used to calculate control statistics Helps to reduce dependencies - enables modular programming. Allows flexibility for job-execution and -scheduling Better performance if coded properly, and can take advantage of parallel processing capabilities when the need arises. Parameter-driven jobs, functions, and job-control Code definitions and mapping in database Consideration for data-driven tables to support more complex code-mappings and business-rule application. Performance Scalable Migratable Recoverable (run_id, ...) Operable (completion-codes for phases, re-running from checkpoints, etc.) Auditable (in two dimensions: business requirements and technical troubleshooting) Use file-based ETL processing where possible Use data-driven methods and minimize custom ETL coding • • • • • • • • • Qualities of a good ETL architecture design : What is Informatica Informatica Power Center is a powerful ETL tool from Informatica Corporation. Informatica Corporation products are: • • • • • Informatica Power Center Informatica on demand Informatica B2B Data Exchange Informatica Data Quality Informatica Data Explorer Informatica Power Center is a single, unified enterprise data integration platform for accessing, discovering, and integrating data from virtually any business system, in any format, and delivering that data throughout the enterprise at any speed. Informatica Power Center Editions : Because every data integration project is different and includes many variables such as data volumes, latency requirements, IT infrastructure, and methodologies—Informatica offers three Power Center Editions and a suite of Power Center Options to meet your project’s and organization’s specific needs. • • • Standard Edition Real Time Edition Advanced Edition Informatica Power Center Standard Edition: Power Center Standard Edition is a single, unified enterprise data integration platform for discovering, accessing, and integrating data from virtually any business system, in any format, and delivering that data throughout the enterprise to improve operational efficiency. Key features include: • • • • A high-performance data integration server A global metadata infrastructure Visual tools for development and centralized administration Productivity tools to facilitate collaboration among architects, analysts, and developers . Informatica Power Center Real Time Edition : Packaged for simplicity and flexibility, Power Center Real Time Edition extends Power Center Standard Edition with additional capabilities for integrating and provisioning transactional or operational data in real-time. Power Center Real Time Edition provides the ideal platform for developing sophisticated data services and delivering timely information as a service, to support all business needs. It provides the perfect real-time data integration complement to service-oriented architectures, application integration approaches, such as enterprise application integration (EAI), enterprise service buses (ESB), and business process management (BPM). Key features include: • • • • • Change data capture for relational data sources Integration with messaging systems Built-in support for Web services Dynamic partitioning with data smart parallelism Process orchestration and human workflow capabilities Informatica Power Center Real Time Edition : Power Center Advanced Edition addresses requirements for organizations that are Standardizing data integration at an enterprise level, across a number of projects and departments. It combines all the capabilities of Power Center Standard Edition and features additional capabilities that are ideal for data governance and Integration Competency Centers. Key features include: • • • • • • • • • • • • • • Dynamic partitioning with data smart parallelism Powerful metadata analysis capabilities Web-based data profiling and reporting capabilities Power Center domain Administration Console Power Center repository Power Center Client Repository Service Integration Service Web Services Hub SAP BW Service Data Analyzer Metadata Manager Power Center Repository Reports Power Center includes the following components: POWERCENTER CLIENT The Power Center Client consists of the following applications that we use to manage the repository, design mappings, mapplets, and create sessions to load the data: 1. Designer 2. Data Stencil 3. Repository Manager 4. Workflow Manager 5. Workflow Monitor 1. Designer: Use the Designer to create mappings that contain transformation instructions for the Integration Service. The Designer has the following tools that you use to analyze sources, design target Schemas, and build source-to-target mappings: • • • • • Source Analyzer: Import or create source definitions. Target Designer: Import or create target definitions. Transformation Developer: Develop transformations to use in mappings. Mapplet Designer: Create sets of transformations to use in mappings. Mapping Designer: Create mappings that the Integration Service uses to Extract, transform, and load data. You can also develop user-defined functions to use in expressions. 2.Data Stencil Use the Data Stencil to create mapping template that can be used to generate multiple mappings. Data Stencil uses the Microsoft Office Visio interface to create mapping templates. Not used by a developer usually. 3.Repository Manager Use the Repository Manager to administer repositories. You can navigate through multiple folders and repositories, and complete the following tasks: • • • Manage users and groups: Create, edit, and delete repository users and User groups. We can assign and revoke repository privileges and folder Permissions. Perform folder functions: Create, edit, copy, and delete folders. Work we perform in the Designer and Workflow Manager is stored in folders. If we want to share metadata, you can configure a folder to be shared. View metadata: Analyze sources, targets, mappings, and shortcut dependencies, search by keyword, and view the properties of repository Objects. We create repository objects using the Designer and Workflow Manager Client tools. Source definitions: Definitions of database objects (tables, views, synonyms) or Files that provide source data. Target definitions: Definitions of database objects or files that contain the target data. Mappings: A set of source and target definitions along with transformations containing business logic that you build into the transformation. These are the instructions that the Integration Service uses to transform and move data. Reusable transformations: Transformations that we use in multiple mappings. Mapplets: A set of transformations that you use in multiple mappings. Sessions and workflows: Sessions and workflows store information about how and When the Integration Service moves data. A workflow is a set of instructions that Describes how and when to run tasks related to extracting, transforming, and loading Data. A session is a type of task that you can put in a workflow. Each session Corresponds to a single mapping. We can view the following objects in the Navigator window of the Repository Manager: • • • • • • 4.Workflow Manager : Use the Workflow Manager to create, schedule, and run workflows. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data. The Workflow Manager has the following tools to help us develop a workflow: • • Task Developer: Create tasks we want to accomplish in the workflow. Work let Designer: Create a worklet in the Worklet Designer. A worklet is an object that groups a set of tasks. A worklet is similar to a workflow, but without scheduling information. We can nest worklets inside a workflow. • Workflow Designer: Create a workflow by connecting tasks with links in the Workflow Designer. You can also create tasks in the Workflow Designer as you develop the workflow. When we create a workflow in the Workflow Designer, we add tasks to the workflow. The Workflow Manager includes tasks, such as the Session task, the Command task, and the Email task so you can design a workflow. The Session task is based on a mapping we build in the Designer. We then connect tasks with links to specify the order of execution for the tasks we created. Use conditional links and workflow variables to create branches in the workflow. 5.Workflow Monitor Use the Workflow Monitor to monitor scheduled and running workflows for each Integration Service. We can view details about a workflow or task in Gantt chart view or Task view. We Can run, stop, abort, and resume workflows from the Workflow Monitor. We can view Sessions and workflow log events in the Workflow Monitor Log Viewer. The Workflow Monitor displays workflows that have run at least once. The Workflow Monitor continuously receives information from the Integration Service and Repository Service. It also fetches information from the repository to display historic Information. Services Behind Scene INTEGRATION SERVICE PROCESS The Integration Service starts an Integration Service process to run and monitor workflows. The Integration Service process accepts requests from the Power Center Client and from pmcmd. It performs the following tasks: • • • • • • • • Manages workflow scheduling. Locks and reads the workflow. Reads the parameter file. Creates the workflow log. Runs workflow tasks and evaluates the conditional links connecting tasks. Starts the DTM process or processes to run the session. Writes historical run information to the repository. Sends post-session email in the event of a DTM failure. LOAD BALANCER The Load Balancer is a component of the Integration Service that dispatches tasks to achieve optimal performance and scalability. When we run a workflow, the Load Balancer dispatches the Session, Command, and predefined Event-Wait tasks within the workflow. The Load Balancer dispatches tasks in the order it receives them. When the Load Balancer needs to dispatch more Session and Command tasks than the Integration Service can run, it places the tasks it cannot run in a queue. When nodes become available, the Load Balancer dispatches tasks from the queue in the order determined by the workflow service level. DTM PROCESS When the workflow reaches a session, the Integration Service process starts the DTM process. The DTM is the process associated with the session task. The DTM process performs the following tasks: • • • • • • • • • • • • Retrieves and validates session information from the repository. Performs pushdown optimization when the session is configured for pushdown optimization. Adds partitions to the session when the session is configured for dynamic partitioning. Expands the service process variables, session parameters, and mapping variables and parameters. Creates the session log. Validates source and target code pages. Verifies connection object permissions. Runs pre-session shell commands, stored procedures, and SQL. Sends a request to start worker DTM processes on other nodes when the session is configured to run on a grid. Creates and run mapping, reader, writer, and transformation threads to extract, transform, and load data. Runs post-session stored procedures, SQL, and shell commands. Sends post-session email. PROCESSING THREADS The DTM allocates process memory for the session and divides it into buffers. This is also known as buffer memory. The default memory allocation is 12,000,000 bytes. The DTM uses multiple threads to process data in a session. The main DTM thread is called the master thread. The master thread can create the following types of threads: • • • • Mapping Threads: One mapping thread for each session. Pre- and Post-Session Threads: One thread created. Reader Threads: One thread for each partition Transformation Threads: One thread for each partition • Writer Threads: One thread for each partition CODE PAGES and DATA MOVEMENT A code page contains the encoding to specify characters in a set of one or more languages. An encoding is the assignment of a number to a character in the character set. The Integration Service can move data in either ASCII or Unicode data movement mode. These modes determine how the Integration Service handles character data. We choose the data movement mode in the Integration Service configuration settings. If we want to move multi byte data, choose Unicode data movement mode. ASCII Data Movement Mode: In ASCII mode, the Integration Service recognizes 7-bit ASCII and EBCDIC characters and stores each character in a single byte. Unicode Data Movement Mode: Use Unicode data movement mode when sources or targets use 8-bit or multi byte character sets and contain character data. Try U R Hand's on Admin-Console Repository Manager Tasks: • • • • • • • • • • Add domain connection information Add and connect to a repository Work with Power Center domain and repository connections Search for repository objects or keywords View object dependencies Compare repository objects Truncate session and workflow log entries View user connections Release locks Exchange metadata with other business intelligence tools Add a repository to the Navigator, and then configure the domain connection information when we connect to the repository. 1.Adding a Repository to the Navigator : 1. In any of the Power Center Client tools, click Repository > Add. 2. Enter the name of the repository and a valid repository user name. 3. Click OK. Before we can connect to the repository for the first time, we must configure the Connection information for the domain that the repository belongs to. 2.Configuring a Domain Connection 1. In a Power Center Client tool, select the Repositories node in the Navigator. 2. Click Repository > Configure Domains to open the Configure Domains dialog box. 3. Click the Add button. The Add Domain dialog box appears. 4. Enter the domain name, gateway host name, and gateway port number. 5. Click OK to add the domain connection. 3.Connecting to a Repository 1. Launch a Power Center Client tool. 2. Select the repository in the Navigator and click Repository > Connect, or double-click the repository. 3. Enter a valid repository user name and password. 4. Click Connect. Launch the Repository Manager and connect to the repository. we can compare two sessions to check for differences. Click Edit > Compare Objects. We can view dependencies for repository objects in the Repository Manager. and worklets. Select validation options from the Validate Objects dialog box 4. 7.Managing User Connections and Locks In the Repository Manager. 3. Steps: 1. The repository also creates an entry for each saved workflow log and session log. We can save and optionally check in objects that change from invalid to valid status as a result of the validation. Write-intent lock: Placed on objects we want to modify. Connect to the repository. 5. Steps: 1.Click on more button to add. Click Edit > Show User Connections or Show locks . 2. Click Compare in the dialog box displayed. 2. For example. We can end connections when necessary. In the Repository Manager. Types of locks created: 1. 5. the Integration Service saves those logs in local directories. In-use lock: Placed on objects we want to view 2. 6. In the Navigator. change or view domain information. 2.Comparing Repository Objects We can compare two repository objects of the same type to identify differences between the objects. the Repository Manager displays their attributes. The Truncate Workflow Log dialog box appears. Select the object of use in navigator. If we move or delete a session log or workflow log from the workflow log directory or session log directory. Select the objects you want to validate. select the object you want to compare. Workflow Manager. User connections: Use the Repository Manager to monitor user connections to the repository. mapplets. 5. 8. Execute lock: Locks objects we want to run. The repository creates different types of locks depending on the task. We can validate sessions. select the workflow in the Navigator window or in the Main window. 4. 4. Steps: 1. Click Analyze and Select the dependency we want to view. 3. we can view dependencies to see the impact on other objects. workflows. we can remove the entries from the repository. Click Analyze and Select Validate 3. 2. we can find out which workflows use the session.Validating Multiple Objects We can validate multiple objects in the repository without fetching them into the workspace. connect to the repository. The Repository Service locks and unlocks all objects in the repository. 3. Click Validate. Choose to delete all workflow and session log entries or to delete all workflow and session log entries with an end time before a particular date. If you want to delete all entries older than a certain date.Viewing Object Dependencies Before we change or delete repository objects. and Designer tools. 3. enter the date and time. before you remove a session. such as workflows and sessions Steps: 1. Click OK. Click a link to view the objects in the results group. In the Repository Manager. Steps: 1. mappings. 4.Truncating Workflow and Session Log Entries When we configure a session or workflow to archive session logs or workflow logs. 2. For example. When we compare two objects. we can view and manage the following items: Repository object locks: The repository locks repository objects and folders by user. Choose Edit > Truncate Log. We can do the rest as per our need. and remove users and groups. The default users are Administrator and the database user that created the repository.". connect to a repository. Propagating Port Descriptions: In Informatica 8 we can edit a port description and propagate the description to other transformations in the mapping. The locks or user connections will be displayed in a window. Click the Groups tab to create Groups. 1. the following SQL command Modifies how the session handles characters: Alter session set NLS_DATE_FORMAT='DD/MM/YYYY'.3. Only processes any Transformation logic that it cannot push to the database. 10 Working with Folders We can create. We cannot delete these users from the repository or remove them from the Administrators group. For example. Pushdown optimization: Uses increased performance by pushing Transformation logic to the database by analyzing the transformations and Issuing SQL statements to sources and targets. Edit or delete folder as per our need. Difference Between 7. 2. connect to a repository. edit. In the Repository Manager. Click the Privileges tab to give permissions to groups and users. We can Use SQL commands that depend upon a transaction being opened during The entire read or write process.Managing Users and Groups 1. 8. or 4.1 and 8. There are two default repository user groups: Administrators: This group initially contains two users that are created by default. 3. Enter the following information: 3. UDF (User defined function) similar to macro in excel 6. 7. Click the Users tab to create Users 5. New function in expression editor: New function have been introduced in Informatica 8X like reg_extract and reg_match 4. Click Folder > Create. 9. 6. FTP: We can have partitioned FTP targets and Indirect FTP File source (with file list). Repository query available in both versioned and non versioned Repositories previously it was available only for versioned repository. Flat File Enhancements: • • • • Reduced conversion of data types Delimited file performance has improved Flat file now can have integer and double data types Data can be appended to existing flat files . 9. 4. Select the options available to add. Concurrently write to multiple files in a session with partitioned targets. Public: The Repository Manager does not create any default users in the Public group. Click Security > Manage Users and Privileges. Target from Transformation: In Informatica 8X we can create target from Transformation by dragging transformation in Target designer 2. 2. 10. 5. 3. Click ok. In the Repository Manager. Environment SQL Enhancements: Environment SQL can still be used to Execute an SQL statement at start of connection to the database.6 1. a) Have customized SQL queries to check the source/targets and here we will perform the Record Count Verification.Informatica power center 8 is having the following features which makes it more powerful. 3. Initial loading of records on data warehouse. Incremental loading of records at a later date to verify the newly inserted or updated data. This requires a clear business requirement from the business on how to handle the data rejections. Analyze the success rows and rejections. c) Use the session and workflow logs to capture the load statistics. f) If you are making changes to existing mappings make use of the data lineage feature Available with Informatica Power Center. easy to use and manage when compared to previous versions. Qualitative Testing Analyze & validate your transformation business rules. d) You need to document all the load timing information. 2. Quantitative Testing Validate your Source and Target a) Ensure that your connectors are configured properly. Performance Improvement a) Network Performance b) Session Performance c) Database Performance d) Analyze and if required define the Informatica and DB partitioning requirements. it should form the basis of starting integration testing. Sequence of ETL jobs in batch. e) You need review field by field from source to target and ensure that the required transformation logic is applied. g) Ensure that appropriate dimension lookup’s have been used and your development is in Sync with your business requirements. unstructured and semi structured data Support for grid computing High availability Pushdown optimization Dynamic partitioning Metadata exchange enhancements Team based Development Global Web-based Admin console New transformations 23 New functions User defined functions Custom transformation enhancements Flat file enhancements New Data Federation option Enterprise GRID Testing Unit Testing Unit testing can be broadly classified into 2 categories. Integration testing should Test out initial and incremental loading of the data warehouse. . Analyze the Load Time a) Execute the session and review the session statistics. More of functional testing. c) You need to document all the connector information. b) Analyze the rejections and build a process to handle those rejections. Integration Testing After unit testing is complete. • • • • • • • • • • • • • • • • Supports Service oriented architecture Access to structured. b) Check the Read and Write counters. b) If you are using flat file make sure have enough read/write permission on the file share. How long it takes to perform the load. Do we need to reload or reject and inform etc? Discussions are required and appropriate process must be developed. Integration testing will involve following 1. This will help you to find the consequences of Altering or deleting a port from existing mapping. . Integration Testing would cover End-to-End Testing for DWH. and performs all pre. Transaction Logs. and Training Systems • Dramatically accelerate development and test cycles and reduce storage costs by creating fully functional. addition.4. Overflow checks: This is a limit check based on the capacity of a data field or data file area to accept data. Testing. Testing the rejected records that don’t fulfill transformation rules. Data Quality Validation Check for missing data. while maintaining full data integrity. Sign test: This is a test for a numeric data field containing a designation of an algebraic sign. testing. the first digit is the one lost. Error log generation. multiplication. .. The coverage of the tests would include the below: Count Validation Record Count Verification: DWH backend/Reporting queries against source and target as an initial check. Usually. the session fails. smaller targeted data subsets for development. quantities or dollars) for acceptability before further processing. Limit checks: The program tests specified data fields against defined high or low value limits (e. or dollars. With a test load. The Integration Service generates all session files. This programming technique can be used to detect the truncation of a financial or quantity data field value after computation (e. These totals have no significance other than for internal system control purposes. + or . Statistical Analysis Validation for various calculations. but rolls back the data when the session completes.and post-session Functions. the Integration Service reads and transforms data without writing to targets. such as account number. Dimensional Analysis Data integrity between the various source tables and relationships.g. Instead you use the Enable Test Load feature available in Informatica Power Center. Error Logs and Validity checks. You can perform a test load for relational targets when you configure a session for normal Mode. Number of Rows to Test Enter the number of source rows you want the Integration Service to test load. Control totals: To ensure accuracy in data entry and processing. fields for which it would logically be meaningless to construct a total).g. Optimize Development. Format checks: These are used to determine that data are entered in the proper mode. The Integration Service reads the number you configure for the test load. and division). as if running the full session. Property Enable Test Load Description You can configure the Integration Service to perform a test load. debits or credits for financial data fields. and training systems. for example. social security number. line items. The proper mode in each case depends on the data field definition. The Integration Service writes data to relational targets. For all other target types. User Acceptance Test In this phase you will involve the user to test the end results and ensure that business is satisfied with the quality of the data. the Integration Service does not write data to the targets. If you configure the session for bulk mode. such as flat file and SAP BW. within designated fields of information. Any changes to the business requirement will follow the change management process and eventually those changes have to follow the SDLC process. Field-by-Field data verification can be done to check the consistency of source and target data. • • When you validate the calculations you don’t require loading the entire rows into target and Validating it. a social security number in the United States should have nine digits Granularity Validate at the lowest granular level possible Other validations Audit Trails. Size test: This test can be used to test the full size of the data field. control totals can be compared by the system with manually entered or otherwise calculated control totals using the data fields such as quantities. part number. whereby totals are obtained on identifier fields (i. which can be used to denote.. negatives and consistency.. Enter the number of source rows you want to test in the Number of Rows to Test field. or employee number.e. or simple record counts Hash totals: This is a technique for improving data accuracy. Note: Based on your project and business needs you might have additional testing requirements. documents. For example. You cannot perform a test load on sessions using XML sources. as numeric or alphabetical characters. 5. Support Corporate Divestitures and Reorganizations Reduce the Total Cost of Storage Ownership • • • • • Informatica Power Center Testing Debugger: Very useful tool for debugging a valid mapping to gain troubleshooting information about data and error conditions. Substantially accelerate time to value for subsets of packaged applications. When you run the Debugger. resource intensive. Analyze performance details. Use Power Center conditional filter in the Source Qualifier to improve performance. .• • • • • • • • • • Quickly build and update nonproduction systems with a small subset of production data and replicate current subsets of nonproduction copies faster. Configure the Integration Service to fail over in safe mode and troubleshoot errors when you migrate or test a production environment configured for high availability. it pauses at breakpoints and you can view and edit transformation output data. Dramatically increase an IT team’s productivity by reusing a comprehensive list of data objects for data selection and updating processes across multiple projects. Lower training costs by standardizing on one approach and one infrastructure. For example. After the Integration Service fails over in safe mode. Run the Integration Service in safe mode to test a development environment before migrating to production Troubleshoot the Integration Service. production-like data in training systems. Analyze thread statistics to determine the optimal number of partition points. Accelerate application delivery by decreasing R&D cycle time and streamlining test data management. Use the following methods to identify performance bottlenecks: • Debugger You can debug a valid mapping to gain troubleshooting information about data and error conditions. Share metadata. I/O waits. Refer Informatica documentation to know more about debugger tool. Monitor system performance. you configure and run the Debugger from within the Mapping Designer. Train employees effectively using reliable. You can export the mapping to an XML file and edit the repository connection information before sending the XML file. The Debugger uses a session to run the mapping on the Integration Service. You can also use the Workflow Monitor to view system resource usage. Simplify test data management and shrink the footprint of nonproduction systems to significantly reduce IT infrastructure and maintenance costs. Running the Integration Service in Safe Mode • • Test a development environment. Analyze performance details. to determine where session performance decreases. realistic data before introducing them into production . you want to send a mapping to someone else for testing or analysis. you can correct the error that caused the Integration Service to fail over. Syntax Testing: Test your customized queries using your source qualifier before executing the session. You can use system monitoring tools to view the percentage of CPU use. Reduce application and upgrade deployment risks by properly testing configuration updates with up-to-date. Analyze thread statistics. Test Load Options – Relational Targets. such as performance counters. You can share metadata with a third party. instead of coding by hand—which is expensive. The third party can import the mapping from the XML file and analyze the metadata. Accelerate the provisioning of new systems by using only data that’s relevant to the divested organization. Decrease the cost and time of data divestiture with no reimplementation costs . Easily customize provisioning rules to meet each organization’s changing business requirements. Untangle complex operational systems and separate data along business lines to quickly build the divested organization’s system. but you do not want to disclose repository connection information for security reasons. Improve the reliability of application delivery by ensuring IT teams have ready access to updated quality production data. and time consuming . Decrease maintenance costs by eliminating custom code and scripting. Performance Testing for identifying the following bottlenecks: • • • • • • • • • Target Source Mapping Session System Run test sessions. Lower administration costs by centrally managing data growth solutions across all packaged and custom applications. You can configure a test session to read from a flat file source or to write to a flat file target to identify source and target bottlenecks. To debug a mapping. and paging to identify system bottlenecks. After you save a mapping. and session configuration properties. Monitor the Debugger. the Integration Service runs a debug instance of the debug workflow and creates and runs a debug workflow for the session. When you run the Debugger. Create breakpoints. Configure the Debugger. you can run some initial tests with a debug session before you create and configure a session in the Workflow Manager. complete the following steps: 1. and the session log. When you run the Debugger. You can also choose to load or discard target data. you can monitor the target data. the Designer connects to the Integration Service. Use the Debugger Wizard to configure the Debugger for the mapping. 2. View target data. the Integration Service runs a debug instance of the reusable session And creates and runs a debug workflow for the session. You might also want to run the Debugger against a session if you want to debug the mapping using the configured session properties. You can configure source. and session configuration properties through the Debugger Wizard. • • Debug Process To debug a mapping. target.You might want to run the Debugger in the following situations: • • Before you run a session. Create breakpoints in a mapping where you want the Integration Service to evaluate data and error conditions. When you run the Debugger. 4. Debugger Session Types: You can select three different debugger session types when you configure the Debugger. Run the Debugger from within the Mapping Designer. Instance window. you configure a subset of session properties within the Debugger Wizard. View transformation data. View messages from the Debugger. Run the Debugger. The Debugger uses existing source. the Designer displays the following windows: • • • Debug log. target. Select the session type the Integration Service uses when it runs the Debugger. you can run the Debugger against the session. The Debugger runs a workflow for each session type. While you run the Debugger. If a session fails or if you receive unexpected results in the target. . the Integration Service runs the non-reusable session and the existing workflow. The Integration Service reads the breakpoints and pauses the Debugger when the breakpoints evaluate to true. the debug log. The Integration Service initializes the Debugger and runs the debugging session and workflow. Target window. such as source and target location. When you create a debug session. Use an existing reusable session. 3. Create a debug session instance. target. The Debugger does not suspend on error. You can choose from the following Debugger session types when you configure the Debugger: • Use an existing non-reusable session. The Debugger uses existing source. After you run a session. When you run the Debugger. and session configuration properties. transformation and mapplet output data. When you run the Debugger. the Debugger moves in and out of running and paused states based on breakpoints and commands that you issue from the Mapping Designer. The Designer connects to the Integration Service. Modify data and breakpoints. The Designer saves mapping breakpoint and Debugger information in the workspace files. After initialization. The Debugger can be in one of the following states: • • • Initializing. The Integration Service encounters a break and pauses the Debugger. mapplets. Note: To enable multiple users to debug the same mapping at the same time. When the Debugger pauses. Running the Debugger: When you complete the Debugger Wizard. You can also modify breakpoint information. You can copy breakpoint information and the Debugger configuration to another mapping. The Integration Service processes the data. each user must configure different port numbers in the Tools > Options > Debug tab. the Integration Service starts the session and initializes the Debugger. If you want to run the Debugger from another Power Center Client machine.5. . you can copy the breakpoint information and the Debugger configuration to the other Power Center Client machine. Paused. and targets as the data moves through the pipeline. Running. The Debugger does not use the high availability functionality. you can modify data and see the effect on transformations. View target data for each target in the mapping. Mapplets that are not selected for debugging Input or input/output ports Output ports when the Debugger pauses on an error breakpoint Additionally. the Integration Service orders the target load on a row-by-row basis. The Mapping Designer displays windows and debug indicators that help you monitor the session: While you monitor the Debugger. you might want to change the transformation output data to see the effect on subsequent transformations or targets in the data flow. When you select this option. Monitor data that meets breakpoint conditions. The Integration Service writes messages to the following tabs in the Output window: Debugger tab. Displays messages from the Repository Service. You might also want to edit or add more breakpoint information to monitor the session more closely. When the Debugger pauses. the . Debug indicators on transformations help you follow breakpoints and data flow. RANKINDEX port. Restrictions You cannot change data for the following output ports: • • • • • • • • • • Normalizer transformation. Monitor target data on a row-by-row basis. you can monitor the following information: • • • • • • • • • • • Session status. All output ports. NewLookupRow port for a Lookup transformation configured to use a dynamic cache. Ports in output groups other than the current output group. Monitor the status of the session. Target data. Generated Keys and Generated Column ID ports. Breakpoints. you cannot change data associated with the following: Constraint-Based Loading: In the Workflow Manager. For every row generated by an active source. Sequence Generator transformation. Session Log tab. Output window. you can view transformation data and row information in the Instance window. The debug log displays in the Debugger tab. Data movement. CURRVAL and NEXTVAL ports. Java transformation. Lookup transformation. Instance window. Custom transformation.Monitoring the Debugger : When you run the Debugger. Notifications tab. Rank transformation. Target window. Debug indicators. you can specify constraint-based loading for a session. Ports in output groups other than the current output group. The session log displays in the Session Log tab. Router transformation. Monitor data as it moves through transformations. then to any foreign key tables. Constraint-based loading does not affect the target load ordering of the mapping. You cannot use updates with constraint based loading. and targets linked together in a mapping. If the tables with the primary key-foreign key relationship are in different target connection groups. Active Source: When target tables receive rows from different active sources. It reverts to a normal load. Similarly. Choose normal mode for the target load type for all targets in the session properties. The Integration Service cannot enforce constraint-based loading for these tables. complete the following tasks: • • • • • Verify all targets are in the same target load order group and receive data from the same active source. When the mapping contains Update Strategy transformations and you need to load data to a primary key table first. you have one target containing a primary key and a foreign key related to the primary key in a second target. The third pipeline contains a source. the Integration Service performs constraint-based loading: loading the primary key table first. Use this option when you insert into the target. For example. Target connection groups. split the mapping using one of the following options: • • Load primary key table in one mapping and dependent tables in another mapping. a mapping contains three distinct pipelines. Example The following mapping is configured to perform constraint-based loading: . The first two contain a source. If you want to specify constraint-based loading for multiple targets that receive data from the same active source. Target tables must have key relationships. Key relationships. Related target tables must have the same active source. the Integration Service does not perform constraint-based loading. Key Relationships: When target tables have no key relationships. A target load order group is a collection of source qualifiers. To verify that all targets are in the same target connection group. You might get inconsistent data if you select a different Treat Source Rows As option and you configure the session for constraint-based loading. Treat Rows as Insert: Use constraint-based loading when the session option Treat Source Rows As is set to insert. Normalizer. Use constraint-based loading to load the primary table. Constraint-based loading depends on the following requirements: • • • • Active source. and two targets. For example. you must verify the tables are in the same target connection group. the Integration Service reverts to a normal load. Target load ordering defines the order the Integration Service reads the sources in each target load order group in the mapping. the Integration Service cannot enforce constraint-based loading when you run the workflow.Integration Service loads the corresponding transformed row first to the primary key table. Use the default partition properties and do not add partitions or partition points. Define the same target type for all targets in the session properties. Targets must be in one target connection group. when target tables have circular key relationships. and target. Constraint based loading establishes the order in which the Integration Service loads individual targets within a set of targets receiving data from a single source qualifier. Define the same database connection name for all targets in the session properties. then the foreign key table. transformations. but loads all other targets in the session using constraint-based loading when possible. the Integration Service reverts to normal loading for those tables. Perform inserts in one mapping and updates in another mapping. Treat rows as insert. The second target also contains a foreign key that references the primary key in the first target. Since these two targets receive data from different active sources. source qualifier. Since these two targets share a single active source (the Normalizer). the Integration Service reverts to normal loading for both targets. Target Connection Groups: The Integration Service enforces constraint-based loading for targets in the same target connection group. T_3. 2. In the Designer. To set the target load order. but since T_2 and T_3 have no dependencies. T_4 The Integration Service loads T_1 first because it has no foreign key dependencies and contains a primary key referenced by T_2 and T_3. The Integration Service loads T_4 last. . the Integration Service orders the target load on a row-by-row basis. and then the second target load order group. If T_6 has a foreign key that references a primary key in T_5. The Integration Service includes T_5 and T_6 in a different target connection group because they are in a different target load order group from the first four targets. select Constraint Based Load Ordering. When it processes the second target load order group. you can set the order in which the Integration Service sends rows to targets in different target load order groups in a mapping. the Integration Service begins reading source B. SQ_ITEMS. transformations. To enable constraint-based loading: 1. Target Load Order When you use a mapplet in a mapping. or updating tables that have the primary key and foreign key constraints. and T_ITEMS. 3.After loading the first set of targets. To set the target load order: 1. 2. it reads data from both sources at the same time. The Target Load Plan dialog box lists all Source Qualifier transformations in the mapping and the targets that receive data from each source qualifier. T_1 2. the Mapping Designer lets you set the target load plan for sources within the mapplet. since T_5 and T_6 receive data from a single active source. deleting.In the first pipeline. target T_1 has a primary key. The Integration Service reads sources in a target load order group concurrently. Since these tables receive records from a single active source. the Integration Service loads rows to the tables in the following order: • • T_5 T_6 T_1. Click the Config Object tab. T_2. T_3 has a primary key that T_4 references as a foreign key. Create a mapping that contains multiple target load order groups. and targets linked together in a mapping. Click OK. because it has a foreign key that references a primary key in T_3. including the TOTAL_ORDERS target. In the Advanced settings. A target load order group is the collection of source qualifiers. Click Mappings > Target Load Plan. create one source qualifier for each target within a mapping. and T_4 are in one target connection group if you use the same database connection for each target. choose Insert for the Treat Source Rows As property. Setting the Target Load Order You can configure the target load order for a mapping containing any type of target definition. SQ_A. The second target load order group includes all other objects in the mapping. You can set the target load order if you want to maintain referential integrity when inserting. The Integration Service processes the first target load order group. In the General Options settings of the Properties tab. you then determine in which order the Integration Service reads each source in the mapping. the Aggregator AGGTRANS. and it processes target load order groups sequentially. and you use the default partition properties. The Integration Service then loads T_2 and T_3. the first target load order group includes ITEMS. the Integration Service reverts to a normal load for both targets. T_2 and T_3 contain foreign keys referencing the T1 primary key. the Integration Service loads rows to the target in the following order: 1. To specify the order in which the Integration Service sends data to targets. 3. The following figure shows two target load order groups in one mapping: In this mapping. If there are no key relationships between T_5 and T_6. T_2 and T_3 (in no particular order) 3. Enabling Constraint-Based Loading: When you enable constraint-based loading. T_5 and T_6 are in another target connection group together if you use the same database connection for each target and you use the default partition properties. they are not loaded in any particular order. When the Integration Service needs an initial value. the Integration Service uses a default value based on the data type of the parameter or variable. Instead of manually entering a session override to filter source data each time we run the session. the first time the Integration Service runs the session. the Integration Service sets $$IncludeDateTime to 8/2/2004. create a filter to read only rows whose transaction date equals $$IncludeDateTime. Click the Up and Down buttons to move the source qualifier within the load order. we have a source table containing time stamped transactions and we want to evaluate the transactions on a daily basis. we define a value for the mapping parameter or variable before we run the session. We can override a saved value with the parameter file. it reads only rows from August 2. 5.4. . user-defined join. mapping variables are values that can change between sessions. first we declare the mapping parameter or variable for use in each mapplet or mapping. In the source qualifier. The Integration Service looks for the start value in the following order: 1. Unlike mapping parameters. Advanced Concepts MAPPING PARAMETERS & VARIABLES Mapping parameters and variables represent values in mappings and mapplets. MAPPING VARIABLES • • • • We might use a mapping variable to perform an incremental read of the source. we can create a mapping variable. When a session fails to complete. • • • After we create a parameter. $$IncludeDateTime. When a session starts. The Integration Service saves the latest value of a mapping variable to the repository at the end of each successful session. Data ->Default Value Numeric ->0 String ->Empty String Date time ->1/1/1 Variable Values: Start value and current value of a mapping variable Start Value: The start value is the value of the variable at the start of the session. it appears in the Expression Editor. A mapping parameter retains the same value throughout the entire session. use a variable function to set the variable value to increment one day each time the session runs. Default value Current Value: The current value is the value of the variable as the session progresses. Click OK. 6. We can also use parameters in a source qualifier filter. Used in following transformations: • • • • Expression Filter Router Update Strategy Initial and Default Value: When we declare a mapping parameter or variable in a mapping or a mapplet. Value saved in the repository 3. the current value of a variable is the same as the start value. or extract override. Example: When we want to extract records of a particular month during ETL process. and we did not declare an initial value for the parameter or variable. The next time it runs the session. Initial value 4. we can enter an initial value. Repeat steps 3 to 4 for other source qualifiers you want to reorder. such as: TIMESTAMP = $$IncludeDateTime In the mapping. Select a source qualifier from the list. MAPPING PARAMETERS • • A mapping parameter represents a constant value that we can define before running a session. Value in parameter file 2. it reads only rows dated 8/1/2004. The final current value for a variable is saved to the repository at the end of a successful session. We can then use the parameter in any expression in the mapplet or mapping. It saves 8/2/2004 to the repository at the end of the session. During the session. When we use a mapping parameter or variable in a mapping. the Integration Service does not update the value of the variable in the repository. We can also clear all saved values for the session in the Workflow Manager. For example. Then. If we set the initial value of $$IncludeDateTime to 8/1/2004. 2004. and in the Expression Editor of reusable transformations. we will create a Mapping Parameter of data type and use it in query to compare it with the timestamp field in SQL override. 10. SetCountVariable: Increments the variable value by one. it saves a final value to the repository. 6. Select Aggregation type for mapping variables. the start value of the variable is saved to the repository. Open folder where we want to create the mapping. Click ok. 11. 5. It adds one to the variable value when a row is marked for insertion. ENAME. Create variable $$var_max of MAX aggregation type and initial value 1500. delete. 3. and subtracts one when the row is Marked for deletion. click Mapplet > Parameters and Variables. we need to configure the Data type and aggregation type for the variable. Variable Functions Variable functions determine how the Integration Service calculates the current value of a mapping variable in a pipeline. Open the folder where we want to create parameter or variable. TOTAL_SAL. Creating Mapping 1. HIREDATE. At the end of a session. Max: All transformation data types except binary data type are valid. Aggregation type set to Max. Variable Data type and Aggregation Type When we declare a mapping variable in a mapping. -or. COUNT is visible when datatype is INT or SMALLINT. Create variable $$var_set of MAX aggregation type. Min: All transformation data types except binary data type are valid. It ignores rows marked for update. Create a target table MP_MV_EXAMPLE having columns: EMPNO. Do not remove $$ from name. 4. or reject. MAX_VAR. COMM and DEPTNO to Expression. Example: Use of Mapping of Mapping Parameters and Variables • • • • • EMP will be source table. SetMinVariable: Sets the variable to the minimum value of a group of values. it compares the final current value of the variable to the start value of the variable. 5. click Mappings > Parameters and Variables. Aggregation types are: • • • Count: Integer and small integer data types are valid only. COUNT_VAR and SET_VAR. 3. delete. The IS uses the aggregate type of a Mapping variable to determine the final current value of the mapping variable. 7. Select Type and Data type. Click Mapping-> Create-> Give name. 2. 6. Create shortcuts as necessary. Create variable $$var_min of MIN aggregation type and initial value 1500. Drag EMPNO. Enter name. 9. Creating Mapping Parameters and Variables 1. ENAME. Ex: m_mp_mv_example 4. TOTAL_SAL = SAL+ COMM + $$BONUS (Bonus is mapping parameter that changes every month) SET_VAR: We will be added one month to the HIREDATE of every employee. SAL. Aggregation type set to Count. In the Mapping Designer. 8. . Based on the aggregate type of the variable. SetMaxVariable: Sets the variable to the maximum value of a group of values. DEPTNO. Give Initial Value. MIN_VAR. SetVariable: Sets the variable to the configured value.Note: If a variable function is not used to calculate the current value of a mapping variable. Drag EMP and target table.In the Mapplet Designer. Aggregation type set to Min. It ignores rows marked for update or reject. Transformation -> Create -> Select Expression for list -> Create –> Done. Create Parameter $$Bonus and Give initial value as 200. Click the add button. Click Tools -> Mapping Designer. Create variable $$var_count of COUNT aggregation type and initial value 0. or reject. It ignores rows marked for update. 2. Open Expression editor for out_max_var. out_COUNT_VAR and out_SET_VAR. Link all ports from expression to target and Validate Mapping and Save it. SAL + COMM + $$Bonus 14. Open Expression editor for out_min_var and write the following expression: SETMINVARIABLE($$var_min. Open Expression editor for out_set_var and write the following expression: SETVARIABLE($$var_set. select variable tab and select the parameter from mapping parameter. out_MAX_VAR.1)). . 13. Validate the expression. 15. 18. Do the same as we did earlier for SAL+ COMM. Validate.'MM'. To add $$BONUS to it. 20.12. Select $$var_max from variable tab and SAL from ports tab as shown below. Open Expression editor for out_count_var and write the following expression: SETCOUNTVARIABLE($$var_count). Validate the expression. Expression Transformation below: 21.SAL) 17. out_MIN_VAR. Click OK.SETMAXVARIABLE($$var_max. Select the variable function SETMAXVARIABLE from left side pane. Open expression editor for TOTAL_SAL.ADD_TO_DATE(HIREDATE. 19.SAL). Create 5 output ports out_ TOTAL_SAL. 22. or session. but we cannot use workflow variables from the parent workflow in a worklet. Run workflow and see result.txt [Practice. Parameter files provide flexibility to change these variables each time we run a workflow or session. Session parameter: Defines a value that can change from session to session. or session to which we want to assign parameters or variables. workflow. We can create a parameter file using a text editor such as WordPad or Notepad. Worklet variable: References values and records information in a worklet. Use predefined worklet variables in a parent workflow. • • • Make session and workflow. PARAMETER FILE • • • • • • • • A parameter file is a list of parameters and associated values for a workflow. Workflow variable: References values and records information in a workflow. worklet. Enter the parameter file name and directory in the workflow or session properties. Sample Parameter File for Our example: In the parameter file. such as a database connection or file name. folder and session names are case sensitive. Integration Service process. We can create multiple parameter files and change the file we use for a session or workflow. Create a text file in notepad with name Para_File. See mapping picture on next page.ST:s_m_MP_MV_Example] $$Bonus=1000 $$var_max=500 $$var_min=1200 $$var_count=0 CONFIGURING PARAMTER FILE . The heading identifies the Integration Service. Give connection information for source and target table. Mapping parameter and Mapping variable A parameter file contains the following types of parameters and variables: USING A PARAMETER FILE Parameter files contain several sections preceded by a heading. worklet. Use Joiner transformation as described earlier to join them. Give the output to mapplet out transformation. Click OK. 3. 8. • • We must use Mapplet Output transformation to store mapplet output. Click the Properties tab and open the General Options settings. We create a mapplet using a stored procedure to create Primary key for target table. 3. So instead of making 5 transformations in every 10 mapping. Enter the parameter directory and name in the Parameter Filename field. Each Output transformation in a mapplet represents one output group in a mapplet. Click OK. MAPPLETS • • • A mapplet is a reusable object that we create in the Mapplet Designer. Mapplet Input: Mapplet input can originate from a source definition and/or from an Input transformation in the mapplet. Created in Mapplet Designer in Designer Tool. Steps: 1. To enter a parameter file in the workflow properties: 1. Open folder where we want to create the mapping. 6. We need to use same set of 5 transformations in say 10 mappings. Mapplets help simplify mappings in the following ways: • • • • • Include source definitions: Use multiple source definitions and source qualifiers to provide source data for a mapping. Open a session in the Workflow Manager. A mapplet must contain at least one Output transformation with at least one connected port in the mapplet. 4. Example: D:\Files\Para_File. Click the Properties tab. 10. · EMP and DEPT will be source tables. 2. · Output will be given to transformation Mapplet_Out. Pass data to multiple transformations: We can create a mapplet to feed data to multiple transformations. Enter the parameter directory and name in the Parameter Filename field. 5. Open a Workflow in the Workflow Manager. To enter a parameter file in the session properties: 1. Example: To create a surrogate key in target. Mapplet -> Validate 11. It contains a set of transformations and lets us reuse that transformation logic in multiple mappings.txt or $PMSourceFileDir\Para_File. Pass all ports from expression to Mapplet output. Example1: We will join EMP and DEPT table. 3. Contain unused ports: We do not have to connect all mapplet input and output ports in a mapping. Now Transformation -> Create -> Select Mapplet Out from list –> Create -> Give name and then done. Click Tools -> Mapplet Designer. 9. Use of Mapplet Input transformation is optional. We can create multiple pipelines in a mapplet. Then calculate total salary. 5. We give target table name and key column name as input to mapplet and get the Surrogate key as output. 4. Ex: mplt_example1 4. Accept data from sources in a mapping Include multiple transformations: As many transformations as we need. • • We use Mapplet Input transformation to give input to mapplet. we create a mapplet of these 5 transformations. Click Mapplets-> Create-> Give name. Transformation -> Create -> Select Expression for list -> Create -> Done 7. 2.txt 5. Mapplet Output: The output of a mapplet is not connected to any target table. Now we use this mapplet in all 10 mappings. Pass all ports from joiner to expression and then calculate total salary as described in expression transformation. 2. Repository -> Save Use of mapplet in mapping: . Click Workflows > Edit. Drag EMP and DEPT table.We can specify the parameter file name and directory in the workflow or session properties. When we use the mapplet in a mapping. When the Integration Service runs the session. 5. Drag all ports from mplt_example1 to filter and give filter condition. 2. IS sets partition points at various transformations in the pipeline. . 3.• • • We can mapplet in mapping by just dragging the mapplet from mapplet folder on left pane as we drag source and target tables. Transformation. Number of Partitions • we can define up to 64 partitions at any partition point in a pipeline. Partition points mark thread boundaries and divide the pipeline into stages. PARTITIONING ATTRIBUTES 1. and load for each partition in parallel. Give connection information for mapplet source tables. · Create target table same as Mapplet_out transformation as in picture above. Click Tools -> Mapping Designer. Ex: m_mplt_example1 4. Give connection information for target table. A pipeline consists of a source qualifier and all the transformations and Targets that receive data from that source qualifier. Transformation -> Create -> Select Filter for list -> Create -> Done. Creating Mapping 1. or Writer thread. 7. We can add more transformations after filter if needed. 8. Partition points • • • By default. These are referred to as the mapplet input and mapplet output ports. Click Mapping-> Create-> Give name. • • • • • • Make session and workflow. Run workflow and see result. · mplt_example1 will be source. transformation. Connect all ports from filter to target. Making a mapping: We will use mplt_example1. 2. 6. Validate mapping and Save it. By default. Open folder where we want to create the mapping. The number of partitions in any pipeline stage equals the number of Threads in the stage. PARTITIONING A partition is a pipeline stage that executes in a single reader. it can achieve higher Performance by partitioning the pipeline and performing the extract. Drag mplt_Example1 and target table. Make sure to give correct connection information in session. the Integration Service creates one partition in every pipeline stage. A stage is a section of a pipeline between any two partition points. the mapplet object displays only the ports from the Input and Output transformations. and then create a filter transformation to filter records whose Total Salary is >= 1500. the Workflow Manager increases or decreases the number of partitions at all Partition points in the pipeline. 3 rd Session partition will receive Data from the remaining 1 DB partition. All rows in a single partition stay in that partition after crossing a pass-Through partition point.• • • When we increase or decrease the number of partitions at any partition point. Use any number of pipeline partitions and any number of database partitions. the Integration Service processes data without Redistributing rows among partitions. The partition type controls how the Integration Service distributes data among partitions at partition points. If we have the Partitioning option. Partition types • • • The Integration Service creates a default partition type at each partition point. PARTITIONING TYPES 1. one of the session partitions receives no data. we can change the partition type. Partitioning a Source Qualifier with Multiple Sources Tables The Integration Service creates SQL queries for database partitions based on the Number of partitions in the database table with the most partitions. one database connection will be used. We can improve performance when the number of pipeline partitions equals the number of database partitions. Database Partitioning Partition Type • • • Database Partitioning with One Source When we use database partitioning with a source qualifier with one source. the Integration Service distributes rows of data evenly to all partitions. the Integration Service generates SQL queries for each database partition and distributes the data from the database partitions among the session partitions Equally. 2. If the session has three partitions and the database table has two partitions. Use pass-through partitioning when we want to increase data throughput. Use database partitioning for Oracle and IBM DB2 sources and IBM DB2 targets only. increasing the number of partitions or partition points increases the number of threads. The number of partitions we create equals the number of connections to the source or target. Use round-robin partitioning when we need to distribute rows evenly and do not need to group data among partitions. Thus four DB partitions used. Each partition processes approximately the same number of rows. Round Robin Partition Type • • • In round-robin partitioning. 1 st and 2nd session partitions will receive data from 2 database partitions each. 3. but we do not want to increase the number of partitions. For one partition. In pass-through partitioning. when a session has three partitions and the database has five partitions. . For example. This option is purchased separately. Pass-Through Partition Type • • • 3. and Unsorted Aggregator transformations to ensure that rows are grouped Properly before they enter these transformations. . Example: Customer 1-100 in one partition. Steps: 1. Validate the expression using the Validate button. The Expression Editor provides predefined workflow variables. In the Workflow Designer workspace. Hash Auto-Keys Partition Type • • The Integration Service uses all grouped or sorted ports as a compound Partition key. Joiner. In the Expression Editor. enter the link condition. The Expression Editor appears. the Integration Service runs the next task in the workflow by default.4. 2. we define the number of ports to generate the partition key. The Integration Service uses a hash function to group rows of data among Partitions. If we do not specify conditions for each link. Key range Partition Type WORKING WITH LINKS • • • Use links to connect each workflow task. We specify one or more ports to form a compound partition key. Valid Workflow : Example of loop: Specifying Link Conditions: • • • Once we create links between tasks. double-click the link you want to specify. 3. Use hash auto-keys partitioning at or before Rank. Use key range partitioning where the sources or targets in the pipeline are Partitioned by key range. user-defined workflow variables. Hash User-Keys Partition Type • • • • • • • 6. Each link in the workflow can run only once. 101-200 in another and so on. Sorter. The Integration Service passes data to each partition depending on the Ranges we specify for each port. The Workflow Manager does not allow us to use links to create loops in the workflow. We Define the range for each partition. 5. we choose the ports that define the partition key . we can specify conditions for each link to determine the order of execution in the workflow. We can specify conditions with links to create branches in the workflow. 4. Use predefined or user-defined workflow variables in the link condition. variable functions. and Boolean and arithmetic operators. Click Apply and OK. all workflows that use the deleted scheduler becomes invalid. Click Add to add a new scheduler. Run Continuously 3. Configure the scheduler settings in the Scheduler tab. There are 3 run options: 1. We can change the schedule settings by editing the scheduler. Configuring Scheduler Settings Configure the Schedule tab of the scheduler to set run options. Scheduler can be non-reusable or reusable. repeat at a given time or interval. Run on Demand 2. We remove the workflow from the schedule The Integration Service is running in safe mode For each folder. 6. we must edit them and replace the missing scheduler. If we delete a folder. 5. enter a name for the scheduler. it reschedules all workflows. The Integration Service does not run the workflow if: The prior workflow run fails. Creating a Reusable Scheduler Steps: 1. schedule options. By default. the workflow runs on demand. In the General tab. Use a reusable scheduler so we do not need to configure the same set of scheduling settings in each workflow. 2.Using the Expression Editor: The Workflow Manager provides an Expression Editor for any expressions in the workflow. the Integration Service removes workflows from the schedule. When we delete a reusable scheduler. Run on Server initialization . start options. and end options for the schedule. In the Workflow Designer. We can enter expressions using the Expression Editor for the following: • • • Link conditions Decision task Assignment task SCHEDULERS We can schedule a workflow to run continuously. 3. The Integration Service runs a scheduled workflow as configured. click Workflows > Schedulers. To make the workflows valid. the Workflow Manager lets us create reusable schedulers so we can reuse the same set of scheduling settings for workflows in the folder. the Integration Service reschedules the workflow according to the new settings. 4. Open the folder where we want to create the scheduler. The Workflow Manager marks a workflow invalid if we delete the scheduler associated with the workflow. • • • • • • • • • • • • A scheduler is a repository object that contains a set of schedule settings. or we can manually start a workflow. If we choose a different Integration Service for the workflow or restart the Integration Service. If we change schedule settings. The Integration Service then starts the next run of the workflow according to settings in Schedule Options. Click the right side of the Scheduler field to edit scheduling settings for the non. Customized Repeat: Integration Service runs the workflow on the dates and times specified in the Repeat dialog box. Note: If we do not have a reusable scheduler in the folder. choose a reusable scheduler from the Scheduler 8. If we select Reusable. 3. Start options for Run on Server initialization: · Start Date · Start Time End options for Run on Server initialization: • • • • End on: IS stops scheduling the workflow in the selected date. we must 5. Click Ok. Run on Server initialization Integration Service runs the workflow as soon as the service is initialized. create one before we choose Reusable. Run Continuously: Integration Service runs the workflow as soon as the service initializes. The Integration Service then starts the next run of the workflow as soon as it finishes the previous run.reusable scheduler 7. End After: IS stops scheduling the workflow after the set number of workflow runs. Run every: Run the workflow at regular intervals. Schedule options for Run on Server initialization: • • • Run Once: To run the workflow just once. Points to Ponder : . open the workflow. 4. Run on Demand: Integration Service runs the workflow when we start the workflow manually. choose Non-reusable. Forever: IS schedules the workflow as long as the workflow does not fail. 3. 9. as configured. Click Workflows > Edit. In the Workflow Designer. Creating a Non-Reusable Scheduler 1.1. In the Scheduler tab. Browser dialog box. Select Reusable if we want to select an existing reusable scheduler for the workflow. 2. 2. 6. See On Success Email Option there and configure it. Enter the fully qualified email address of the mail recipient in the Email User Name field. EMAIL TASK • • Steps: 1. We can set the option to send email on success or failure in components tab of a session task. 5. 2. 4. Select an Email task and enter a name for the task.• • To remove a workflow from its schedule. Edit Session task and go to Components tab. Double-click the Email task in the workspace. The Power Center Server creates several files and in-memory caches depending on the transformations and options used in the session. 5. To run a session. In Value. We can run the Session tasks sequentially or concurrently. Created by Administrator usually and we just drag and use it in our mapping. 6. 8. Click the Properties tab. We can create reusable tasks in the Task Developer. We can run as many sessions in a workflow as we need. 7. Click Create. right-click the workflow in the Navigator window and choose Schedule Workflow. 4. . Enter the subject of the email in the Email Subject field. Example: To send an email when a session completes: Steps: 1. Click OK twice to save your changes. 6. Types of tasks: Task Type Session Email Command Event-Raise Event-Wait Timer Decision Assignment Control SESSION TASK • • • • Tool where task can Reusable or not be created Task Developer Workflow Designer Worklet Designer Workflow Designer Worklet Designer Yes Yes Yes No No No No No No A session is a set of instructions that tells the Power Center Server how and when to move data from sources to targets. Create a workflow wf_sample_email 2. In the Task Developer or Workflow Designer. Validate workflow and Repository -> Save • • We can also drag the email task and use as per need. Click the Open button in the Email Text field to open the Email Editor. The Workflow Manager provides an Email task that allows us to send email during a workflow. depending on our needs. choose Tasks-Create. right-click the workflow in the Navigator window and choose Unscheduled Workflow. select the email task to be used. you can leave this field blank. In Type select reusable or Non-reusable. Drag any session task to workspace. To reschedule a workflow on its original schedule. The Edit Tasks dialog box appears. 3. Click Apply -> Ok. 7. WORKING WITH TASKS –Part 1 The Workflow Manager contains many types of tasks to help you build workflows and worklets. Or. Click Done. 8. 9. 3. we must first create a workflow to contain the Session task. Click OK to close the Command Editor. In the Commands tab. Click Create. Steps to create the workflow using command task: 1. Command: COPY D:\sample. or archive target files. Standalone Command task: We can use a Command task anywhere in the workflow or worklet to run shell commands.txt from D drive to E. Double click link between Session and Command and give condition in editor as 6. This is done in COMPONENTS TAB of a session. Pre. 2. In the Name field. Workflow -> Create -> Give name and click ok. 2. Click OK. Steps for creating workflow: .and post-session shell command: We can call a Command task as the pre. 3. 6. enter a name for the new command. Click to Add button to add events and give the names as per need.txt E:\ in windows Steps for creating command task: 1. EVENT WAIT: Event-Wait task waits for a file watcher event or user defined event to occur before executing the next session in the workflow. Open Workflow Designer. Workflow-> Validate 8. 8. 3. Repository –> Save WORKING WITH EVENT TASKS We can define events in the workflow to specify the sequence of task execution. copy a file. Then click done. 2. 4. In the Task Developer or Workflow Designer.txt file is present in D:\FILES folder. Example1: Use an event wait task and make sure that session s_filter_example runs when abc. click the Edit button to open the Command Editor.or post-session shell command for a Session task. We create events and then raise them as per need. 4. 7. Click Workflow-> Edit -> Events tab. Double-click the Command task. 10. Link Start to Session task and Session to Command Task. Select the Value and Type option as we did in Email task.Status=SUCCEEDED 7. we can specify shell commands in the Command task to delete reject files. click the Add button to add a command. 5. Create a task using the above steps to copy a file in Task Developer. User-defined event: A user-defined event is a sequence of tasks in the Workflow. Enter a name for the Command task. choose Tasks-Create. Start is displayed. 3. $S_M_FILTER_EXAMPLE. We use this task to raise a user defined event. We can run it in Pre-Session Command or Post Session Success Command or Post Session Failure Command. 9. For example. Steps for creating User Defined Event: 1. This event Waits for a specified file to arrive at a given location. 5. Types of Events: • • Pre-defined event: A pre-defined event is a file-watch event. Ways of using command task: 1. Types of Events Tasks: • • EVENT RAISE: Event-Raise task represents a user-defined event.COMMAND TASK The Command task allows us to specify one or more shell commands in UNIX or DOS commands in Windows to run during the workflow. 11. Drag session say s_m_Filter_example and command task. Select Command Task for the task type. In the Command field. Repeat steps 5-9 to add more commands in the task. Open any workflow where we want to create an event. Validate the workflow and Save it. Enter only one command in the Command Editor. 2. Click Apply -> Ok. Example: to copy a file sample. Go to commands tab. 4. Example: D:\FILES\abc. 7. Right click on event wait task and click EDIT -> EVENTS tab. Click Tasks -> Create -> Select EVENT RAISE from list. WORKING WITH TASKS –Part 2 TIMER TASK The Timer task allows us to specify the period of time to wait before the Power Center Server runs the next task in the workflow. Example 2: Raise a user defined event when session s_m_filter_example succeeds. Right click ER_Example -> EDIT -> Properties Tab -> Open Value for User Defined Event and Select EVENT1 from the list displayed.Status=SUCCEEDED 8. Workflow -> Create -> Give name wf_event_wait_event_raise -> Click ok. 16. the parent workflow.tct 7. 6. ER_Example. 2. Drag s_filter_example to workspace and link it to event wait task. Workflow -> Edit -> Events Tab and add events EVENT1 there. Click link between ER_Example and s_m_filter_example and give the condition $S_M_FILTER_EXAMPLE. Give name. 13. 14. 10. Repository -> Save.Link ER_Example to s_m_filter_example. Apply -> OK. 2. 4. Workflow -> Create -> Give name wf_event_wait_file_watch -> Click ok. Example: Run session s_m_filter_example relative to 1 min after the timer task. Link EW_WAIT to START task. 5. Drag s_m_filter_example and link it to START task. Click create and done. In the blank space. Click Create and then done. Relative time: We instruct the Power Center Server to wait for a specified period of time after the Timer task. give directory and filename to watch. Link Start to Event Wait task. The next task in workflow will run as per the date and time specified. Select User Defined there. 3. Apply -> OK. 12. Right click EW_WAIT -> EDIT-> EVENTS tab. Give name 5. 11. 4. . 9. Capture this event in event wait task and run session S_M_TOTAL_SAL_EXAMPLE Steps for creating workflow: 1. Select the Event1 by clicking Browse Events button. or the top-level workflow starts. Click Create and then done. Drag S_M_TOTAL_SAL_EXAMPLE and link it to EW_WAIT. Select Pre Defined option there. The Timer task has two types of settings: • • Absolute time: We specify the exact date and time or we can choose a user-defined workflow variable to specify the exact time. 3. 6.1. Workflow validate and Repository Save. Click Tasks -> Create -> Select EVENT WAIT from list. Mapping -> Validate 15. Give name EW_WAIT. Run workflow and see. Task -> Create -> Select Event Wait. Give name TIMER_Example. Double click link between S_m_sample_mapping_EMP & DECISION_Example & give the condition: $DECISION_Example. Apply and click OK. 4. Validate & click OK. 7. Link DECISION_Example to both s_m_filter_example and S_M_TOTAL_SAL_EXAMPLE. Double click link between Command task and DECISION_Example and give the condition: $DECISION_Example. 6. 8. Drag s_m_filter_example and link it to TIMER_Example. Set ‘Treat Input Links As’ to OR. Workflow-> Validate and Repository -> Save.Condition = 0. Workflow -> Create -> Give name wf_timer_task_example -> Click ok. Select Relative Time Option and Give 1 min and Select ‘From start time of this task’ Option.condition that represents the result of the decision condition. Click Create and then done. Give name DECISION_Example. 12. Workflow -> Create -> Give name wf_decision_task_example -> Click ok. Default is AND. 4. We can specify one decision condition per Decision task. Validate the condition -> Click Apply -> OK. Drag s_m_filter_example and S_M_TOTAL_SAL_EXAMPLE to workspace and link both of them to START task. If any of s_m_filter_example or S_M_TOTAL_SAL_EXAMPLE fails then S_m_sample_mapping_EMP should run. DECISION TASK • • • • The Decision task allows us to enter a condition that determines the execution of the workflow. The Decision task has a pre-defined variable called $Decision_task_name. The Power Center Server evaluates the condition in the Decision task and sets the pre-defined condition variable to True (1) or False (0). 3.Condition = 1. 11. Now edit decision task again and go to PROPERTIES Tab. Right click DECISION_Example-> EDIT -> GENERAL tab. Example: Command Task should run only if either s_m_filter_example or S_M_TOTAL_SAL_EXAMPLE succeeds. 5. Workflow Validate and repository Save. 5. 10. Apply -> OK. Click Tasks -> Create -> Select TIMER from list. Link TIMER_Example to START task. Open the Expression editor by clicking the VALUE section of Decision Name attribute and enter the following condition: $S_M_FILTER_EXAMPLE. Steps for creating workflow: 1. Right click TIMER_Example-> EDIT -> TIMER tab. 3. Click Tasks -> Create -> Select DECISION from list. 2. 9. 8.Steps for creating workflow: 1.Status = SUCCEEDED OR $S_M_TOTAL_SAL_EXAMPLE. 6. Drag command task and S_m_sample_mapping_EMP task to workspace and link them to DECISION_Example task. Run workflow and see the result.Status = SUCCEEDED 7. Click Create and then done. similar to a link condition. Validate and click OK. . 2. A parent workflow or worklet is the workflow or worklet that contains the Control task. To use an Assignment task in the workflow. Marks the status of the WF or worklet that contains the Control task as failed. Run workflow and see the result. Click Apply and OK. Steps for creating workflow: 1. or fail the top-level workflow or the parent workflow based on an input link condition. 9. 4. Example: Drag any 3 sessions and if anyone fails. Give name cntr_task. 5.CONTROL TASK • • • We can use the Control task to stop. We give the condition to the link connected to Control Task. 6. 3. Drag any 3 sessions to workspace and link all of them to START task. See Workflow variable topic to add user defined variables. abort. Click Tasks -> Create -> Select CONTROL from list.Status = SUCCEEDED. first create and add the . Repeat above step for remaining 2 sessions also. then Abort the top level workflow. Link all sessions to the control task cntr_task. Click Create and then done. Aborts the WF or worklet that contains the Control task. Workflow’ for Control Option. Default is AND. ASSIGNMENT TASK • • • The Assignment task allows us to assign a value to a user-defined workflow variable. Abort Top-Level WF Aborts the workflow that is running. Control Option Fail Me Fail Parent Stop Parent Abort Parent Fail Top-Level WF Stop Top-Level WF Stops the workflow that is running. Set ‘Treat Input Links As’ to OR. Workflow -> Create -> Give name wf_control_task_example -> Click ok. Right click cntr_task-> EDIT -> GENERAL tab. Go to PROPERTIES tab of cntr_task and select the value ‘Fail top level 10. 12. Fails the workflow that is running. Double click link between cntr_task and any session say s_m_filter_example and give the condition: $S_M_FILTER_EXAMPLE. Workflow Validate and repository Save. 7. 8. 2. Stops the WF or worklet that contains the Control task. Description Fails the control task. 11. Edit Workflow and add user defined variables. Now we need to transfer all the 10 files to same target. Change the Filename and Directory to give information of second file. 5. Names of files are say EMP1. Enter the value or expression you want to assign. Then configure the Assignment task to assign values or expressions to userdefined variables. Now make a parameter file and give the value of $InputFileName.txt 5. 5. Validate Session 7. Then click Done. Now edit parameter file and give value of second file. Save it to repository and run. click Add to add an assignment. Click OK. 4. 10. Select the variable for which you want to assign a value. 6. Import one flat file definition and make the mapping as per need. 12. Make Workflow. Do same for remaining files. INDIRECT LOADING FOR FLAT FILES Suppose. This is a session parameter. select Indirect. 4. give the name and location of above created file. Now open session after workflow completes.txt E:\EMP2. All the flat files have same number of columns and data type. 7. In Source file type field. Click the Open button in the User Defined Variables field. Import one flat file definition and make the mapping as per need. 6. 2. Select Assignment Task for the task type.• • Assignment task to the workflow. 3. Enter a name for the Assignment task. Choose Tasks-Create. 5. Run workflow again. Now in session give the Source File name and Source File Directory location of one file. Now in session give the Source Directory location of the files. 2. Solution1: 1. 3. 8. 2. On the Expressions tab. Click OK. SCD – Type 1 . Now make a notepad file that contains the location and name of each 10 flat files. Click Apply. 2. Do the above for all 10 files. Solution2: 1. Sample: D:\EMP1. you have 10 flat files of same structure. Now make a session and in Source file name and Source File Directory location fields. 11. 3. Click the Edit button in the Expression field to open the Expression Editor. 7. EMP2 and so on. Make workflow and run.txt E:\FILES\DWH\EMP3. 4. Steps to create Assignment Task: 1. Repeat steps 7-10 to add more variable assignments as necessary. Open any workflow where we want to use Assignment task. Click Create. Import one flat file definition and make the mapping as per need. Solution3: 1.txt and so on 3. 9. Run workflow again. 4. Now in Fieldname use $InputFileName. Double-click the Assignment task to open the Edit Task dialog box. We cannot assign values to pre-defined workflow. $InputFileName=EMP1. Run the workflow 6. Go to the targets Menu and click on generate and execute to confirm the creation of the target tables. and therefore does not track historical data at all.Slowly Changing Dimensions (SCDs) are dimensions that have data that changes slowly. that might give misleading information. (Assuming you won't ever need to know how it used to be misspelled in the past. Dealing with these issues involves SCD management methodologies: Type 1: The Type 1 methodology overwrites old data with new data. such as the spelling of a name. for example. Creating sales reports seems simple enough. Source Table: (01-02-11) Target Table: (01-02-11) Emp no 101 102 103 104 • Ename A B C D Sal 1000 2500 3000 4000 Empno 101 102 103 104 Ename A B C D Sal 1000 2500 3000 4000 In the second Month we have one more employee added up to the table with the Ename D and salary of the Employee is changed to the 2500 instead of 2000. the joins will perform better on an integer than on a character string. However. Now imagine that this supplier moves their headquarters to Illinois. Step 1: Is to import Source Table and Target table. Import the source from the source analyzer. but that creates problems also. rather than changing on a timebased. you may have a dimension in your database that tracks the sales records of your company's salespeople. If the salesperson that was transferred used to work in a hot market where sales were easy.) Here is an example of a database table that keeps supplier information: Supplier_Key Supplier_Code Supplier_Name Supplier_State 123 ABC Acme Supply Co CA In this example. the surrogate key is not necessary. Explanation with an Example: Source Table: (01-01-11) Target Table: (01-01-11) Emp no 101 102 103 Ename A B C Sal 1000 2000 3000 Emp no 101 102 103 Ename A B C Sal 1000 2000 3000 The necessity of the lookup transformation is illustrated using the above source and target table. but if you use that to compare the performance of salesmen. and now works in a market where sales are infrequent. even if they are just as good. Technically. The updated table would simply overwrite this record: Supplier_Key Supplier_Code Supplier_Name Supplier_State 123 ABC Acme Supply Co IL The obvious disadvantage to this method of managing SCDs is that there is no historical record kept in the data warehouse. But an advantage to Type 1 SCDs is that they are very easy to maintain. You can't tell if your suppliers are tending to move to the Midwest. • • • • . How do you record such a change in your sales dimension? You could sum or average the sales by salesperson. In the same way as above create two target tables with the names emp_target1. Supplier_Code is the natural key and Supplier_Key is a surrogate key. Or you could create a second salesperson record and treat the transferred person as a new sales person. her totals will look much stronger than the other salespeople in her new region. emp_target2. Create a table by name emp_source with three columns as shown above in oracle. This is most appropriate when correcting certain types of data errors. since the table will be unique by the natural key (Supplier_Code). regular schedule For example. until a salesperson is transferred from one regional office to another. • In the Conditions tab (i) Click on Add a new condition (ii)Lookup Table Column should be Empno. Filter Transformation. • • The first thing that we are goanna do is to create a look up transformation and connect the Empno from the source qualifier to the transformation. Update Transformation. In the Properties tab (i) Lookup table name ->Emp_Target. delete or reject rows. Step 2: Design the mapping and apply the necessary transformation. update. The Input Port for the first column should be unchked where as the other ports like Output and lookup box should be checked.• The snap shot of the connections using different kinds of transformations are shown below. Transformation port should be Empno1 and Operator should ‘=’. Expression Transformation. Delete. In the Ports tab we should add a new column and name it as empno1 and this is column for which we are gonna connect from the Source Qualifier. . Update or reject the rows in to target table. (ii)Look up Policy on Multiple Mismatch -> use First Value. Necessity and the usage of all the transformations will be discussed in detail below. • Here in this transformation we are about to use four kinds of transformations namely Lookup transformation. Look up Transformation: The purpose of this transformation is to determine whether to insert. The snapshot of choosing the Target table is shown below. • • • • What Lookup transformation does in our mapping is it looks in to the target table (emp_table) and compares it with the Source Qualifier and determines whether to insert. (iii) Connection Information ->Oracle. For the newly created column only input and output boxes should be checked. Go to the Properties tab on the Edit transformation Filter Transformation: we are gonna have two filter transformations one to insert and other to update. Input à IsNull(EMPNO1) Output à iif(Not isnull (EMPNO1) and Decode(SAL. Sal from the expression transformation to both filter transformation. The Snap shot for the Edit transformation window is shown below. If there is any change in input data then filter transformation 2 forwards the complete input to the update strategy transformation 2 then it is gonna forward the updated input to the target table. • • • • (i) The value for the filter condition 1 is Insert.Expression Transformation: After we are done with the Lookup Transformation we are using an expression transformation to check whether we need to insert the records the same records or we need to update the records.1. Ename. Now double click on the Transformation and go to the Ports tab and create two new columns and name it as insert and update. . (ii) The value for the filter condition 1 is Update. • • The condition that we want to parse through our output data are listed below. • • Drag all the columns from both the source and the look up transformation and drop them all on to the Expression transformation. Later now connect the Empno. The steps to create an Expression Transformation are shown below. • • We are all done here .1.Click on apply and then OK.SAL1. Both these columns are gonna be our output data so we need to have check mark only in front of the Output check box. Connect the Insert column from the expression transformation to the insert column in the first filter transformation and in the same way we are gonna connect the update column in the expression transformation to the update column in the second filter. If there is no change in input data then filter transformation 1 forwards the complete input to update strategy transformation 1 and same output is gonna appear in the target table.0)=0.0) . • The Closer view of the filter Connection is shown below. Change Bulk to the Normal.. Step 3: Create the task and Run the work flow. delete. Run the work flow from task. Ename A B Sal 1000 2000 Source Table: (01-01-11) Emp no 101 102 . We are gonna use the SCD-2 style to extract and load the records in to target table. Now go to the Properties tab and the value for the update strategy expression is 1 (on the 2 nd update transformation).Update Strategy Transformation: Determines whether to insert. update or reject the rows.Ename. Now go to the Properties tab and the value for the update strategy expression is 0 (on the 1 st update transformation).. in the current month ie. Ename and Sal from the filter transformations and drop them on the respective Update Strategy Transformation.Sal). Don’t check the truncate table option. Step 4: Preview the Output in the target table. • • • • • • • Drag the respective Empno. We are all set here finally connect the outputs of the update transformations to the target table. Type 2 Let us drive the point home using a simple scenario.(01-01-2010) we are provided with an source table with the three columns and three rows in it like (EMpno. There is a new employee added and one change in the records in the month (01-02-2010). For eg. • The thing to be noticed here is if there is any update in the salary of any employee then the history of that employee is displayed with the current date as the start date and the previous date as the end date. . • • In The Target Table we are goanna add five columns (Skey. Step 2: Design the mapping and apply the necessary transformation. Necessity and the usage of all the transformations will be discussed in detail below. The snap shot of the connections using different kinds of transformations are shown below. Flag. Filter Transformation (2). Expression Transformation (3). Step 1: Is to import Source Table and Target table. Drag the Target table twice on to the mapping designer to facilitate insert or update process. Sequence Generator. Here in this transformation we are about to use four kinds of transformations namely Lookup transformation (1). • • • • • Create a table by name emp_source with three columns as shown above in oracle.103 C 3000 Target Table: (01-01-11) Skey 100 200 300 Source Table: (01-02-11) Emp no 101 102 103 104 Ename A B C D Sal 1000 2500 3000 4000 Emp no 101 102 103 Ename A B C Sal 1000 2000 3000 S-date 01-01-10 01-01-10 01-01-10 E-date Null Null Null Ver 1 1 1 Flag 1 1 1 Target Table: (01-02-11) Skey 100 200 300 201 400 Emp no 101 102 103 102 104 Ename A B C B D Sal 1000 2000 3000 2500 4000 S-date 01-02-10 01-02-10 01-02-10 01-02-10 01-02-10 E-date Null Null Null 01-01-10 Null Ver 1 1 1 2 1 Flag 1 1 1 0 1 In the second Month we have one more employee added up to the table with the Ename D and salary of the Employee is changed to the 2500 instead of 2000. Version. Import the source from the source analyzer. S_date . Go to the targets Menu and click on generate and execute to confirm the creation of the target tables. Look up Transformation: The purpose of this transformation is to Lookup on the target table and to compare the same with the Source using the Lookup Condition.E_Date). (ii)Look up Policy on Multiple Mismatch -> use Last Value.SAL1. Both these columns are goanna be our output data so we need to have unchecked input check box. Expression Transformation: After we are done with the Lookup Transformation we are using an expression transformation to find whether the data on the source table matches with the target table. Now double click on the Transformation and go to the Ports tab and create two new columns and name it as insert and update.0)=0. • We are all done here . The Snap shot for the Edit transformation window is shown below.1.0) . (iii) Connection Information ->Oracle. We specify the condition here whether to insert or to update the table. • The condition that we want to parse through our output data are listed below. • In the Conditions tab (i) Click on Add a new condition (ii)Lookup Table Column should be Empno. • • If there is no change in input data then filter transformation 1 forwards the complete input to Exp 1 and same output is goanna appear in the target table. Insert : IsNull(EmpNO1) Update: iif(Not isnull (Skey) and Decode(SAL. If there is any change in input data then filter transformation 2 forwards the complete input to the Exp 2 then it is gonna forward the updated input to the target table. . • • • Drag the Empno column from the Source Qualifier to the Lookup Transformation.1.• • The first thing that we are gonna do is to create a look up transformation and connect the Empno from the source qualifier to the transformation. Filter Transformation: We need two filter transformations the purpose the first filter is to filter out the records which we are goanna insert and the next is vice versa. In the Properties tab (i) Lookup table name ->Emp_Target. The steps to create an Expression Transformation are shown below. The Input Port for only the Empno1 should be checked. • • • Drag all the columns from both the source and the look up transformation and drop them all on to the Expression transformation. Transformation port should be Empno1 and Operator should ‘=’.Click on apply and then OK. The snapshot of choosing the Target table is shown below. Now add a new column as N_skey and the expression for it is gonna be Nextval1*100. Expression Transformation: Exp 1: It updates the target table with the skey values. Else the there is no modification done on the target table .• Go to the Properties tab on the Edit transformation (i) The value for the filter condition 1 is Insert. Flag is also made as output and expression parsed through it is 1. • • • • Drag all the columns from the filter 1 to the Exp 1. Point to be noticed here is skey gets multiplied by 100 and a new row is generated if there is any new EMP added to the list. .The purpose of this in our mapping is to increment the skey in the bandwidth of 100. Connect the output of the sequence transformation to the Exp 1. • • We are gonna have a sequence generator and the purpose of the sequence generator is to increment the values of the skey in the multiples of 100 (bandwidth of 100). • The closer view of the connections from the expression to the filter is shown below. (ii) The value for the filter condition 2 is Update. We are goanna make the s-date as the o/p and the expression for it is sysdate. Sequence Generator: We use this to generate an incremental cycle of sequential range of number. . Exp 2: If same employee is found with any updates in his records then Skey gets added by 1 and version changes to the next higher number. Both the S_date and E_date is gonna be sysdate. Don’t check the truncate table option.F • • • Drag all the columns from the filter 2 to the Exp 2. Update Strategy: This is place from where the update instruction is set on the target table.• Version is also made as output and expression parsed through it is 1. Create the task and run the work flow. Now add a new column as N_skey and the expression for it is gonna be Skey+1. Change Bulk to the Normal. Exp 3: If any record of in the source table gets updated then we make it only as the output. Run the work flow from task. The update strategy expression is set to 1. • • • • • • If change is found then we are gonna update the E_Date to S_Date. Step 3: Create the task and Run the work flow. Step 4: Preview the Output in the target table. Source table: (01-01-2011) Empno 101 102 103 Ename A B C Sal 1000 2000 3000 Target Table: (01-01-2011) Empno Ename 101 102 103 A B C C-sal 1000 2000 3000 P-sal - Source Table: (01-02-2011) Empno 101 102 103 Ename A B C Sal 1000 4566 3000 Target Table (01-02-2011): Empno Ename C-sal P-sal . and we are goanna use skey as the Primary key here. SCD Type 3 This Method has limited history preservation. Based on the Look up condition it decides whether we need to update. Step 2: here we are goanna see the purpose and usage of all the transformations that we have used in the above mapping. • • • As usually we are goanna connect Empno column from the Source Qualifier and connect it to look up transformation. In the Properties tab specify the Filter condition as update.0) Filter Transformation: We are goanna use two filter Transformation to filter out the data physically in to two separate sections one for insert and the other for the update process to happen. Add two Ports and Rename them as Insert. Next to this we are goanna specify the look up condition empno =empno1. Explanation of each and every Transformation is given below. Update Strategy 1: This is intended to insert in to the target table. expression. Insert: isnull(ENO1 ) Update: iif(not isnull(ENO1) and decode(SAL. Specify the below conditions in the Expression editor for the ports respectively. insert. These two ports are goanna be just output ports. .Stuff’s logically.1. Update Strategy: Finally we need the update strategy to insert or to update in to the target table.Curr_Sal. • Drag all the ports except the insert from the first filter in to this.0)=0. In the Properties tab specify the Filter condition as Insert. Expression Transformation: We are using the Expression Transformation to separate out the Insert-stuff’s and Update. filter. Prior to this Look up transformation has to look at the target table. update strategy to drive the purpose. Update. Finally specify that connection Information (Oracle) and look up policy on multiple mismatches (use last value) in the Properties tab. Look up Transformation: The look Transformation looks the target table and compares the same with the source table.1. Filter 2: • • Drag the update and other four ports which came from Look up in to the Expression in to Second filter. and delete the data from being loaded in to the target table. Filter 1: • • Drag the Insert and other three ports which came from source qualifier in to the Expression in to first filter.101 102 103 102 A B C B 1000 4566 3000 4544 Null 4566 So hope u got what I’m trying to do with the above tables: Step 1: Initially in the mapping designer I’m goanna create a mapping as below. • • • Drag all the ports from the Source Qualifier and Look up in to Expression. And in this mapping I’m using lookup. rather than forcing it to process the entire source and recalculate the same data each time you run the session. For each input record. the Integration Service processes the entire source. the index file and the data file. the Integration Service applies the changes to the existing target. Use incremental aggregation when the changes do not significantly change the target. you might have a session using a source that receives new data every day. the Integration Service performs the aggregate operation incrementally. The Integration Service creates the files in the cache directory specified in the Aggregator transformation properties. (iv) If the source changes significantly and you want the Integration Service to continue saving aggregate data for future incremental changes. when you run the session again. This allows the Integration Service to read and store the necessary aggregate data. If processing the incrementally changed source alters more than half the existing target. Integration Service Processing for Incremental Aggregation (i)The first time you run an incremental aggregation session. In the Properties tab specify the condition as the 1 or dd_update. the Integration Service does not store incremental aggregation values for percentile and median functions in disk caches. If the source changes incrementally and you can capture changes. When using incremental aggregation. If it finds a corresponding group. It saves modified aggregate data in the index and data files to be used as historical data the next time you run the session. Note: Do not use incremental aggregation if the mapping contains percentile or median functions.• • • In the Properties tab specify the condition as the 0 or dd_insert. On March 2. you apply captured changes in the source to aggregate calculations in a session. You then enable incremental aggregation. the Integration Service stores aggregate data from that session run in two files. Step 4: Observe the output it would same as the second target table Incremental Aggregation: When we enable the session option-> Incremental Aggregation the Integration Service performs incremental aggregation. the session may not benefit from using incremental aggregation. Drag all the ports except the update from the second filter in to this. you filter out all the records except those time-stamped March 2. you use the incremental source changes in the session. Incremental changes do not significantly change the target.Consider using incremental aggregation in the following circumstances: • • You can capture new source data. At the end of the session. You can capture those incremental changes because you have added a filter condition to the mapping that removes pre-existing data from the flow of data. . Finally connect both the update strategy in to two instances of the target. When the session runs with incremental aggregation enabled for the first time on March 1. you can configure the session to process those changes. using the aggregate data for that group. This allows the Integration Service to update the target incrementally. In this case. drop the table and recreate the target with complete source data. As a result. The Integration Service then processes the new data and updates the target accordingly. you use the entire source. Use incremental aggregation when you can capture new source data each time you run the session. and saves the incremental change. Use a Stored Procedure or Filter transformation to process new data. For example. If it does not find a corresponding group. The Integration Service uses system memory to process these functions in addition to the cache memory you configure in the session properties. the Integration Service creates a new group and saves the record data. (iii)When writing to the target. (ii)Each subsequent time you run the session with incremental aggregation. Update Strategy 2: This is intended to update in to the target table. it passes source data through the mapping and uses historical cache data to perform aggregation calculations incrementally. configure the Integration Service to overwrite existing aggregate data with new aggregate data. Step 3: Create a session for this mapping and Run the work flow. the Integration Service checks historical information in the index file for a corresponding group. Note: To protect the incremental aggregation files from file corruption or disk failure. Delete cache files. The Integration Service creates new aggregate data. • • (ii) Verify the incremental aggregation settings in the session properties. Verify that the Informatica Stencil and Informatica toolbar are available . 4.Each subsequent time you run a session with incremental aggregation. Configure the session to reinitialize the aggregate cache. To create a mapping template manually. Creating a Mapping Template Manually: You can use the Informatica Stencil and the Informatica toolbar to create a mapping template. (v)When you partition a session that uses incremental aggregation. The Informatica toolbar contains buttons for the tasks you can perform on mapping template. or you can create a mapping template by importing a Power Center mapping. Configuring the Session Use the following guidelines when you configure the session for incremental aggregation: (i) Verify the location where you want to store the aggregate files. • • Mapping Templates A mapping template is a drawing in Visio that represents a PowerCenter mapping. you need to configure both mapping and session properties: • • Implement mapping logic or filter to remove pre-existing data. when you perform one of the following tasks: • • • • • • Save a new version of the mapping. Drag the mapping objects from the Informatica Stencil to the drawing window:. When the Integration Service rebuilds incremental aggregation files. the Workflow Manager displays a warning indicating the Integration Service overwrites the existing cache and a reminder to clear this option after running the session. Integration Services rebuild incremental aggregation files they cannot find. $PMCacheDir. When an Integration Service rebuilds incremental aggregation files. However. Start Mapping Architect for Visio. When you run multiple sessions with incremental aggregation. the Integration Service creates one set of cache files for each partition. enter the appropriate directory for the process variable. . In a grid. Be sure the cache directory has enough disk space to store historical data for the session. If you choose to reinitialize the cache. Changing the cache directory without moving the files causes the Integration Service to reinitialize the aggregate cache and gather new aggregate data. Configure the session for incremental aggregation and verify that the file directory has enough disk space for the aggregate files. You can enter sessionspecific directories for the index and data files.Use the mapping objects to create visual representation of the mapping. periodically back up the files. decide where you want the files stored. the Integration Service creates a backup of the incremental aggregation files. Save and publish a mapping template to create the mapping template files. Use the Informatica Stencil and the Informatica toolbar in the Mapping Architect for Visio to create a mapping template. it loses aggregate history. Configuring the Mapping Before enabling incremental aggregation. you can easily change the cache directory when necessary by changing $PMCacheDir. You can create a mapping template manually. You can configure rules and parameters in a mapping template to specify the transformation logic. complete the following steps: 1. 2. You can also configure the session to reinitialize the aggregate cache. You can use a Filter or Stored Procedure transformation in the mapping to remove pre-existing source data during a session. instead of using historical data. Move the aggregate files without correcting the configured path or directory for the files in the session properties. by using the process variable for all sessions using incremental aggregation. • • The index and data files grow in proportion to the source data. you must capture changes in source data. The cache directory for the Aggregator transformation must contain enough disk space for two sets of the files. Create links:. You can configure the session for incremental aggregation in the Performance settings on the Properties tab. 3. Then. in the Workflow Manager. the data in the previous files is lost.Create links to connect mapping objects. Change the configured path or directory for the aggregate files without moving the files to the new location. The Informatica Stencil contains shapes that represent mapping objects that you can use to create a mapping template. Preparing for Incremental Aggregation: When you use incremental aggregation. Decrease the number of partitions. To create multiple mappings. To run sessions on a grid. Configure link rules. you must have the Server grid option. the master service process runs the workflow and all tasks except Session. export the mapping to a mapping XML file and then use the mapping XML file to create a mapping template.Add a group or expression required by the transformations in the mapping template. 11. Save changes to the mapping template drawing file. When you run a workflow on a grid. Mapping Architect for Visio generates a mapping template XML file and a mapping template parameter file (param. Declare mapping parameters and variables to use when you run sessions in Power Center:. Verify links. you can use the mapping parameters and variables in the session or workflow. To run a workflow on a grid. complete the following steps: 1. The Integration Service distributes workflow tasks and session threads based on how you configure the workflow or session to run: • • Running workflows on a grid. Do not edit the mapping template XML file.xml). When you run a session on a grid. To import a mapping template from a Power Center mapping. and predefined Event-Wait tasks within workflows across the nodes in a grid. Save the mapping template:. Command. Note: To run workflows on a grid. Command. click the Create Template from Mapping XML button. 9. Publish the mapping template. 2. 6. 9. Configure the mapping objects. If you make any change to the mapping template after publishing. you need to publish the mapping template again. Publish the mapping template:. and predefined Event-Wait tasks. Mapping Architect for Visio determines the mapping objects and links included in the mapping and adds the appropriate objects to the drawing window. Grid Processing When a Power Center domain contains multiple nodes. 6.If you edit the mapping template drawing file after you publish it. Start Mapping Architect for Visio. Export a Power Center mapping. Declare mapping parameters and variables to use when you run the session in Power Center. you can configure workflows and sessions to run on a grid. Running Workflows on a Grid: When you run a workflow on a grid. Save the mapping template.When you publish the mapping template. Configure the mapping objects:. Modify or declare new mapping parameters and variables appropriate for running the new mappings created from the mapping template. After you import the mappings created from the mapping template into Power Center. 3. Create or verify links that connect mapping objects. Do not edit the mapping template XML file. Configure rules for each link in the mapping template to indicate how data moves from one mapping object to another. configure the session to run on the grid. Import the mapping. You create the grid and configure the Integration Service in the Administration Console. set a parameter for the source or target definition.After you import the mappings created from the mapping template into Power Center. 10. On the Informatica toolbar. Informatica does not support imported objects from a different release. you must have the Session on Grid option. which it may distribute to other nodes. In the Designer.5. Use parameters to make the rules flexible. it is possible that the mapping parameters and variables ($$ParameterName) may not work for all mappings you plan to create from the mapping template. The master service process is the Integration . 8. It also distributes the Session. or mapping object. 10. Running sessions on a grid. 7. The Integration Service distributes session threads across nodes in a grid. select the mapping that you want to base the mapping template on and export it to an XML file. 4. target type. Importing a Mapping Template from a Power Center: If you have a Power Center mapping that you want to use as a basis for a mapping template. Validate the mapping template. Note: If the Power Center mapping contains mapping parameters and variables. Note: Mapping Architect for Visio fails to create a mapping template if you import a mapping that includes an unsupported source type. When you publish the mapping template. Mapping Architect for Visio generates a mapping template XML file and a mapping template parameter file (param. you configure the workflow to run on the Integration Service associated with the grid. Use parameters to make the rules flexible. 8. 5. Add a group or expression required by the transformations in the mapping template. you can use the mapping parameters and variables in the session or workflow. set a parameter for the source or target definition. Note: Export the mapping XML file within the current Power Center release. To run a session on a grid.xml). Configure link rules:. The Integration Service distributes workflows across the nodes in a grid. the Integration Service distributes session threads to multiple DTM processes on nodes in the grid to increase performance and scalability. you need to publish again.Save changes to the mapping template drawing file. the Integration Service runs a service process on each available node of the grid to increase performance and scalability. Verify that the Informatica stencil and Informatica toolbar are available .Configure rules for each link in the mapping template to indicate how data moves from one mapping object to another. To create multiple mappings. Validate the mapping template. 7. Workflow Variables You can create and use variables in a workflow to reference values and record information. Behavior differs when you disable the Integration Service or you disable a service process. monitors service processes running on other nodes. a workflow contains a Session task. and predefined EventWait tasks to the nodes in the grid. the Load Balancer distributes session threads to DTM processes running on different nodes. shuts down. the recovery behavior depends on the recovery strategy you configure for each task in the workflow. and Node 4 is unavailable. Links. User-defined workflow variables. Operating mode:. Use the following types of workflow variables: • • • • • Predefined workflow variables. If the Integration Service is configured to check resources. • • • • Note: You cannot configure an Integration Service to fail over in safe mode if it runs on a grid. When a workflow suspends. • Use the following keywords to write expressions for user-defined and predefined workflow variables: . after a Decision task. You configure a recovery strategy for tasks within the workflow. Grid Connectivity and Recovery When you run a workflow or session on a grid. aborts. If it did. Running mode:-If the workflow runs on a grid. If not. You might want to configure a session to run on a grid when the workflow contains a session that takes a long time to run. The Workflow Manager provides predefined workflow variables for tasks within a workflow. The Scheduler runs on the master service process node. you can specify that the service completes. the Load Balancer also distributes tasks based on resource availability. The Load Balancer distributes tasks based on node availability. For example. If you do not have high availability. When you run a session on a grid. you can run the next task. Running Sessions on a Grid: When you run a session on a grid. Command.If the Integration Service is configured to check resources. it identifies nodes that have resources required by mapping objects in the session. Command. use the Status variable to run a second session only if the first session completes successfully. a Decision task. you can stop the workflow. If a session runs on a grid. Use an Assignment task to assign a value to a user-defined workflow variable. or stops processes running on the service. recovery is disabled for sessions and workflows.If the Integration Service runs in safe mode. Behavior also differs when you disable a master service process or a worker service process.You can configure a workflow to suspend on error. the failover and recovery behavior depend on which service process shuts down and the configured recovery strategy. The grid contains four nodes. You create user-defined workflow variables when you create a workflow. Use workflow variables in links to create branches in the workflow. Partitioning configuration. The Load Balancer distributes the Session and Command tasks to nodes on the grid based on resource availability and node availability. In this case. you can create one link to follow when the decision condition evaluates to true. The Integration Service or service process may also shut down unexpectedly. Timer tasks specify when the Integration Service begins to run the next task in the workflow. Recovery strategy:. Services may shut down unexpectedly. For example. The Integration Service failover and recovery behavior in these situations depends on the service process that is disabled. Decision tasks determine how the Integration Service runs a workflow.The Load Balancer verifies which nodes are currently running. you can manually restart a workflow on another node to recover it. For example. The Load Balancer dispatches groups of session threads to separate nodes based on the partitioning configuration. workflows fail over to another node if the node or service shuts down. Network failures can cause connectivity loss between processes running on separate nodes. and runs the Load Balancer. you can increment a user-defined counter variable by setting the variable to its current value plus 1.Service process that runs the workflow. service processes and DTM processes run on different nodes. you cannot configure a resume recovery strategy. the Load Balancer distributes session threads based on the following factors: • • • Node availability :. the master service process runs the workflow and all tasks except Session. For Example. so it uses the date and time for the master service process node to start scheduled workflows. Links connect each workflow task. Use workflow variables when you configure the following types of tasks: Assignment tasks. or loses connectivity. the Integration Service can recover workflows and tasks on another node. The Load Balancer is the component of the Integration Service that dispatches Session. use a Variable in a Decision task to determine whether the previous task ran properly. and predefined Event-Wait tasks as it does when you run a workflow on a grid. You specify a resource requirement for the Session task. The master service process runs the Start and Decision tasks. In addition. Decision tasks. For example. enabled. so it uses the date and time for the master service process node to start scheduled workflows. or you may disable the Integration Service or service processes while a workflow or session is running. Timer tasks. Use a user-defined date/time variable to specify the time the Integration Service starts to run the next task. and available for task dispatch. and another link to follow when the decision condition evaluates to false. Shutdown mode:. Resource availability :. The Scheduler runs on the master service process node.When you disable an Integration Service or service process. Recovery behavior also depends on the following factors: • High availability option:-When you have high availability. and a Command task. Sample syntax: $s_item_summary. If the task fails.ErrorCode = 24013. If there is no error. • Task-Specific Variables Condition Description Evaluation result of decision condition expression. Use the following types of predefined variables: • Task-specific variables. Integration Service Name. Sample syntax: $s_item_summary.EndTime > TO_DATE('11/10/2004 08:13:25') Last error code for the associated All tasks task. Session Date/Time ErrorCode Integer ErrorMsg Nstring First Error Code Error code for the first error message in the session. Sample syntax: $s_item_summary.ErrorMsg = 'PETL_24013 Session run completed with failure Variables of type Nstring can have a maximum length of 600 characters. system date. Last error message for the All tasks associated task. Built-in variables. The Workflow Manager lists built-in variables under the Built-in node in the Expression Editor. Sample syntax: $Dec_TaskStatus. the Workflow Manager keeps the condition set to null. The Workflow Manager provides a set of task-specific variables for each task in the workflow. the Integration Service sets ErrorCode to 0 when the task completes.If there is no error.Condition = <TRUE | FALSE | NULL | any integer> Task Types Decision Data type Integer End Time Date and time the associated task All tasks ended. Integer . The Workflow Manager lists task-specific variables under the task name in the Expression Editor. Note: You might use this variable when a task consistently fails with this final error message. Note: You might use this variable when a task consistently fails with this final error message. or workflow start time. Use task-specific variables in a link condition to control the path the Integration Service takes when running the workflow. Use built-in variables in a workflow to return run-time or system information such as folder name. Precision is to the second.• • • • • • • AND OR NOT TRUE FALSE NULL SYSDATE Predefined Workflow Variables: Each workflow contains a set of predefined variables that you use to evaluate workflow and task conditions. the Integration Service sets ErrorMsg to an empty string when the task completes. SrcSuccessRows > 2500 StartTime Integer Date and time the associated task All Task started.SUCCEEDED Use these key words when writing expressions to evaluate the status of the current task.ABORTED .StartTime > TO_DATE('11/10/2004 08:13:25') Status of the previous task in the All Task workflow.DISABLED . the Integration Service sets FirstErrorCode to 0 when the session completes.FAILED . Sample syntax: $s_item_summary.If there is no error. Statuses include: . Sample syntax: $s_item_summary. Nstring PrevTaskStatus Status of the previous task in the All Tasks Integer workflow that the Integration Service ran. the Integration Service sets FirstErrorMsg to an empty string when the task completes.SUCCEEDED Use these key words when writing expressions to evaluate the status of the previous task.If there is no error.SrcFailedRows = 0 Session Integer SrcFailedRows SrcSuccessRows Total number of rows successfully read from the sources. Sample syntax: $s_dist_loc.ABORTED 2.PrevTaskStatus = FAILED Total number of rows the Session Integration Service failed to read from the source.FAILED 3. Sample syntax: $Dec_TaskStatus. Sample syntax: $s_dist_loc. Sample syntax: Date/Time Status Integer TgtFailedRows Integer .NOTSTARTED .FirstErrorCode = 7086 FirstErrorMsg First error message in the Session session.STARTED . Sample syntax: $s_dist_loc.STOPPED .FirstErrorMsg = 'TE_7086 Tscrubber: Debug info… Failed to evalWrapUp'Variables of type Nstring can have a maximum length of 600 characters.STOPPED 4. Sample syntax: $s_item_summary. Statuses include: 1.Status = SUCCEEDED Total number of rows the Session Integration Service failed to write to the target. Precision is to the second. Place a Decision task after the session that updates the local orders database. Sample syntax: $s_dist_loc. create a new workflow or edit an existing one. Add a Start task and both sessions to the workflow.TotalTransErrors = 5 User-Defined Workflow Variables: You can create variables within a workflow. Create separate sessions to update the local database and the one at headquarters. to represent the number of times the workflow has run. To configure user-defined workflow variables. complete the following steps: 1.Set up the decision condition to check to see if the number of workflow runs is evenly divisible by 10.TgtFailedRows = 0 TgtSuccessRows Total number of rows Session successfully written to the target. Select the Variables tab. the session that updates the local database runs every time the workflow runs. Create a persistent workflow variable. Use the variable in tasks within that workflow. Integer Integer Use a user-defined variable to determine when to run the session that updates the orders database at headquarters. When you configure workflow variables using conditions.Nstring Whether the variable is persistent. The single dollar sign is reserved for predefined workflow variables Data type of the variable. every tenth time you update the local orders database.TgtSuccessRows > 0 TotalTransErrors Total number of transformation Session errors. Click Add. 3. it is valid only in that workflow. 3. The correct format is $$VariableName. you create a workflow to load data to an orders database nightly. 2. $$WorkflowCount. Workflow variable names are not case sensitive. Link the Decision task to the session that updates the database at headquarters when the decision condition evaluates to true. 4. Enter the information in the following table and click OK: Field Name Description Variable name.Date/Time . When you create a variable in a workflow. To create a workflow variable: 1. 2. You can select from the following data types: . Use user-defined variables when you need to make a workflow decision based on criteria you specify. Enable this option if you want the value of the variable retained from one execution of the workflow to the next. Do not use a single dollar sign ($) for a user-defined workflow variable. The session that updates the database at headquarters runs every 10th time the workflow runs. Use the modulus (MOD) function to do this.Double . Data type Persistent . You also need to load a subset of this data to headquarters periodically.Integer . Create an Assignment task to increment the $$WorkflowCount variable by one. In the Workflow Designer. You can edit and delete user-defined workflow variables. Creating User-Defined Workflow Variables : You can create workflow variables for a workflow in the workflow properties. 4.$s_dist_loc. For example. 5. Link it to the Assignment task when the decision condition evaluates to false. Sample syntax: $s_dist_loc. You cannot use one. How does your Mapping in Load to Stage look like? 19. and space. What are your Roles in this project? 10.US . Can I have one situation which you have adopted by which performance has improved dramatically? 11. 6.MM/DD/YYYY . In which account does your Project Fall? 5.MM/DD/YYYY HH24:MI:SS .MM/DD/YYYY HH24:MI:SS. How does your Mapping in Stage to ODS look like? 20. What are your Daily routines? 3. Interview Zone Hi readers. colon (:). Click OK. What is the size of your Data warehouse? 21.NS . period (. Explain your Project? 2. What is your Daily feed size and weekly feed size? . How do we do the Fact Load? 16.MM/DD/RR HH24:MI:SS. Click Apply to save the new workflow variable. Whether the default value of the variable is null. What is your Involvement in Performance tuning of your Project? 8. Description associated with the variable. Variables of type Date/Time can have the following formats: . To validate the default value of the new workflow variable. How many Complex Mapping’s have you created? Could you please me the situation for which you have developed that Complex mapping? 7. What is your Reporting Hierarchy? 6.MM/DD/RR .MM/DD/YYYY HH24:MI:SS.). The Integration Service ignores extra spaces. What is the Schema of your Project? And why did you opt for that particular schema? 9. backslash (\). How many Dimension Table are there in your Project and how are they linked to the fact table? 15.MS .US . What kinds of Testing have you done on your Project (Unit or Integration or System or UAT)? And Enhancement’s were done after testing? 14.MM/DD/RR HH24:MI:SS.MS .MM/DD/YYYY HH24:MI . Where you Involved in more than two projects simultaneously? 12. The Integration Service uses this value for the variable during sessions if you do not set a value for the variable in the parameter file and there is no value stored in the repository. These are the questions which normally i would expect by interviewee to know when i sit in panel.MM/DD/YYYY HH24:MI:SS. So what i would request my reader’s to start posting your answers to this questions in the discussion forum under informatica technical interview guidance tag and i’ll review them and only valid answers will be kept and rest will be deleted. click the Validate button. 1. How did you implement CDC in your project? 17. Do you have any experience in the Production support? 13. Is Null Description 5. slash (/). 7.MM/DD/RR HH24:MI:SS.or three-digit values for year or the “HH12” format for hour. Variables of type Nstring can have a maximum length of 600 characters.NS You can use the following separators: dash (-). If the default value is null. How does your Mapping in File to Load look like? 18.Default Value Default value of the variable. How many mapping have you created all together in your project? 4. enable this option.MM/DD/RR HH24:MI .MM/DD/RR HH24:MI:SS . What is hash table Informatica? 34. What is polling? 44. What is the scheduler tool you have used in this project? How did you schedule jobs using it? Informatica Experienced Interview Questions – part 1 1. What is Mapplet? 46. What are the different types of the caches available in Informatica? Explain in detail? 43. How to import oracle sequence into Informatica? 16. Explain the versioning concept in Informatica? 26. Explain about Informatica server Architecture? 10. What is Data driven? 27. How do we improve the performance of the aggregator transformation? 42. When we can join tables at the Source qualifier itself. In update strategy Relational table or flat file which gives us more performance? Why? 11. What are the types of meta data repository stores? 29. Why? 35. Difference between Static and Dynamic caches? 5. Can you explain what are error tables in Informatica are and how we do error handling in Informatica? 13. How do we implement recovery strategy while running concurrent batches? 25. Where does Informatica store rejected data? How do we view them? 21. How u will create header and footer in target using Informatica? 19. What is the default join operation performed by the look up transformation? 33. In a joiner transformation. Difference between Normal load and Bulk load? 18. Why did we use stored procedure in our ETL Application? 31. What do you mean by direct loading and Indirect loading in session properties? 24. you should specify the table with lesser rows as the master table. Have you developed any Stored Procedure or triggers in this project? How did you use them and in which situation? 25. What is Persistent Lookup cache? What is its significance? 6. Difference between Cached lookup and Un-cached lookup? 36. Did your Project go live? What are the issues that you have faced while moving your project from the Test Environment to the Production Environment? 26. Is sorter an active or passive transformation? When do we consider it to be active and passive? 9. Difference between connected and unconnected lookup transformation in Informatica? 3. What are active and passive transformations? .22. How do I create Indexes after the load process is done? 41. What are the session parameters? 20. What is batch? Explain the types of the batches? 28. How the Informatica server sorts the string values in Rank transformation? 8. Which Approach (Top down or Bottom Up) was used in building your project? 23. Explain what Load Manager does when you start a work flow? 38. Difference between constraint base loading and target load plan? 14. What is the biggest Challenge that you encountered in this project? 27. Difference between stop and abort in Informatica? 4. What are mapping parameters and variables in which situation we can use them? 23. What are the out put files that the Informatica server creates during running a session? 12. Explain what DTM does when you start a work flow? 37. Difference between IIF and DECODE function? 15. Can you use the mapping parameters or variables created in one mapping into another mapping? 30. What is difference between partitioning of relational target and file targets? 22. What are the limitations of the joiner transformation? 45. What are the types of the aggregations available in Informatica? 40. Difference between and reusable transformation and mapplet? 7. Difference between Informatica 7x and 8x? 2. How do you access your source’s (are they Flat files or Relational)? 24. What is parameter file? 17. why do we go for joiner transformation? 32. In a Sequential batch how do i stop one particular session from running? 39. What is a code page? Explain the types of the code pages? 49.e different source and targets for each session run ? 65. What is Transformation? 69. What are the unsupported repository objects for a mapplet? 55. Why we use stored procedure transformation? 59. If a sequence generator (with increment of 1) is connected to (say) 3 targets and each target uses the NEXTVAL port. Which object is required by the debugger to create a valid debug session? 60. What does stored procedure transformation do in special as compared to other transformation? 70. What are the transformations that use cache for performance? 85. What is a shortcut in Informatica? 82. What are the types of mapping wizards available in Informatica? 57. What is exact use of 'Online' and 'Offline' server connect Options while defining Work flow in Work flow monitor? The system hangs when 'Online' Server connect option. Difference between Power mart & Power Center? 78. My flat file’s size is 400 MB and I want to see the data inside the FF with out opening it? How do I do that? 73. While importing the relational source definition from database. Difference between Filter and Router? 74. Can we do ranking on two ports? If yes explain how? 68. What is tracing level and what are its types? 53. What is change data capture? 64. How do you recognize whether the newly added rows got inserted or updated? 71. How many different locks are available for repository objects 84. What are partition points? 66. What kinds of sources and of targets can be used in Informatica? 79. What do you mean rank cache? 50.47. Is it possible negative increment in Sequence Generator? If yes. how would you accomplish it? 91. You have more five mappings use the same lookup. 63. what action would perform at Informatica + Oracle level for a successful load? 89. can we map these three ports directly to target? 58. Which directory Informatica looks for parameter file and what happens if it is missing when start the session? Does session stop after it starts? 92. How do you handle the decimal places when you are importing the flat file? 75. If your workflow is running slow. i. how can you delete duplicates before it starts loading? 88. what value will each target get? 80. What precautions do you need take when you use reusable Sequence generator transformation for concurrent sessions? 90. How can you delete duplicate rows with out using Dynamic Lookup? Tell me any other ways using lookup delete the duplicate rows? 51. Can we use an active transformation after update strategy transformation? 61. Write a session parameter file which will change the source and targets for every session. In case of flat file. The Informatica is installed on a Personal laptop. Explain how we set the update strategy transformation at the mapping level and at the session level? 62. Informatica is complaining about the server could not be reached? What steps would you take? 93. What is the use of Forward/Reject rows in Mapping? 86. How many ways you can filter the records? 87. What is data cleansing? 72. What is a command that used to run a batch? 54. What are the options in the target session of update strategy transformation? 48. what are the meta data of source U import? 77. Can u copy the session in to a different folder or repository? 52. How can you manage the lookup? . What is the difference between $ & $$ in mapping or parameter file? In which case they are generally used? 76. oracle. After dragging the ports of three sources (Sql server. Informix) to a single source qualifier. How to delete duplicate records from source database/Flat Files? Can we use post sql to delete these records. What are the different threads in DTM process? 67. What do you mean by SQL override? 81. You are required to perform “bulk loading” using Informatica on Oracle. How does Informatica do variable initialization? Number/String/Date 83. what is your approach towards performance tuning? 56. What are Data Marts? 3. What will happen if you copy the mapping from one repository to another repository and if there is no identical source? 95. What type of Indexing mechanism do we need to use for a typical Data warehouse? 23. SCD2 and SCD3? . What are the various ETL tools in the Market? 12. What are SCD1. Which columns go to the fact table and which columns go the dimension table? (My user needs to see <data element<data element broken by <data element<data element> All elements before broken = Fact Measures All elements after broken = Dimension Elements 24. If you have more than one pipeline in your mapping how will change the order of load? 99. What is VLDB? (Database is too large to back up in a time frame then it's a VLDB) 30. which port should be the output? 97. What are conformed dimensions? 29. group by col 2. Second Normal Form . You have a requirement to alert you of any long running sessions in your workflow. What are non-additive facts? (Inventory. Your session failed and when you try to open a log file. What are Aggregate tables? 9. What is a dynamic lookup and what is the significance of NewLookupRow? How will use them for rejecting duplicate records? 98. What is ODS? 22. What is the Difference between OLTP and OLAP? 10. What is ER Diagram? 4. col3). it complains that the session details are not available. What is a data-warehouse? 2. How are the Dimension tables designed? De-Normalized. What is a general purpose scheduling tool? Name some of them? 17. Wide. What is Dimensional Modelling? 6. Use Surrogate Keys. what does this xml contain? Workflow only? 100. procedure or Informatica mapping or workflow control? Data warehousing Concepts Based Interview Questions 1. Third Normal Form? 21.Account balances in bank) 28. What is Fact table? 14. What is a dimension table? 15. What Snow Flake Schema? 7. How can you create a workflow that will send you email for sessions running more than 30 minutes. What is a Star Schema? 5. shell script. You can use any method. Contain Additional date fields and flags. What is real time data-warehousing? 19. What is ETL? 11.94. What is a level of Granularity of a fact table? What does this signify?(Weekly level summarization there is no need to have Invoice Number in the fact table anymore) 25. What is Normalization? First Normal Form. Short. An Aggregate transformation has 4 ports (l sum (col 1). How would do trace the error? What log file would you seek for? 101. When you export a workflow from Repository Manager. What is data mining? 20. What are the various Reporting tools in the Market? 13. What is a lookup table? 16. You want to attach a file as an email attachment from a particular directory using ‘email task’ in Informatica. What are slowly changing dimensions? 27. How can you limit number of running sessions in a workflow? 96. What are the Different methods of loading Dimension tables? 8. How will you do it? 102. What are modeling tools available in the Market? Name some of them? 18. 26.
Copyright © 2024 DOKUMEN.SITE Inc.