Ab-Initio Interview Ques

What is the relation between EME , GDEand Co-operating system ? ans. EME is said as enterprise metdata env, GDE as graphical devlopment env and Co-operating sytem can be said as asbinitio server relation b/w this CO-OP, EME AND GDE is as fallows Co operating system is the Abinitio Server. this co-op is installed on perticular O.S platform that is called NATIVE O.S .comming to the EME, its i just as repository in informatica , its hold the metadata,trnsformations,db config files source and targets informations. comming to GDE its is end user envirinment where we can devlop the graphs(mapping just like in informatica) desinger uses the GDE and designs the graphs and save to the EME or Sand box it is at user side.where EME is ast server side. What is the use of aggregation when we have rollup as we know rollup component in abinitio is used to summirize group of data record. then where we will use aggregation ? ans: Aggregation and Rollup both can summerise the data but rollup is much more convenient to use. In order to understand how a particular summerisation being rollup is much more explanatory compared to aggregate. Rollup can do some other functionalities like input and output filtering Aggregate and rollup of same perform result in main memory, what are kinds of layouts does ab initio supports Aggregate action, does rollup not records. intermediat display support intermediat result Basically there are serial and parallel layouts supported by AbInitio. A graph can have both at the same time. The parallel one depends on the degree of data parallelism. If the multi-file system is 4-way parallel then a component in a graph can run 4 way parallel if the layout is defined such as it’s same as the degree of parallelism. How can you run a graph infinitely? To run a graph infinitely, the end script in the graph should call the .ksh file of the graph. Thus if the name of the graph is abc.mp then in the end script of the Like this the graph will run infinitely. graph there should be a call to abc.ksh. How do you add default rules in transformer? Double click on the transform parameter of parameter tab page of component properties, it will open transform editor. In the transform editor click on the Edit menu and then select Add Default Rules from the dropdown. It will show two options – 1) Match Names 2) Wildcard. Do you know what a local lookup is? If your lookup file is a multifile and partioned/sorted on a particular key then local lookup function can be used ahead of lookup function call. This is local to a particular partition depending on the key. Lookup File consists of data records which can be held in main memory. This makes the transform function to retrieve the records much faster than retirving from disk. It allows the transform component to process the data records of multiple files fastly. What is the difference between look-up file and look-up, with a relevant example? Generally Lookup file represents one or more serial files(Flat files). The amount of data is small enough to be held in the memory. This allows transform A lookup functions to is a component retrive records of abinitio graph much more quickly where we can store data than it and retrieve could it by A lookup file is the physical file where the data for How many components in your most complicated graph? It depends the type of components you us. the retrive using a lookup from Disk. key parameter. is stored. usually avoid using much complicated transform function in a graph. Explain what is lookup? Lookup is basically a specific dataset which is keyed. This can be used to mapping values as per the data present in a particular file (serial/multi file). The dataset can be static as well dynamic ( in case the lookup file is being generated in previous phase and used as lookup file in current phase). Sometimes, hash-joins can be replaced by using reformat and lookup if one of the input to the join contains less number AbInitio has of built-in records functions to with retrieve values slim using What is a The limit parameter contains an integer that represents a number of reject events the key record for the ramp length. lookup limit? The ramp parameter contains a real number that represents a rate of reject events in the number of records processed. no of bad records allowed = limit + no of records*ramp. ramp is basically the percentage This two together provides the threshold value of bad records. value (from 0 to 1) Have you worked with packages? Multistage transform components by default uses packages. However user can create his own set of functions in a transfer function and can include this in other transfer functions. Have you used rollup component? Describe how. If the user wants to group the records on particular field values then rollup is best way to do that. Rollup is a multi-stage transform function and it contains the following mandatory functions. 1. 2. 3. Also need to declare one temporary variable if you want to get counts of a particular group. initialise rollup finalise For each of the group, first it does call the initialise function once, followed by rollup function calls for each of the records in the group and finally calls the finalise function once at the end of last rollup call. How do you add default rules in transformer? Add Default Rules — Opens the Add Default Rules dialog. Select one of the following: Match Names — Match names: generates a set of rules that copies input fields to output fields with the same name. Use Wildcard (.*) Rule — Generates one rule that copies input fields to output fields with the same name. )If 2)Click it is not Business the already Rules displayed, tab display it if the is Transform Editor not already Grid. displayed. 3)Select Edit > Add Default Rules. In case of reformat if the destination field names are same or subset of the source fields then no need to write anything in the reformat xfr unless you dont want to use any real transform other than reducing the set of fields or split the flow into a number of flows to achive the functionality. What is the difference between partitioning with key and round robin? Partition by Key or hash partition -> This is a partitioning technique which is used to partition data when the keys are diverse. If the key is present in large volume then there can large data skew. But this method is used more often for parallel data processing. Round robin partition is another partitioning technique to uniformly distribute the data on each of the destination data partitions. The skew is zero in this case when no of records is divisible by number of partitions. A real life example is how a pack of 52 cards is distributed among 4 players in a round-robin manner. How do you improve the performance of a graph? There are many 1) 2) Use Use a optimum 3) 4) Minimise Minimise sorted 5) 6) Use Use 7) 8) If the two For ways limited value join are large huge performance number of max the component only required phasing/flow inputs the and fields buffers then use dataset 9) Minimise the use of regular 10) Avoid repartitioning of data unnecessarily of core if of number possible in in graph in sort can a sort, of otherwise use functions them like use sort in-memory by reformat, merge, hash join broadcast re_index be particular join and of replace the case sorted join, don’t expression the components values for in improved. phase components components join/hash join join components sorted joins with the proper as trasfer driving port partitioner functions A solution to this problem would be to use a partitioning component in between if there was change in layout.field :: (decimal(8)) in. What is the difference between clustered and non-clustered indices? …and why do you use a clustered index? What is an outer join? An outer join is used when one wants to select all the records from a port – whether it has satisfied the join criteria or not. out. Key should be {}.Try to run the graph as long as possible in MFS. Just use decimal cast with the size in the transform function and will suffice. What are Cartesian joins? joins two tables without a join key.The criteria for both the tables is there should be a matching column. say destination field is decimal(5).field If the destination field size is lesser than the input then use of string_substring function can be used likie the following. For example. out.1. What is the function you would use to transfer a string into a decimal? In this case no specific function is required if the size of the string and decimal is same. What is the purpose of having stored procedures in a database? Main Purpose of Stored Procedure for reduse the network trafic and all sql statement executing in cursor so speed too high. How do you truncate a table? From Abinitio run sql By using the Truncate table component in Ab Initio component using the DDL “trucate table Have you eveer encountered an error called “depth not equal”? When two components are linked together if their layout doesnot match then this problem can occur during the compilation of the graph.Wheras the primary key table is the parent table and foreignkey table is the child table.5)) What are primary keys and foreign keys? /* string_lrtrim used to trim leading and trailing spaces */ In RDBMS the relationship between the two tables is represented as Primary key and foreign key relationship.field.field :: (decimal(5))string_lrtrim(string_substring(in. For these input files should be partitioned and if possible output file should also be partitioned. if the source field is defined as string(8) and the destination as decimal(8) then (say the field name is field1). Why might you create a stored procedure with the ‘with recompile’ option? . Implicit cursor is using for internal processing and Explicit cursor is using for user open for data required.col1 and b. Because implicit is using for internal processing and explicit is using for user open data requied.There are two types of cursors like Implecit cursor and Explicit cursor. Describe the process steps you would perform when defragmenting a data table. these acts as directives to the optimizer select /*+ index(a index_name) full(b) */ *from table1 a. table2 bwhere b. When using multiple DML statements to perform a single unit of work. If we create the stored proc with recompile option. Due to the heavy modification activity the execute plan becomes outdated and hence the stored proc performance goes down. This There 1) We table can are move the contains table in several the same or mission other ways tablespace and critical to rebuild all the do indexes data. How would you find out whether a SQL query is using the indices you expect? explain plan can be reviewed to check the execution plan of the query. and why.col3 = 1. What is a cursor? Within a cursor. Because every job depend upon another job for example if you first job result is successfull then another job will execute otherwise your job doesn’t work. . This would guide if the expected indexes are used or not.col2= ‘sid’and b.Recompile is useful when the tables referenced by the stored proc undergoes a lot of modification/deletion/addition of data. how would you update fields on the row just fetched The oracle engine uses work areas for internal processing in order to the execute sql statement is called cursor.col1 = a. the sql server wont cache a plan for this stored proc and it will be recompiled every time it is run. Describe the elements you would review to ensure multiple scheduled “batch” jobs do not “collide” with each other. on the this: table. How can you force the optimizer to use a particular index? use hints /*+ */. is it preferable to use implicit or explicit transactions. Rollback cannot be performed incase of Truncate statement wheras Rollback can be performed in Delete statement. displayed. finalise For each of the group. 1. Describe the “Grant/Revoke” DDL facility and how it is implemented. How do you add default rules in transformer? Ans: Add Default Rules — Opens the Add Default Rules dialog.Grant or Revoke both commands depend upon D.A. Have you used rollup component? Describe how. first it does call the initialise function once. Also need to declare one temporary variable if you want to get counts of a particular group. Explain the difference between the “truncate” and “delete” commands. initialise rollup 3. already Rules displayed. truncate the table and import the dump back into the table. Use Wildcard (.alter table move this activity reclaims the defragmented space analyze table table_name compute statistics to capture the 2)Reorg could be done by taking a dump of the table.CFG file is the table configuration file created by db_config while using components like Load DB Table.CREATE VIEW AND MANY MORE REVOKE means cancel the grant (permissions). “WHERE” clause cannot be used in Truncate where as “WHERE” clause can be used in DELETE statement. Ans: If the user wants to group the records on particular field values then rollup is best way to do that. What is the difference between a DB config and a CFG file? A . The difference between the TRUNCATE and DELETE statement is Truncate belongs to DDL command whereas DELETE belongs to DML command.B. 1)If 2)Click it is the not Business 3)Select Edit > Add Default Rules.So.B.A responsibilities GRANT means permissions for example GRANT CREATE TABLE . However user can create his own set of functions in a transfer function and can include this in other transfer functions. Have you worked with packages? Ans: Multistage transform components by default uses packages.This is a part of D.dbc file has the information required for Ab Initio to connect to the database to extract or load tables or views. Select one of the following: Match Names — Match names: generates a set of rules that copies input fields to output fields with the same name. tab if display it the is Transform Editor not already Grid. followed by rollup function calls for each of the records in the group and finally calls the finalise function once at the end of last rollup call.*) Rule — Generates one rule that copies input fields to output fields with the same name. 2. Rollup is a multi-stage transform function and it contains the following mandatory functions. Basically. While . in the updated table statistics. . . What is the difference between partitioning with key and round robin? Ans: Partition by Key or hash partition -> This is a partitioning technique which is used to partition data when the keys are diverse. the end script in the graph should call the . This makes the transform function to retrieve the records much faster than retirving from disk.In case of reformat if the destination field names are same or subset of the source fields then no need to write anything in the reformat xfr unless you dont want to use any real transform other than reducing the set of fields or split the flow into a number of flows to achive the functionality.trnsformations.S . of otherwise use functions them sort. .comming to the EME. Lookup File consists of data records which can be held in main memory. Aggregate and rollup perform same action. In order to understand how a particular summerisation being rollup is much more explanatory compared to aggregate. This is local to a particular partition depending on the key. Do you know what a local lookup is? Ans : If your lookup file is a multifile and partioned/sorted on a particular key then local lookup function can be used ahead of lookup function call. A real life example is how a pack of 52 cards is distributed among 4 players in a round-robin manner. How do you add default rules in transformer? Ans : Double click on the transform parameter of parameter tab page of component properties.S platform that is called NATIVE O. Round robin partition is another partitioning technique to uniformly distribute the data on each of the destination data partitions. If the multi-file system is 4-way parallel then a component in a graph can run 4 way parallel if the layout is defined such as it’s same as the degree of parallelism. its i just as repository in informatica . Thus if the name of the graph is abc. it will open transform editor. But this method is used more often for parallel data processing. GDE as graphical devlopment env and Co-operating sytem can be said as asbinitio server relation b/w this CO-OP. How Ans: There do are you many 1) 2) Use Use a optimum 3) 4) Minimise Minimise sorted 5) 6) Use Use 7) 8) 9) 10) If the two For Minimise improve ways the limited value join are large the use Avoid number max the component only required phasing/flow inputs of huge of the performance and fields buffers then use dataset of core if performance of the components values for number possible in in regular expression repartitioning in sort a like use sort in-memory by hash join broadcast re_index of graph? improved. If the key is present in large volume then there can large data skew. Aggregate does not support intermediat result. It allows the transform component to process the data records of multiple files fastly. merge.ksh. be particular join and reformat. What is the use of aggregation when we have rollup as we know rollup component in abinitio is used to summirize group of data record. How can you run a graph infinitely? Ans: To run a graph infinitely. EME AND GDE is as follows Co operating system is the Abinitio Server. How Ans: From do Abinitio you run sql component truncate using the a DDL “trucate table? table By using the Truncate table component in Ab Initio What is the relation between EME . its hold the metadata.1) Match Names 2) Wildcard. a can of replace the case sorted join. comming to GDE its is end user envirinment where we can devlop the graphs(mapping just like in informatica) desinger uses the GDE and designs the graphs and save to the EME or Sand box it is at user side where EME is ast server side. It will show two options .This co-op is installed on perticular O. The skew is zero in this case when no of records is divisible by number of partitions. What are kinds of layouts does ab initio supports? Ans: Basically there are serial and parallel layouts supported by AbInitio.db config files source and targets informations. Rollup can do some other functionalities like input and output filtering of records. then where we will use aggregation ? Ans: Aggregation and Rollup both can summerise the data but rollup is much more convenient to use. The parallel one depends on the degree of data parallelism. A graph can have both at the same time. In the transform editor click on the Edit menu and then select Add Default Rules from the dropdown. For these input files should be partitioned and if possible output file should also be partitioned. GDE and Co-operating system ? Ans : EME is said as enterprise metdata env.ksh file of the graph. don’t of graph phase components components join/hash join join components sorted joins with in data the proper as trasfer driving port partitioner functions unnecessarily Try to run the graph as long as possible in MFS. Like this the graph will run infinitely. rollup display intermediat result in main memory.mp then in the end script of the graph there should be a call to abc. hash-joins can be replaced by using reformat and lookup if one of the input to the join contains less number of records with AbInitio has built-in functions to retrieve values using the key for the lookup What Ans: The is limit parameter contains an a integer that slim represents record ramp a number of length. This allows transform functions to retrive records much more quickly than it could retrive from Disk. with a relevant example? Ans: Generally Lookup file represents one or more serial files(Flat files). Ans: Usually avoid using much complicated transform function in a graph. reject limit? events The ramp parameter contains a real number that represents a rate of reject events in the number of records processed. no of bad records allowed = limit + no of records*ramp. How many components in your most complicated graph? It depends the type of components you us. ramp is basically the percentage This two together provides the threshold value of bad records. A lookup is a component of abinitio graph where we can A lookup file is the physical file where the data for the lookup is stored. The dataset can be static as well dynamic ( in case the lookup file is being generated in previous phase and used as lookup file in current phase).What is the difference between look-up file and look-up. The amount of data is small enough to be held in the memory. This can be used to mapping values as per the data present in a particular file (serial/multi file). Sometimes. value (from 0 What is destructor what is destructor What is XML-RPC? What is XML-RPC? What is new about Web services? What is new about Web services? What is a Web service? What is a Web service? What kind of services operating system provides? What kind of services operating system provides? What is logic? What is logic? What is algorithm? What is algorithm? What is constant? What is constant? What is variable? What is variable? What for an assignment statement is used? What for an assignment statement is used? What are the four basic types of data? What are the four basic types of data? What for a conditional loop is best suited? What for a conditional loop is best suited? What for an incremented loop is best suited? What for an incremented loop is best suited? What is Relational operators used for? What is Relational operators used for? What Relational Operators Do you know? (C) What Relational Operators Do you know? (C) What does grep() stand for? (Unix interview question) What does grep() stand for? (Unix interview question) What does RPG stand for? What does RPG stand for? to 1) . Explain what is lookup? Ans: Lookup is basically a specific dataset which is keyed. store data and retrieve it by using a key parameter. This table contains mission critical data. GDE and Co-operating system ? What is the use of aggregation when we have rollup as we know rollup component in abinitio is used to summirize group of data record. .What does RPG stand for? What does RPG stand for? What does Lisp stand for? What does Lisp stand for? What does HTML stand for? . then where we will use aggregation ? Describe the process steps you would perform when defragmenting a data table. Explain the difference between the ?truncate? and “delete” commands. What does HTML stand for? What does Fortran stand for? What does Fortran stand for? What does DOS stand for? What does DOS stand for? What does CGI stand for? What does CGI stand for? What does CORBA stand for? What does CORBA stand for? What does Cobol stand for? What does Cobol stand for? What does Case stand for? What does Case stand for? What does BASIC stand for? What does BASIC stand for? What does ASCII stand for? What does ASCII stand for? What does Algol stand for? What does Algol stand for? What does SQL stand for? What does SQL stand for? What is the latest version that is available in Ab-initio? How to take the input data from an excel sheet? How will you test a dbc file from command prompt ? Which one is faster for processing fixed length dmls or delimited dmls and why ? What are the contineous components in Abinitio? What is meant by fancing in abinitio ? What is the relation between EME . Connection Options. Users. Logins. is it preferable to use implicit or explicit transactions.e. What are primary keys and foreign keys? What is an outer join? What are Cartesian joins? What is the purpose of having stored procedures in a database? What is a cursor? Within a cursor. What is semi-join How to get DML using Utilities in UNIX? What is driving port? When do you use it? . Describe the ?Grant/Revoke? DDL facility and how it is implemented. and why. Triggers.When running a stored procedure definition script how would you guarantee the definition could be “rolled back” in the event of problems.: a test and production copy of a database). Constraints. Describe the elements you would review to ensure multiple scheduled “batch” jobs do not “collide” with each other. and Server Options etc) are consistent and repeatable between multiple database instances (i. Describe how you would ensure that database object definitions (Tables. how would you update fields on the row just fetched? How would you find out whether a SQL query is using the indices you expect? How can you force the optimizer to use a particular index? When using multiple DML statements to perform a single unit of work. What is the difference between a DB config and a CFG file? What about DML changes dynamically? What is backward compatibility in abinitio? What are kinds of layouts does ab initio supports How do you add default rules in transformer? Have you used rollup component? Describe how. Indices. dbc and .15)? What are differences between different versions of Co-op? Do you know what a local lookup is? How many components in your most complicated graph? How to handle if DML changes dynamically in abinitio . What are the different commands that you used when writing wrappers? What do the hidden files in a sandbox represent and what does start.13and 1.10.1.ksh represent? How can we test the abintio manually and automation? What is the difference between sandbox and EME.11.1.What is local and formal parameter What is BRODCASTING and REPLICATE ? Explain what is lookup? Have you worked with packages? How to create repository in abinitio for stand alone system(LOCAL NT)? What is the difference between .1.12.cfg file? What does dependency analysis mean in Ab Initio? What do you have to give the value for the Record Required parameter for a natural join? When do you use Partition by Expression? What is Adhoc File System? Give me a scenario where you used it. can we perform checkin and checkout through sandbox/ Can anybody explain checkin and checkout? What does layout means in terms of Ab Initio What are different things that you have to consider when loading data into a table? How to Create Surrogate Key using Ab Initio? Can anyone give me an exaple of realtime start script in the graph? What are differences between different GDE versions(1. profile in Abinitio and what does it contains? What is data mapping and data modelling? What is the difference between partitioning with key and round robin? Can anyone tell me what happens when the graph run? i. .abinitiorc and What it contain? What do you mean by . How How would you do performance tuning for already built graph ? Can you let me know some examples? the How to execute the graph from start to end stages? Tell me and how to run graph in non-Abinitio system? What are the most commonly used components in a Abinition graph? can anybody give me a practical example of a trasformation of data.e The Co-operating System will be at the host.Explain what is lookup? Have you worked with packages? How to run the graph without GDE? What are the different versions and releases of ABinitio (GDE and Co-op version) What is the Difference between DML Expression and XFR Expression ? How Does MAXCORE works? What is $mpjret? Where it is used in ab-initio? How do you convert 4-way MFS to 8-way mfs? What is skew and skew measurement? What is the importance of EME in abinitio? How do you add default rules in transformer? What is difference between file and table in abinitio How to create a computer program that computes the monthly interest charge on a credit card account? What is . We are running the graph at some other place. say customer data in a credit card company into meaningful output based on business rules? Can we load multiple files? Can anyone please explain the environment varaibles with example. What is the difference between rollup and scan? How to work with parameterized graphs? Please give us insight on Enterprise Meta Environment. and some possible questions on that. What are delta table and master table? What error would you get when you use Partition by Round Robin and Join? Do you know what a local lookup is? How many components in your most complicated graph? How to handle if DML changes dynamically in abinitio How do you count the number of records in a flat file? How do you connect EME to Abinitio Server? Have you eveer encountered an error called ‘depth not equal’? (This occurs when you extensively create graphs it is a trick question) What is the difference between a DB config and a CFG file? Do you know what a local lookup is? What is the difference between look-up file and look-up. with a relevant example? Have you worked with packages? In which scenarios would you use Partition by Key and also.Explain the differences between api and utility mode? Please let me know whether we have ab initio GDE version 1. Partition by Round Robin and differences between the both? What are the different dimension tables that you used and some columns in the fact table? What is the difference between a Scan component and a RollUp component? How do we handle if DML changing dynamicaly ..14 and what is the latest GDE version and Co-op version? What are the Graph parameter? How to find the number of arguments defined in graph. would you have to start the process all over again or does it start from where it stopped? What are the different types of partitions and scenarios. we can use partition components. we can use replicate component. For component parallelism.What is m_dump What is the syntax of m_dump command? Have you used rollup component? Describe how. What does dependency analysis mean in Ab Initio? . let us say you lost the network connection.. like output of A JOB is Input to B How do we co-ordinate the jobs How do you truncate a table? What is a ramp limit? What is the difference between dbc and cfg? When do you use these two? What are the compilation errors you came across while executing your graphs? What is depth_error? Difference between conventional loading and direct loading ? When it is used in real time . During the execution of graph. How do you improve the performance of a graph? How many components are there in your most complicated graph? What is the function you would use to transfer a string into a decimal? For data parallelism. Like this which component(s) can we use for pipeline parallelism? What is AB_LOCAL expression where do you use it in ab-initio? What is mean by Co > Operating system and why it is special for Ab-initio ? How to retrive data from database to source in that case whice componenet is used for this? How can you run a graph infinitely? What is the syntax of m_dump command? How to do we run sequences of jobs . How to Schedule Graphs in AbInitio. Position attributes. . like workflow Schedule in Informatica? And where we must is Unix shell scripting in AbInitio? How to Improve Performance of graphs in Ab initio? Give some examples or tips. Address.what applications prodeuce and depend on this data etc.What does unused port in join component do? Define Multi file system. What is difference between Redefine Format and Reformat components? Sometimes you have to use dynamic length strings. if not how would you store the respective files? How did you do version control? Which tool did you use? How do you troubleshoot performance issues in graph? What are the usual errors that you encounter during ETL process apart from compilation process? Were you involved in production support? What were the different kinds of problems that you encountered? How do you count the number of records in a multifile system without using GDE? What does Scan and Rollup component do and give a scenario where you used them? Did you ever used user defined functions or packages? If yes. We can retrieve the maximum (surrogate key) from the existing data. Status.8 have sandbox. can Name and Address be on one partition and Status and Position in the other partition? What is a sandbox? Did the co-operating system version 2.That is where does the data come from. give a scenario. Can you create multifile system on the same server? Also.the by using scan or next_in_sequence/reformat we can generate further sequence for new records. Can you give me one circumstance where you need it? Why might you create a stored procedure with the ‘with recompile’ option? How many parallelisms are in Abinitio? Please give a definition of each. Ab Initio Questions and Answers:  1 :: What does dependency analysis mean in Ab Initio? Dependency analysis will answer the questions regarding datalinage. if you have a table that has Name. . Logins.A responsibilities GRANT means permissions for example GRANT CREATE TABLE . Connection Options.B. REVOKE means cancel the grant (permissions).B.1 Yes 1 No Is This Answer Correct? 2 :: When using multiple DML statements to perform a single unit of work.e.So.A. is it preferable to use implicit or explicit transactions.If so. 1 Yes 0 No Is This Answer Correct? 4 :: What is the difference between rollup and scan? By using rollup we cant generate cumulative summary records for that we will be using scan.: a test and production copy of a database)? Take an entire database backup and restore it in different instance.without testscript you cant run a file.CREATE VIEW AND MANY MORE . by merging GUI map files in GUI map editor it wont create corresponding test script.So it is impossible to run a file by merging 2 GUI map files. Indices. 0 Yes 1 No Is This Answer Correct? 6 :: How can i run the 2 GUI merge files? Do you mean by merging Gui map files in WR. 1 Yes 1 No Is This Answer Correct? 3 :: Describe the Grant/Revoke DDL facility and how it is implemented? Basically. Constraints. and Server Options etc) are consistent and repeatable between multiple database instances (i. Triggers. 1 Yes 1 No Is This Answer Correct? 5 :: Describe the elements you would review to ensure multiple scheduled batch jobs do not collide with each other? Because every job depend upon another job for example if you first job result is successfull then another job will execute otherwise your job doesn't work. Users. 0 Yes 1 No Is This Answer Correct? 7 :: Describe how you would ensure that database object definitions (Tables. and why? Because implicit is using for internal processing and explicit is using for user open data requied.Grant or Revoke both commands depend upon D.This is a part of D. grants. 0 Yes 0 No Is This Answer Correct? 10 :: When running a stored procedure definition script how would you guarantee the definition could be rolled back in the event of problems? There are quite a few factors that determines the approach such as what type of version control are used. then just drop the wrong one if it is a replacement then how big is the change and what will be the possible impact. clusters or tables. any job calling the procedure at the time of change and so on. generally used to delete a record. This table contains mission critical data? There 1) We are can move the several table in the same ways or other tablespace to and rebuild do all the indexes this: on the table. To make deleted things permanently. you may rename the old procedure as old and then work on new and so on. This would guide if the expected indexes are used or not. what is the size of the change. In nutshell.Take a statistics of all valid and invalid objects and match. depending upon you can have the entire database backed up or just create a script for your original procedure before messing it up or you just do an ed and change the file back to original and reapply. 0 Yes 0 No Is This Answer Correct? 9 :: How to create repository in abinitio for stand alone system(LOCAL NT)? If you are trying to install the Ab -Initio on stand alone machine . . scenario can be varied and solution also can be varied. dependancies. Rollback command can be performed . It is faster than delete. what is the impact of the change. While installing It creates automatically for you under abinitio folder ( where you installing the Ab-Initio) If you are still not clear please ask your Question on the same portal . "commit" command should be used. If it is a new. used to delete tables or clusters. Delete: It is DML command. 0 Yes 0 No Is This Answer Correct? 12 :: Describe the process steps you would perform when defragmenting a data table. in order to retrieve the earlier deleted things. Since it is a DDL command hence it is auto commit and Rollback can't be performed. Periodically refresh 0 Yes 0 No Is This Answer Correct? 8 :: How would you find out whether a SQL query is using the indices you expect? Explain plan can be reviewed to check the execution plan of the query. is it a new procedure or replacing an existing and so on. then it is not necessary to create the repository . 11 :: Explain the difference between the truncate and delete commands? Truncate : It is a DDL command. few issues to keep in mind are synonyms. 0 Yes 0 No Is This Answer Correct? 17 :: What are Cartesian joins? A Cartesian join will get you a Cartesian product. 0 Yes 0 No Is This Answer Correct? 15 :: Why might you create a stored procedure with the with recompile option? Recompile is useful when the tables referenced by the stored proc undergoes a lot of modification/deletion/addition of data. 0 Yes 0 No Is This Answer Correct? Ab Initio Questions and Answers:  16 :: What is the purpose of having stored procedures in a database? Main Purpose of Stored Procedure for reduse the network trafic and all sql statement executing in cursor so speed too high. . these acts as directives to the optimizer 0 Yes 0 No Is This Answer Correct? 14 :: What is a cursor? Within a cursor.Implicit cursor is using for internal processing and Explicit cursor is using for user open for data required. You can also get one by joining every row of a table to every row of itself.There are two types of cursors like Implecit cursor and Explicit cursor. Due to the heavy modification activity the execute plan becomes outdated and hence the stored proc performance goes down. how would you update fields on the row just fetched? The oracle engine uses work areas for internal processing in order to the execute sql statement is called cursor. 0 Yes 0 No Is This Answer Correct? 13 :: How can you force the optimizer to use a particular index? Use hints /*+ <hint> */. truncate the table and import the dump back into the table.alter table analyze <table_name> table move table_name <tablespace_name> compute this activity statistics reclaims to the defragmented space the updated capture in the table statistics. 2)Reorg could be done by taking a dump of the table. the sql server wont cache a plan for this stored proc and it will be recompiled every time it is run. If we create the stored proc with recompile option. A Cartesian join is when you join every row of one table to every row of another table. Ab Initio Questions and Answers:  26 :: What is the Difference between DML Expression and XFR Expression? The DML main difference represent b/w format dml & of xfr the is that metadata. AI_MFS_WIDE_HOME etc.Wheras the primary key table is the parent table and foreignkey table is the child table. . initialise 2. AI_MFS_HOME. For each of the group. 1. Rollup is a multi-stage transform function and it contains the following mandatory functions. 0 Yes 0 No Is This Answer Correct? Ab Initio Questions and Answers:  21 :: How do you convert 4-way MFS to 8-way mfs? To convert 4 way to 8 way partition we need to change the layout in the partioning component. There will be seperate parameters for each and every type of partioning eg. followed by rollup function calls for each of the records in the group and finally calls the finalise function once at the end of last rollup call.0 Yes 0 No Is This Answer Correct? 18 :: What is an outer join? An outer join is used when one wants to select all the records from a port . The appropriate parameter need to be selected in the component layout for the type of partioning. rollup 3. AI_MFS_MEDIUM_HOME. first it does call the initialise function once. 0 Yes 0 No Is This Answer Correct? 20 :: Have you used rollup component? Describe how? If the user wants to group the records on particular field values then rollup is best way to do that.whether it has satisfied the join criteria or not.The criteria for both the tables is there should be a matching column. 0 Yes 0 No Is This Answer Correct? 19 :: What are primary keys and foreign keys? In RDBMS the relationship between the two tables is represented as Primary key and foreign key relationship. finalise Also need to declare one temporary variable if you want to get counts of a particular group. 0 Yes 0 No Is This Answer Correct? 30 :: What are differences between different GDE versions(1.1. .Whne ever a component is executed it will take that much memeory we specified for execution 0 Yes 0 No 0 Yes 0 No Is This Answer Correct? 28 :: What is the syntax of m_dump command? The genaral syntax is "m_dump metadata data [action] " Is This Answer Correct? 29 :: Can anyone give me an exaple of realtime start script in the graph? Here is a simple In example start to use a script export start script lets in a give as: $DT=`date Now this variable Now somewhere DT in will the have graph graph: '+%m%d%y'` today's transform date we before can the use graph this is run.1.12.XFR rules represent the tranform functions. which provides the value from the shell.10 is a non key version and rest are There are lot of components added and revised at following versions.10.11.13and 1. 0 Yes Is This Answer Correct? Ab Initio Questions and Answers:  0 No key versions.15)? What are differences between different versions of Co-op? 1.1.process_dt::$DT. variable as.which will contain 0 Yes business 0 No Is This Answer Correct? 27 :: How Does MAXCORE works? Maxcore is a value (it will be in Kb). out. combines it and sends it to all the output ports. .Your incoming flow to replicate has a data parallelism level of 2.dat> 0 Yes 0 No 0 Yes 0 No Is This Answer Correct? Is This Answer Correct? way. Eg . 0 Yes 0 No Is This Answer Correct? 35 :: What is m_dump? m_dump command prints the data in a formatted m_dump <dml> <file. one with 10 records & other with 20 records. but maintains the partition integrity. 0 Yes 0 No Is This Answer Correct? 33 :: What is BRODCASTING and REPLICATE? Broadcast - Takes data from multiple inputs.It replicates the data for a particular partition and send it out to multiple out ports of the component.bat file at ur host directory . 0 Yes 0 No Is This Answer Correct? 34 :: What is the importance of EME in abinitio? EME is a repository in Ab Inition and it used for checkin and checkout for graphs also maintains graph version. it create a . Now suppose you have 3 output flos from replicate.You have 2 incoming flows (This can be data parallelism or component parallelism) on Broadcast component.31 :: How to run the graph without GDE? In RUN ==> Deploy >> As script . Then each flow will have 2 data partitions with 10 & 20 records respectively.bat file from Command prompt 1 Yes 0 No Is This Answer Correct? 32 :: What is local and formal parameter? Two are graph level parameters but in local you need to initialize the value at the time of declaration where as globle no need to initialize the data it will promt at the time of running the graph for that parameter.and then run . Eg . with one partition having 10 recs & other one having 20 recs. Then on all the outgoing flows (it can be any number of flows) will have 10 + 20 = 30 records Replicate . when we need to produce summary then we use scan. 0 Yes 0 No Is This Answer Correct? 23 :: What is the latest version that is available in Ab-initio? The latest version of GDE ism1.There are two forms of AB_LOCAL() construct.i set all these? ChalapathiFirst we can check the properties in internet options and then u can check in cmd format telenet abinitio ip_add. You can use the ABLOCAL() construct in this case to prevent the Input Table component from parsing the SQL (it will get passed through to the database).Which we can make use in parallel unloads. 0 Yes 0 No Is This Answer Correct? Ab Initio Questions and Answers:  36 :: What is the difference between a Scan component and a RollUp component? Rollup is for group by and Scan is for successive total. The use of AB_LOCAL() construct is in Some complex SQL statements contain grammar that is not recognized by the Ab Initio parser when unloading in parallel. It also specifies which table to use for the parallel clause.ABLOCAL() is replaced by the contents of ablocal_expr. one with no arguments and one with single argument as a table name(driving table).15 AND Co>operating system is 2. Basically. .22 :: What is AB_LOCAL expression where do you use it in ab-initio? ablocal_expr is a parameter of itable component of Ab Initio. Rollup is used to aggregate data.14 0 Yes 1 No Is This Answer Correct? 24 :: What is $mpjret? Where it is used in ab-initio? You can use if $mpjret in endscript 0 like -eq($mpjret) then echo "success" else mailx -s "[graphname] failed" mailid 0 Yes 0 No Is This Answer Correct? 25 :: I am unable to connect sever database(oracle) from GDE(db config file) local system. But for some problems they provide solutions which have nothing to do with databases.in fact so much that they don't need any advertising. big databases. 200. Ab Initio never advertise themselves. This is wrong.and license the tools to provide those solutions. but as of today (20098) this description doesn't give a good honest representation of the company (in my opinion). or huge transactional or accounting system . So it is more a solutions company. skew is allways desriable. Ab Initio has excellent tools for ETL (Extract. 1 gb 100mb+200mb+300mb+5oomb) 250 )/500= --> -150/500 == calclu +ve files to ( 1000mb/4= (100- flow cal ur mb self it wil come in for value of -ve value. 0 Yes 0 No Is This Answer Correct? 38 :: How to get DML using Utilities in UNIX? If your source is a cobol copybook. Many IT people never heard of Ab Initio. Yes.0 Yes 0 No Is This Answer Correct? 37 :: What is skew and skew measurement? skew is suppose the i/p mesaureof is 1 comming data from 4 gb= 250 each and size is partation . and they don't sell software. very active web site.500. They sell solutions . You can read a short description on wikipedia. But if you have thousands of transactions per second. skew is a indericet measure of graph. If you are a small or medium client . not a software company. but the long term costs are reasonable. then we have a command in unix which generates the required in Ab Initio. here it is: cobol-to-dml. Why? Well. Most of those people who have heard about Ab Initio think about it as an ETL provider.Ab Initio is a savior. because Ab Initio only works with few clients who have extreme data processing problems. Ab Initio is not common.300. Transform. 0 Yes 0 No Is This Answer Correct? The Latin term ab initio means from the beginning . Load).Ab Initio is an overkill. They get lots of business by referral . In fact. . Their pricing model is a bit unusual. first. "Ab Initio Software LLC" is a company which excels in solving extreme data processing problems. in many situations they recommend to STOP using database at all for performance reasons. Second. patents.to define. etc. etc. define what they do and how...pdf. Stephen A. The company was formed by former employees of the Thinking Machines Corporation. They have very good talented devoted people. So not getting into details. and then to guide clients in using their tools.a very powerful desktop software. m_cp.org/wiki/Ab_Initio  http://www. The scripts will call misc. You can run the application right from the IDE. Shapiro. (some options) .bi-nerd. most of AI functionality can be scripted using several commands which you can give from prompt (with many options):  m_* commands ( for example. So your application is a graph. USA ..pdf  http://www. I've seen this tool generating powerful data processing application in less than 10 minutes.com  http://www. and run jobs  air . Stanfill. m_shutdown. US7047232. Some key people: Craig W.there is a 75% chance that you will speak with a Ph.com/Ab-Initio-SoftwareCorporation/Lexington/MA/301339/company/  http://www.pdf.since 1994).wikipedia. connect them. Kukolich. establish.D.abinitio.com/ab-initio-the-dark-horse-of-etl/  Patents: US6654907. or save it as a set of scripts (ksh for unix). Richard A.. so effectively you can drill deeply into the diagram. I've heard that when you are calling their customer service . US7167850. The libraries are written in C++. US7164422. You place components on the screen. Ab Initio also uses its own people as well as independent consulting firms to build proof of concept for a client. Unfortunately Ab Initio provides very little information about their solutions to general public. but they have offices all over the world (as you can see on their web site). ) are used for administering  mp ..linkedin. You can create components which consist of other components which consist of other components. http://en.pdf.to work with EME (basically a specialized version control system) The scripts can be easily integrated to work with external schedulers. Somewhere ~1997 Ab Initio has introduced Graphical Development Environment . Some of the key elements of the system:  "Co>Operating System"  "Component Library"  "Graphical Development Environment" (GDE)  "Enterprise Meta>Environment" (EME) . Massachusetts (near Boston.com/companies/ab-initio Ab Initio is a private company. (some options) . component libraries. It may very well be true. m_mkfs. its main offices are in Lexington. 1) Data Parallesim .is achieved via its "Co>Operating System" which provides the facilities for "parallel execution (multiple CPUs and/or multiple boxes). xfr three types of parallelism A new sandbox will have many directories: mp. and high speed data loading and unloading tasks. and process monitoring.xfr"). multi-file. multidirectory. Temporary files created during a phase will be deleted after its completion. 2) Componnent Paralelism (execute simultaneously on different branches of . such as check-pointing. You can have phase breaks with or without checkpoints.are used to break the graph into pieces. These are points where s everything is written to disk. CPU. xfr. Ab Initio tools incorporate best practices. . "Data Profiler"  "Conduct>It" Main power of Ab Initio . Phases vs Checkpoint Checkpoints . . rerunnability.and rerun from it. memory). Usually XFR stores mapping. tagging everything with unique Ids. etc.com/Interview-Questions/Data-Warehouse/Abinitio 1 Question Answer ============================================== ============ Phases . Phases are used to effectively separately manage resource-consuming (memory. db.data (partitionning of data into parallel streams for parallel processing). disk) parts of the application. xfr is a directory where you put files with extension .. dml.created for recovery purposes.a set of software modules to perform sorting.xfr containing your own custom functions (and then use : include "somepath/xfr/yourfile. check pointing. Unfortunately Ab Initio doesn't advertise or publish any information.. A lot of attention is devoted to monitoring resources (CPU. platform independent data transport. You can recover to the latest saved point . So there are just bits and pieces here and there. data transforming.geekinterview. Component Library .parallelism . Here is an interesting blog:  http://www. and then join by "A. then the output depends on the keep parameter.B.remove the multifile m_cp .C) .to add more directories to existing directory structure Memory requireme nts of a graph How to calculate a SUM  Each partition of a component uses: ~ 8 MB + max-core (if any)  Add size of lookup files used in phase (if multiple components use same lookup only count it once)  Multiply by degree of parallelism. file1 (A. Multi-File System MFS m_mkfs . We partition both files by "A". IS it OK? Or should we partition by "A.. that is how much memory is used in that phase.D). dedup sort with null key join on partitioned  first .there will be no records in the output file. file2 (A.B.list all the multifiles m_rm . ..B" ? Not clear. Add up all components in a phase.only last record  unique_only . mpfileN) m_ls .create a multifile (m_mkfs ctrlfile mpfile1 .  Select the largest-memory phase in the graph SCAN ROLLUP SCANWITHROLLUP Scan followed by Dedup sort and select the last If we don't use any key in the sort component while using the dedup sort.copy a multifile m_mkdir .the graph) 3) Pipeline (sequential).B".only the first record  last . or use environmental variable. Once in your sandbox .removing partitionning (gather an merge component) . You can use the ouput of one graph as input for another. repartitionin g. You can also copy/paste the contents between graphs.  departitioning . How to get records 50-75 out of 100  use scan and filter  m_dump <dml> <mfs file> -start 50 -end 75  use next_in_sequence() function and filter by expression component (next_in_sequence() >50 && next_in_sequence() <75) Hot to convert a serial file into FFS create MFS. mfs) into multiple flows.dividing a single flow of records(serial file. connecting serial flow directly to mfs flow without using a partition component) merging graphs You can not merge two ab initio graphs. sandbox parameter s When you check out a project into your sandbox .you get project parameters. checkout You can do checkin/checkout using the wizard right from the GDE using versions and tags how to have different passwords for QA and production parameterize the .plan partitionin g. BadStraightflow error you get when connecting mismatching components (for example. departition ing  partitioning . See also about using . then use partition component project parameter s vs.dbc file .flow checkin.you can refer to them as sandbox parameters. special public project that exists in every Ab Initio environment. (built-in functions like sum count avg min max product. config info.). GDE. Ab Initio has series of air commands to manipulate repository objects. recommended to use instead of Agregate. For example. only 1 of which can run at any given time) Continuou Continuous components . For example. extended. version control.change the number of partitions (eg...produce useful output file while running s continously. Environme nt project Environment project . Functions (repository. This is where you checkin/checkout. source and target info: graph dml xfr ksh sql. dependency analysis). etc. It contains all the environment parameters required by the private or public projects which constitute AI Standard Environment.newer. A well fenced graph means no matter what is source data volume process will not cough in dead locks.. It is on the server side and holds all the projects (metadata of transformations. re-partitioning . .changing a priority of a job Phasing . limiting the number of simultaneous processes (by breaking the graph into phases. In AI it actually refers to customized phase breaking. It also helps in dependency analysis of codes. from 2 to 4 flows) lookup file for large amounts of data use MFS lookup file (instead of serial) indexing No indexes as such.)  EME = Enterprise Metdata Environment.managing the resources to avoid deadlocks.old component Rollup . statistical analysis. Cooperating sytem fencing means job controlling on priority basis. Continuous update batch component subscribe s . But there is an "output indexing" using reformat and doing necessary coding in transform part. fencing Fencing . /Project dir of EME contains common directories for all application sandboxes connected to it. Continuous rollup. It actually limits the number of simultaneous processes.  GDE = Graphical Devlopment Environment (on the client box)  Co-operating sytem = Ab Initio server installed on top of native (unix) os on the server EME. Aggregate vs Rollup Aggregate . rollup scan normalize and denormalize sorted. AI_HOME.where co>operating system is installed  AB_AIR_ROOT . environm ent  AB_HOME . 5. 3. AI_SERIAL.  from unix prompt: env | grep AI wrapper script unix script to run graphs multistag e compone nt A multistage component is a component which transforms input records in 5 stages (1. .partition component (increase parallelism)  fan in departition component (decrease parallelism) lock a user can lock the graph for editing so that others will see the message and can not edit the same graph.2 Question Answer =============================================== =========== deadlock Deadlock is when two or more processes are requesting the same resource. fan out  fan out .temporary initialization. fan in. output selection. So it is a transform component which has packages. in that case we can use flag in the dml and the flag is first read in the input file recieved and according to the flag its corresponding dml is used. For large files use join. AI_MFS. Examples: scan Normalize and Denormalize. To avoid use phasing and resource pooling. etc. You may need to increase the maxcore limit to handle big joins. Example: at different time different input files are recieved for processing which have different dml. 2.processing. join vs lookup Lookup is good for spped for small files (will load whole file in memory). Dynamic DML Dynamic DML is used if the input metadata can change.input select.finalize).default location for EME datastore  sandboxes standard environment  AI_SORT_MAX_CORE. 4.  We can use Autosys. In fact. or we can create a wrapper script and put there several sequential commands (nohup command1. Whereas the smallest can have "Sorted-Input" parameter be set to "Input need not be sorted" because it will be loaded completely in memory. you can even write SP in Ab Initio. Frequentl y used functions string_ltrim. lookup_match. string_lrtrim. or any other external scheduler.uses SQL. Ab Initio vs Ab Initio benefits: parallelism built in.) one of the ports is used as "driving (by default . mulitifile system.. lookup_count. etc). in1. is_blank.in0). Control-M. is_defined driving port When joining inputs (in0. handles huge . is_null. today().. string_substring.it treats each input record as a completely separate piece of work. For example. lookup file Calling stored proc in DB You can call stored proc (for example.  We can take care of dependencies in many ways. Driving input is usually the largest one. scheduler Api and Utility modes in input table These are database interfaces (api . nohup command2.  Lookups are always used with combination of the reformat components.ksh &.bulk loads. now() data validation is_valid. lookup_next. We can even create a special graph in Ab Initio to execute individual scripts as needed. Functions: lookup. reinterpret_as. we can arrange for this in Autosys. utility . . if scripts should run sequentially. from input component).ksh & .multi update multi update executes SQL statements . lookup_local. Make it "with recompile" to assure good performance. whatever vendor provides)  lookup file component. You can also use force_error() function in transform function.before spilling to disk. Setting it too high may degrade the performance because of OS swapping and degrading of the performance of other components.will see each component's CPU and memory usage. sort 100 MBytes) specifies the amount of memory used by a component (like Sort or Rollup) . override key override key option is used when we need to join 2 fields which have different field names.amounts of data. You may need to declare you parameter scope s as formal. graph > select parameters tab > click "create" .and create a parameter. The scripts can be easily scheduled using any external scheduler .and easily integrated with other systems. etc. Input Usage: $paramname. easy to build and run. These parameters will be Parameter substituted during run time. Edit > parameters. Usually you don't need to change it . Error captures corresponding error. assign keys Easy and saves development time. You can control reject status of each component by setting reject threshold to either Never Abort. 3 Question Answer ============================================ ============== How to see resource usage In GDE goto options View > Tracking Details . Ab Initio doesn't require a dedicated administrator. Abort on first reject. control file control file should be in the multifile directory (contains the addresses of the serial files) max-core max-core parameter (for example. Need to understand how to feed . and log ports. Reject captures rejected records. Informati ca for ETL Ab Initio doesn't have built-in CDC capabilities (CDC = Change Data Capture). error.just use default value.per partition . Ab Initio provides immediate metrics for each component. and log captures the execution statistics of the component. or setting ramp/limit. Generates scripts which can be easily modified as needed )if something couldn't be done in ETL tool itself). Ab Initio allows to (attach error / reject files) to each transformation and capture and analyze the message and data separately (as opposed to Informatica which has just one huge log). Error Trapping Each component has reject. database configuration file (dbname. nodes. connection method).. Parameter showing how data is unevenly distributed between partitions..dbc . extract all data .  Scenario 1 (preferred): we run query which joins 2 tables in DB and gives us the result in just 1 DB component. not recommended if number of records is big.  Use Multi-file system (MFS). . for example.size)* 100 / (size of the largest partition) .and then join in Ab Initio. and you can't control it easily.component Join in DB vs join in Ab Initio Join with DB parameters. remote connection config (name of remote server. unused records . Data Skew skew = (partition size ..cfg file resides in the config dir. location of OS on remote machine. Roundrobin partitionning gives good balance. depth not equal data format error etc. compilation errors depth error : we get this error.  Use Ad Hoc MFS to read many serial files in parallel.and join them in Ab Initio.part.avg. and use . It is better to retrieve the data out .to the unused port tuning performance  Go parallel using partitionning.  Scenario 2 (much slower): we use 2 database components.any tyoe of config file. user/pwd to connect to db. version user/pwd) resides in the db directory dbc vs cfg . used records go to the output port. when two components connected together but does't match there layout types of partitions broadcast pbyexpression pbyroundrobin pbykey pwithloadbalance unused port when joining.cfg . Another example .  Using phase breaks let you allocate more memory in individual components . Repartition instead. do necessary selection / aggregation / sorting in the database before getting data into Ab Initio.do not switch it to serial and back.  when getting data from database . For example. etc. Ideally do it in the source (database ?) before you get the data. you can implement the same function in reformat/Join/Rollup. then it will drop all the inputs to disk and in-memory does not make sence.).  Use checkpoint after sort to land data on disk  Use Join and rollup in-memory feature  When joining very small dataset to a very large dataset it is more efficient to broadcast the small dataset to MFS using .when joining data from 2 files. than to do a sort before a partition.concat component.  Note . If possible.  try to avoid using a join with the "db" component.  Use rollup and Filter as soon as possible to reduce number of records.make sure your queries are fast (use indexes.  tune Max_core for Optimal performance (for sort depends on the size of the input file). use union function instead of adding an additional component for removing duplicates.If in-memory join cannot fit its non-driving inputs in the provided MAX-CORE.use FTP instead  use lookup local rather than lookup (especially for big lookups).  it is faster to do a sort after a partitino.  Remove unnecessary components.thus improving performance.  Do not acceess large filess via NFS . instead of using filter by exp.  Once data is partitionned .  use gather instead of concatenate. use output indexes parameter in the first Reformat component and mention the condition there. and if possible replace them by in-memory join/hash join. use Reformat rather than Broadcast component. But for large dataset don't use broadcast as a partitioner.a table on tp of which we create a view delta table scan vs rollup rollup . Use "Sort within Groups" instead of just Sort when data was already presorted.performs aggregate calculations on groups.  Use Ab Initio layout instead of database default to achieve parallel loads  Change AB_REPORT parameter to increased monitoring duration  Use catalogs for reusability  Components like join/ rollup should have the option "Input must be sorted" if they are placed after a sort component.  minimize number of sort components. If no order is required then it is preferable to use Gather component.broadcast component. When splitting records into more than two flows. Minimize usage of sorted join component.  Instead of putting many Reformat components consecutively.calculates cumulative totals . or use the small file as lookup.  For joining records from 2 flows use Concatenate component ONLY when there is a need to follow some specific order in joining records.  Master (or base) table .  Delta table maintain the sequencer of each data table.  Use phasing/flow buffers in case of merge sorted joins  Minimize the use of regular expression functions like re_index in the transfer functions  Avoid repartitioning of data unnecessarily. Use only required fields in the sort reformat join components. scan . Also in your graph in your "Filter by expression" Component enter following condition: $FilterCondition Now on your command line or in wrapper script give the following command YourGraphname. fi ------------------------------------#!/bin/ksh #Running the set up script on enviornment typeset PROJ_DIR $(cd $(dirname $0)/. then INPUT_FILE_PARAMETER_1 $1 INPUT_FILE_PARAMETER_2 $2 # This grpah is using the input file cd $AI_RUN . pwd) ../my_graph2.ksh $PROJ_DIR #Exporting the script parameter1 to INPUT_FILE_NAME if [ $# -ne 2 ]. and you want it to do filtering on COUNT > 0 . else echo Insufficient parameters exit 1. .packages Reformat vs "Redefine Format" Conditional DML SORTWITHING ROUP passing a condition as a parameter Passing file name as a parameter used in multistage components or transform components  Reformat .ksh $INPUT_FILE_PARAMETER_2 exit 0. you call it FilterCondition.rename fields DML which is separated based on a condition  The prerequisit for using sortwithingroup is that the data is already sorted by the major key. It is like an implicit phase.deriving new data by adding/dropping fields  Redefine format . pwd) ./my_graph1... sortwithingroup outputs the data once it has finished reading the major key group.ksh $INPUT_FILE_PARAMETER_1 # This graph also is using the input file.ksh -FilterCondition COUNT > 0 #!/bin/ksh #Running the set up script on enviornment typeset PROJ_DIR $(cd $(dirname $0)/. For example.. $PROJ_DIR/ab_project_setup. Define a Formal Keyword Parameter of type string. ksh exit 0.ksh # This graph also is using the input file./my_graph2. you can use next_in_sequence() function in your transform. 4 Question Surrogate key Answer =============================================== =========== There are many ways to create a surrogate key. Vector A vector is simply an array.)  first method: in GDE go to RUN > Execute Command .. then do something like this: . including a vector or a record). Or you can use "Assign key values" component.ksh $PROJ_DIR #Exporting the script parameter1 to INPUT_FILE_NAME export INPUT_FILE_NAME $1 # This grpah is using the input file cd $AI_RUN . Note: if you use partitions.. For example. . that is where does the data come from what applications prodeuce and depend on this data etc.and call it. and in ports tab double-click on partitions . Dependency Analysis Dependency analysis will answer the questions regarding datalinage./my_graph1. How to remove header and trailer lines? How to create a multi file system on Windows use conditional dml where you can separate detail from header and trailer. It is an ordered set of elements of the same type (type can be any type. Or you can write a stored procedure .there you can enter the number of partitions. $PROJ_DIR/ab_project_setup.and run m_mkfs c:control c:dp1 c:dp2 c:dp3 c:dp4  second method: double-click on the file component. For validations use reformat with count :3 (out0:header out1:detail out2:trailer. login info (id.profile your ksh init file ( environment. etc. data modelling Hwo to execute the graph From GDE .local and formal.14.and post-processes.see the results. configuration variables (AB_WORK_DIR. aliases. . . etc. The layout is defined by the location of the file (or a control file for the multifile).whole graph or by phases. encrypted password). triggers. Co-operative system ver 2.in user's home directory and in $AB_HOME/Config. For example.6. The EME Datastorecontains all versions of the code that have been checked into it (source control). In the graph the layout can propagate automatically (for multifile you have to provide details). AB_DATA_DIR. history file settings. Also using ksh scripts Write Multiplefil es A component which allows to write simultaneously into multiple local files Testing Run the graph .(next_in_sequence()-1)*no_of_partition()+this_partition() .). Use components from Validate category. for data . command prompt settings. etc. etc. Sandbox vs EME Sandbox is your private area where you develop and test.).1.serial or partitioned (multi-file). They can be of 2 types . Also you can specify methods to run on success or on failure of the graphs. From checkpoint. login methods for hosts for execution (like EME host. Graph paramete rs menu edit > parameters . Plan>It You can define pre.allows you to specify private parameters for the graph.) data mapping.15. It sets abinitio home path.abinitiorc This is a config file for ab initio . Only one project and one version can be in the sandbox at any time. Layout Where the data-files are and where the components are running. path variables. Latest versions April 2009: GDE ver. all indexes will be updated. all contraints will be checked.and then return error or success codes back. This is basically an Oracle question . Direct load . Co>op system will execute the scripts on different machines (using specified host settings and connection methods. indexes may be disabled . semi-join abinitio online help gives 3 examples of joins: inner join.data is written directly block by block.Frequentl y used compone nts running on hosts conventio nal loading vs direct loading  input file / output file  input table / output table  lookup / lookup_local  reformat  gather / concatenate  join  runsql  join with db  compression components  filter by expression  sort (single or multiple keys)  rollup  trash  partition by expression / partition by key co>operating system is layered on top of native OS (unix). Can load into specific partition.regarding SQLLDR (SQL Loader) utility. Conventional load . When running from GDE. Some constraints are checked. outer join.need to specify native options to skip index maintenance. like rexec telnet rsh rlogin) . GDE generates a script (according to "run" setings).using insert statements. and . All triggers will fire. semi join.  for semi join it is true for both port (like InnerJoin). but the dedup option is set only on one side .  for inner join 'record_requiredN' parameter is true for all "in" ports.  for outer join it is false for all the "in" ports.

Comments

Description