Analyst Interview Questions - AMAZON - Free Download PDF Ebook

Analyst Interview QuestionsHiringTheBest HiringAnalysts 3 significant areas to cover. Assumption is that questions in these areas will provide data to assess leadership, culture fit, & communication skills First: Business Process Assessment The candidate should be able to assess problem/opportunities in a "case" study method. Second: Technical Depth The candidate will need to retrieve, manipulate, & evaluate large sets of data efficiently Third: Self-directed/Leadership Will this candidate look for business opportunities with passion? 1. Business Process Assessment Quick Questions What are you reading currently? What has influenced your business behavior most heavily in the last year? What kind of process or project mgmt training have you had? What do you think of (five forces, rational, UML, competitive advantage, ted levitt's criticism of the "product lifecycle", Six Sigma) How do you get to root cause for an issue such as Looking for: Evidence that the person is growing, stretching in a direction that is successful at Amazon. Is the candidate reading the learning organization, the innovators dilemma or who moved my cheese. Do they read the Economist, Mit Tech review, HBR or Newsweek? Do they recognize process terms? Longer Questions These should be of 2 types - simple give me an equation for situation and then more vague case types. Simple Equations Profitability = Revenue - Costs Need Inventory = OH -Demand (bonus if time phased & includes forecasts, intransits) Predicted OH Inventory = OH +Intransits + Pos - Demand -Forecast Healthy Inventory = Start just asking for a simple definition, then start discussing factors with the candidate. For example in profitability, what factors go into costs? Looking for thoughtfulness & testing of assumptions. How does the candidate think through the question - systematically or ad-hoc Cases: Skip suggested to preface questions of this type with "there is no right answer, I want to use this as an example to see how you approach a problem" <insert Nimrod, Janice, Skip questions here> 2. Technical Assessment Questions Quick questions: From a list of orders over the last week using the tool of your choice 1. rank the orders by quantity 2. avg quantity for each vendor 3. # of distinct vendors per week. 4. Find & count lines in a log file that have a specific ASIN or user id (In an onsite, could have a data file on a laptop and say show me?.) Looking for: Unix - cut cat find grep sort Excel/Access : basic functions, pivot tables, data structures, domain tables SQL - nested queries, functions, basic joins Perl - any scripting?, RegExp Reference - does the candidate know how to find help, admit boundaries Longer SQL/Unix Questions 1. I need to provide a report with: 2. -the total units & the average cost of book orders by day of week over the last 10 weeks by country A good answer will look something like Select sum(quantity), avg(cost), product, to_char(date,'DY'), country from ( select quantity, (quantity * cost) as cost, to_char(order_day,'dy'), product, country from order_items where product = books and order_day between x & now ) The candidate should recognize -cost must be calculated on an item basis before averaging nested or inline query -sql functions exist for total, average, & date manipulation  For extra credit add a join - such as book name 1. "Say there's a text file of the form "userid-tab-command" that tracks all the commands that a given user runs. How would you find out how many times user "Bob" has run any command at all?" A Good Answer: At its most basic: "grep -c Bob filename" or "cat filename | grep Bob". If they understand that "Bob" could be part of the command, then the correct grep is actually: grep -c "^Bob" to anchor the user. Even better, so in case there's a user called "Bob" and another called "BobH", they should do: grep -c "^Bob<tab>". 3. Self-directed/Leadership Assessing these behaviors may occur throughout the questions of other areas -Look for an opportunity to challenge the candidate on something that is obviously right or true. -Do they hold their ground? -Do they get angry or look to understand why they're being challenged? -What does the resume say about the candidate? -did they found anything, start anything, volunteer on something huge, Is this person an AutoDidact? Hiring Analysts back to HiringTheBest Companies that hire lots of analysts have the process down to a science, just as Amazon does for SDEs. The interview process at a big 5 consulting firm is very defined, behaviorally focused, looking at capabilities. An Analyst is a unique creature but not impossible to find & assess. During the interview loop in addition to culture fit & interpersonal skills, a candidate should be reviewed on how they've displayed analyst type competencies in the past AND solve a problem to display the competency in actuality. Analysts are usually good presenters, just asking them about the past may not display the limits of their abilities. Here are a couple frameworks of core Analyst Competencies: --1. Think broad and deep: can take the big picture strategic business view and can also dive into the details to understand a problem 2. Problem solving skills: can they structure and frame a problem, make estimates when necessary, figure out the dataset needed (smallest, easiest dataset to draw solid conclusions), get and analyze the data, summarize the conclusions and their reasoning 3. Communication skills: clear, organized, concise, ability to adapt to audience (VP to SDE), think on the fly, thoughtful 4. Multi-tasking: can they juggle many issues at one time? 5. Independence: ability to work with minimal direction and ask for help when needed 6. Customer focus 7. Cultural fit: Team (COFS, SCOS,...) and Amazon 8. Leadership --Find, Frame, Analyze, & Deliver within Amazon Find Problems/Opportunities An analyst should be able to recognize broken processes, bad processes, troubleshoot processes. But also prioritize whether the proposal is polishing a pig or creating a golden cow. Building pretty toys with no ROI is a waste of time. Given the business maturity at Amazon, there are a lot of process improvements or new businesses where money can be saved/found. .-What should amazon add to its site to deliver more on the "Find Discover & Buy anything online" -What is different for Amazon over Blockbuster video? -What impacts to supply chain & customer experience would be felt by adding an Amazon Air Travel Store -Given factors x. look into the crystal ball about the future. in an hour This competency is a display of business competency .... and draw a cool looking formula to make you believe it. Drive into specifics Problem Solving Provide a problem for them to solve .) -describe the tool(s) with which they've worked (AMPL. how can this be used in Supply Chain? -How many customers does a 2% damage rate to the top 10 best selling items at the top 4 FCs impact? .does the candidate see the big picture or get wrapped up into their project? Potential Skills -Process mapping: Can a candidate draw out a level1. vice president. financial. ) Can they identify the opportunity (competency 1) then define and model it here? Past Example A candidate should be able to explain past projects where they: -developed a model (forecasting.how would you calculate ROI on project Q to present to a sr...y. S&OP --Frame Model & Hypothesis The analyst is part wizard. an Ops Research will need to demonstrate different skills than an MBA type or a Supply Chain type. DRP.Past Example A candidate should be able to point to past projects where they: -worked as support -Saved x USD. part math professor. They are called in to explain the past..confirm time for n asins * min fifo cost layer. then ask what upstream/downstream business impacts occurred after the change occurred. an Amazon analyst will need to be able to stand at a whiteboard and draw some algebraic looking formula (sum of receive time . Drive into specifics Problem Solving Provide a problem for them to solve ..Excel. But at the end of the day.tweak it for ecommerce -Why do split shipments matter? -How would you build a forecasting model for new products with no history? -What data does Amazon have that is unique. Acceptable skills vary here.Pkg Software) Have them explain their role. Forecasting. n Minutes as a result of a process change Have them explain their role.z .2.3 diagram? Understand ICOMs or do they move into systems & dataflow -Basic Business Measures: ROI. spreadsheet. no-one is going to go get data for you. The key elements here are abilities to: *Retrieve Data *Evaluate Data Quality *Data Scale So an analyst has found a good opportunity. then get the data themselves or negotiating for SDE time.does the candidate set assumptions. The tools on hand will be limited or perhaps not available. Since SDE time is money. and display the ability to draft a reasonable model? Could they build a metrics package? Potential Skills -Modeling: Can a candidate draw out a forecast equation. why average.e. Drive into specifics Problem Solving Provide a problem for them to solve . linear programming -Advanced Business Measures: Time Value of Money --Analyze Once a candidate has built a model.This competency is a display of analytic skills . but how will the control be built ongoing? Past Example A candidate should be able to explain past projects where they: -built a tool or heavily configured software -what were the shortcomings? how did they drive through their weaknesses -What data gathering tools were used. this is usually the less preferred choice. Follow up with "What decisions could I do with that data? -SDE design questions are good here too This competency is a display of technical skills & business skills . why aggregated at all. average cost of orders by product line over the last 15 weeks A great candidate should question the assumptions . Write a query from a order items table that results in average # of orders.Could the candidate analyze a data set with 2million rows? What conclusions do they draw from the results Potential Skills -SQL -Design --Deliver .why 15 wks. To succeed the analyst will need to identify and evaluate a data source.joins. determined how to quantify it.tweak it for ecommerce -If SQL is a listed skill ask for a query that tests aggregation. how big was the data set Have them explain their role. & business definitions i. functions. challenge the definitions. Could you prepare an outline of your answer. Cleveland) or do they LOVE powerpoint? At Amazon.Once the analysis is completed. in 15 minutes -How did you get your points across in your allotted 10 minutes of executive time? -What data presentation tools were used? Problem Solving Provide a problem for them to solve .tweak it for ecommerce -"You have 15 minutes tomorrow afternoon to report back to a VP about a question he asked you today regarding specific metric accuracy . W.1 DW Concepts . email.will the work stand up to scrutiny? Past Example A candidate should be able to explain past projects where they: -presented results in detail. 3 ring binder.powerpoint. is it just a report on a shelf? What changed? Were cost reductions actually realized? What form did the analysis results take . how would you followup on your recommendations?" Look for creativity Potential Skills -Creativity -Effective communication -a get it done attitude Data Engineer Interview Questions Contents [hide]  1 Sample Interview Questions for Data Engineering Candidates o 1. whitepaper? Who saw them and what did they do? Is the candidate aware of good visualization guidelines (Tufte. Analysts often present their own results . what format would it be in. Allows business entities to map directly with schema design for highly optimized performance when querying.6 Linux/Unix o 1. . 'Version' or 'Effective Date' are common ways to allow unlimited history preserved with each update/record. What are the key differences in their implementation 1. III). It is widely supported by a number of BI tools.9. Type I SCD's are dimensions where old data is overwritten with new data and no historical data is kept.9.7 Teradata o 1.o 1. 3.10 Reporting Specific Interview Questions [edit][hide] Sample Interview Questions for Data Engineering Candidates [edit][hide] DW Concepts  What the advantages of star schema design 1. It is the simplest data warehouse schema.1 Oracle  1. 2.4 Oracle o 1.9 Additional Questions for DEIII (Level 6) Bar  1.2 Architecture and design o 1.  Can you provide the different types of slow changing dimensions (Type I.8 Data Modeling o 1.3 SQL o 1.5 ETL o 1.2 Tuning o 1. Type II SCD's are dimensions where multiple records are kept to track historical data. 2. II. Type III SCD's are dimensions where a limited amount of history is preserved by using seperate columns.  Another example of a semi-additive fact is Local Net Revenue.3.  What are the difficulties in implementing a Type II dimension table o  Given a type II dimension table having a 32bit guid as the natural key. Depending on the relationship between the dimension and fact table. how would you design the fact tables to support both point in time as well as current hierarchy reporting o  ANS: Create a 'bridge' table to collect and assign keys to all unique combinations of the GUID and timestamp/level. StoreId . DateofJoin. SalesRep_Name. but not for a period of time. 1. Product_Price 2. give some examples o  ANS: When new records are created to represent changes in a dimension table. Launch_Dt. Store table – Store_Id. What are semi-additive facts. This semi-additive fact would need to be (Local Net Revenue * Conversion Rate) to be a fully additive fact. Product table – Product_Id. Product_Name. 'Original' or 'Previous' columns for another column. are common was to track a limited number of changes. Store_Name. Local Net Revenue can only be aggregated in the context of the local currency to give an accurate calculation. whereas for Profit Margin. Current Balance is a semi-additive fact because it makes sense to add the Current Balance for all accounts at one point in time. the fact table may not capture all relevant dimensions when being queried. but may not be logical for others. ANS:Semi-additive facts are facts that can be aggregated for some dimensions. and store the unique bridge key in the fact table. Launch_Dt 3. the relationships between the fact tables and common keys can become inconsistent and lead to inaccurate results.  An example of this is Current Balance and Profit Margin. Take a source schema below. SalesRep table – SalesRep_Id. you may want both. denormalization. less duplicate. Item_Id. maintenance. o  How would you maintain the history of Product_Price? o  ANS: OLTP's records transactions in real-time and aims to automate clerical data entry processes of a business entity. Amount 1. volume. Order Items table – Order_Id. Incremental/Refresh load refers to populating tables with records that were not already in the tables. What is Full/Initial load & Incremental/Refresh load? o  ANS: Integer or Numeric What is the difference between OLTP and DW systems? o  ANS: Type 2 SCD would solve the problem What is the data type of the surrogate key o  ANS : Factors like storage. SalesRep_Id. DW systems are a storage space of current and historical data extracted from external sources for aggregation and analytical querying. backfill contribute to decision of design of fact table in such scenario. Orders table – Order_Id. Total Quantity 5. Order_Date. ANS: Initial Load is when you are populating tables in the DW schema for the first time. What is a staging area? Do we need it? What is the purpose of a staging area? o ANS: A staging area is intermediate storage space between external sources and the DW. whether the first time or to overwrite data.4. Quantity. Total Amount. Store_Id. Yes we need staging areas. Full load refers to populating the entire table. How would you approach building the DW schema for the above model? 2. The purpose of staging areas are: . ANS: Star schema or Snowflake Schema or Same model as source  What kind of factor should be considered while build fact table? Would merging a Order table and Order Item table make more sense or not. 1. You have an Item Orders fact table? Will you store the Product group of the item in the fact? If so why? Else why not? o  ANS: A data mart is a subset of the DW. The main difference is in a Star schema. Pull ETL is when ETL tool requests/retrieves data from source. ANS: Push and pull ETL strategies refer to the way in which data is transferred from source to ETL tool. What is Star. without affecting other data marts or the DW. one dimension could have a subset of dimensions that are related to the fact tables. Use data to compare against current datasets within DW 4. In a Snowflake schema. For pre-joining and aggregating data as well as 'data cleansing'  How to determine what records to extract in Incremental/Refresh load? o  What is a data mart? o  ANS: I would store the foreign key of the Product Group dimension table in the Item Orders facts table. Push ETL is when external source sends data to ETL tool. What are push and pull ETL strategies? o  ANS: The Star schema consists of one or more Fact tables relating to any number of Dimension tables. Gathering data from different sources for transforming at different times 2. one dimension would have only one table. What does level of granularity in a fact table mean? . usually as a seperate schema so a specific business unit/group can modify and maintain the data within it. Snow Flake Schema? What is the difference? o  ANS: By using Type II and Type III SCD's as a common key to determine the most recent or newly creatd records. so that it could be used for pivoting and reporting on Product Groups. OLTP's can quickly offload data when in need of free space 3. The Snowflake schema is represented by centralized Fact tables related to dimensions on multiple levels. o SELECT o TO_CHAR(ORDER_DAY. Order_items table (item_id.order_day) table.'MONTH') . nested loop) and when each should be used  Different types of joins and when each should be used [edit][hide] SQL DE 1 bar is questions 1-3 1. SUM(QTY) AS MONTHLY_SUM o FROM ORDER_ITEMS o GROUP BY TO_CHAR(ORDER_DAY. count(*) of orders last week o SELECT COUNT(ORDER_ID) AS NUM_OF_ORDERS o FROM ORDERS o WHERE ORDER_DAY BETWEEN TRUNC(SYSDATE.DAY)-7 AND TRUNC(SYSDATE.. [edit][hide] Tuning If you have a poorly performing report/etl process.  'what if you didn't have indexes'  What about partitioning.  explain plans .. how would you investigate and tune it going all the way back to table design.'MONTH') AS MONTH o .  What about the oracle level join types (hash.when tuning what do you look for in an explain plan that screams red flags.o ANS: The level of granularity in a fact table refers to the detail and precision at which a fact is captured within a given context. order_id.DAY) 2.. Given an orders (order_id.. sum of qty by month.. qty). 'MONTH') AS MONTH.  COUNT(CASE WHEN BINDING = 'DVD' THEN 1 ELSE 0 END) AS NUM_OF_DVDS. SUM(QTY) AS MONTHLY_SUM  FROM ORDER_ITEMS  GROUP BY TO_CHAR(ORDER_DAY.. Order_items table (item_id. value). sum of qty by month when more than 50.'MONTH') o HAVING SUM(QTY) > 50 4. Pivot: o o using the data from #3. SUM(QTY) AS MONTHLY_SUM o FROM ORDER_ITEMS o GROUP BY TO_CHAR(ORDER_DAY. .'MONTH') AS MONTH.'MONTH')  HAVING SUM(QTY) > 50) given item_properties (asin. qty). give me the data with the Months as columns instead of rows or  SELECT  CASE WHEN MONTH = 1 THEN MONTHLY_SUM END AS JAN  CASE WHEN MONTH = 2 THEN MONTHLY_SUM END AS FEB  CASE WHEN MONTH = 3 THEN MONTHLY_SUM END AS MAR  FROM  (SELECT TO_CHAR(ORDER_DAY.3.. order_id. binding. provide sql that gives 1 row per asin  SELECT  ASIN. o SELECT TO_CHAR(ORDER_DAY. o SELECT ORDER_ID. orders (order_id. Given an orders table (order_id. C1. FIRST(BILLING_ADDRESS_ID) o FROM ORDERS o Query for customer that they bought a year ago and yesterday. Customer_id. SUM(ORDER_QTY) OVER (ORDER BY ORDER_DAY) AS RUNNING_TOTAL o FROM ORDERS o What are the differences between aggregates and analytic functions. o Create a query that the result set contains a running total. order_day. 01-Jan-2000 o O2. 01-Jan-2002 o O3. customer_id and order_date with the sample data o Order_id. Given an orders table with order_id. 01-Apr-2003 . order_day. ORDER_QTY. C4. customer_id). 01-Apr-2002 o O4. order_date o O1. Provide the last billing address every customer used. and how does oracle handle them differently o ANS: Aggregate functions returns one result per each group of the result set. C3. qty): running sum total on day. SUM(VALUE) AS ITEM_VALUE. Where as analytical functions returns multiple results per each group i. C2. billing_address_id..e.  FROM ORDER_ITEMS  GROUP BY ASIN 2.3  o SELECT CUSTOMER_ID. Example table. using analytical functions we may display group results along with individual rows. o SELECT o COUNT(DISTINCT CASE WHEN TO_CHAR(order_date.   o O5. 01-May-2006 Give SQL for the list of customer_ids who placed more than 1 order o SELECT Customer.'YYYY') = 2001 THEN 1 ELSE 0 END) AS 2001 o COUNT(DISTINCT CASE WHEN TO_CHAR(order_date.C1.'YYYY') = 2000 THEN 1 ELSE 0 END) AS 2000 o COUNT(DISTINCT CASE WHEN TO_CHAR(order_date.'YYYY') = 2006 THEN 1 ELSE 0 END) AS 2006 . 2000 to 2006.'YYYY') = 2006)) Please write a sql which can generate the number of Orders for each year. COUNT(OrderID) FROM Orders o GROUP BY Customer o HAVING Count(OrderID) > 1 Give the Sql for the list of customer_ids who have placed at least 1 order in 2000 and at least 1 order in 2006. C4.'YYYY') = 2000) OR (Count(OrderID) > 1 AND TO_CHAR(order_date. 01-Jan-2006 o O6. COUNT(OrderID) FROM Orders o GROUP BY Customer o HAVING ((Count(OrderID) > 1 AND TO_CHAR(order_date. o SELECT Customer.'YYYY') = 2005 THEN 1 ELSE 0 END) AS 2005 o COUNT(DISTINCT CASE WHEN TO_CHAR(order_date.'YYYY') = 2004 THEN 1 ELSE 0 END) AS 2004 o COUNT(DISTINCT CASE WHEN TO_CHAR(order_date.'YYYY') = 2003 THEN 1 ELSE 0 END) AS 2003 o COUNT(DISTINCT CASE WHEN TO_CHAR(order_date.'YYYY') = 2002 THEN 1 ELSE 0 END) AS 2002 o COUNT(DISTINCT CASE WHEN TO_CHAR(order_date. * o FROM EMPLOYEES emp1. EMPLOYEE. EMPLOYEE o HAVING MAX(SALARY) = SALARY Display the 2nd highest paid employee in each department. o SELECT DEPT. EMPLOYEE. o SELECT o DEPT. EMPLOYEE o FROM o (SELECT DEPT.EMPLOYEE_JOIN_DATE Display employee records getting more salary than the average salary in their department? o SELECT o DEPT. EMPLOYEE.o     FROM ORDERS Display the employee records who joins the department before their manager? o SELECT emp1. SALARY o FROM EMPLOYEES o GROUP BY DEPT.MANAGER_ID = emp2. EMPLOYEES emp2 o WHERE emp1. SALARY o HAVING AVG(SALARY) < SALARY Display the highest paid employee in each department. AVG(SALARY) o FROM EMPLOYEES o GROUP BY DEPT. RANK() OVER (PARTITION BY DEPT ORDER BY SALARY DESC) AS RANK FROM EMPLOYEES) . SALARY.EMPLOYEE_ID o AND emp1.EMPLOYEE_JOIN_DATE < emp2. EMPLOYEE. Materialized views are the same as views except they have to be manually refreshed to contain updated date. To specify the DELETE clause of the merge_update_clause. You must have the INSERT and UPDATE object privileges on the target table and the SELECT object privilege on the source table. you must also have the DELETE . or arrange records in descending order. Each row in the one table is paired with all the rows in each of the rest of the tables. What is a view? What is materialized View? What is the difference between view and materialized view? o  ANS: SELECT COUNT(*) FROM TABLE_NAME What is Cartesian product in the SQL? o  ANS: DESC can be used to describe a schema. student_name from students where student_id = 1 and student_id = 2. Can you insert data into a view? o  ANS: A Cartesian product returns all the rows in all the tables listed in the query. What is a merge statement? What is the requirement for a merge statement? Is PK necessary for merge? o ANS: The MERGE statement is used to select rows from one or more sources for update or insertion into a table or view. What is the use of DESC in SQL? o  WHERE RANK = 2 ANS: Yes.o  Select student_id. This happens when there is no relationship defined between tables. How do you find the number of rows in a Table o  ANS: It returns nothing since student_id is generally considered a unique value and a student can't have two IDs at once. Views are updated automatically whenever an underlying table is modified. What does the query return? o  ANS: Views are virtual tables based on a query that can be realized based on multiple tables by containing combined data from each of them. You can specify conditions to determine whether to update or insert into the target table or view. whereas SQL has to look at all values in an IN clause. Difference between CHAR and VARCHAR2? . Another requirement is you cannot update the same row of the target table multiple times in the same MERGE statement. a unique/primary key is necessary. o ANS: Rank and Percent_Rank are good analytic functions when wanting to create column values. not a real table. so for this to to take place. Both commands essentially perform the same task.IN or EXISTS? o  ANS: You can have more than 1 UNIQUE constraint within a table and it can be NULL. DELETE is a DML command and can be rolled back. except TRUNCATE does it faster. which is a VATCHAR2 data type. and cannot be NULL. Differentiate between IN and EXISTS? Which is faster . ANS: UNION will filter duplicate values to give DISTINCT results. whereas there can only be one PK constraint per table. based on the rest of the dataset and it's relationship to each record.object privilege on the target table. What the difference between UNION and UNIONALL? o  ANS: TRUNCATE is a DDL command and cannot be rolled back. Give some examples where you have used analytics functions. o  ANS: IN tells SQL to run an outer query using the list of values within the clause.  What is dual? Is it a table? If so what columns does it have? What’s the data type? o  ANS: The DUAL table is a pseudo table. [edit][hide] Oracle  What is difference between UNIQUE and PRIMARY KEY constraints? o  Differentiate between TRUNCATE and DELETE. EXISTS is faster because SQL stops executing that operation after the first match. while UNIONALL will not. The DUAL table has only one column named DUMMY. EXISTS tells SQL to run an outer query on a list of values within the clause until there is a match. What does COMMIT do? o  ANS: Insert. REPLACEMENT_VALUE) Difference between CASE and DECODE? o  ANS: CHAR is a fixed length data type. What’s difference in 10G and 11g partitioning. Is there any way we can change the column name in a table o  ANS: The NVL statement says if FIELD_NAME is NULL. NULL. By using the ALTER TABLE. Yes. but it is then referred as a Composite Primary Key. and handles NULL differently. Which is faster Insert or Delete? o  ANS: DECODE can only work with scalar values. It is different from DECODE in that DECODE has an if-then-else structure. Can a primary key contain more than one column? o  ANS: Yes. assign value X: NVL(FIELD_NAME.. REPLACEMENT_VALUE). ANS: A table partition is a collection of rows that is a subset of a user-created table. . VARCHAR2 is a variable length data type and can free up unused space if possible.ALTER COLUMN command. What does ROLLBACK do? o  ANS: Yes.o  What is the NVL statement? How is it different from decode? Is it possible to implement NVL with Decode? o  ANS: The ROLLBACK statement is the inverse of the COMMIT statement. It undoes some or all database changes made during the current transaction. NVL can be implemented by DECODE using: DECODE(FIELD_NAME. CASE can work with predicates and sub queries. What are partitions? o  ANS: COMMIT makes permanent the changes resulting from all SQL statements in the transaction.. Primary key assumes only one column describes it. table or table partition. probing the hash table to find the joined rows. . index-organized table. ANS: An Explain Plan is an ordered set of steps used to access or modify information. It drives from the outer loop to the inner loop. ANS: Analyzing a table involves collecting and interpreting statistics on a table such as the following:  Collect or delete statistics about an index or index partition. index-organized table. or object reference (REF). The optimizer uses the smaller of two tables or data sources to build a hash table on the join key in memory. What is oracle hint? Is the hint a command or Oracle uses it optionally? o ANS: A hint is code snippet that is embedded into a SQL statement to suggest to Oracle how the statement should be executed. cluster. or scalar object attribute.o  What is meant by analyzing tables? o  ANS: New features of 11G allow INTERVAL partitions. ideally by an index scan. Difference between hash and nested loop joins? o ANS: Hash joins are used for joining large data sets. cluster. table or table partition. It then scans the larger table.   What is an Explain Plan? o  Note: Hints should only be used as a last-resort if statistics were gathered and the query is still following a sub-optimal execution plan. with a good driving condition between the two tables. based on a query. Nested loops nested join small number of rows. which moves part of functionality solved currently by ETL pre-wrappers to default processing of RDBMS defined in Data dictionary metadata (automatic partition creation).  Validate the structure of an index or index partition. The inner loop is iterated for every row returned from the outer loop. while estimating the time and cost of processing.  Identify migrated and chained rows of a table or cluster. you can then deprecate the original table and publish the new table with the additional column to the users. Hash joins are optimal when joining large subsets of data together. and then publish that table to the users. What is the best strategy to use when you have to delete 400 million from a billion row table. cron enables users to schedule jobs (commands or shell scripts) to run periodically at certain times or dates. o ANS: Create a new table and backfill it with the existing data in the original table. Delete the desired 400 million rows from the new table. This approach causes no impact to the users. How would that affect your ETL? o ANS: Your ETL will then have to be adjusted to ensure that the data is available for reporting. [edit][hide] ETL 1. as you are creating a separate table to backfill (with the additional column) instead of attempting to perform an UPDATE on a billion row table. 2. For the DW. It is commonly used to automate system maintenance or administration. while deprecating the original. hash joins are generally recommended as most tables are not small enough to utilize the nested loop join efficiently. Add in world wide reporting. Once the backfill is complete.o The difference is the performance in which these joins are conducted. 2. based on the different time zones. How do you add a new column and backfill the data from source without impacting the user? o ANS: You will create a new table with the additional column and then backfill the data from the existing table. [edit][hide] Linux/Unix 1. though its general-purpose nature means that it can be used for other purposes. 3. Given a billion row table. such as connecting to the Internet and downloading email. cron o ANS: Cron is the time-based job scheduler in Unix-like computer operating systems. where as nested loops are more efficient for smaller datasets that preferably has an index to use. combine 2 files o ANS: cat file1 file2 >> mergedfile . txt  Example: To remove the last line in a file: sed '$d' filename. then who can access the file? o  ANS: sort file | uniq -d ANS: Only root can read/write the file. Using pipelines in this way is not restricted to text streams. given a large 1 column (for example used 'names') file. dedupe #2 o ANS: sort mergedfile | uniq 4.txt 6. What is the difference between grep and find commands? .  Example: To remove the 3rd line in a file: sed '3d' fileName. remove a known row from a file too large for vi o ANS: The SED command provides an effective and versatile way of deleting one or more lines from a designated file to match the needs of the user. get a list of the duplicate values o  What does ls do? o  ANS: The ls command lists the files in a directory. 5. No one can execute the file. If a file has permissions 000. while only the owner can change the file's permissions. given a process named 'foo' . although that is often where they are used.find and kill it o ANS: pkill foo 7.3. describe linux/unix permissions 8. pipes o ANS: Pipes are a function of text filtering in Linux that can be used to construct a pipeline of commands where the output from one command is piped or redirected to be used as input to the next. . For example. o  ANS: grep is used to search for patterns in a file. [edit][hide] Teradata  Advantages of using Teradata over Oracle o  ANS: The advantage of Teradata is that it uses MPP architecture. hardware. Find a pattern in a file o  ANS: Redirection is when you change the standard input and outputs of a command to a user-specified location. ANS: awk '$3' file | sort | uniq -d How do you check for null value in a particular column in a file o ANS: You could use the Awk NR command. Disadvantages of using Teradata as compared to Oracle o ANS: The disadvantage of Teradata is it's ability to handle a large volume of simultaneous queries. so that a query running against large tables can run over multiple threads. where as. which gives you the total number of records being processed or line number. find is used to search files or directories. then you would check to see if a line number has NR<10. if a file has 10 columns. Count the number of lines in a file with a pattern given o  ANS: Piping is when you are redirecting standard inputs and outputs of a command by using pipes. how would you find if there are duplicates in the file. while processing them efficiently.o  What is redirection? o  ANS: grep -c "pattern" file. Pipes are generally used for redirection. etc. What is piping? o  ANS: grep is used to search for patterns in a file.txt Given that 3rd column is the primary key. 4. Try to provide a real-life problem. We can determine which dimension is current by adding a current record flag or by time stamp on the dimensional row. 2. whereas Data Warehouse is with denormalized tables. Designed to maintain transactions of the business Where Data Warehouse is non volatile with periodic updates. error. Star join is a primary key to foreign key join of the dimension tables to a fact table. ANS: Primary key is the key we define on the table column or set of columns(composite pk) to make sure all the rows in a table are unique. Transaction Database is Relational Database with the normalized table. like how would model so you can report on delay times between order state statuses . Data warehouse is for analysis. o What is the difference between a Type 1 and Type 2 Dimension? 1. Data Warehouse database is subject oriented. Differentiate Primary Key and Partition Key? 1. 2. By this way we can keep the history. 1. What are the primary the differences between a transactional database vs a data warehouse database? 1. 2. 3. Transaction Database is highly volatile. What is the difference between Snow flake and Star Schema? What are the benefits of each? 1. success. where Type I implementation fails.pending. Star Schema 1. 2. Transaction Database is OLTP. Partition key is the key that we use to partition the table with. there by we lose the history. Type II: Create a new additional dimension table record with new value.[edit][hide] Data Modeling The following are just definitions. . But data warehouse has a responsibility to track the history effectively. Type I: Replace the old record with a new record with updated data. Transaction Database is functional data. etc. the dimension data has been grouped into multiple tables instead of one large table. What is the difference between dimensional modeling vs. Snowflake Schema 1. a product dimension table in a star schema might be normalized into a products table. This model is good to use if you need to identify and understand the similarities/differences between data objects. That is. which measures similarities of data objects that are an abstraction from real world. which can help determine the relevancy of data during consumption. 3. ER Model is utilized for OLTP databases that uses any of the 1st or 2nd or 3rd normal forms. and a product_manufacturer table in a snowflake schema. 4. It increases the number of dimension tables and requires more foreign key joins. 3. Provide highly optimized performance for typical star queries. Where as ER Model is not mapped for creating shemas and does not use in conversion of normalization of data into denormalized form. Provide a direct and intuitive mapping between the business entities being analyzed by end users and the schema design.2. ER modeling? o ANS: Dimensional modelling is very flexible for the user perspective. For example. The result is more complex queries and reduced query performance. Dimensional data model is mapped for creating schemas. a product_category table. Normalize dimensions to eliminate redundancy. 2. Are widely supported by a large number of business intelligence tools. where as dimensional data model is used for data warehousing and uses .  What is a context-driven data model? When would you need one? o  ANS: A context-driven data model is based on contextual information to enhance the "understanding" of object-to-object associations. which may anticipate or even require that the data warehouse schema contain dimension tables. 2. to track each iteration of the rapid-changing dimensions. Ask to de-norm / dimensionalize the data? Key questions: how to partition the data? Context-driven approach? .Why? If not why not? Advantages/disadvantages. such as Last_Updated or Created. Provide dimensional data from an OLTP source in a key-value pair.  2nd normal form represents a table where no non-prime attribute in the table is functionally dependent on a proper subset of any candidate key.g: new/used books). Describe a scenario where you would have to snowflake a model? o   How do you design a data model for rapid changing dimensions? o  ANS: The normal forms of relational database theory provide criteria for determining a table's degree of vulnerability to logical inconsistencies and anomalies. is if you have entities that have a parent-child relationship..3rd normal form.) Ask to build a data model? If header-detail approach. Where would the change be propagated to? 2.  3rd normal form represents a table where every nonprime attribute is non-transitively dependent on every candidate key in the table.  Describe the normal forms? What is BCNF? 2nd normal form? 3rd normal form? o  Boyce–Codd normal form (BCNF) represents a table where every non-trivial functional dependency in the table is a dependency on a superkey. Provide candidate an OLTP model similar to amazon ordering with 2-3 dimensions (product/customer/merchant etc. In other words. ANS: Implement an internal audit column. no transitive dependency is allowed. The attributes that do not contribute to the description of the primary key are removed from the table. ANS: One scenario for creating a snowflake schema. o Add a value to the OLTP design that alters the grain of one associated dimension (e. ER model contains normalized data where as Dimensional model contains denormalized data. deletes by any other transaction o ANS: By setting the table to the serializable isolation level.e. . and insert into a specified table. o What kind of design will you propose in source as well as data warehouses for tables which have hard deletes occurring on regular basis in source systems. You are starting a transaction and reading from an oracle table and processing the data. Another solution. What kind of logging system would you design for sql and pl/sql scripts so that all errors get logged in error tables? Provide at least two design solutions o ANS: One design solution within Oracle is. How will you ensure that the select is consistent i. o ANS: One way is by using bridge tables that holds at least the 2 foreign keys from the 2 tables that have the M:M relationship. o ANS: A design where Type 2 SCD's are implemented in Dimension tables and temporal fact tables are created to give DW users the visibility of whether the facts have been hard deleted in the source systems. Also. 3. 2. non-repeatable reads and phantom reads from occurring during the initial select transaction.3. which is just one aspect of a well designed system. [edit][hide] Additional Questions for DEIII (Level 6) Bar [edit][hide] Oracle 1. and more difficult to manage (training and troubleshooting). and possibly even handle a larger load. using UNIX. How do you handle many to many relationships in star schema. you can create a stored procedure call that can be attached to any other package/procedure that would be able to gather data on an error/exception or user/system checkpoint. RAC systems usally only improve availability. On the other hand. can scale with less hardware. while another script uses the data from the controller file to pull more data from Oracle's error logs. RAC systems can be costly. o ANS: RAC systems allows closer to 100% uptime. Advantages and disadvantages of Oracle RAC systems. which will prevent dirty reads. you ignore inserts. would be to create one script as the controller file that captures the unique identifiers for each error/checkpoint. updates. or on-site. Each database system was designed with specific advantages and disadvantages that may outweigh or downplay the advantages of Oracle (which also depends on the intended application of the database system). quarterly(q-o-q) and yearly(Y-O-Y) metrics . Provide solutions which have a very low price performance. monthly (m-o-m).4. 2. The vendor portal will be accessed from the external systems. We also receive customer returns. Advantages of using Oracle vs other database systems o The advantages may differ. What kind of end to end architecture you would design for such a scenario. Follow up questions: What happens when a backfill happens? 1. 2. or phone screen questions (in decreasing order of difficulty). This data needs to be stored and sales commission needs to be paid based on the state based on the $ sales made. . database and non database. Daily volume is around 600 million rows. The objective is given that the base tables are there in the datawarehouse. Products get traffic. Amazon receives Products from Vendors. SQLInterviewQuestions I've created this page as a place to put SQL puzzles to assign candidates who claim strong SQL backgrounds as homework. Amazon cuts PO’s to vendors which are fulfilled. depending on which database system is being compared to Oracle. What kind of solutions will you provide. Data warehouse receives orders from multiple ordering systems. what are the major factors you need to consider followup: Have you architected a reporting solution. a Vendor portal needs to be created which provides daily (d-o-d). How would you implement such a solution? What are the major design decisions you will take to ensure that a payment is tracked . get ordered and then shipped out of warehouses. We need to design a data mart for storing and querying the raw data for a 2 year history. A mapping table is present which associates state to Sales manager. When designing a data warehouse solution (both etl and reporting) for a company having businesses around the world. weekly (w-ow). [edit][hide] Architecture and design 1. We get click stream data on a daily basis from source team. What were the challenges faced. Prove or disprove the following equation: ( X join(f(X. the result will include all rows of X? . select distinct o. Y. from Objects o join ( 9. join ObjectTag ot2 on o.Z)) Z ) where all the field names of X. Suppose also that I have a mapping table ObjectTag which represents a many-tomany relationship between Objects and Tags.. group by obj_id 13. from ObjectTag 11.obj_id 6. Suppose I have a table X with a numeric field N.id = ot.N = m.obj_id 2. given a finite input list of Tag ids.[edit][hide] Homework Questions 1. Now I wish to find.obj_id where ot.id = ot<n>. How do I write a single query with one numeric query parameter such that if the parameter is some number m.Y)) Y ) left join(g(Y. Incidentally. and if the parameter is null. Argument via set-theoretic calculation.obj_id 4. and Tags. the result will only contain rows where X. 5. having count( tag_id ) = <n> 14.id = ot.Y)) ( Y left join(g(Y.. join .* from Objects o join ObjectTag ot on o. where ot1. Can you do (a) and (b) with one query each? o o o o (a) select distinct o. select o. ) ot on o. and ot<n>.tag_id = <input 1> and . Oracle Corporation's query-plan optimizer team is in a state of denial about this equivalence..Z)) Z == X join(f(X.tag_id = <input n> 7.tag_id in ( <input list> ) (b) Two ways. [Answer: true.] [edit][hide] On-site Questions 1. and (b) all of the input tags [harder].* from Objects o join ObjectTag ot1 on o. where tag_id in ( <input list> ) 12. join ObjectTag ot<n> on o.id = ot2. select count( tag_id ) tag_count. Suppose I have two entities in my DB: Objects. the set of Objects which map to (a) any of the input tags [easy]. with the second worth many more points than the first in terms of elegance. Let n be the length of the input list: 1..id = ot1. and Z are distinct. 2. obj_id 10.* 8.obj_id 3. MySQL */ select * from X where N = COALESCE( ?. What's the fastest query possible? [Answer: select * from X where X.empid) Department deptid (primary key) deptname [edit][hide] Questions Type Question SQL . select * from X where N = NVL( ?.3. 4. Suppose my enormous table X has a uniquely indexed string field "name". Now I want to find all records in X whose name field starts with the prefix 'foo'. N ) /* PostGreSQL */ [edit][hide] Phone Screen Questions 1.name < 'fop' as opposed to the much more common response select * from X where X.deptid) mgrid (foreign key to Employee. N ) /* oracle */ select * from X where N = IFNULL( ?.name like 'foo%' SQL [edit][hide] Base tables Employee empid (primary key) name title salary deptid (foreign key to Department. N ) /* SQL Server.name >= 'foo' and X. deptid and salary = (select max(salary) from Employee) Join Self join select emp.mgrid = mgr.deptid group by deptname Group by having Dept Name with number of employees > 10 select deptname. count(empid) from Employee emp. All employees reporting Employee mgr where emp.name from Employee emp.name.name from Employee emp where emp.deptid = dept. Department dept where emp.deptid group by deptname having count(empid) > 10 Dept Name with number of employees Outer Join include depts with no employees also select deptname.*.empid Self join select emp.mgrid = to Aneesh mgr.deptid = dept.deptid group by deptname Highest salary Sub query employee with dept name select emp.name from Emp name & Mgr name Employee emp. Department dept where emp.* from Employee emp where mgrid is null .deptid = dept. mgr.* from Employee emp. Department dept where emp. Department dept where emp.salary from Employee mgr where emp. dept. Employee mgr where emp.All employees from department = GFS select emp.deptid (+)= dept. count(empid) from Employee emp.salary > (select mgr.empid) All employees reporting select empname.empid and mgr. Department dept where emp. mgrid from Employee Heirarchic to Ramya (directly or start with empname = 'Ramya' connect al query indirectly) by prior empid = mgrid is null Find the top most employee select emp.deptid = dept.deptid and dept.name = 'Aneesh' Corelated subquery Employees with salary more than their managers select emp. count(empid) from Employee emp.mgrid = mgr.* from Employee emp.deptname = 'GFS' Group by Dept Name with number of employees select deptname. zip code and demographic data on zip-code level (key is zip_code) Create a sql query that returns the most recently used zip-code and the most commonly used zip code for each customer. Suppose you are aggregating shipping_addresses over customers. We want to aggregate shipping address zip codes up to customers to choose a 'representative' zip code for each customer that can be used for model building. customers. What is the 'Simpson's paradox'? Give an example. performance issues – how to improve.The above will cover some basic scenarios. PipsInterviewQuestions A few Interview questions in sections [edit][hide] Statistics 1. Join the results of this query with the census table to get the medianHouseValue for the zip code for each customer. Can ask about EXISTS. There are three tables  purchases . each customer has a customer_id and each address has an address_id.has customer purchases including shipping_address_id (key is purchase_id)  addresses . NOT EXISTS and other correlated subquery conditions. May be one question about giving hints in sql query. Orders has order date. If you want multiple joining condition may be add another table like address into the mix and create some joining conditions. Ask some question regarding partitioning – say we have tables : orders. customers may have multiple shipping addresses. followup: How might this paradox occur in continuous distributions? [edit][hide] SQL 1.has address_id and postal_code (for US customers postal_code = zip_code. SQLInterviewQuestions . Should arrive at partitioning by date. key is address_id)  zipcode2000census . id = ot<n>. and Tags. join ObjectTag ot2 on o. Suppose I have two entities in my DB: Objects. the set of Objects which map to (a) any of the input tags [easy]. or phone screen questions (in decreasing order of difficulty). Argument via set-theoretic calculation. Incidentally.tag_id in ( <input list> ) (b) Two ways. from ObjectTag 11.I've created this page as a place to put SQL puzzles to assign candidates who claim strong SQL backgrounds as homework. Oracle Corporation's query-plan optimizer team is in a state of denial about this equivalence.. from Objects o join ( 9.] [edit][hide] On-site Questions 1.. 2. Suppose also that I have a mapping table ObjectTag which represents a many-tomany relationship between Objects and Tags. join ObjectTag ot<n> on o.obj_id 6. group by obj_id .id = ot2.tag_id = <input 1> and . select distinct o. [edit][hide] Homework Questions 1. Now I wish to find. with the second worth many more points than the first in terms of elegance. and (b) all of the input tags [harder]. or on-site.obj_id where ot. [Answer: true. Let n be the length of the input list: 1. 5. select o. join .Y)) Y ) left join(g(Y. given a finite input list of Tag ids..* 8. select count( tag_id ) tag_count. where tag_id in ( <input list> ) 12.Z)) Z == X join(f(X.id = ot1.id = ot.obj_id 4. and ot<n>. Can you do (a) and (b) with one query each? o o o o (a) select distinct o. and Z are distinct. Prove or disprove the following equation: ( X join(f(X.tag_id = <input n> 7.obj_id 3.Y)) ( Y left join(g(Y. Y..Z)) Z ) where all the field names of X. where ot1. obj_id 10.* from Objects o join ObjectTag ot1 on o.* from Objects o join ObjectTag ot on o. ] nterview Question [edit][hide] Some Interview Questions  What command would I use to search for a specific string or regular expression in a file?  What command would I use to change the permissions of a file?  What command would I use to find the names off all processes running as a specific user?  How to kill a process  What are some common data structures in Java?  What is a binary tree:  What is a BST  Some of the tree traversals? .name >= 'foo' and X. How do I write a single query with one numeric query parameter such that if the parameter is some number m. having count( tag_id ) = <n> ) ot on o. N ) /* PostGreSQL */ Phone Screen Questions 1. N ) /* oracle */ select * from X where N = IFNULL( ?.name < 'fop' as opposed to the much more common response select * from X where X. select * from X where N = NVL( ?. and if the parameter is null. 4. What's the fastest query possible? [Answer: select * from X where X.N = m.id = ot. N ) /* SQL Server. the result will only contain rows where X. Now I want to find all records in X whose name field starts with the prefix 'foo'.obj_id 2. MySQL */ select * from X where N = COALESCE( ?. Suppose my enormous table X has a uniquely indexed string field "name".name like 'foo%' which can't use the index on name. 14. Suppose I have a table X with a numeric field N.13. the result will include all rows of X? 3. employee => Eid. how would you try to fix it?  I gave the candidate the structure of two database tables.  Write a method to print out a binary tree's nodes in level-order.  Find Nth element from the last in a linked list  You are responsible for supporting a web service. name. Give me a case where I would want to use a hash table?  What is the time complexity of retrieving an element from hash table?  Give me a regex to match a 10-digit phone number of the form 555555-5555. o Write a SQL query to list all active employees and their managers. or o Write a SQL query to find the total number of active employees. You get notification that the service is failing – assuming you do not know much about the service (but you know that you own it). "employees" and "managers" and asked him to write a couple of SQL queries. active. It is a distributed service where incoming requests are load-balanced between 10 different hosts. name. ABC service. MiD manager => MId. active -----------------------------------------------------------------------------------------------------------------------------DWInterviewCompetencies || Help Desk || Alphabetical Listing || All DW Topics || DataWarehouse Interview Competencies Contents [hide] . 5 Hiring Manager o 1. parallelism.1 Data Engineering o 1.3 Database Concepts o 1. [edit][hide] Data Engineering This should include Operational Data Engineering skills that is required in DW world.7 DW Grid o 1.6 Bar Raiser o 1.2 Data Modeling and Design o 1.  Please make sure you have sufficient number of questions in each competency to support your vote. Some section corrupted. billion rows. MVs). 1 Competencies o 1. impacts to objects (indexes. huge backfills. different granularity handling etc.4 Coding and Problem Solving o 1.Interviewer Pool [edit][hide] Competencies Following are the competencies that are identified that each person should focus on for DW Data Engineer role. how will you backfill only those affected rows? . ONLY HIRING MANAGER WILL DO THIS Candidates should be comfortable with partitioning. multi terabyte data. if its more please ask your HM.8 Competency . please dont vote for it. instead keep it in Pros/Cons. Examples questions:  Huge.  Any skill set that you have a serious data point and that is outside of your competency. please abide by the following:  Please make sure you have only two Competencies from the list below. Before looking into the competencies. Tables in three Clusters out of sync. writing takes long time. exchange partitions. .. the candidate  Resolves all ambiguities by himself  Asks lot of relevant questions  States his assumptions  Commits mistakes. how will you correct it? o    Expect for more clarifying questions like which one is correct. ONLY HIRING MANAGER and BR WILL DO OLTP DM. again articulation around partitions. OTHERS PLEASE ASK MORE DW DM and Design  Please give Votes as follows: o RAISES . global/local. SQL producing 500 M rows. Load errors during huge volumes o Duplicates o Data errors when they dont match column data type definition A big file 500 M rows (200 GB). how will you load into tables? o External table o How will he parallelize? o Unix/OS level familiarity to parallelize etc. o A DE II and DE III should be aware of impact to indexes. what are his thought process to make it better? [edit][hide] Data Modeling and Design This includes both Oltp Data Modeling and DW Data Modeling and Design.In addition to giving you a correct answer. but when probed understands and corrects. o Expect for partitions.  Thinks more on extensibility.You know. T_Changed) for denormalization of multiple tables into one where updates happen asynchronously. o LOWERS . It took lot of probing to get the above things cleared up. deleting old rows. how will he implement it? How will he make it fast? Any improvements on the only changed ones etc. [edit][hide] Database Concepts This strictly includes only Database concepts. marking it as deleted flag. OLTP Examples include:  Give a use case and ask him to design a data model (some DMs that we ask are bookmyshow.  Aggregate designs  o How will you design a multi granular table with some measures against 3 dimensions? o How will you select a particular granularity row (see for bitmap indexes on booleans that describe the granularity of that row?) Daily query of 15 months scanning to do YoY.  SCD type of implementation (Customer address change). o MEETS . Examples include:  Execution plans  Difference between Hash and Nested loop joins . scalability (its ok for you to probe him on this)  Doesnt give up when you make more complex.)  Any SQL questions from the DM and your judgement should go ONLY as Pros or Cons DW DM and Design include:  Etl Design (ex. car pooling.Candidate just gives you the correct answer and he doesnt exhibit all the above.. table management in a restaurant etc. ask for actual implementation. but he picks your probing and solves.com. W_ Tables. ranks. can it be done in a single for loop etc) . rownumbers. not too many subqueries.  Questions on rollup. outer joins  Case when statements/decode  Analytical functions (lag lead.manager relation in same employee table  Joins. MVs etc etc. grouping sets etc. employees . My recommendation will be start off with simple SQL coding skills to medium to complex problems that requires intermediate designs and implementing above with SQL code as well.. Partitioning concepts  Parallelism concepts  Consistent reads (ora-snapshot too old errors)  Indexes.  Pivotting. effective joins. Ask more on solving using group bys. temp tables etc  Recursive functions in PL/SQL programming  Efficient for looping in PL/SQL programs (if its two for loops. set operators. please Observe for minimal scans. row explosion!)  Designing intermediate structures.). Examples include:  You can start off with Top 10 salaries in a employee table  Self join type of questions. last value etc.  Distributed Databases (pros and cons) [edit][hide] Coding and Problem Solving This includes giving candidates problems and observing the approach and SQL coding skills for the same. De-Pivotting  Cartesian Joins (yes it serves some purpose too. ITS FINE. if the candidate DOESNT know analytical fucntions.. temporary tables. cube. first value. You can also give problems that requires procedural coding (PL/SQL programming). In SQL.. With tables etc.. C.C D Coding . Data structure usage in PL/SQL programs (Cursors. [edit][hide] Hiring Manager  Project management  cult fit  HM can also pick any skill set form above just to be comfortable and please include that in Pros/Cons. (HM should do one of the competencies as well).A B DM .D HM HM Round . arrays etc).A B C DB Concepts . B.  [edit][hide] Bar Raiser  BR competencies.HM BR BR . D. . tables. [edit][hide] DW Grid DW Grid should look something like this: Data DM and Engineering Design(OLTP/DW) 2 Votes 2 Votes Coding and PS 3 Votes DB Concept s 2 Votes HM BR 2 2 Votes Votes Total Interviewers : A. BR DE . HM.BR HM So we need 4 onsite interviewers + a HM + a BR. 1 BI Tools .[edit][hide] Competency .Interviewer Pool Interviewer Data Engineerin g Coding and PS DB DM Concept (OLTP/DW s ) General HM/BR skill sets Venkatesh Mohan Y Y N Y Y Abhishek Agrawal Y Y Y Y Y Rakesh Singh Y Y Y Y N Naidu Rongali Y Y Y N N Aniruddha Vishnupurikar N Y N Y N Paparao Chinthagunti Y Y Y Y N Ankush Kuhar Y Y Y Y N Samar Sodhi N Y N Y N AmazonAnalyticsDEInterviewsQuestions Contents [hide]  1 Amazon Analytics Data Engineer Interview Questions o 1.1 Outline of Phone Screen o 1.2 Questions by Subject Area  1.2. 2.2. 10 min ."Why amazon?". "any questions for me?". 25 min .4 Data Modelling  1.2.Introduction (hello and quick "who you are".7 Data Warehousing  1.2 Reporting  1.Deep dive with questions 4. 1.3 SQL  1. [edit][hide] Outline of Phone Screen 1.5 Unix  1.2.6 Oracle DB Technology  1.Ask about background and most recent experience 3. describe job position) 2. 5 min .8 Essbase [edit][hide] Amazon Analytics Data Engineer Interview Questions See Amazon_Analytics_DE_Interviews for summary of typical DE interview for BI Reporting. What are drill down and drill across reports.2. what is the difference? .2.2. 5 min . describe next steps [edit][hide] Questions by Subject Area [edit][hide] BI Tools  Basic o What is the purpose of BI  Intermediate  Advanced [edit][hide] Reporting 1. 6.Steps for LDAP setup 15. Display the highest paid employee in each department.2. Select student_id. What are the different join types? 2. What is pivoting? How will you write a pivoting sql? 3. What is scorecarding? 5. role of each layer 8. Display the employee records who joins the department before their manager? 3. Different type of cache? 9.What is Guided navigation and how it works 16.What is shared logon property in the Connection pool setup 11. OBIEE Product overview 7. Explain the Dimension Hierarchy! 6. Display the 2nd highest paid employee in each department. 5. Request processing flow in OBIEE. student_name from students where student_id = 1 and student_id = 2.Connection pool optimization 12. What is a dashboard? 4. Level based measures and Preffered drill path 10. What does the query return? .Difference between online and offline repository 13.Variable type and usage (Both at RPD and PS level) [edit][hide] SQL 1. Display employee records getting more salary than the average salary in their department? 4.Steps for MUD development 14. What is Cartesian product in the SQL? 11. How do you find the number of rows in a Table? 10.What is a view? What is materialized View? What is the difference between view and materialized view 12. What is the use of DESC in SQL? 9. select customer ids of customers who placed orders with more than 3 items (having or subquery)    create buckets for if they placed more than 10 and more than 100 orders (case) Intermediate o Difference between hash join and nested loops o dedupe (analytics.What is a merge statement? What is the requirement for a merge statement? Is PK necessary for merge? 14. How do you model a many-to-many relationship? 2.What is dual? Is it a table? if so what columns does it have? What’s the data type?  Basic o Describe different joins o given order and order items tables. select customer ids of customers who placed orders with more than 3 items (having or subquery) 8. etc) Advanced o Describe explain plan for query (give query) [edit][hide] Data Modelling 1. temp tables.7.Can you insert data into a view? 13. What is 3NF? How normalize/denormalize? . given order and order items tables. What is normalization? denormalization? 3. If a file has permissions 000. Differentiate between TRUNCATE and DELETE 3.4. Difference between CHAR and VARCHAR2? 6. What is difference between UNIQUE and PRIMARY KEY constraints? 2. What is a type-2 dimension? How many types are there? [edit][hide] Unix 1. What does ls do? 2. then who can access the file? 3. How would you dedupe a text file? 7.IN or EXISTS? 4. Differentiate between IN and EXISTS? Which is faster .Which is faster Insert or Delete? 11. What is the NVL statement? How is it different from decode? Is it possible to implement NVL with Decode? 7.Can a primary key contain more than one column? 12. What is the difference between UNION and UNION ALL? 5. What is redirection? 5. Is there any way we can change the column name in a table 10. what is COALESCE function? 8. Difference between CASE and DECODE? 9. What is the difference between grep and find commands? 4. What is piping? 6.What does COMMIT do? . How do you view in-use ports? [edit][hide] Oracle DB Technology 1. What is a data mart? 7. What are push and pull ETL strategies? 10. What is the difference between OLTP and DW systems? 3. What are Levels and Generations? 3. Can there be more than one Accounts Dimension in a cube? 4. Is it possible to update dimension values during fact load? . Difference between Block and Aggregate Storage? 2.13. Is it possible to have duplicate level-0 members in a cube? 5. What is a Rule file? 7.What are partitions? [edit][hide] Data Warehousing 1. How do you determine what records to extract in Incremental/Refresh load? 6. What is the data type of the surrogate key? 2.What does ROLLBACK do? 14. What is a staging area? Do we need it? What is the purpose of a staging area? 5. What are Star & Snow Flake Schemas? What is the difference? When do you use one or the other? 8.What does level of granularity in a fact table mean? 11. What is Full/Initial load & Incremental/Refresh load? 4.What is the difference between Inmon and Kimball methodology? [edit][hide] Essbase 1. If there are duplicate members in the cube to which member does Essbase attribute fact value? 6. You have an Item Orders fact table: Will you store the Product group of the item in the fact? If so why? Else why not? 9. without adding new dim members. is re-aggregation required? 13. What is MDX? 10. What is MAXL? 9.If new data is added to the cube.What is query based aggregation and stop value based aggregation? .8.Why is aggregation needed? 12.What is aggregation? 11.

Analyst Interview Questions - AMAZON

Comments

Description