Big Data Project Report

March 29, 2018 | Author: HemanthAroumougam | Category: Big Data, Data Model, Databases, Analytics, Sql


Comments



Description

!BIG DATA REPORT !"#$%&'()*+,#+,-$#( 2014/04/04 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"! Confidential ! ! Table of Contents:! Abstract!$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!%! Chapter 1!$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!&! •! Definition and some facts of BIG Date:! $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!&! •! Variety!$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!'! •! Velocity!$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!(! •! Volume! $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!)! • Overall Diagram of 3V’s!$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!*! Chapter 2!$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!"+! •! Advantages and Disadvantages of BIG DATA:!$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!"+! Chapter 3!$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!""! 3.1 Dialogue with Consumers-!$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!""! 3.2 Re-develop your Products-!$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!""! 3.3 Perform Risk Analysis-!$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!""! 3.4 Keeping your data safe-!$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!""! Chapter 4!$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!"#! 4.1. Car Makers (Toyota):!$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!"#! 4.2. Finance (Visa):!$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!"#! 4.3. Utilities (oil & gas) (Chevron Corporation):!$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!"#! 4.4. General Manufacturing (General Motors India Limited, GM):!$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!"#! 4.5. Policing (CBI):!$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!"%! 4.6. Retail and Marketing (Air Jordan):!$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!"%! Conclusion!$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!",! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!#! Confidential ! ! Abstract This is a report that contains details about what is Big Data, advantages and disadvantages of Big Data, Some things that you can accomplish with Big Data, Utilization of Big Data and a conclusion. The Utilization of Big Data part consists of significant information about where does the data comes from, what they can do with data and how does this benefit them. The conclusion part consists of information about with big data what would be the future like, what are people going to be doing when everything makes data and finally what do I want to do with big data. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!%! Confidential ! ! Chapter 1 • Definition and some facts of BIG Date: • • • • • At the start people who work in companies called employees used to enter data into computer systems. Then the second generation came where us users online started entering our own data into social networking sites. Now a third generation has come. This generation is where machines in companies or factories are automatically entering data into computer systems. Overall BIG DATA is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. In Big Data there are 3Vs. The 3Vs are Big Volume, Big Velocity and Big Variety. These are the defining properties and the dimensions of big data. Volume refers to the amount of data. Variety refers to the number of types of data. Velocity refers to the speed of the data processing. Big volume: With Simple (SQL) analytics, With complex (non-SQL) analytics. Big Velocity: Drink from the fire hose. Big Variety: Large number of diverse data sources to integrate. SQL stands for Structured Query Language. SQL is a standardized query language for requesting information from a database. SQL was first introduced as a commercial database system in 1979 by thed Oracle Corporation. • • • • • • • • • • • !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!&! Confidential ! ! • Historically, SQL has been the favorite query language for database management systems running on minicomputers and mainframes Big data is a buzzword, or catch-phrase, used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using MStraditional database and software techniques. In most enterprise scenarios the data is too big or it moves too fast or it exceeds current processing capacity. “Big data is a term describing the storage and analysis of large and or complex data sets using a series of techniques including, but not limited to: NoSQL, MapReduce and machine learning.” • • ! ! ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!,! Confidential ! ! • Variety Unstructured Data- refers to information that either does not have a pre-defined data model or is not organized in a predefined manner. Unstructured information is typically text-heavy. In other words unstructured data is something that is at the other end of the spectrum. It might be in any form: text, audio, video. We definitely don’t know from looking at the data what it means ,unless we apply human understanding to it. Examples of Unstructured Data • • • • • • • • • Book Story Heavy text audio video RSS Feeds Word documents Excel Spreadsheets Email messages Structured Data- Data that resides in a fixed field within a record or file is called structured data. This includes data contained in relational databases and spreadsheets. Structured data has the advantage of being easily entered, stored, queried and analyzed. Examples of Structured Data: • • Census records (birth, income, employment, place etc.) Library Catalogues (date, author, place, subject, etc) • Phone numbers (and the phone book) !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!'! Confidential ! ! • • Economic data (GDP, PPI, ASX etc.) XML-TEI (bringing structure to the text through tagging particular elements like versions of the word ”canal’ in 17th C Dutch. Databases Data warehouse Enterprise systems (CRM, ERP, etc) • • • Relational Data- Relational data is a data that speaks for itself – typically this is the standard fare for data warehouses. This is extracted from ERP and other operational systems. We already know what the data means and what its structure are. Semi structured Data: Semi-structured data is a form of structured data that does not conform with the formal structure of data models associated with relational databases or other forms of data tables Examples of Semi structured Data: Web pages Information integration XML • • • ! • Velocity Velocity Rates • • • • Real Time (Fastest) Near Real Time Periodic Batch (Slowest) Real Time- a real time big data analytics platform, delivers ultra-fast, interactive analytical results with sub-second response time. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!(! Confidential ! ! Batch: is another type of streaming data but is a slower than the Real time. Benefits of Batch Processing: • • • • It can shift the time of job processing to when the computing resources are less busy. It avoids idling the computing resources with minute-by-minute manual intervention and supervision. By keeping high overall rate of utilization, it amortizes the computer, especially an expensive one. It allows the system to use different priorities for batch and interactive work. Rather than running one program multiple times to process one transaction each time, batch processes will run the program only once for many transactions, reducing system overhead. • Volume Volume pretty much refers to the number of amount of data. Like PB, TB, GB, MB, KB and so on. Volume pretty much consists of • • • • Records Transactions PB, TB, GB, MB, KB Tables, Files We currently see the exponential growth in the data storage, as the data is now more than text data. We can find data in the format of videos, music and large images on our social media channels. It is very common to have Terabytes and Petabytes of the storage system for enterprises. As the database grows the applications and architecture built to support the data needs to be reevaluated quite often. Sometimes the same data is re-evaluated with multiple angles and even though the original data is the same the new found intelligence creates explosion of the data. The big volume indeed represents Big Data. ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!)! Confidential ! ! Overall Diagram of 3V’s ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!*! Confidential ! ! Chapter 2 • Advantages and Disadvantages of BIG DATA: Advantages: • • • • Data mining allows uses are that you can find correlations easier. More calculated now therefore accuracy is higher. Data is now combined into a big mass, which allows for links to be found. For example: company with decades of information can make use of Big Data and data analysis to create competitive advantages and open new business opportunities. Started because companies have been finding it hard to manage all their data. Creates new growth opportunities, lots of jobs. • • Disadvantages: • • • Big risks on security and privacy. Challenges arise: expensive, need to spend a lot to get it working. A lot of analyzing: uncover patterns, apply algorithms, connections relationships. Still need specialization regarding the analysts; hard to find the right skill set. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"+! Confidential ! ! Chapter 3 Things that you can accomplish with BIG DATA 3.1 Dialogue with Consumers• Today’s consumers are a tough nut to crack. They look around a lot before they buy. You want to make customers to buy your products. Big Data allows you to profile these increasingly vocal and fickle little ‘tyrants’ in a far-reaching manner so that you can engage in an almost one-on-one, real-time conversation with them. This is not actually a luxury. If you don’t treat them like they want to, they will leave you in the blink of an eye. • 3.2 Re-develop your Products• Big Data can also help you understand how others perceive your products so that you can adapt them. Analysis of unstructured social media text allows you to uncover the sentiments of your customers and even segment those in different geographical locations or among different demographic groups. • 3.3 Perform Risk Analysis• Success not only depends on how you run your company. Social and economic factors are crucial for your accomplishments as well. Predictive analytics, fueled by Big Data allows you to scan and analyze newspaper reports or social media feeds so that you permanently keep up to speed on the latest developments in your industry and its environment. Detailed health-tests on your suppliers and customers are another goodie that comes with Big Data. This will allow you to take action when one of them is in risk of defaulting. • 3.4 Keeping your data safe• You can map the entire data landscape across your company with Big Data tools, thus allowing you to analyze the threats that you face internally. You will be able to detect potentially sensitive information that is not protected in an appropriate manner and make sure it is stored according to regulatory requirements. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!""! Confidential ! ! Chapter 4 Utilization of BIG DATA: Big Data is used in many fields like: 4.1. Car Makers (Toyota): • Fault Logging and cost predictions- Car makers place hundreds of sensors on components around the car which constantly log data on performance and faults. All of this data can be used to re engineer designs for more efficient products and to predict what the strain of warranty repairs are likely to be on cost and man resource. 4.2. Finance (Visa): • B2B supplier profiling- Finance professionals can use big data to check on the ‘health’ of their suppliers and business partners. They can monitor a variety of indicators including when creditors pay their bills and whether there is any change. Fraud detection- Companies like Visa are using big data to create fraud detection models, which can flag up potential fraudsters. • 4.3. Utilities (oil & gas) (Chevron Corporation): • Asset monitoring- As with the machines in manufacturing plants, the utilities companies use big data to keep track on all of their assets spread across a country, continent or the globe. This enables them to fix any broken asset (such as a sewage cleansing plant, a leaking pipe or a gas pump), perform pre-emptive running maintenance or isolate areas in which repair actions have been ineffective. 4.4. General Manufacturing (General Motors India Limited, GM): • Simulations- Manufacturers can take real data from their products on the market and then run simulations based on what would happen if they changed one particular component or design aspect. They can then find ways to make the product cheaper, more reliable or more environmentally friendly. The Formula 1 racing teams are particularly adept in this area, as are advanced aerospace companies. Expanded product design modeling- Similarly, with new big-data enabled computer aided design programs, product designers can substitute components or materials from huge databases and then access in-depth information on how this affects the final product, including the ramifications ! • !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"#! Confidential ! on cost, production processes, environmental requirements, supply chain and so on. effects, legislative 4.5. Policing (CBI): • Suspect tracking- By combining CCTV images, facial recognition software, travel trends and identifiers on travel cards, police forces can capture criminals by automatically linking people to their likely destinations on buses and metro systems. This allows police to catch those that they miss at the scene of the crime and also to control arrest statistics, meeting targets for arrests in one London borough, for instance, as needed. 4.6. Retail and Marketing (Air Jordan): • Mood mapping- Retailers use feeds from social networks to build an understanding of how their products and company reputation is seen among the public. With the constant streams of opinions from Facebook, Twitter, Google+ and the like, companies are able to cheaply and quickly gather large samples of customer opinion. Title 1. Car Makers (Toyota) Where From the factories and from the sensors to the data center (headquarters) Type of Data: What condition the car is in. Needs -Safety and Quality analysis. Benefits Feedback from design. 2. Finance (Visa) Where ever they buy. Type of Data: What they buy, where they buy, when they buy, how much they buy it for. Several branchesHeadquarters in Gurgaon Type of Data- What condition the motor is in. - Detect Fraud - Customer’s behavior Personal Recommendation. 3. General Manufacturing (GM) -Safety and Quality analysis. Awareness and indication on what to fix. 4. Policing Several Police departments - Detecting Give awareness !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"%! Confidential ! ! (CBI or CID) CBI- Crime Branch India CDI- Crime Investigation Department. (India) Both are same. 5. Utilities, oil & gas (Chevron Corporation) - to the main CBI Headquarters located in Delhi. They mainly track people by their cellphones. Type of Data- Detail of the person who they are tracking person’s behavior and actions. for what that person is going to do next. What is their next plan? From the machines in the manufacturing plants - data center (headquarters). Type of Data- What is going on in the Manufacturing plant. - Keep track of what is going on in the Manufacturing plants like broken pipes, leakage and etc... This gives them feedback from designs so they know how to improve the construction of the manufacturing plant because that is their main source of how they get oil and gas. This gives them feedback on what the customers are thinking about the product. Gives feedback from audiences to improve their product. 6. Retail and Marketing (Air Jordan) Air Jordan is a Basketball Shoe Company in America. From social media networking sitesheadquarters of company (data center) Type of Data- Customer’s opinion or feedback on the product. - Customers behaviors (like it or not) - Helps to find out consumers opinions and feelings. - Feedback of their brand. ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"&! Confidential ! ! Conclusion With Big Data what would be the future like? As larger and more complex data sets emerge, it becomes increasingly more difficult to process Big Data using on-hand database management tools or traditional data processing applications. To maximize the significant investments in these datacenter resources, companies must tackle Big Data with “Big Workflow,” a term we’ve coined at Adaptive Computing to describe a comprehensive approach that maximizes datacenter resources and streamlines the simulation and data analysis process. What could you do with Big Data that you couldn’t do before with? With Big Data one of the major things that we can do is to predict the future. In today's world we are surrounded by predictions. For instance, during political elections the main focus of the media and the public is not on the differences between the candidates' positions, but rather on the "horse race" aspect of the competition. Issues at stake are secondary compared to the main question: who is going to win? So with these data trends that we receive we can predict the future. What am I going to do with Big Data? In the future I am planning to become a Robotic Engineer. If I make products that benefit human lives then I can use that Big Data collected to conduct a safety and quality analysis test on my product. This would lead me to get feedback from my design. I could also use Big Data to collect costumer’s opinion about what they have on my brand and on my product. With these feedback and opinions I could improve my product to satisfy them in the future. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!",! Confidential ! !
Copyright © 2024 DOKUMEN.SITE Inc.