16469_17_2017_Lecture1-2_INT312

April 3, 2018 | Author: Avinash Singh | Category: Big Data, Analytics, Databases, Information Management, Technology


Comments



Description

INT312::BIG DATAFUNDAMENTALS Lecture #0 Course details • LTP – 004 [Four Practicals/week] [BYOD] ▪ CA Category: A0304 ▪ Course Orientation: RESEARCH, SOFTWARE SKILL ▪ Weightages: ATT: 5 CA: 25 MTT: 20 ETT: 50 WILEY . KEIT. UNDERSTANDING BIG DATA: ANALYTICS FOR ENTERPRISE CLASS HADOOP AND STREAMING DATA by PAUL C ZIKOPOULOS. MCGRAW HILL EDUCATION 2. ORACLE BIG DATA HANDBOOK by TOM PLUNKETT. ALEXEY YAKUBOVICH. BIG DATA by ANIL MAHESHWARI. ROBERT STACKOWIAK. IBM. HELEN SUN. MC GRAW HILL 5.Course details ▪ TEXT BOOKS No Textbook for this course. SUBHASHINI CHELLAPPAN. PROFESSIONAL HADOOP SOLUTIONS by BORIS LUBLINSKY. PAUL ZIKOPOULOS. MARK HORNICK. BIG DATA AND ANALYTICS by SEEMA ACHARYA. BRUCE NELSON. BRIAN MACDONALD. DEBRA HARDING. KHADER MOHIUDDIN. SMITH. WILEY 3. KEVIN T. ▪ REFERENCE BOOKS 1. GOKULA MISHRA. MC GRAW HILL 4. CHRIS EATON. Course Objectives • recognize the need and importance of fundamental concepts and principles of Big Data • examine internal functioning of different modules of Big Data and Hadoop • conceptualize the big data ecosystem and appreciate its key components . What you will learn? • Big Data Fundamentals provides a path for • Introduction to Big Data • Introduction to Hadoop • Installation of Hadoop • Hadoop Architecture • Hadoop Ecosystem • HIVE and HBASE . 6 Course Prerequisite • Prerequisite: • Java Programming / C++ • Database basics . storage. • The challenges include capture. and visualization. analysis. transfer. . search. 7 What’s Big Data? No single definition. curation. here is from Wikipedia: • Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. sharing. 8 Big Data: 3V’s . 8 zettabytes to 35zb • Data volume is increasing exponentially Exponential increase in collected/generated data . 9 Volume (Scale) • Data Volume • 44x increase from 2009 2020 • From 0. 3B in 2005) camera of tweet data phones every day world wide 100s of millions data every day of GPS ? TBs of enabled devices sold annually 25+ TBs of log data 2+ every day billion people on the Web 76 million smart by end meters in 2009… 2011 200M by 2014 . 4.6 30 billion RFID billion . 12+ TBs tags today (1. CERN’s Large Hydron Collider (LHC) generates 15 PB a year Maximilien Brice. © CERN . etc) To extract knowledge all these types of data need to linked together . weather. Semantic Web (RDF). … • Streaming Data • You can only scan the data once • A single application can be generating/collecting many types of data • Big Public Data (online. 2 Variety (Complexity) • Relational Data (Tables/Transaction/Legacy Data) • Text Data (Web) • Semi-structured Data (XML) • Graph Data • Social Network. finance. A Single View to the Customer Social Banking Media Finance Our Gaming Customer Known History Purchas Entertain e . 4 Velocity (Speed) • Data is begin generated fast and need to be processed fast • Online Data Analytics • Late decisions  missing opportunities • Examples • E-Promotions: Based on your current location. what you like  send promotions right now for store next to you • Healthcare monitoring: sensors monitoring your activities and body  any abnormal measurements require immediate reaction . your purchase history. analyze. visualize. summarize. 5 Real-time/Fast Data Mobile devices (tracking all objects all the time) Social media and networks Scientific instruments (all of us are generating data) (collecting all sorts of data) Sensor technology and networks (measuring all kinds of data) • The progress and innovation is no longer hindered by the ability to collect data • But. by the ability to manage. and discover knowledge from the collected data in a timely manner and in a scalable fashion . 6 Some Make it 4V’s . 7 Harnessing Big Data • OLTP: Online Transaction Processing (DBMSs) • OLAP: Online Analytical Processing (Data Warehousing) • RTAP: Real-Time Analytics Processing (Big Data Architecture & technology) . 8 The Model Has Changed… • The Model of Generating/Consuming Data has Changed Old Model: Few companies are generating data. all others are consuming data New Model: all of us are generating data. and all of us are consuming data . Very large datasets . 9 What’s driving Big Data .Optimizations and predictive analytics .Small to mid-size datasets .Structured data. typical sources .Complex statistical analysis . and many sources .All types of data.Ad-hoc querying and reporting .More of a real-time .Data mining techniques . . 1 Big Data Technology .
Copyright © 2024 DOKUMEN.SITE Inc.