CNIT131 Internet Basics & Beginning HTML Week 15

CNIT131 Internet Basics & Beginning HTML Week 15

CNIT131 Internet Basics & Beginning HTML Week 15 Big Data http://fog.ccsf.edu/~hyip What is Big Data? Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using on-hand data management tools or traditional data processing applications. (From Wikipedia)

Big Data refers to extremely vast amounts of multi-structured data that typically has been cost prohibitive to store and analyze. (My view) NOTE: However, big data is only referring to digital data, not the paper files stored in the basement at FBI headquarters, or piles of magnetic tapes in our data center. Big Data a Brief History Types of Big Data In the simplest terms, Big Data can be broken down into two basic

types, structured and unstructured data. Structured Predefined data type Spreadsheets and Oracle Relational database Unstructured is non pre-defined data model or is not organized in a pre-defined manner. Video, Audio, Images, Metadata, etc Semi-structured Structured data embedded with some unstructured data Email, Text Messaging

A Big Data Platform must: Analyze a variety of different types of information This information by itself could be unrelated, but when paired with other information can illustrate a causation for an various events that the business can take advantage. Analyze information in motion Various data types will be streaming and contain a large amount of "bursts". Ad hoc analysis needs to be done on the streaming data to search for relevant events. Cover extremely large volumes of data

Due to the proliferation of devices in the network, how they are used, along with customer interactions on smartphones and the web, a cost efficient process to analyze the petabytes of information is required A Big Data Platform must: (2) Cover varying types of data sources Data can be streaming, batch, structured, unstructured, and semistructured, depending on the information type, where it comes from and its primary use. Big Data must be able to accommodate all of these various types of data on a very large scale. Analytics Big Data must provide the mechanisms to allow ad-hoc queries, data

discovery and experimentation on the large data sets to effectively correlate various events and data types to get an understanding of the data that is useful and addresses business needs. Five Characteristics of Big Data Big Data is defined by five characteristics: Volume: Data created by and moving through todays services may describe tens of millions of customers, hundreds of millions of devices, and billions of transactions or statistical records. Such scale requires careful engineering, as it is necessary to carefully conserve even the number of CPU instructions or operating system events and network messages per data items. Parallel processing is a powerful tool to cope with scale. MapReduce computing frameworks like Hadoop and storage systems like HBASE and Cassandra provide low-cost, practical system

foundations. Analysis also requires efficient algorithms, because data in flight may only be observed one time, so conventional storage-based approaches may not work. Large volumes of data may require a mix of move the data to the processing and move the processing to the data architectural styles. Velocity: Timeliness is often critical to the value of Big Data. For example, online customers may expect promotions (coupons) received on a mobile device to reflect their current location, or they may expect recommendations to reflect their most recent purchases or media that was accessed. The business value of some data decays rapidly. Because raw data is often delivered in streams, or in small batches in near real-time, the requirement to deliver rapid results can be demanding and does not mesh well with conventional data warehouse technology. Five Characteristics of Big Data (2) Variety: Big Data often means integrating and processing multiple types of data. We can consider most data sources as structured,

semi-structured, or unstructured. Structured data refers to records with fixed fields and types. Unstructured data includes text, speech, and other multimedia. Semi-structured data may be a mixture of the above, such as web documents, or sparse records with many variants, such as personal medical records with well defined but complex types. Veracity: Data sources (even in the same domain) are of widely differing qualities, with significant differences in the coverage, accuracy and timeliness of data provided. Per IBM's Big Data website, one in three business leaders don't trust the information they use to make decisions. Establishing trust in big data presents a huge challenge as the variety and number of sources grows.. Variability: Beyond the immediate implications of having many types of data, the variety of data may also be reflected in the frequency with which new data sources and types are introduced. NOTE: Big Data generally includes data sets with sizes beyond the ability of commonly-used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target, from hundreds of terabytes to many petabytes of data

in a single data set. With this difficulty, a new tool sets has arisen to handle making sense over these large quantities of data. Big data is difficult to work with using relational databases, desktop statistics and visualization packages, requiring instead "massively parallel software running on tens, hundreds, or even thousands of servers". References Discovering the Internet: Complete, Jennifer Campbell, Course Technology, Cengage Learning, 5th Edition-2015, ISBN 978-1-28584540-1. Basics of Web Design HTML5 & CSS3, Second Edition, by Terry FelkeMorris, Peason, ISBN 978-0-13-312891-8. A Very Short History of Big Data.

Recently Viewed Presentations

  • Scarlet Runner - Molecular Cell

    Scarlet Runner - Molecular Cell

    Predicting Genes Web based programs can predict the genes in a sequence. These predicted genes are then checked against the NCBI database to see if they match any known gene or protein. Three gene predicting programs are: Genescan FGenesh GeneMark...
  • Resume & Cover Letter Workshop - Boston University

    Resume & Cover Letter Workshop - Boston University

    A summary of your professional, personal experiences and skills relevant to the position you are seeking. A reflection of your education, accomplishments, volunteer and community activities. Resumes and CV's are designed to assist in getting the interview, not the residency...
  • 758 Pelanggaran Dalam Pemungutan Dan Penghitungan Suara

    758 Pelanggaran Dalam Pemungutan Dan Penghitungan Suara

    Tahapan Pemungutan dan Penghitungan Suara dalam Pemilihan Umum merupakan tahapan yang paling krusial dan strategis bagi semua pihak. Bagi peserta pemilu, tahap ini akan menjadi pertaruhan apakah hasil jerih payah mereka selama masa kampanye akan diapresiasi positif oleh pemilih dengan...
  • Station 9 - acephysed.files.wordpress.com

    Station 9 - acephysed.files.wordpress.com

    Introducing Game Strategies to Primary School Children. Presented by Steve Clogstoun. PE and Health Co-Ordinator Willmott Park Primary School. To download this information please visit. www.acephysed.wordpress.com
  • Buddhism - Mr. Westwater's History Class

    Buddhism - Mr. Westwater's History Class

    Types of Buddhism. Zen Buddhism. Seeks sudden enlightenment through meditation, arriving at emptiness. Use of meditation masters. Beauty, art, and aesthetics, such as gardens & calligraphy. Tibetan Buddhism. Developed in Tibet in the 7c CE. A mix of Theravada and...
  • Mt.vesuvius

    Mt.vesuvius

    vesuvius is located in vesuvius,italy mt.vesuvius' elevation is 4,200 feet creation mt. vesuvius was created by the collapse of the somma rim 17,000 years ago 79 a.d. the eruption in 79 a.d. sent 10 feet of tephrite falling on to...
  • Paul Klee - Spring Brook Elementary School

    Paul Klee - Spring Brook Elementary School

    Paul Klee: Fish Facts. 1. His name is pronounced Paul "Clay." 2. He was born in 1879 in Switzerland, a country in Europe. 3. He painted between the years 1900 and 1940.
  • European Structural and Investment Funding (ESIF) AHSN ESIF/

    European Structural and Investment Funding (ESIF) AHSN ESIF/

    Higher Level Skills. Calls now open for HLS under ESF. E.g. Heart of the South West, £2.8m "Whilst this call is not limited to specific sectors, applicants should note that the following sectors have been identified by the Heart of...