Computer-Science

Big Data Technologies Based on MapReduce and Hadoop

3Vs
- Volume: 데이터의 크기
- Velocity: 데이터의 생성/처리 속도
- Variety: 데이터의 다양성 (Structured, Semi-structured, Unstructured)
+ 2Vs (2010~)
- Veracity: 데이터의 진실성 (truthfulness), 정확성, 불확실성, Noise, 오류
- Value: 데이터의 가치

DB 작업의 종류

Hadoop : High-Availability Distributed Object-Oriented Platform

HDFS: Hadoop Distributed File System - 분산 데이터 저장
YARN: Hadoop의 resource manager - 분산 클러스터 리소스 관리
MapReduce2: breaking up the JobTracker into a few different services, it avoids many of the scaling issues faced by MRv1
Spark2: RDD-based computing framework. (고속처리)
Tez: A Framework for YARN-based, Data Processing Applications In Hadoop
Hive: SQL-like interface to query
HBase: NoSQL DB on top of Hadoop
Pig: script language to run MapReduce jobs on Hadoop
Oozie: workflow scheduler to manage Hadoop jobs
Zookeeper: centralized operational services for a Hadoop cluster
Storm: real-time stream analytic system
Flume: collecting, aggregating, moving unstructured data like log data to Hadoop
Kafka: Stream data processing platform. Distributed message broker
Sqoop: RDBMS ↔ Hadoop. Import, Export Strucured data
Solr: 검색 인덱싱
Zeppelin: web-based notebook for interactive data analysis

Fundamentals of Database Systems 7th Edition by Ramez Elmasri, Shamkant B. Navathe.

This site is open source. Improve this page.