导引
两个主要的问题如何存储海量数据如何分析海量数据Hadoop就是Hadoop项目
它包括Common, Avro, MapReduce, HDFS, Pig, Hive, Hbase, ZooKeeper, Sqoop, OozieHadoop文件系统适合于有数据流(一次写,多次读)和运行的普通主机上的海量数据
但是Hadoop文件系统部适合运行延迟性输入,多次写以及随意修改的小文件HDFS 框架
分块:默认64M(很大,因为用于海量数据)名字结点:含有文件系统的目录,文件信息以及相应的分块信息(很重要)数据结点:储存分块信息HA策略:1.x只能有一个名字结点,2.x之后就有针对名字结点的活动-待机模式MapReduce
就是用于处理并行计算海量数据的编程模式举个例子,求9个数字的最大值第一步,调用map函数得到每三个数的最大值,这三个数都是用Hadoop文件系统的方式储存的第二步,用reduce函数得到最大的值总结,Hadoop文件系统就是提供储存海量数据在多个主机上的方法,以及相应的策略
而Mapreduce就是用分而治之的思想来分析数据INTRODUCTORY
the two main questionfirst, how to handle the mass data storage - HDFSsecond, how to analyze the mass data - MapReduceHadoop = The Hadoop projects
including Common, Avro, MapReduce, HDFS, Pig, Hive, Hbase, ZooKeeper, Sqoop, OozieHapood is suitable for very large files which possess streaming date access and run in commodity hardware.
but hadoop is not proper for small files which have low-latency date access, multiply writer, arbitrary modification. HDFS FrameBlock: default 64M(big, because for mass data)NameNode: contain catalogue of the file system, file info and according block info. (crucial)DateNode: store block info.HA strategy: 1.x just has one NameNode, and after 2.x, there is active-standy pattern of NameNode. MapReducewhich is progroming, using for parallel computation of mass data.For example, get max of the nice numbers.Firstly, using map function get max of three numbers respectively.you know that the data is stored by the HDFS.Secondly, using reduce function to get the maximum value. In conclusion, the HDFS provide the method that store mess data in many host, incluing some strategy.then Mapreduce analyze the data by divide and rule.