它包括Common, Avro, MapReduce, HDFS, Pig, Hive, Hbase, ZooKeeper, Sqoop, OozieHadoop文件系统适合于有数据流(一次写,多次读)和运行的普通主机上的海量数据
但是Hadoop文件系统部适合运行延迟性输入,多次写以及随意修改的小文件HDFS 框架
the two main questionfirst, how to handle the mass data storage - HDFSsecond, how to analyze the mass data - MapReduceHadoop = The Hadoop projects
including Common, Avro, MapReduce, HDFS, Pig, Hive, Hbase, ZooKeeper, Sqoop, OozieHapood is suitable for very large files which possess streaming date access and run in commodity hardware.
but hadoop is not proper for small files which have low-latency date access, multiply writer, arbitrary modification. HDFS FrameBlock: default 64M(big, because for mass data)NameNode: contain catalogue of the file system, file info and according block info. (crucial)DateNode: store block info.HA strategy: 1.x just has one NameNode, and after 2.x, there is active-standy pattern of NameNode. MapReducewhich is progroming, using for parallel computation of mass data.For example, get max of the nice numbers.Firstly, using map function get max of three numbers respectively.you know that the data is stored by the HDFS.Secondly, using reduce function to get the maximum value. In conclusion, the HDFS provide the method that store mess data in many host, incluing some strategy.then Mapreduce analyze the data by divide and rule.