The Hadoop Distributed File System



Introduction
HDFS, the Hadoop Distributed File System, is a distributed file system designed to hold very large amounts of data (terabytes or even petabytes), and provide high-throughput access to this information. Files are stored in a redundant fashion across multiple machines to ensure their durability to failure and high availability to very parallel applications. This module introduces the design of this distributed file system and instructions on how to operate it.
Goals for this Module:
Understand the basic design of HDFS and how it relates to basic distributed file system concepts
Learn how to set up and use HDFS from the command line
Learn how to use HDFS in your applications

0 comments:

Post a Comment