The browser version you are using is not recommended for this site.Please consider upgrading to the latest version of your browser by clicking one of the following links.
We are sorry, This PDF is available in download format only
This reference architecture is for companies who are looking to build their own cloud computing infrastructure, including both enterprise IT organizations and cloud service providers or cloud hosting providers. The decision to use a cloud for the delivery of IT services is best done by starting with the knowledge and experience gained from previous work. This reference architecture gathers into one place the essentials of a Apache* Hadoop* cluster build out complete with benchmarking using TeraSort workload. This paper defines easy to use steps to replicate the deployment at your data center lab environment. The installation is based on Intel®-powered servers and creates a multi node, optimized Hadoop environment. The reference architecture contains details on the Hadoop topology, hardware and software deployed, installation and configuration steps, and tests for real-world use cases that should significantly reduce the learning curve for building and operating your first Hadoop infrastructure.
Apache Hive* overview
Apache MapReduce overview.
Apache Pig* overview.
Apache HDFS* overview.
Linda Feldt highlights big data research—video