Hadoop MapReduce Cookbook

更新时间：2021-08-05 18:10:46

coverpage

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Support files eBooks discount offers and more

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Chapter 1. Getting Hadoop Up and Running in a Cluster

Introduction

Setting up Hadoop on your machine

Writing a WordCount MapReduce sample bundling it and running it using standalone Hadoop

Adding the combiner step to the WordCount MapReduce program

Setting up HDFS

Using HDFS monitoring UI

HDFS basic command-line file operations

Setting Hadoop in a distributed cluster environment

Running the WordCount program in a distributed cluster environment

Using MapReduce monitoring UI

Chapter 2. Advanced HDFS

Introduction

Benchmarking HDFS

Adding a new DataNode

Decommissioning DataNodes

Using multiple disks/volumes and limiting HDFS disk usage

Setting HDFS block size

Setting the file replication factor

Using HDFS Java API

Using HDFS C API (libhdfs)

Mounting HDFS (Fuse-DFS)

Merging files in HDFS

Chapter 3. Advanced Hadoop MapReduce Administration

Introduction

APP免费

Tuning Hadoop configurations for cluster deployments

APP免费

Running benchmarks to verify the Hadoop installation

APP免费

Reusing Java VMs to improve the performance

APP免费

Fault tolerance and speculative execution

APP免费

Debug scripts – analyzing task failures

APP免费

Setting failure percentages and skipping bad records

APP免费

Shared-user Hadoop clusters – using fair and other schedulers

APP免费

Hadoop security – integrating with Kerberos

APP免费

Using the Hadoop Tool interface

APP免费

Chapter 4. Developing Complex Hadoop MapReduce Applications

APP免费

Introduction

APP免费

Choosing appropriate Hadoop data types

APP免费

Implementing a custom Hadoop Writable data type

APP免费

Implementing a custom Hadoop key type

APP免费

Emitting data of different value types from a mapper

APP免费

Choosing a suitable Hadoop InputFormat for your input data format

APP免费

Adding support for new input data formats – implementing a custom InputFormat

APP免费

Formatting the results of MapReduce computations – using Hadoop OutputFormats

APP免费

Hadoop intermediate (map to reduce) data partitioning

APP免费

Broadcasting and distributing shared resources to tasks in a MapReduce job – Hadoop DistributedCache

APP免费

Using Hadoop with legacy applications – Hadoop Streaming

APP免费

Adding dependencies between MapReduce jobs

APP免费

Hadoop counters for reporting custom metrics

APP免费

Chapter 5. Hadoop Ecosystem

APP免费

Introduction

APP免费

Installing HBase

APP免费

Data random access using Java client APIs

APP免费

Running MapReduce jobs on HBase (table input/output)

APP免费

Installing Pig

APP免费

Running your first Pig command

APP免费

Set operations (join union) and sorting with Pig

APP免费

Installing Hive

APP免费

Running a SQL-style query with Hive

APP免费

Performing a join with Hive

APP免费

Installing Mahout

APP免费

Running K-means with Mahout

APP免费

Visualizing K-means results

APP免费

Chapter 6. Analytics

APP免费

Introduction

APP免费

Simple analytics using MapReduce

APP免费

Performing Group-By using MapReduce

APP免费

Calculating frequency distributions and sorting using MapReduce

APP免费

Plotting the Hadoop results using GNU Plot

APP免费

Calculating histograms using MapReduce

APP免费

Calculating scatter plots using MapReduce

APP免费

Parsing a complex dataset with Hadoop

APP免费

Joining two datasets using MapReduce

APP免费

Chapter 7. Searching and Indexing

APP免费

Introduction

APP免费

Generating an inverted index using Hadoop MapReduce

APP免费

Intra-domain web crawling using Apache Nutch

APP免费

Indexing and searching web documents using Apache Solr

APP免费

Configuring Apache HBase as the backend data store for Apache Nutch

APP免费

Deploying Apache HBase on a Hadoop cluster

APP免费

Whole web crawling with Apache Nutch using a Hadoop/HBase cluster

APP免费

ElasticSearch for indexing and searching

APP免费

Generating the in-links graph for crawled web pages

APP免费

Chapter 8. Classifications Recommendations and Finding Relationships

APP免费

Introduction

APP免费

Content-based recommendations

APP免费

Hierarchical clustering

APP免费