Hadoop MapReduce（WordCount） Java编程-阿里云开发者社区

Hadoop MapReduce（WordCount） Java编程

2017-11-16 1111

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介：

编写WordCount程序数据如下：

hello beijing

hello shanghai

hello chongqing

hello tianjin

hello guangzhou

hello shenzhen

...

1、WCMapper：

package com.hadoop.testHadoop;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Mapper;

//4个泛型中，前两个是指定mapper输入数据的类型，KEYIN是输入的key的类型，VALUEIN是输入的value的类型

//map 和 reduce 的数据输入输出都是以 key-value对的形式封装的

//默认情况下，框架传递给我们的mapper的输入数据中，key是要处理的文本中一行的起始偏移量，value为这一行的内容

//LongWritable Text 是hadoop为了序列化定义的数据类型

public class WCMapper extends Mapper<LongWritable,Text,Text,LongWritable>{

//mapreduce框架每读一行数据就调用一次该方法

@Override

protected void map(LongWritable key, Text value,Context context) throws IOException, InterruptedException {

String line=value.toString();

String [] words = line.split(" ");

for(String word:words){

context.write(new Text(word), new LongWritable(1));

}

2、WCReducer：

package com.hadoop.testHadoop;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Reducer;

public class WCReducer extends Reducer<Text, LongWritable, Text, LongWritable>{

//框架在map处理完成之后，将所有kv对缓存起来，进行分组，然后传递一个组<key,valus{}>，调用一次reduce方法

//<hello,{1,1,1,1,1,1.....}>

@Override

protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws java.io.IOException ,InterruptedException {

long count=0;

for(LongWritable value:values){

count+=value.get();

}

context.write(key, new LongWritable(count));

}

3、WCRunner:

package com.hadoop.testHadoop;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WCRunner {

public static void main(String[] args) throws Exception {

Configuration conf=new Configuration();

Job job = Job.getInstance(conf);

//设置整个job所用的那些类在哪个jar包

job.setJarByClass(WCRunner.class);

job.setMapperClass(WCMapper.class);

job.setReducerClass(WCReducer.class);

//map输出数据kv类型

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(LongWritable.class);

//reduce输出数据kv类型

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(LongWritable.class);

//执行输入数据的路径

FileInputFormat.setInputPaths(job, new Path("/wordcount/inpput"));

//执行输出数据的路径

FileOutputFormat.setOutputPath(job, new Path("/wordcount/outputmy"));

//将job提交给集群运行

job.waitForCompletion(true);

}

本文转自lzf0530377451CTO博客，原文链接：http://blog.51cto.com/8757576/1839294，如需转载请自行联系原作者

Hadoop MapReduce（WordCount） Java编程

热门文章

最新文章

相关课程

相关电子书

相关实验场景