flume收集日志,写入hdfs

首先安装flume:

创新互联主营阜宁网站建设的网络公司,主营网站建设方案,App定制开发,阜宁h5微信小程序开发搭建,阜宁网站营销推广欢迎阜宁等地区企业咨询

建议和Hadoop保持统一用户来安装Hadoop,flume

本次我采用Hadoop用户安装flume

http://douya.blog.51cto.com/6173221/1860390

开始配置:

1,配置文件编写:

vim  flume_hdfs.conf

# Define a memory channel called ch2 on agent1

agent1.channels.ch2.type = memory

agent1.channels.ch2.capacity = 10000

agent1.channels.ch2.transactionCapacity = 100

#agent1.channels.ch2.keep-alive = 30

#define source monitor a file

agent1.sources.avro-source1.type = exec

agent1.sources.avro-source1.shell = /bin/bash -c

agent1.sources.avro-source1.command = tail -n +0 -F /root/logs/appcenter.log

agent1.sources.avro-source1.channels = ch2

agent1.sources.avro-source1.threads = 5

agent1.sources.avro-source1.interceptors = i1

agent1.sources.avro-source1.interceptors.i1.type = org.apache.flume.interceptor.TimestampInterceptor$Builder

# Define a logger sink that simply logs all events it receives

# and connect it to the other end of the same channel.

agent1.sinks.log-sink1.channel = ch2

agent1.sinks.log-sink1.type = hdfs

agent1.sinks.log-sink1.hdfs.path = hdfs://ns1/flume/%y%m%d

agent1.sinks.log-sink1.hdfs.writeFormat = events-

agent1.sinks.log-sink1.hdfs.fileType = DataStream

agent1.sinks.log-sink1.hdfs.rollInterval = 60

agent1.sinks.log-sink1.hdfs.rollSize = 134217728

agent1.sinks.log-sink1.hdfs.rollCount = 0

#agent1.sinks.log-sink1.hdfs.batchSize = 100000

#agent1.sinks.log-sink1.hdfs.txnEventMax = 100000

#agent1.sinks.log-sink1.hdfs.callTimeout = 60000

#agent1.sinks.log-sink1.hdfs.appendTimeout = 60000

# Finally, now that we've defined all of our components, tell

# agent1 which ones we want to activate.

agent1.channels = ch2

agent1.sources = avro-source1

agent1.sinks = log-sink1

2,前提是Hadoop集群正常启动,并且可用、

3,启动flume,开始收集日志

启动:

flume-ng agent -c /usr/local/ELK/apache-flume/conf/  -f /usr/local/ELK/apache-flume/conf/flume_hdfs.conf  -Dflume.root.logger=INFO,console -n agent1

错误一:

2016-12-06 11:24:49,036 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:234)] Creating hdfs://ns1/flume//FlumeData.1480994688831.tmp

2016-12-06 11:24:49,190 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:459)] process failed

java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName

at org.apache.hadoop.security.UserGroupInformation.getOSLoginModuleName(UserGroupInformation.java:366)

at org.apache.hadoop.security.UserGroupInformation.(UserGroupInformation.java:411)

at org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:2828)

at org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:2818)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2684)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)

at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)

at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:243)

at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235)

at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679)

at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)

at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.PlatformName

at java.net.URLClassLoader.findClass(URLClassLoader.java:381)

at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)

at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

... 16 more

 解决:

1,cd /usr/local/hadoop/hadoop/share/hadoop/common/

  cp  hadoop-common-2.7.3.jar

 

2, cd /usr/local/hadoop/hadoop/share/hadoop/common/lib

  hadoop-auth-2.7.3.jar

   将lib 目录下此jar包复制到flume客户端的lib目录下即可

但是会出现很多报错,所以懒人做法,将lib下所有的jar包拷贝到flume端的lib下

原因:因为flume要把数据写人hdfs,所以需要依赖一些hdfs的包

启动:

flume-ng agent -c /usr/local/ELK/apache-flume/conf/  -f /usr/local/ELK/apache-flume/conf/flume_hdfs.conf  -Dflume.root.logger=INFO,console -n agent1

错误二:

2016-12-06 11:36:52,791 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:234)] Creating hdfs://ns1/flume//FlumeData.1480995177465.tmp

2016-12-06 11:36:52,793 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:455)] HDFS IO error

java.io.IOException: No FileSystem for scheme: hdfs

at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)

at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)

at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)

at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)

at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:243)

at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235)

at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679)

at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)

at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

2016-12-06 11:36:57,798 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:234)] Creating hdfs://ns1/flume//FlumeData.1480995177466.tmp

2016-12-06 11:36:57,799 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:455)] HDFS IO error

java.io.IOException: No FileSystem for scheme: hdfs

分析,此时flume端识别不到hadoop的ns1

解决方法:

cd   /usr/local/hadoop/hadoop/etc/hadoop

将core-site.xml hdfs-site.xml 拷贝到 flume端的conf下

绑定Hadoop机器的机器

vim /etc/hosts

  172.16.9.250 hadoop1     

  172.16.9.252 hadoop2

  172.16.9.253 hadoop3

此时还差一个hadoop-hdfs-2.7.3.jar

下载地址:

https://www.versioneye.com/java/org.apache.hadoop:hadoop-hdfs/2.7.3

上传至flume lib目录下

再次成功启动

     flume-ng agent -c /usr/local/ELK/apache-flume/conf/  -f /usr/local/ELK/apache-flume/conf/flume_hdfs.conf  -Dflume.root.logger=INFO,console -n agent1

Info: Sourcing environment configuration script /usr/local/ELK/apache-flume/conf/flume-env.sh

Info: Including Hive libraries found via () for Hive access

+ exec /soft/jdk1.8.0_101/bin/java -Xmx20m -Dflume.root.logger=INFO,console -cp '/usr/local/ELK/apache-flume/conf:/usr/local/ELK/apache-flume/lib/*:/usr/local/ELK/apache-flume:/lib/*' -Djava.library.path= org.apache.flume.node.Application -f /usr/local/ELK/apache-flume/conf/flume_hdfs.conf -n agent1

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/local/ELK/apache-flume/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/local/ELK/apache-flume/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

2016-12-06 13:35:05,379 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:61)] Configuration provider starting

2016-12-06 13:35:05,384 (conf-file-poller-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:133)] Reloading configuration file:/usr/local/ELK/apache-flume/conf/flume_hdfs.conf

2016-12-06 13:35:05,391 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:log-sink1

2016-12-06 13:35:05,391 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:log-sink1

2016-12-06 13:35:05,392 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:log-sink1

2016-12-06 13:35:05,392 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:log-sink1

2016-12-06 13:35:05,392 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:931)] Added sinks: log-sink1 Agent: agent1

2016-12-06 13:35:05,392 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:log-sink1

2016-12-06 13:35:05,392 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:log-sink1

2016-12-06 13:35:05,392 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:log-sink1

2016-12-06 13:35:05,392 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:log-sink1

2016-12-06 13:35:05,403 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:141)] Post-validation flume configuration contains configuration for agents: [agent1]

2016-12-06 13:35:05,403 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:145)] Creating channels

2016-12-06 13:35:05,410 (conf-file-poller-0) [INFO - org.apache.flume.channel.DefaultChannelFactory.create(DefaultChannelFactory.java:42)] Creating instance of channel ch2 type memory

2016-12-06 13:35:05,417 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:200)] Created channel ch2

2016-12-06 13:35:05,418 (conf-file-poller-0) [INFO - org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:41)] Creating instance of source avro-source1, type exec

2016-12-06 13:35:05,456 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:42)] Creating instance of sink: log-sink1, type: hdfs

2016-12-06 13:35:05,465 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:114)] Channel ch2 connected to [avro-source1, log-sink1]

2016-12-06 13:35:05,472 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:138)] Starting new configuration:{ sourceRunners:{avro-source1=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:avro-source1,state:IDLE} }} sinkRunners:{log-sink1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@17517996 counterGroup:{ name:null counters:{} } }} channels:{ch2=org.apache.flume.channel.MemoryChannel{name: ch2}} }

2016-12-06 13:35:05,472 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:145)] Starting Channel ch2

2016-12-06 13:35:05,535 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:120)] Monitored counter group for type: CHANNEL, name: ch2: Successfully registered new MBean.

2016-12-06 13:35:05,535 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:96)] Component type: CHANNEL, name: ch2 started

2016-12-06 13:35:05,536 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:173)] Starting Sink log-sink1

2016-12-06 13:35:05,540 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:120)] Monitored counter group for type: SINK, name: log-sink1: Successfully registered new MBean.

2016-12-06 13:35:05,540 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:96)] Component type: SINK, name: log-sink1 started

2016-12-06 13:35:05,540 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:184)] Starting Source avro-source1

2016-12-06 13:35:05,541 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.source.ExecSource.start(ExecSource.java:169)] Exec source starting with command:tail -n +0 -F /opt/logs/appcenter.log

2016-12-06 13:35:05,543 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:120)] Monitored counter group for type: SOURCE, name: avro-source1: Successfully registered new MBean.

2016-12-06 13:35:05,543 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:96)] Component type: SOURCE, name: avro-source1 started

2016-12-06 13:35:05,934 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.HDFSDataStream.configure(HDFSDataStream.java:58)] Serializer = TEXT, UseRawLocalFileSystem = false

2016-12-06 13:35:06,279 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:234)] Creating hdfs://ns1/flume/161206/FlumeData.1481002505934.tmp

2016-12-06 13:35:06,586 (hdfs-log-sink1-call-runner-0) [WARN - org.apache.hadoop.util.NativeCodeLoader.(NativeCodeLoader.java:62)] Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

2016-12-06 13:36:07,615 (hdfs-log-sink1-roll-timer-0) [INFO - org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:363)] Closing hdfs://ns1/flume/161206/FlumeData.1481002505934.tmp

2016-12-06 13:36:07,710 (hdfs-log-sink1-call-runner-7) [INFO - org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:629)] Renaming hdfs://ns1/flume/161206/FlumeData.1481002505934.tmp to hdfs://ns1/flume/161206/FlumeData.1481002505934

2016-12-06 13:36:07,760 (hdfs-log-sink1-roll-timer-0) [INFO - org.apache.flume.sink.hdfs.HDFSEventSink$1.run(HDFSEventSink.java:394)] Writer callback called.


分享标题:flume收集日志,写入hdfs
文章位置:http://scyanting.com/article/jccesp.html