Hive log filling up tmp space
/tmp is usually the neglected folder in any unix environment, but there is where Hive is going to place all its log files. If you didnt have enough space allocated for this then your scheduled sqoop or hive queries are going to fail because of silly reason - no temp space. Not worth it..
Freeing up unused HDFS memory
Two directories we can concentrate to free-up a lot of HDFS memory
1. /tmp/hive-<username>/
2. /local/hadoop/mapred/staging/user/.staging/job*
If you have Hue you can cleanup some of long unused saved reports
1. /tmp/hive-beeswax-<username>/
If you can limit the files by date less than the current month, you are safe..(since this is delete operation)
Hadoop, Hive and Hue does have demon or code to clean-up these folders (Im not sure) but
Usually all these files are there because of orphaned mapreduce jobs. Like when you CTRL+C instead of formally "hadoop -job kill <job_id>"
And ofcourse scanning the hdfs directory and hive directory for junk files will also help..
the following are default configuration directories
for beeswax cleanup -- hadoop fs -rmr /tmp/hive-beeswax-*/hive*
for hive tmp -- hadoop fs -rmr /tmp/hive-bhchandr/hive*
older job files -- hadoop fs -rmr /local/hadoop/mapred/staging/<application_user>/.staging/job_2012*
Free Sqoop temp space
remove the compile folders for sqoop job
/tmp/sqoop-<username>/compile/
remove log files older than 1/2 day at jobtracker and data-nodes
So first remove some local files from
/opt/local/hadoop/mapred/local/taskTracker
/opt/local/hadoop/mapred/local/taskTracker/distcache
/opt/local/hadoop/mapred/local/taskTracker/<username>
then
hadoop dfsadmin -safemode leave
since Hadoop is out of safemode.. you can cleanup some HDFS using the steps in the first sections
/var/log/hadoop-0.20/history/done
/tmp is usually the neglected folder in any unix environment, but there is where Hive is going to place all its log files. If you didnt have enough space allocated for this then your scheduled sqoop or hive queries are going to fail because of silly reason - no temp space. Not worth it..
1.
Create a cron to remove the /tmp/<username>/*.txt files frequenty
2.
Change the hive log directory to a different
location which is monitored by ops ..
I I
prefer the 2nd since this change will not remove any files in the
process.
t To make the configuration change got to /etc/hive/conf/hive-site.xml and add the following property
<property>
<name>hive.querylog.location</name>
<value>/var/<username>/tmp</value>
<description>Directory where structured hive query logs are created</description>
</property>
Two directories we can concentrate to free-up a lot of HDFS memory
1. /tmp/hive-<username>/
2. /local/hadoop/mapred/staging/user/.staging/job*
If you have Hue you can cleanup some of long unused saved reports
1. /tmp/hive-beeswax-<username>/
If you can limit the files by date less than the current month, you are safe..(since this is delete operation)
Hadoop, Hive and Hue does have demon or code to clean-up these folders (Im not sure) but
Usually all these files are there because of orphaned mapreduce jobs. Like when you CTRL+C instead of formally "hadoop -job kill <job_id>"
And ofcourse scanning the hdfs directory and hive directory for junk files will also help..
the following are default configuration directories
for beeswax cleanup -- hadoop fs -rmr /tmp/hive-beeswax-*/hive*
for hive tmp -- hadoop fs -rmr /tmp/hive-bhchandr/hive*
older job files -- hadoop fs -rmr /local/hadoop/mapred/staging/<application_user>/.staging/job_2012*
Free Sqoop temp space
remove the compile folders for sqoop job
/tmp/sqoop-<username>/compile/
remove log files older than 1/2 day at jobtracker and data-nodes
/var/log/hadoop-*-*/userlogs/*
recover from safemode
Hadoop can go into safemode when the local directory mapped to HDFS is full.. this usually happens when your HDFS files usedup all the space.. but the trick is you cannot remove files unless you recover from safemode.So first remove some local files from
/opt/local/hadoop/mapred/local/taskTracker
/opt/local/hadoop/mapred/local/taskTracker/distcache
/opt/local/hadoop/mapred/local/taskTracker/<username>
then
hadoop dfsadmin -safemode leave
since Hadoop is out of safemode.. you can cleanup some HDFS using the steps in the first sections
/var/log/hadoop-0.20/history/done
One more directory that can get filled up is /tmp/ on the client. Hive is hardcoded to use this for some temporary data (https://cwiki.apache.org/Hive/adminmanual-configuration.html). This directory is cleaned up only if a hive job cleanly exits. A hive job does not cleanly exit if the EXIT statement is not called, atleast when running jobs using hive jdbc.
ReplyDeleteThanks for the comment Dilip.. properly exiting is the key here it seems for a mapreduce job or a Hive query..
ReplyDelete