Wednesday, June 20, 2012

Hive Maintenance

Hive log filling up tmp space

/tmp is usually the neglected folder in any unix environment, but there is where Hive is going to place all its log files. If you didnt have enough space allocated for this then your scheduled sqoop or hive queries are going to fail because of silly reason - no temp space. Not worth it..


1.       Create a cron to remove the /tmp/<username>/*.txt files frequenty
2.       Change the hive log directory to a different location which is monitored by ops ..
I I prefer the 2nd since this change will not remove any files in the process. 
t   To make the configuration change got to /etc/hive/conf/hive-site.xml and add the following property

<property>
  <name>hive.querylog.location</name>
  <value>/var/<username>/tmp</value>
  <description>Directory where structured hive query logs are created</description>
</property>



Freeing up unused HDFS memory

Two directories we can concentrate to free-up a lot of HDFS memory

1. /tmp/hive-<username>/
2.  /local/hadoop/mapred/staging/user/.staging/job*

If you have Hue you can cleanup some of long unused saved reports
   1. /tmp/hive-beeswax-<username>/

If you can limit the files by date less than the current month, you are safe..(since this is delete operation)
Hadoop, Hive and Hue does have demon or code to clean-up these folders (Im not sure) but 
Usually all these files are there because of orphaned mapreduce jobs. Like when you CTRL+C instead of formally  "hadoop -job kill <job_id>"

And ofcourse scanning the hdfs directory and hive directory for junk files will also help..

the following are default configuration directories

for beeswax cleanup -- hadoop fs -rmr /tmp/hive-beeswax-*/hive*   

for hive tmp --   hadoop fs -rmr /tmp/hive-bhchandr/hive*
older job files --  hadoop fs -rmr /local/hadoop/mapred/staging/<application_user>/.staging/job_2012*

Free Sqoop temp space

remove the compile folders for sqoop job
/tmp/sqoop-<username>/compile/

remove log files older than 1/2 day at jobtracker and data-nodes

 /var/log/hadoop-*-*/userlogs/*            

recover from safemode

Hadoop can go into safemode when the local directory mapped to HDFS is full.. this usually happens when your HDFS files usedup all the space.. but the trick is you cannot remove files unless you recover from safemode.

So first remove some local files from
/opt/local/hadoop/mapred/local/taskTracker
/opt/local/hadoop/mapred/local/taskTracker/distcache
/opt/local/hadoop/mapred/local/taskTracker/<username>

then
hadoop dfsadmin -safemode leave

since Hadoop is out of safemode.. you can cleanup some HDFS using the steps in the first sections


/var/log/hadoop-0.20/history/done