We usually use information_schema or database metadata tables to query the tables, columns, indexes ..in the case of traditional databases. How about the same in Hive?
In the case of Hive there is a metastore which acts as a metadata for the databases, Hive uses this database to store the tables, partitions, databases, serde in this database. Say you want to know the tables in a database or physical(HDFS) location of the tables or similar column names across the tables..
I have no idea about the default derby DB.. so letme talk about metastore in a traditional database like Mysql..( If you dont have your metastore in an external DB you can follow this link to do so.. https://ccp.cloudera.com/display/CDHDOC/Hive+Installation#HiveInstallation-ConfiguringtheHiveMetastore)
Hive metastore has minimal tables when compared to the metadata layer of a traditional database but Im sure the metadata schema will get bigger and complex in future.
Lets go through the most desirable tables ...
COLUMNS
DBS -- the list of schemas
PARTITIONS
TBLS - tables
Querying these tables will give a basic idea and the datamodel of Hive metastore ( I couldnt find any over the internet), understanding of these tables are essential if you are seriously into Hive and have some production data.
for eg.. In a weird case we found some partitions of one database is being loaded into HDS location of another database (may be a code issue ) but using the following query Im able to narrow down to the affected tables..
select distinct a.tbl_id,a.tbl_name
from TBLS a
join PARTITIONS c on (a.TBL_ID=c.TBL_ID)
join SDS b on (c.SD_ID=b.SD_ID)
where b.location like '%sbx.db%' and a.DB_ID=1;
In the case of Hive there is a metastore which acts as a metadata for the databases, Hive uses this database to store the tables, partitions, databases, serde in this database. Say you want to know the tables in a database or physical(HDFS) location of the tables or similar column names across the tables..
I have no idea about the default derby DB.. so letme talk about metastore in a traditional database like Mysql..( If you dont have your metastore in an external DB you can follow this link to do so.. https://ccp.cloudera.com/display/CDHDOC/Hive+Installation#HiveInstallation-ConfiguringtheHiveMetastore)
Hive metastore has minimal tables when compared to the metadata layer of a traditional database but Im sure the metadata schema will get bigger and complex in future.
Lets go through the most desirable tables ...
COLUMNS
DBS -- the list of schemas
PARTITIONS
TBLS - tables
Querying these tables will give a basic idea and the datamodel of Hive metastore ( I couldnt find any over the internet), understanding of these tables are essential if you are seriously into Hive and have some production data.
for eg.. In a weird case we found some partitions of one database is being loaded into HDS location of another database (may be a code issue ) but using the following query Im able to narrow down to the affected tables..
select distinct a.tbl_id,a.tbl_name
from TBLS a
join PARTITIONS c on (a.TBL_ID=c.TBL_ID)
join SDS b on (c.SD_ID=b.SD_ID)
where b.location like '%sbx.db%' and a.DB_ID=1;
No comments:
Post a Comment