Hive安裝配置及應(yīng)用
Hive是一個(gè)構(gòu)建在Hadoop上的數(shù)據(jù)倉(cāng)庫(kù)框架,需要的支持組件包括:HDFS、YARN、MapReduce。它是應(yīng)Facebook每天產(chǎn)生的海量新興社交網(wǎng)絡(luò)數(shù)據(jù)進(jìn)行管理和機(jī)器學(xué)習(xí)的需求而產(chǎn)生和發(fā)展的。
Hive提供一套類(lèi)SQL語(yǔ)言——HiveQL語(yǔ)言,可將HiveQL編譯成MapReduce程序并在YARN中執(zhí)行,支持HDFS文件數(shù)據(jù)和HBase數(shù)據(jù)記錄查詢(xún)。
一、Hive安裝
1、去www.apache.org下載Hive (apache-hive-3.1.2-bin.tar.gz)
http://www.apache.org/dyn/closer.cgi/hive/
2、解壓安裝包創(chuàng)建hive目錄和臨時(shí)目錄
mkdir /home/hive
mkdir /home/hive/tmp
tar -zxvf apache-hive-3.1.2-bin.tar.gz -C /home/hive
mv /home/hive/apache-hive-3.1.2-bin /home/hive/apache-hive-3.1.2 //將解壓后的文件名修改為apache-hive-3.1.2
解壓完后,注意最后沒(méi)有報(bào)錯(cuò)信息
tar: Exiting with failure status due to previous errors
如果有,則嘗試尋找原因重新解壓
3、修改環(huán)境變量
vi /etc/profile
添加以下行
export HIVE_HOME=/home/hive/apache-hive-3.1.2
export PATH=$PATH:$HIVE_HOME/bin
加載資源并執(zhí)行命令查看版本
source /etc/profile
/home/hive/apache-hive-3.1.2/bin/hive --version
如果提示Missing Hive Execution Jar,則可能是下載或解壓不完全導(dǎo)致
解壓不全可能的原因:
(1)下載文件錯(cuò)誤,在windows中解壓試試
(2)傳輸錯(cuò)誤,重新將文件傳給linux試試
(3)虛擬硬盤(pán)空間已滿(mǎn),查看虛擬硬盤(pán)剩余空間:df -h
4、進(jìn)入hive安裝目錄下的conf目錄
cd /home/hive/apache-hive-3.1.2/conf
從模板復(fù)制一個(gè)配置文件并打開(kāi)
cp hive-env.sh.template hive-env.sh
vi hive-env.sh
在文件末尾增加(以實(shí)際的hadoop和hive版本為主)
export HADOOP_HOME=/home/hadoop/hadoop-3.1.2
export HIVE_CONF_DIR=/home/hive/apache-hive-3.1.2/conf
不配置數(shù)據(jù)庫(kù)Hive即缺省采用Derby數(shù)據(jù)庫(kù)
5、啟動(dòng)Hadoop HDFS和YARN,確保HDFS和YARN能正常工作
cd /home/hadoop/hadoop-3.1.2
sbin/start-all.sh
6、初次執(zhí)行時(shí),先初始化derby元數(shù)據(jù)庫(kù)
cd進(jìn)入hive安裝目錄:
cd /home/hive/apache-hive-3.1.2
初始化derby元數(shù)據(jù)庫(kù)
bin/schematool -initSchema -dbType derby
Derby會(huì)在第一次啟動(dòng)初始化命令時(shí),在當(dāng)前所在目錄創(chuàng)建元數(shù)據(jù)庫(kù)文件(metastore_db)
注意:執(zhí)行hive會(huì)在當(dāng)前所在目錄尋找元數(shù)據(jù)文件(metastore_db),必須先cd進(jìn)入第6步的目錄,否則會(huì)在原地創(chuàng)建一個(gè)空metastore_db并報(bào)找不到元數(shù)據(jù)表的錯(cuò)誤
執(zhí)行命令 bin/hive?進(jìn)入Hive命令行界面
二、Hive命令行應(yīng)用(注意單條命令以 ; 結(jié)尾!?。?/strong>
內(nèi)部表應(yīng)用
(1) 創(chuàng)建內(nèi)部表Student,要求的字段和類(lèi)型: id int, name string, sex string, age int,執(zhí)行insert語(yǔ)句錄入以下數(shù)據(jù):

create table Student(id int,name string,sex string,age int);
insert into Student values(1,"Tom","M",18),(2,"Trill","M",19),(3,"Bell","M",20),(4,"Lisa","F",19),(5,"Willy","F",18);
(2) 執(zhí)行查詢(xún)
查詢(xún)所有男生
select * from Student where sex="M";
查詢(xún)大于18歲的學(xué)生,并按年齡升序排列
select * from Student where sex="M" and age>18 order by age asc;
2.外部表應(yīng)用
(1) 將本實(shí)驗(yàn)數(shù)據(jù)上傳至HDFS,數(shù)據(jù)為stock-daily30d2.txt,此數(shù)據(jù)修改了實(shí)驗(yàn)4中stock-daily數(shù)據(jù)中的日期格式以支持Hive中的Date數(shù)據(jù)類(lèi)型:
https://download.csdn.net/download/m0_69488210/87961142
stock-daily數(shù)據(jù)說(shuō)明:
[‘code’,‘isst’,open’, ‘close’, ‘high’, ‘low’, ‘volume’, ‘money’, ‘factor’, ‘high_limit’, ‘low_limit’, ‘a(chǎn)vg’, ‘paused’, ‘date’]
code 股票代碼
isst 是否ST
open 開(kāi)盤(pán)價(jià)
close 收盤(pán)價(jià)
high 當(dāng)日最高價(jià)
low 當(dāng)日最低價(jià)
volume 交易量(手)
money 交易量(萬(wàn)元)
factor 除權(quán)比例
high_limit 漲停價(jià)
low_limit 跌停價(jià)
avg 每日均價(jià)
paused 是否停牌
date 日期
在hdfs中新建/hivedata/stock-daily目錄,并把stock-daily30d.txt上傳到/hivedata/stock-daily目錄中
hadoop fs -mkdir /hivedata
hadoop fs -mkdir /hivedata/stock-daily
hadoop fs -put stock-daily30d.txt /hivedata/stock-daily
(2) 創(chuàng)建外部表Stock,具體的字段名稱(chēng)和類(lèi)型為:code string, isst boolean, open float, close float, high float, low float, volume float, money float, factor float, high_limit float, low_limit float, avg float, paused boolean, trade_date date;注意日期字段不要用date或time,可使用trade_date以避免語(yǔ)法錯(cuò)誤。
create external table Stock(code string, isst boolean, open float, close float, high float, low float, volume float, money float, factor float, high_limit float, low_limit float, avg float, paused boolean, trade_date date) row format delimited fields terminated by '\t' location '/hivedata/stock-daily';

(3) 執(zhí)行查詢(xún)
查詢(xún)上證所的股票數(shù)量,代碼以.SH結(jié)尾的股票
select count(*) from Stock where code like "%.SH";
參考SQL語(yǔ)句group語(yǔ)句,查詢(xún)每日市場(chǎng)總交易量(萬(wàn)元)
結(jié)果須包含日期和對(duì)應(yīng)的中交易量數(shù)據(jù)
結(jié)果按時(shí)間做降序排列
市場(chǎng)總交易量為當(dāng)天所有股票交易量(萬(wàn)元)之和
select trade_date,sum(money) from Stock group by trade_date order by trade_date desc;
(4) 學(xué)習(xí)與思考:計(jì)算實(shí)驗(yàn)4中,股票5日滾動(dòng)收益率
_1_?搜索所有交易日,并對(duì)交易日編號(hào)(保存中間數(shù)據(jù)到tradedate表)
CREATE TABLE TradeDate as (select row_number() over(order by 1) as date_id, s.trade_date from (select distinct(trade_date) trade_date from Stock where trade_date is not NULL order by trade_date) s);
_2_?利用連接查詢(xún),搜索提取計(jì)算所需股票代碼和收盤(pán)價(jià),并將交易日標(biāo)記上對(duì)應(yīng)編號(hào)(保存中間數(shù)據(jù)到closewithid表)
CREATE TABLE CloseWithDateId as (select s.code, s.close, s.trade_date, t.date_id from Stock as s join TradeDate as t on s.trade_date=t.trade_date);
_3_?利用自連接,讓股票當(dāng)日收益與改股票前第五日收益連接在同一行中,并計(jì)算五日收益率,此計(jì)算對(duì)每個(gè)股票和交易日都生效(保存中間數(shù)據(jù)到rollreturn表)
CREATE TABLE RollEarns as (select c2.code, c1.close prev_close, c2.close curr_close, round(((c2.close-c1.close)/c1.close)*100,2) earn_rate, c2.trade_date from CloseWithDateId as c1 join CloseWithDateId as c2 on (c1.date_id == c2.date_id + 4) and c1.code == c2.code order by c2.trade_date);
_4_?查詢(xún)某只股票的五日滾動(dòng)收益率
select * from RollEarns where code=='688800.SH' order by trade_date desc;