文章目录
  1. Hive-3_DML使用
  2. 1. Loading files into tables
  3. 2. Inserting data into Hive Tables from queries
  4. 3. Writing data into the filesystem from queries
  5. 4. Inserting values into tables from SQL
  6. 5. Update
  7. 6. Delete

Hive-3_DML使用

[TOC]

原文参考

1. Loading files into tables

load 数据到hive表中是不会对数据进行转化的, 其实load的操作就是数据文件直接move/copy到hive表中。

语法

1
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]

说明

  • filepath 可以是
    • 一个相对路径, 如 project/data1
    • 绝对路径, 比如 /user/hive/project/data1
    • 一个完整的uri路径, 比如 hdfs://namenode:9000/user/hive/project/data1
  • 加载数据的目标可以是一个表或分区。如果表是分区表, 那么必须指定需要导入的分区。比如:
1
load data local inpath '/Users/icemimosa/orders.txt' into table orders partition (pt="2016-01-02");
  • filepath 可以是一个文件, 也可以是一个目录(会加载目录下所有的文件)。
  • 如果指定 LOCAL 关键字
    • 如果是相对路径, 那么是相对于当前hive的工作目录。也可以是一个绝对路径。也可以是一个完整uri的路径,如 file:///user/hive/project/data1
  • 如果没有指定 LOCAL, hive会去使用完整的uri
    • 如果uri协议未指定, 那么会去找hadoop的配置 fs.default.name (fs.defaultFS) 的值
    • 如果路径不是绝对的,那么hive就会将路径相对于 /user/<username>
  • 如果指定了 OVERWRITE 关键字,那么目标table(partition)将会先清空数据,然后在加载数据
    • 如果目标已经含有了一个 文件名称filepath 中相同的文件,那么已存在的文件将会被新文件替换。

注意

  • filepath 不能包含子目录
  • 如果没有指定 LOCALfilepath必须和hive的table(partition)在同一个文件系统
  • 如果数据是被压缩的, 参照 CompressedStorage

2. Inserting data into Hive Tables from queries

查询的结果可以通过 insert 插入到指定表中

语法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
## 标准的语法
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] (z,y) select_statement1 FROM from_statement;

## Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2]
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...;
FROM from_statement
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2]
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] ...;

## Hive extension (dynamic partition inserts):
INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;

说明

  • INSERT OVERWRITE 将会覆盖所有已存在的数据(除非指定 IF NOT EXISTS)
  • 如果目标是分区表,那么在insert的时候必须指定partition参数
  • As of Hive 1.1.0 the TABLE keyword is optional.
  • As of Hive 1.2.0 each INSERT INTO T can take a column list like INSERT INTO T (z, x, c1)

Dynamic Partition Inserts 参考

3. Writing data into the filesystem from queries

查询的结果可以写入到文件系统中

语法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
## Standard syntax:
INSERT OVERWRITE [LOCAL] DIRECTORY directory1
[ROW FORMAT row_format] [STORED AS file_format] (Note: Only available starting with Hive 0.11.0)
SELECT ... FROM ...

## Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
[INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ...


row_format
: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char] (Note: Only available starting with Hive 0.13)

说明

  • 目录可以是一个完整的uri,如果没有指定uri协议, 那么hive会去找hadoop的配置 fs.default.name (fs.defaultFS) 的值
  • 如果使用了 LOCAL,那么数据会写入到本地文件系统

4. Inserting values into tables from SQL

使用 INSERT...VALUES 将数据插入到表中

语法

1
2
3
4
5
6
## Standard Syntax:
INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values_row [, values_row ...]

## Where values_row is:
( value [, value ...] )
## where a value is either null or any valid SQL literal

5. Update

1
UPDATE tablename SET column = value [, column = value ...] [WHERE expression]
  • partition的column不能被更新
  • Bucketing columns 也不能被更新

6. Delete

1
DELETE FROM tablename [WHERE expression]
文章目录
  1. Hive-3_DML使用
  2. 1. Loading files into tables
  3. 2. Inserting data into Hive Tables from queries
  4. 3. Writing data into the filesystem from queries
  5. 4. Inserting values into tables from SQL
  6. 5. Update
  7. 6. Delete