大数据开发工程师-第十二周 电商数据仓库之所有表概况


第十二周 综合项目:电商数据仓库之所有表概况

用户行为数仓所有表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
{
"uid":1001, //用户ID
"xaid":"ab25617-c38910-m2991", //手机设备ID
"platform":2, //设备类型, 1:Android-APP, 2:IOS-APP, 3:PC
"ver":"3.5.10", //大版本号
"vercode":"35100083", //子版本号
"net":1, //网络类型, 0:未知, 1:WIFI, 2:2G , 3:3G, 4:4G, 5:5G
"brand":"iPhone", //手机品牌
"model":"iPhone8", //机型
"display":"1334x750", //分辨率
"osver":"ios13.5", //操作系统版本号
"data":[ //用户行为数据
{"act":1,"acttime":1592486549819,"ad_status":1,"loading_time":100},
{"act":2,"acttime":1592486549819,"goods_id":"2881992"}
]
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
act代表具体的用户行为,在这列出来几种
act=1:打开APP
属性 含义
act 用户行为类型
acttime 数据产生时间(时间戳)
ad_status 开屏广告展示状态, 1:成功 2:失败
loading_time 开屏广告加载耗时(单位毫秒)

act=2:点击商品
属性 含义
act 用户行为类型
acttime 数据产生时间(时间戳)
goods_id 商品ID
location 商品展示顺序:在列表页中排第几位,从0开始

act=3:商品详情页
属性 含义
act 用户行为类型
acttime 数据产生时间(时间戳)
goods_id 商品ID
stay_time 页面停留时长(单位毫秒)
loading_time 页面加载耗时(单位毫秒)

act=4:商品列表页
属性 含义
act 用户行为类型
acttime 数据产生时间(时间戳)
loading_time 页面加载耗时(单位毫秒)
loading_type 加载类型:1:读缓存 2:请求接口
goods_num 列表页加载商品数量

act=5:app崩溃数据
属性 含义
act 用户行为类型
acttime 数据产生时间(时间戳)

image-20230405222432071

dwd层

dwd_user_active

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
create external table if not exists dwd_mall.dwd_user_active(
user_id bigint,
xaid string,
platform tinyint,
ver string,
vercode string,
net bigint,
brand string,
model string,
display string,
osver string,
acttime bigint,
ad_status tinyint,
loading_time bigint
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/dwd/user_active/';

dwd_click_good

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
create external table if not exists dwd_mall.dwd_click_good(
user_id bigint,
xaid string,
platform tinyint,
ver string,
vercode string,
net bigint,
brand string,
model string,
display string,
osver string,
acttime bigint,
goods_id bigint,
location tinyint
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/dwd/click_good/';

dwd_good_item

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
create external table if not exists dwd_mall.dwd_good_item(
user_id bigint,
xaid string,
platform tinyint,
ver string,
vercode string,
net bigint,
brand string,
model string,
display string,
osver string,
acttime bigint,
goods_id bigint,
stay_time bigint,
loading_time bigint
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/dwd/good_item/';

dwd_good_list

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
create external table if not exists dwd_mall.dwd_good_list(
user_id bigint,
xaid string,
platform tinyint,
ver string,
vercode string,
net bigint,
brand string,
model string,
display string,
osver string,
acttime bigint,
loading_time bigint,
loading_type tinyint,
goods_num tinyint
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/dwd/good_list/';

dwd_app_close

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
create external table if not exists dwd_mall.dwd_app_close(
user_id bigint,
xaid string,
platform tinyint,
ver string,
vercode string,
net bigint,
brand string,
model string,
display string,
osver string,
acttime bigint
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/dwd/app_close/';

dws层

dws_user_active_history

1
2
3
4
5
6
7
create external table if not exists dws_mall.dws_user_active_history(
xaid string,
times int
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/dws/user_active_history';

dws_user_new_item

1
2
3
4
5
6
create external table if not exists dws_mall.dws_user_new_item(
xaid string
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/dws/user_new_item';
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
insert overwrite table dws_mall.dws_user_active_${dt}_tmp select
xaid,
count(*) as times
from dwd_mall.dwd_user_active
where dt = '${dt}'
group by xaid;

--注意:考虑到脚本重跑的情况,所以在这开面每次执行的时候都会先删除dws_user_active_history表中指定分区的数据
--因为在计算每日新增用户的时候需要和dws_user_active_history进行关联查询
alter table dws_mall.dws_user_active_history drop partition(dt='${dt}');


insert overwrite table dws_mall.dws_user_new_item partition(dt='${dt}') select
duat.xaid
from dws_mall.dws_user_active_${dt}_tmp duat
left join (select xaid from dws_mall.dws_user_active_history group by xaid) duah
on duat.xaid = duah.xaid
where duah.xaid is null;

insert overwrite table dws_mall.dws_user_active_history partition(dt='${dt}') select
xaid,
times
from dws_mall.dws_user_active_${dt}_tmp;

dws_user_lost_item

1
2
3
4
5
6
create external table if not exists dws_mall.dws_user_lost_item(
xaid string
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/dws/user_lost_item';
1
2
3
4
5
6
insert overwrite table dws_mall.dws_user_lost_item partition(dt='${dt}') select
xaid
from dws_mall.dws_user_active_history
where dt >= regexp_replace(date_add('${dt_new}',-7),'-','')
group by xaid
having max(dt) = regexp_replace(date_add('${dt_new}',-7),'-','');

dws_user_platform_distrib

1
2
3
4
5
6
7
create external table if not exists dws_mall.dws_user_platform_distrib(
ty string,
num int
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/dws/user_platform_distrib';

dws_user_android_osver_distrib

1
2
3
4
5
6
7
create external table if not exists dws_mall.dws_user_android_osver_distrib(
ty string,
num int
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/dws/user_android_osver_distrib';

dws_user_ios_osver_distrib

1
2
3
4
5
6
7
create external table if not exists dws_mall.dws_user_ios_osver_distrib(
ty string,
num int
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/dws/user_ios_osver_distrib';

dws_user_brand_distrib

1
2
3
4
5
6
7
create external table if not exists dws_mall.dws_user_brand_distrib(
ty string,
num int
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/dws/user_brand_distrib';

dws_user_model_distrib

1
2
3
4
5
6
7
create external table if not exists dws_mall.dws_user_model_distrib(
ty string,
num int
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/dws/user_model_distrib';

dws_user_net_distrib

1
2
3
4
5
6
7
create external table if not exists dws_mall.dws_user_net_distrib(
ty string,
num int
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/dws/user_net_distrib';
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
insert overwrite table dws_mall.dws_user_platform_distrib partition(dt='${dt}') select
case platform
when 1 then 'android'
when 2 then 'ios'
end ty,
count(*) as num
from dwd_mall.dwd_user_active
where dt = '${dt}' and platform in (1,2)
group by platform;



insert overwrite table dws_mall.dws_user_android_osver_distrib partition(dt='${dt}') select
osver as ty,
count(*) as num
from dwd_mall.dwd_user_active
where dt = '${dt}' and platform = 1
group by osver;


insert overwrite table dws_mall.dws_user_ios_osver_distrib partition(dt='${dt}') select
osver as ty,
count(*) as num
from dwd_mall.dwd_user_active
where dt = '${dt}' and platform = 2
group by osver;


insert overwrite table dws_mall.dws_user_brand_distrib partition(dt='${dt}') select
brand as ty,
count(*) as num
from dwd_mall.dwd_user_active
where dt = '${dt}'
group by brand;


insert overwrite table dws_mall.dws_user_model_distrib partition(dt='${dt}') select
model as ty,
count(*) as num
from dwd_mall.dwd_user_active
where dt = '${dt}'
group by model;



insert overwrite table dws_mall.dws_user_net_distrib partition(dt='${dt}') select
case net
when 0 then '未知'
when 1 then 'WIFI'
when 2 then '2G'
when 3 then '3G'
when 4 then '4G'
when 5 then '5G'
end ty,
count(*) as num
from dwd_mall.dwd_user_active
where dt = '${dt}'
group by net;

app层

app_user_new_count

1
2
3
4
5
6
create external table if not exists app_mall.app_user_new_count(
num int
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/app/user_new_count';

app_user_new_count_ratio

1
2
3
4
5
6
7
8
create external table if not exists app_mall.app_user_new_count_ratio(
num int,
day_ratio double,
week_ratio double
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/app/user_new_count_ratio';

app_user_active_count

1
2
3
4
5
6
create external table if not exists app_mall.app_user_active_count(
num int
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/app/user_active_count';

app_user_active_count_ratio

1
2
3
4
5
6
7
8
create external table if not exists app_mall.app_user_active_count_ratio(
num int,
day_ratio double,
week_ratio double
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/app/user_active_count_ratio';
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
insert overwrite table app_mall.app_user_active_count partition(dt='${dt}') select
count(*) as num
from dws_mall.dws_user_active_history
where dt = '${dt}';

insert overwrite table app_mall.app_user_active_count_ratio partition(dt='${dt}') select
num,
(num-num_1)/num_1 as day_ratio,
(num-num_7)/num_7 as week_ratio
from(
select
dt,
num,
lead(num,1) over(order by dt desc) as num_1,
lead(num,7) over(order by dt desc) as num_7
from app_mall.app_user_active_count
where dt >=regexp_replace(date_add('${dt_new}',-7),'-','')
) as t
where dt = '${dt}';

app_user_lost_count

1
2
3
4
5
6
7
8
9
10
11
create external table if not exists app_mall.app_user_lost_count(
num int
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/app/user_lost_count';

insert overwrite table app_mall.app_user_lost_count partition(dt='20260201') select
count(*) as num
from dws_mall.dws_user_lost_item
where dt = '20260201';

app_user_open_app_count

1
2
3
4
5
6
7
create external table if not exists app_mall.app_user_open_app_count(
pv int,
uv int
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/app/user_open_app_count';

app_user_open_app_distrib

1
2
3
4
5
6
7
8
create external table if not exists app_mall.app_user_open_app_distrib(
ts_1 int,
ts_2 int,
ts_3_m int
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/app/user_open_app_distrib';
1
2
3
4
5
6
7
8
9
10
11
12
insert overwrite table app_mall.app_user_open_app_count partition(dt='${dt}') select
sum(times) as pv,
count(*) as uv
from dws_mall.dws_user_active_history
where dt = '${dt}';

insert overwrite table app_mall.app_user_open_app_distrib partition(dt='${dt}') select
sum( case when times = 1 then 1 else 0 end) ts_1,
sum( case when times = 2 then 1 else 0 end) ts_2,
sum( case when times >= 3 then 1 else 0 end) ts_3_m
from dws_mall.dws_user_active_history
where dt = '${dt}';

商品订单数仓所有表

1
2
3
4
5
6
7
8
9
10
11
12
13
接下来需要生成商品订单相关数据,这些数据都是存储在mysql中的
注意:MySQL在这里我使用的版本是8.x

相关表名为:
订单表:user_order
商品信息表:goods_info
订单商品表:order_item
商品类目码表:category_code
订单收货表:order_delivery
支付流水表:payment_flow
用户收货地址表:user_addr
用户信息表:user
用户扩展表:user_extend

image-20230330235704680

image-20230331012207048

dwd层

dwd_user

1
2
3
4
5
6
7
8
9
10
11
12
13
create external table if not exists dwd_mall.dwd_user(
user_id bigint,
user_name string,
user_gender tinyint,
user_birthday string,
e_mail string,
mobile string,
register_time string,
is_blacklist tinyint
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/dwd/user/';

dwd_user_extend

1
2
3
4
5
6
7
8
9
10
11
12
13
14
create external table if not exists dwd_mall.dwd_user_extend(
user_id bigint,
is_pregnant_woman tinyint,
is_have_children tinyint,
is_have_car tinyint,
phone_brand string,
phone_cnt int,
change_phone_cnt int,
weight int,
height int
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/dwd/user_extend/';

dwd_user_addr

1
2
3
4
5
6
7
8
9
10
11
create external table if not exists dwd_mall.dwd_user_addr(
addr_id bigint,
user_id bigint,
addr_name string,
order_flag tinyint,
user_name string,
mobile string
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/dwd/user_addr/';

dwd_goods_info

1
2
3
4
5
6
7
8
9
10
11
12
create external table if not exists dwd_mall.dwd_goods_info(
goods_id bigint,
goods_no string,
goods_name string,
curr_price double,
third_category_id int,
goods_desc string,
create_time string
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/dwd/goods_info/';

dwd_category_code

1
2
3
4
5
6
7
8
9
10
11
create external table if not exists dwd_mall.dwd_category_code(
first_category_id int,
first_category_name string,
second_category_id int,
second_catery_name string,
third_category_id int,
third_category_name string
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/dwd/category_code/';

dwd_user_order

1
2
3
4
5
6
7
8
9
10
11
12
13
create external table if not exists dwd_mall.dwd_user_order(
order_id bigint,
order_date string,
user_id bigint,
order_money double,
order_type int,
order_status int,
pay_id bigint,
update_time string
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/dwd/user_order/';

dwd_order_item

1
2
3
4
5
6
7
8
9
10
create external table if not exists dwd_mall.dwd_order_item(
order_id bigint,
goods_id bigint,
goods_amount int,
curr_price double,
create_time string
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/dwd/order_item/';

dwd_order_delivery

1
2
3
4
5
6
7
8
9
10
 create external table if not exists dwd_mall.dwd_order_delivery(
order_id bigint,
addr_id bigint,
user_id bigint,
carriage_money double,
create_time string
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/dwd/order_delivery/';

dwd_payment_flow

1
2
3
4
5
6
7
8
9
10
11
create external table if not exists dwd_mall.dwd_payment_flow(
pay_id bigint,
order_id bigint,
trade_no bigint,
pay_money double,
pay_type int,
pay_time string
)partitioned by(dt string)
row format delimited
fields terminated by '\t'
location 'hdfs://bigdata01:9000/data/dwd/payment_flow/';

本文标题:大数据开发工程师-第十二周 电商数据仓库之所有表概况

文章作者:TTYONG

发布时间:2023年03月30日 - 20:03

最后更新:2023年06月19日 - 00:06

原始链接:http://tianyong.fun/%E5%A4%A7%E6%95%B0%E6%8D%AE%E5%BC%80%E5%8F%91%E5%B7%A5%E7%A8%8B%E5%B8%88-%E7%AC%AC%E5%8D%81%E4%BA%8C%E5%91%A8-%E7%BB%BC%E5%90%88%E9%A1%B9%E7%9B%AE-%E7%94%B5%E5%95%86%E6%95%B0%E6%8D%AE%E4%BB%93%E5%BA%93%E4%B9%8B%E6%89%80%E6%9C%89%E8%A1%A8%E6%A6%82%E5%86%B5.html

许可协议: 转载请保留原文链接及作者。

多少都是爱
0%