merge方法
横向连接,包括内连接和外连接,结果都是列增加了,行根据要求可能减少,不变或者增加)
import pandas as pd
data=pd.DataFrame([{"id":0,"name":'lxh',"age":20,"cp":'lm'},
{"id":1,"name":'xiao',"age":40,"cp":'ly'},
{"id":2,"name":'hua',"age":4,"cp":'yry'},
{"id":3,"name":'be',"age":70,"cp":'old'}])
data1=pd.DataFrame([{"id":100,"name":'lxh','cs':10},
{"id":101,"name":'xiao','cs':40},
{"id":102,"name":'hua2','cs':50}])
data2=pd.DataFrame([{"id":0,"name":'lxh','cs':10},
{"id":101,"name":'xiao','cs':40},
{"id":102,"name":'hua2','cs':50}])
0 |
0 |
lxh |
20 |
lm |
1 |
1 |
xiao |
40 |
ly |
2 |
2 |
hua |
4 |
yry |
3 |
3 |
be |
70 |
old |
0 |
100 |
lxh |
10 |
1 |
101 |
xiao |
40 |
2 |
102 |
hua2 |
50 |
内连接
单列作为连接键。
不设置how的值,则how默认值为”inner”。
结果:求交集。
pd.merge(data,data1,
on="name",
suffixes=('_left','_right')
)
0 |
0 |
lxh |
20 |
lm |
100 |
10 |
1 |
1 |
xiao |
40 |
ly |
101 |
40 |
多列作为连接键。
不设置how的值,则how默认值为”inner”。
结果:要求两列值同时符合,求交集。
0 |
0 |
lxh |
20 |
lm |
1 |
1 |
xiao |
40 |
ly |
2 |
2 |
hua |
4 |
yry |
3 |
3 |
be |
70 |
old |
0 |
0 |
lxh |
10 |
1 |
101 |
xiao |
40 |
2 |
102 |
hua2 |
50 |
pd.merge(data,data2,
on=("name","id")
)
指定不同名连接键
当左右两个DataFrame的列名不同,当又想做为连接键时可以使用left_on与right_on来指定连接键
data3=pd.DataFrame([
{"mid":0,"mname":'lxh','cs':10},
{"mid":101,"mname":'xiao','cs':40},
{"mid":102,"mname":'hua2','cs':50}]
)
0 |
0 |
lxh |
20 |
lm |
1 |
1 |
xiao |
40 |
ly |
2 |
2 |
hua |
4 |
yry |
3 |
3 |
be |
70 |
old |
0 |
0 |
lxh |
10 |
1 |
101 |
xiao |
40 |
2 |
102 |
hua2 |
50 |
pd.merge(data,data3,
left_on=["name"],
right_on=["mname"])
0 |
0 |
lxh |
20 |
lm |
0 |
lxh |
10 |
1 |
1 |
xiao |
40 |
ly |
101 |
xiao |
40 |
pd.merge(data,data3,
left_on=["name","id"],
right_on=["mname","mid"]
)
pd.merge(data,data3,
left_on=["name"],
right_on=["mname"],
how = 'left')
0 |
0 |
lxh |
20 |
lm |
0.0 |
lxh |
10.0 |
1 |
1 |
xiao |
40 |
ly |
101.0 |
xiao |
40.0 |
2 |
2 |
hua |
4 |
yry |
NaN |
NaN |
NaN |
3 |
3 |
be |
70 |
old |
NaN |
NaN |
NaN |
不指定键
不指定on则以两个DataFrame的列名交集做为连接键,如下面使用了id与name
设置行索引
这里将name设为索引
indexed_data1=data1.set_index("name")
0 |
0 |
lxh |
20 |
lm |
1 |
1 |
xiao |
40 |
ly |
2 |
2 |
hua |
4 |
yry |
3 |
3 |
be |
70 |
old |
name |
|
|
lxh |
100 |
10 |
xiao |
101 |
40 |
hua2 |
102 |
50 |
使用右边的DataFrame的行索引做为连接键,通过right_index=True设置
pd.merge(data, indexed_data1, left_on='name', right_index=True)
0 |
0 |
lxh |
20 |
lm |
100 |
10 |
1 |
1 |
xiao |
40 |
ly |
101 |
40 |
外连接
左外连接
以左侧表的name列为连接键,向左外连接
0 |
0 |
lxh |
20 |
lm |
1 |
1 |
xiao |
40 |
ly |
2 |
2 |
hua |
4 |
yry |
3 |
3 |
be |
70 |
old |
0 |
100 |
lxh |
10 |
1 |
101 |
xiao |
40 |
2 |
102 |
hua2 |
50 |
pd.merge(data, data1,
on="name",
how="left",
suffixes=('_left','_right'))
0 |
0 |
lxh |
20 |
lm |
100.0 |
10.0 |
1 |
1 |
xiao |
40 |
ly |
101.0 |
40.0 |
2 |
2 |
hua |
4 |
yry |
NaN |
NaN |
3 |
3 |
be |
70 |
old |
NaN |
NaN |
# 交换表的位置
pd.merge(data1,data,on="name",how="left")
0 |
100 |
lxh |
10 |
0.0 |
20.0 |
lm |
1 |
101 |
xiao |
40 |
1.0 |
40.0 |
ly |
2 |
102 |
hua2 |
50 |
NaN |
NaN |
NaN |
右外连接
以右侧表为基准进行外连接,得到的整体数据内容与上面左外连接是一样的。
pd.merge(data,data1,
on="name",
how="right")
0 |
0.0 |
lxh |
20.0 |
lm |
100 |
10 |
1 |
1.0 |
xiao |
40.0 |
ly |
101 |
40 |
2 |
NaN |
hua2 |
NaN |
NaN |
102 |
50 |
全外连接
保留两个表的所有行,无匹配则用NAN
填充
pd.merge(data,data1,
on="name",
how="outer")
0 |
0.0 |
lxh |
20.0 |
lm |
100.0 |
10.0 |
1 |
1.0 |
xiao |
40.0 |
ly |
101.0 |
40.0 |
2 |
2.0 |
hua |
4.0 |
yry |
NaN |
NaN |
3 |
3.0 |
be |
70.0 |
old |
NaN |
NaN |
4 |
NaN |
hua2 |
NaN |
NaN |
102.0 |
50.0 |