python - Combining two dataframes based on two columns values in pandas - TagMerge
5Combining two dataframes based on two columns values in pandasCombining two dataframes based on two columns values in pandas

Combining two dataframes based on two columns values in pandas

Asked 1 years ago
1
5 answers

Use pandas.merge The on argument defines what column you want to merge the dataframes on and the how keyword defines what type of merge you want. Please look at the documentation to confirm what type of merge you want. But I think in this case you want the outer merge.

print(pd.merge(df1, df2, on='col2',how='outer'))

Source: link

0

If you want to base your conditions for both the values of 'col1' and 'col2', this minor adjustment would certainly help.

data = pd.merge(df1, df2, on=['col1',col2'],how='outer')

Source: link

0

In [1]: df1 = pd.DataFrame(
   ...:     {
   ...:         "A": ["A0", "A1", "A2", "A3"],
   ...:         "B": ["B0", "B1", "B2", "B3"],
   ...:         "C": ["C0", "C1", "C2", "C3"],
   ...:         "D": ["D0", "D1", "D2", "D3"],
   ...:     },
   ...:     index=[0, 1, 2, 3],
   ...: )
   ...: 

In [2]: df2 = pd.DataFrame(
   ...:     {
   ...:         "A": ["A4", "A5", "A6", "A7"],
   ...:         "B": ["B4", "B5", "B6", "B7"],
   ...:         "C": ["C4", "C5", "C6", "C7"],
   ...:         "D": ["D4", "D5", "D6", "D7"],
   ...:     },
   ...:     index=[4, 5, 6, 7],
   ...: )
   ...: 

In [3]: df3 = pd.DataFrame(
   ...:     {
   ...:         "A": ["A8", "A9", "A10", "A11"],
   ...:         "B": ["B8", "B9", "B10", "B11"],
   ...:         "C": ["C8", "C9", "C10", "C11"],
   ...:         "D": ["D8", "D9", "D10", "D11"],
   ...:     },
   ...:     index=[8, 9, 10, 11],
   ...: )
   ...: 

In [4]: frames = [df1, df2, df3]

In [5]: result = pd.concat(frames)
pd.concat(
    objs,
    axis=0,
    join="outer",
    ignore_index=False,
    keys=None,
    levels=None,
    names=None,
    verify_integrity=False,
    copy=True,
)
In [6]: result = pd.concat(frames, keys=["x", "y", "z"])

Source: link

0

DF1
date    hours        var1            var2 
0   2013-07-10  00:00:00    150.322617  52.225920   
1   2013-07-10  01:00:00    155.250917  53.365296   
2   2013-07-10  02:00:00    124.918667  51.158249   
3   2013-07-10  03:00:00    143.839217  53.138251
 .....  
9   2013-09-10  09:00:00    148.135818  86.676341
10  2013-09-10  10:00:00    147.833517  53.658016   
11  2013-09-10  12:00:00    149.580233  69.745368   
12  2013-09-10  13:00:00    163.715317  14.524894   
13  2013-09-10  14:00:00    168.856650  10.762779
DF2
date      hours      myvar1        myvar2 
0   2013-07-10  09:00:00    1.617         98.56 
1   2013-07-10  10:00:00    2.917         23.60 
2   2013-07-10  12:00:00    19.667        36.15 
3   2013-07-10  13:00:00    14.217        45.16
 .....  
20 2013-09-10   20:00:00    1.517         53.56 
21 2013-09-10   21:00:00    5.233         69.47
22 2013-09-10   22:00:00    13.717        14.25
23 2013-09-10   23:00:00    18.850        10.69
As you can see in both DataFrames, DF2 starts with 09:00:00 and I want to join with DF1 09:00:00, which is basically the matchind dates and times. So far, I tried many different combination using previous threads and the documentation mentioned above. An example,
merged_df = DF2.merge(DF1, how = 'left', on = ['date', 'hours'])
This was introduces NAN values for right right DataFrame. I know, I do not have to use both date and hours columns, however, still getting the same result. I tried R quick like this, which works perfectly fine.
merged_df  <- left_join(DF1, DF2, by = 'date')
Use how='inner' in pd.merge:
merged_df = DF2.merge(DF1, how = 'inner', on = ['date', 'hours'])

Source: link

0

I am trying to join two pandas data frames using two columns:
new_df = pd.merge(A_df, B_df,  how='left', left_on='[A_c1,c2]', right_on = '[B_c1,c2]')
but got the following error:
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4164)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13166)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13120)()

KeyError: '[B_1, c2]'
Try this
new_df = pd.merge(A_df, B_df,  how='left', left_on=['A_c1','c2'], right_on = ['B_c1','c2'])
the problem here is that by using the apostrophes you are setting the value being passed to be a string, when in fact, as @Shijo stated from the documentation, the function is expecting a label or list, but not a string! If the list contains each of the name of the columns beings passed for both the left and right dataframe, then each column-name must individually be within apostrophes. With what has been stated, we can understand why this is inccorect:
new_df = pd.merge(A_df, B_df,  how='left', left_on='[A_c1,c2]', right_on = '[B_c1,c2]')
And this is the correct way of using the function:
new_df = pd.merge(A_df, B_df,  how='left', left_on=['A_c1','c2'], right_on = ['B_c1','c2'])

Source: link

Recent Questions on python

    Programming Languages