Perform dataframe join before another one

Perform dataframe join before another one

Problem Description:

What would be nice and fast way to perform a dataframe join between 3 tables?
Also I need to have duplicate columns in my dataframe.

   SELECT
      tbl_1.id,
      tbl_2.id,
      tbl_3.id 
    FROM tbl_1
    JOIN tbl_2
      ON tbl_1.id = tbl_2.id 
    JOIN tbl_3
      ON tbl_3.id = tbl_2.id ;

Solution – 1

First thing: if you want to have all duplicated columns in the output DataFrame, you should prefix them with something, so that they can be differentiated somehow. Example:

df = spark.createDataFrame([(1, 2), (1, 1)], "id: int, val: int")
df.show()
+---+---+
| id|val|
+---+---+
|  1|  2|
|  1|  1|
+---+---+

from pyspark.sql.functions import col

df.select([col(c).alias("df1_" + c) for c in df.columns]).show()
+------+-------+
|df1_id|df1_val|
+------+-------+
|     1|      2|
|     1|      1|
+------+-------+

Second thing: if the SQL syntax is not available for any reason, then you will have to join the DataFrames one by one – the join method takes only one DataFrame as right side of join, so you can chain them for example. Snippet:

df1 = spark.createDataFrame([(1, 1), (1, 4)], "df1_id: int, df1_val: int")
df2 = spark.createDataFrame([(1, 2), (1, 5)], "df2_id: int, df2_val: int")
df3 = spark.createDataFrame([(1, 3), (1, 6)], "df3_id: int, df3_val: int")

df1.join(df2, df1.df1_id == df2.df2_id, "inner").join(df3, df1.df1_id == df3.df3_id, "inner").show()
+------+-------+------+-------+------+-------+
|df1_id|df1_val|df2_id|df2_val|df3_id|df3_val|
+------+-------+------+-------+------+-------+
|     1|      4|     1|      5|     1|      3|
|     1|      4|     1|      2|     1|      3|
|     1|      1|     1|      5|     1|      3|
|     1|      1|     1|      2|     1|      3|
|     1|      4|     1|      5|     1|      6|
|     1|      4|     1|      2|     1|      6|
|     1|      1|     1|      5|     1|      6|
|     1|      1|     1|      2|     1|      6|
+------+-------+------+-------+------+-------+

Rate this post
We use cookies in order to give you the best possible experience on our website. By continuing to use this site, you agree to our use of cookies.
Accept
Reject