When does class attribute initialization code run in python?
Problem Description:
There is a class
attribute spark
in our AnalyticsWriter
class:
class AnalyticsWriter:
spark = SparkSession.getActiveSession() # this is not getting executed
I noticed that this code is not being executed before a certain class method is run. Note: it has been verified that there is already an active SparkSession
available in the process: so the init code is simply not being executed
@classmethod
def measure_upsert(
cls
) -> DeltaTable:
assert AnalyticsWriter.spark, "AnalyticsWriter requires
an active SparkSession"
I come from jvm-land (java/scala) and in those places the class level initialization code happens before any method invocations. What is the equivalent in python?
Solution – 1
Class attributes are initialized at the moment they are hit, during class definition, so the line containing the getActiveSession()
call is run before the class is even fully defined.
class AnalyticsWriter:
spark = SparkSession.getActiveSession()
# The code has been run here
# ... other definitions that occur after spark exists ...
# class is complete here
I suspect the code is doing something, just not what you expect. You can confirm that it is in fact run with a cheesy hack like:
class AnalyticsWriter:
spark = (SparkSession.getActiveSession(), print("getActiveSession called", flush=True))[0]
which just makes a tuple
of the result of your call and an eager print
, then discards the meaningless result from the print
; you should see the output from the print
immediately, before you can get around to calling class methods.