When does class attribute initialization code run in python?

When does class attribute initialization code run in python?

Problem Description:

There is a class attribute spark in our AnalyticsWriter class:

class AnalyticsWriter:

    spark = SparkSession.getActiveSession()  # this is not getting executed

I noticed that this code is not being executed before a certain class method is run. Note: it has been verified that there is already an active SparkSession available in the process: so the init code is simply not being executed

    @classmethod
    def measure_upsert(
        cls
    ) -> DeltaTable:

        assert AnalyticsWriter.spark, "AnalyticsWriter requires 
             an active SparkSession"

I come from jvm-land (java/scala) and in those places the class level initialization code happens before any method invocations. What is the equivalent in python?

Solution – 1

Class attributes are initialized at the moment they are hit, during class definition, so the line containing the getActiveSession() call is run before the class is even fully defined.

class AnalyticsWriter:
    spark = SparkSession.getActiveSession()
    # The code has been run here
    
    # ... other definitions that occur after spark exists ...
# class is complete here

I suspect the code is doing something, just not what you expect. You can confirm that it is in fact run with a cheesy hack like:

class AnalyticsWriter:
    spark = (SparkSession.getActiveSession(), print("getActiveSession called", flush=True))[0]

which just makes a tuple of the result of your call and an eager print, then discards the meaningless result from the print; you should see the output from the print immediately, before you can get around to calling class methods.

Rate this post
We use cookies in order to give you the best possible experience on our website. By continuing to use this site, you agree to our use of cookies.
Accept
Reject