Skip to main content

Data Engineering

New

BUG: spark.sql() function breaks successive comment line formatting in notebook

Vote (1) Share
Niels van de Coevering's profile image

Niels van de Coevering on 06 Jul 2024 04:59:43

The code below is copied from a pyspark notebook cell. What happens is that when you use the spark.sql() function and the next line is a comment line, the comment line is not formatted green, as you can see with the first comment. Even the second comment line is not formatted, until you a some other code line. But it will repeat itself when the spark.sql() function is used again (fourth comment). When the spark.sql() function is followed by an attached function (fifth comment) or code lines (sixth comment), the formatting is correct.


df = spark.sql('SELECT * FROM table_name')

# first comment

# second comment

print('text')

# third comment


df = spark.sql('SELECT * FROM table_name')

# fourth comment


df = spark.sql('SELECT * FROM table_name').where('')

# fifth comment


df = spark.sql('SELECT * FROM table_name')

print('text')

# sixth comment

Comments (1)
Niels van de Coevering's profile image Profile Picture

Niels van de Coevering on 06 Jul 2024 05:13:07

RE: BUG: spark.sql() function breaks successive comment line formatting in notebook

Here's another code snippet that the spark.sql() function seems to break, allthough the code is able to run. The code below gives the red squiggly underline errors "no viable alternative at input '\n'" and "extraneous input '.' expecting {, 'assert', 'async'...". But the code still runs. The errors can be solved by using backslashes on each line for line continuation, but that makes the code messy. When commenting or removing the spark.sql() line, the errors disappear.df = spark.sql(query)df1 = (df        .withColumn("col1", lit(1))        .select(col('*'))    )