Data Engineering
NewBUG: spark.sql() function breaks successive comment line formatting in notebook
Niels van de Coevering on 06 Jul 2024 04:59:43
The code below is copied from a pyspark notebook cell. What happens is that when you use the spark.sql() function and the next line is a comment line, the comment line is not formatted green, as you can see with the first comment. Even the second comment line is not formatted, until you a some other code line. But it will repeat itself when the spark.sql() function is used again (fourth comment). When the spark.sql() function is followed by an attached function (fifth comment) or code lines (sixth comment), the formatting is correct.
df = spark.sql('SELECT * FROM table_name')
# first comment
# second comment
print('text')
# third comment
df = spark.sql('SELECT * FROM table_name')
# fourth comment
df = spark.sql('SELECT * FROM table_name').where('')
# fifth comment
df = spark.sql('SELECT * FROM table_name')
print('text')
# sixth comment
- Comments (1)
RE: BUG: spark.sql() function breaks successive comment line formatting in notebook
Here's another code snippet that the spark.sql() function seems to break, allthough the code is able to run. The code below gives the red squiggly underline errors "no viable alternative at input '\n'" and "extraneous input '.' expecting {, 'assert', 'async'...". But the code still runs. The errors can be solved by using backslashes on each line for line continuation, but that makes the code messy. When commenting or removing the spark.sql() line, the errors disappear.df = spark.sql(query)df1 = (df .withColumn("col1", lit(1)) .select(col('*')) )