Skip to main content

Data Engineering

New

MySQL data import. tinyint incorrectly interpreted as bit. bug?

Vote (1) Share
Nico Timmerman's profile image

Nico Timmerman on 10 Oct 2024 00:56:16

Hello,


Today I ran into an issue that looks like a bug to me.

When using a notebook to read from an external MySQL database. one of the tables that I import into the bronze layer has a tinyint column in the source table. however, when I read this table, it incorrectly interprets this column as a bit. resulting in incorrect data in my bronze layer (in the source tables, I have ones and twos but in the bronze layer, everything is converted to a 1)

Any clue on how to fix this?


Here's a snippet of the code to build the jdbc connection

  url = f"jdbc:mysql://{datasource}?enabledTLSProtocols=TLSv1.2&serverTimezone=UTC"        

    query = getQuery(tableInfo,table_name)  

    return spark.read.format("jdbc").options(

            url=url,

            driver="com.mysql.cj.jdbc.Driver",

            query=query,


The problem is that the queries are generated dynamically, so I can't easily force the schema to a certain datatype


Thanks