Data Factory

New

Copy Data Activity Should Not Write Data On Source Connection Error

Vote (1)

Chris Chiosa on 06 Dec 2024 16:39:55

Issue:

Currently the Copy Data activity inside of the Fabric data pipeline will write null data to the destination when the source data connection fails.

Issue Example:

I have a copy data activity which copies data from my on-prem sql server to a delta table in my fabric lakehouse.

The copy data activity uses the overwrite option on the sink side.

If the on-prem sql server cannot be accessed during the copy data runtime, the sink delta table will be truncated ('overwritten' with null data).

Solution:

If the Copy Data activity fails to connect to the source, the activity is aborted and any connection to the sink is closed without performing any writes.

Solution Example:

I have a copy data activity which copies data from my on-prem sql server to a delta table in my fabric lakehouse.

The copy data activity uses the overwrite option on the sink side.

If the on-prem sql server cannot be accessed during the copy data runtime, the sink delta table will not be updated. No data is changed or updated.

Why Should This Be Done:

I do not see a use-case where you would ever want to default to writing or over writing data to a sink when your source connection failed.

If the connection fails and you want to perform additional actions - you should use the success/failure pipeline routes on the copy data activity that failed.

The current state means I need to be proactive to protect my data - I need to validate the source is reachable in a pipeline activity prior to the copy data activity. This adds complexity.

The solution state means I do not need to be proactive to protect my data (if a connection fails the state of my data is maintained). If I want to take actions on connection failures, I can utilize existing functionality to create a pipeline flow from the activity failure status.

The current state is unintuitive and bad user experience, and defies industry best practice of data preservation.

Comments (1)

Comments (1)

Chris C on 06 Dec 2024 16:47:45

RE: Copy Data Activity Should Not Write Data On Source Connection Error

I believe current functionality also goes against MS best practices, but there is no way for a user to opt-out of the current functionality. Best practices: https://learn.microsoft.com/en-us/azure/databricks/lakehouse-architecture/reliability/best-practices