Skip to main content

Data Factory

New

Copy Data Activity Should Not Write Data On Source Connection Error

Vote (1) Share
Chris Chiosa's profile image

Chris Chiosa on 06 Dec 2024 16:39:55

Issue:

Currently the Copy Data activity inside of the Fabric data pipeline will write null data to the destination when the source data connection fails.



Issue Example:

I have a copy data activity which copies data from my on-prem sql server to a delta table in my fabric lakehouse.

The copy data activity uses the overwrite option on the sink side.


If the on-prem sql server cannot be accessed during the copy data runtime, the sink delta table will be truncated ('overwritten' with null data).



Solution:

If the Copy Data activity fails to connect to the source, the activity is aborted and any connection to the sink is closed without performing any writes.



Solution Example:

I have a copy data activity which copies data from my on-prem sql server to a delta table in my fabric lakehouse.

The copy data activity uses the overwrite option on the sink side.


If the on-prem sql server cannot be accessed during the copy data runtime, the sink delta table will not be updated. No data is changed or updated.




Why Should This Be Done:

I do not see a use-case where you would ever want to default to writing or over writing data to a sink when your source connection failed.

If the connection fails and you want to perform additional actions - you should use the success/failure pipeline routes on the copy data activity that failed.


The current state means I need to be proactive to protect my data - I need to validate the source is reachable in a pipeline activity prior to the copy data activity. This adds complexity.


The solution state means I do not need to be proactive to protect my data (if a connection fails the state of my data is maintained). If I want to take actions on connection failures, I can utilize existing functionality to create a pipeline flow from the activity failure status.


The current state is unintuitive and bad user experience, and defies industry best practice of data preservation.

Comments (1)
Chris Chiosa's profile image Profile Picture

Chris C on 06 Dec 2024 16:47:45

RE: Copy Data Activity Should Not Write Data On Source Connection Error

I believe current functionality also goes against MS best practices, but there is no way for a user to opt-out of the current functionality. Best practices: https://learn.microsoft.com/en-us/azure/databricks/lakehouse-architecture/reliability/best-practices