Ben Gribaudo on 18 Nov 2022 17:32:12
A challenge with the current Power Query PostgreSQL data connector is that it does not understand how to work with PostgreSQL’s complex typed columns.
Background
Scenario: PostgreSQL allows a single column to be defined as containing a specific complex type.
Example: Column phone_number could be defined to as being of type telephone_number (which has fields “areacode”, “number” and “extension”).
M Equivalent: Column contains a record.
Scenario: A single column can also be defined to contain an array of values of a specific scalar type.
Example: Column visit_dates could be defined as an array of date values.
M Equivalent: Column contains a list.
Scenario: A column can be configured to contain an array of a given complex type.
Example: Column order_lines could be defined as an array of order_line values.
M Equivalent: Column contains a nested table.
Challenge
Power Query does not understand any of these more advanced column set ups. Depending on the exact scenario, when PQ encounters one of the above, it either renders out the raw textual equivalent of the entire column (like “{}
”) or an error (such as a complaint that “We don’t support CLR type ‘System.Dynamic.ExpandoObject’.”—for a picture showing both of these, see https://bengribaudo.com/blog/2022/11/18/7073/power-query-postgresql-data-connector-feature-request).
The net effect is that, for a table/view where these more complex structures are used, hand-written SQL (e.g. Value.NativeQuery
) may be necessary as Power Query may be unable to make “sense” out of the relevant columns on its own. This makes the level of effort involved with ingesting this data into PQ much higher.
Feature Request
It would be really nice if PQ had improved support in this area. As noted above, each of the three scenarios mentioned maps directly to data types built into Power Query (record of given type, list of given type or a nested table). It would be really nice if the PQ connector could represent these that way, and then have its query folding logic generate the necessary SQL* based on whether the “complex” columns’ values are used/expanded out.
(*PostgreSQL contains SQL syntax for outputting the various child components of a complex type. For example, it has syntax for converting an array of values into separate rows, etc.)