Holger Schimanski on 21 Feb 2024 20:52:47
Statistical functions like mean and percentile calculations are very helpful to get an overview about distributions of a numeric attribute and to identify outliers.
Most databases support mean and percentile as SQL aggregations or window functions like SQL Server, HANA, Oracle, and also Spark.
It looks like these functions are currently running in Power BI in a way that require loading of data to Power BI as even in Direct Query mode you get error message that the number of data is too large to be loaded.
But especially in case of larger data volume these functions are useful and therefor should be implement with true Direct Query support.
Most important for me would be true Direct Query support for functions to create information as you would typically see in a box plot.
Percentile cont
Median
With Percentile cont you can calculate 0.25 and 0.75 percentile for Q1 and Q3, calculate the IQR and then calculate thresholds for extreme outlier Q3 + (Q3 - Q1) * 1.5 resp. Q1 - (Q3 - Q1) * 1.5.
Having percentile and Median available would also enable a Box Plot visualization with true Direct Query, which currently is not possible and not available.
Having this as part of the SQL statement of Direct Query is possible, but does not work in all the dynamic elements with Power BI when filtering or changing the group by.