Skip to main content

Power BI

New

PowerQuery Fuzzy Matching vs. Excel Fuzzy Lookup add-in: different comparison algorithms

Vote (2) Share
K Viccaji's profile image

K Viccaji on 13 Jan 2023 06:38:26

Background.

I have recently started learning and working on PowerQuery. A few weeks ago (before I started working with PowerQuery), I worked on a project where I needed to do dozens of fuzzy lookups on multiple columns of data (with partial matches) in more than 20 different files.


I used the Excel Fuzzy Lookup Add-in for that purpose.

The add-in worked perfectly for me, except that it involves a lot of manual work while setting up the various parameters (source data tables, columns to match, similarity threshold, etc.).



PowerQuery could be the solution for me!

I was quite excited when I saw that PowerQuery has a Fuzzy Matching feature while merging tables.

It had the potential to save many hours of manual work for me!


After reading and practicing the various features explained in a Skillwave Training e-book and various online tutorials, I tried doing the fuzzy matching using PowerQuery's MERGE query feature.

I performed the merge in PowerQuery with the same settings as I had used in the add-in. 

(1) Similarity threshold >= 0.75 

(2) Max number of matches = 1.



But...

To my surprise (and disappointment), PowerQuery's fuzzy matching option did not work as 'efficiently' as the Excel add-in!

The results of fuzzy matching in the Excel add-in were excellent. But the results of the PowerQuery fuzzy matching fell short by a fair bit!

I found quite a few false positives and false negatives when I compared the fuzzy matching results between the two programs.


I am very keen to use PowerQuery in such cases because the potential savings in time and efforts are huge!

But the apparently lower / lesser accuracy makes it a deal-breaker for me... :-(



Reaching out for help.

I wrote to Ken Puls at Skillwave, and he responded quite helpfully with a few thoughts on why this could be happening.

He also got in touch with someone at Microsoft, who responded by suggesting that I write about this issue on this forum, "so we can gauge the demand for it and prioritize it accordingly."



I would love to find out whether there are others who are facing similar issues, and hopefully we can ask Microsoft to resolve this issue!



~KV

India

Comments (1)
K Viccaji's profile image Profile Picture

Chris B on 24 Feb 2023 23:41:27

RE: PowerQuery Fuzzy Matching vs. Excel Fuzzy Lookup add-in: different comparison algorithms

I should have searched, but I also think my ask is a bit more fleshed out. https://ideas.powerbi.com/ideas/idea/?ideaid=a37dac1d-9cb4-ed11-9ac4-281878deecacI'll repeat it here, since this one has more votes and thus probably more visibility at microsoft:At a minimum I think it would be obvious to provide a version of the matching functions that take a parameter `matchFunction(str,str)=>(num between [0,1])`. Then if our friendly neighborhood Microsoft devs are feeling generous, the library could provide also provide functions Match.LevenshteinEditD (what I really want), Match.JaccardIndex (the current option), and whatever other similarity functions, all as first class library functions.This change could even allow for extending fuzzy matching to additional kinds of data by providing matchers that work on them in the future (I'm picturing bitmap data, or audio/signal clips).None of this would break current code, but it would expand the utility of fuzzy matching in Power Query significantly. I am aware that one can rig something together with the comparer version of Table.Group and/or other functions--along with liberal application of custom M code--but this would enable non-M savvy users to do so as well.