Gregory J. Deckler on 21 Apr 2016 00:52:48
Provide the option of not removing duplicates automatically when creating R visualizations or provide the ability to create R datasets using the same syntax as shown in the comments when creating an R visualization
- Comments (16)
RE: "R" Don't remove duplicates
I have no idea why this feature isn't standard behaviour - it's trivial in R to remove duplicates from a dataset if that behaviour is desired
RE: "R" Don't remove duplicates
I want to use R to create a histogram. I add one column, and then it removes all the duplicates, which provides a completely inaccurate histogram. This is pretty stilly - and potentially problematic if someone uses this without noticing.
Yes, I can add extra columns, or create an ID column, but I don't want to. I don't want the program to remove duplicates, just because it sees fit. There are times when it isn't appropriate - and as the analyst I want that choice.
Also, I want to write the simplest code, and that should involve only one column for a histogram.
RE: "R" Don't remove duplicates
When linking to an SQL Server Analysis Services database cube, I don't directly have access to the right keys that make records unique to block duplicates from being removed. Therefore, when using cubes it may not be possible to get accurate data in R in some cases. Many statistics/visualizations are worthless when duplicates have been removed. Can someone explain why removing duplicates was ever a good idea?
RE: "R" Don't remove duplicates
I shouldn't have to add an ID or key field to get all the data. If I want to remove duplicates in R, it's trivial with the "unique" statement.
RE: "R" Don't remove duplicates
The workaround here is to add "ID column" to the data (don't use it in R script)
RE: "R" Don't remove duplicates
The comments of an R visualization show:
#dataset <- data.frame(Column)
However, I cannot use the same syntax to create my own data frame.