Clean messy data by providing examples in Excel

While Excel isn’t usually my tool of choice for manipulating or analyzing data (I prefer to use it as a data source for R), it has just learned a new trick that’s likely to prove useful from time to time. Extracting the useful information from complicated or inconsistent formats can often be a pain, but with Excel’s new Transform Data By Example add-in, you can start with a column of messy data, manually provide a few examples of transformations by hand (3 or so usually does the trick — just as many as you need to cover all the variations), and Excel will automatically figure out how to generate similar examples for the rest of the rows.
For example, given just three examples of “Lastname. Firstname” format, the add-in can extract formatted names from a mixture of name formats while discarding titles and suffixes:

 

Excel has had a system like this (called Flash Fill) for a while, which used an automated rule-writing algorithm described in this paper. This new add-in extends the concept by searching not just the space of text transformation, but also the outputs from library of web-based services from Github, .NET and StackOverflow. You can even add your own transformations by publishing them as an Azure Function. The video below shows an example from Python, but I imagine that R-based examples are equally possible.

For more, check out the link below.
Microsoft: Transform Data By Example

from Revolutions http://blog.revolutionanalytics.com/2017/05/clean-messy-data-by-providing-examples-in-excel.html

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s