The Data Janitor’s Guide to Excel

October 24, 2013

In most business settings, Microsoft Excel is the format of choice when data changes hands. Here at OpenView, we send and receive Excel files every day. The spreadsheets may contain financial data from prospective investments, customer or user data from our portfolio companies, or market research requests from our offshore research teams.

How to Be an All-Star Data Janitor

The first thing we do after we receive an Excel file, and the last thing we do before we send one, is to make sure the data is formatted correctly, properly filtered and de-duplicated, and doesn’t have any glaring omissions. This almost always requires a bit of janitorial work.

Nobody loves being a janitor. But if you have to be one, and almost all of us do at some point or another, you might as well use an industrial mop instead of a toothbrush. In Excel, that means knowing not just the basic functions, but creative combinations to extend the limited cleaning functionality native to Excel.

This guide covers the 10 basic Excel functions that any data janitor should know, and an additional 6 useful combinations you can use to catapult you into the world-class, celebrity-rockstar tier of data janitors.

BASIC FUNCTIONS:

1)      IF(,,)

“If” statements are the most basic concept in most programming languages, and Excel is no exception, so we’ll start here. We use this most commonly, to ‘bucket’ a field. If the logical test is true, it returns the first value, if not it returns the second.

2)      FIND(,)

FIND is another building-block formula that we use most commonly in conjunction with other formulas. It returns the location of a character or string within a cell or larger string. If it doesn’t find what you’re looking for, it returns an error.

3)      LEN()

On its own, knowing the length in characters of a string isn’t all that useful. But if you plan on deconstructing or manipulating the string, you will almost certainly need to use the LEN function.

4)      LEFT(,)

LEFT, simply enough, returns some number of characters from the left (or beginning) of a string. Here’s where we start putting FIND to productive use. Instead of merely putting in a number where the function asks for a , we can use a FIND function to find the first instance of some character, say a dash (-). This allows us to only return the part of the string before the dash. In our example, that’s where the “Industry” ends and the “Sub-industry” begins, so we’re able to effectively split one field into two.

5)      RIGHT (,)

Right returns X number of characters from the right (or end) of a string. Unlike with LEFT, however, we can’t use it directly with FIND to get the “Sub Industry”, since that function counts from the left of the string. We want to count from the right of the string.

Therefore, we need to use RIGHT in combination with LEN and FIND to grab the proper number of characters off of the end of the string.

6)      ISNUMBER()

ISNUMBER simply returns “TRUE” if the string is a number and “FALSE” if it isn’t. This allows us to turn the FIND function into a binary test rather than a number or error. To do this, use the function within a function “ISNUMBER(FIND(,)”.

7)      SUBSTITUTE(,,)

Like the “find and replace” function in all MS Office products, SUBSTITUTE takes a string, searches for a phrase or character, and substitutes another phrase or character. To remove a phrase or character altogether, you can use a blank string (“”) as your input.

8)      LOWER()

LOWER simply takes text and converts it into all lower case. Also works with UPPER and PROPER, which converts text to proper case. Since many functions in Excel are case-sensitive (i.e. the IF function), you can eliminate the case-sensitivity by making both sides of the equation lower.

9)      VLOOKUP(,

,,)

VLOOKUP is one of the more useful functions in excel, but is the most difficult one to grasp in this list. I won’t walk you through how to do one here, as there are many other resources that explain it in detail. However, I will explain that its used most commonly to merge data from two data sets, or to deduplicate a ‘new’ list against an ‘old’ one.

10)   N()

This is a simple one: it converts TRUE or FALSE into 1 or 0. Numbers are more useful than TRUE and FALSE because you can take averages or sums to quickly summarize the data.

 USEFUL COMBINATIONS:

11)   N(ISNUMBER(FIND(,LOWER()))

Description: This formula tells you if the is all ­or part of a cell, and returns a 1 if it is. The “Lower” part makes it case in-sensitive, as long as your find text is all lower.

Use: If you have a text field and want to find all of the entries that contain a specific phrase, you can populate an adjacent field with this formula and sort largest to smallest.

12)   LEN()-LEN(SUBSTITUTE(,,””))

Description: This formula tells you the number of characters in a given cell.

Use: Most commonly with structured strings like URLs, where I expect there to be two “/”s in the “http://” portion, but want to exclude “/” thereafter.

13)   LEFT(,FIND(,))

Description: Returns the text to the left of the delimiter.

Use: Any time you need to carve off the beginning of a longer string: extracting a first name, getting a street number from a full address, etc.

14)   RIGHT(,LEN()- FIND(,))

Description: Returns the text to the right of the delimiter.

Use: Usually when I use the “Left” function above to extract the beginning of the string, I’ll also want to extract the end in a separate cell.

15)   IF(RIGHT(,1)=“ ”,LEFT(,LEN()-1),)

Description: Removes the last character, but only if it’s a blank space.

Use: Often, messy data will have trailing spaces that are invisible to the naked eye, which cause problems for de-duplication. This clears the problem. You may have to run this multiple times as there might be multiple trailing spaces.

16)   N(PROPER()=)

Description: Returns a 1 if the is in proper case.

Use: Often, data comes back in all caps or all lower case. Because converting everything proper case isn’t always the best solution, this function will flag all cells that are already in proper case so that you don’t have to review them.

Did we miss any Excel trade secrets? Let us know in the comments section below!

Behavioral Data Analyst

Nick is a Behavioral Data Analyst at <a href="https://www.betterment.com/">Betterment</a>. Previously he analyzed OpenView portfolio companies and their target markets to help them focus on opportunities for profitable growth.