If you’ve been watching the Power BI space for some time, I’m sure you’ve seen many offers to go to a seminar or watch a webinar to show you how easy it is to build a “Dashboard in a Day” or even a “Dashboard in an Hour” with Power BI. I’ve seen many of these presentations, and they are fantastic! However, you don’t always get the opportunity to play with the data while you’re watching the presentation. And invariably, it will be a few months before you need to build another dashboard, and you will have forgotten where exactly each of the settings are located in order to find what you need.
So today, I’ve written a “Dashboard in a Blog – or Two,” (since the blog got to be pretty long for a single blog post). Why? So that you can take your time working through the example, and then use it as a reference in the future when you need it.
Power BI is Constantly Changing
One note before we begin: Power BI is constantly changing. The screenshots and instructions here are up to date as of this writing (June 2017), but things may change as updates are released to the platform. The general instructions should remain similar.
Getting Started
First, you need Power BI. This can be downloaded and installed from http://www.powerbi.com. You will need to sign up, but the free options should be sufficient.
Next, we need data. A dashboard is nothing without data. There is a ton of free data available at http://www.data.gov provided by various government agencies. Since we finished tax season not long ago, I thought a sample based on IRS tax return data would be interesting. The data is available here (or you can jump here to get directly to the download page). For the purposes of this blog, I used 2014 data (it was the most recent available as of writing this blog). I downloaded the zip file, and within the zip was a large number of files, including Excel files per state, and a couple of CSV files that either include or exclude Adjusted Gross Income (AGI) data. There’s also a Word document that, among other things, explains what all of the columns mean. For this demonstration and because the data is easier to work with, I will be using the “No AGI” version:
Now that we have some critical pieces in place, let’s begin. Launch the Power BI application on your computer, if you did not already, and you’ll see this launch screen:
On the left you will see a “Get Data” option. This is where we begin. Once you click “Get Data,” you will see a litany of ways you can pull data into Power BI. For this demonstration, select “Text/CSV” and click “Connect” at the bottom.
In the file picker, go find the file you downloaded and unzipped earlier. As previously noted, we will be using the file named “14zpallnoagi.csv” from the zip file. Once you click “Open,” you will be brought to a window to start the process of shaping your data.
Shaping Your Data
“Shaping your data” refers to the process by which you ensure your data is cleaned up and organized in a way that can be used by Power BI. In this case, we used a pretty clean dataset, but there are a few things we can and should do.
If you select “Load,” you are saying that all the data looks exactly the way you expect it to, and we should place it in the model just like this. Fear not. If you accidentally select “Load” before you meant to, you still have the opportunity to change things later. However, if you select “Edit,” you will have the opportunity to change things before they are loaded. Power BI keeps track of what you change and will apply them each time you reload the file, for example, if we received new 2014 data and wanted to update our dashboard.
Query Editor
Let’s click “Edit” and take a look at the Query Editor.
If you look closely at the columns, you can see icons representing the datatypes. For example, STATE has an “ABC” next to it, Zip Code has a “123” next to it, and N1 has a “1.2” next to it. That means State is text, Zip Code is a whole number, and N1 is a fixed decimal number. If any of these are incorrect, this is the time to fix them.
I typically end up addressing the Zip Code. Almost every automated tool represents Zip Code as a whole number. It is made up of numbers, however things get tricky when dealing with Zip Codes that start with a zero. In those cases, a numeric column would drop that leading 0, and you’d have a short Zip Code. Without a proper Zip Code, things will not work correctly if we try to tie this to a different dataset, or utilize mapping functions. A numeric Zip Code column also becomes a challenge when working internationally—if the sample data only has U.S. information, for example, but there is also Canadian data, you will definitely need a text Zip Code column. So let’s convert this to text.
Simply click the column header, then click the “Data Type” dropdown and select “Text”.
Another window pops up this time:
In this case, the tool already changed the data type once. It brought it in as text (from the CSV file), then converted it to a whole number. If we use the “Add new step” option, it would add another conversion—now that it’s a number, it would revert back to text and we would lose the leading zeros. However, if we “Replace current,” it will change to how it converted before. Let’s click “Replace current”.
Now is a good time to talk about the “Applied Steps” window on the right. This is what keeps track of all the changes that have been done since the document was brought in.
As you click on each line, you can see how the data looked after that step completed. For example, if I click “Source”:
Now you see none of the columns have names, and the data types are all text.
If you click the gear on the right of one of the lines, you get options that allow you to change what that step did. If you click on one of the steps, and then do something else (like rename a column or change a data type), it will add that step into the process. You have to be careful here, it could break downstream steps (if you change a column name, for example, that old name won’t exist to change a data type later). As you hover over each step, the “X” appears if you’d like to delete that step. Go ahead and click on the last step (“Changed Type”) before we continue.
If you remember, that zip file we downloaded also included a Word document. That document has explanations of each column, and what kind of data we should expect. Do you know what kind of data is in the STATEFIPS column? Neither do I. Let’s open up that Word document and see what we have.
To me, it looks like that column is going to have a number that represents each state. Will we need that to connect to another system? Possibly. Can we just use the State column? Probably. Let’s eliminate that STATEFIPS column.
Simply click on the column, then select “Remove Columns” at the top:
Once you do that, the column is gone, and you’ll notice another step has been added to the “Applied Steps” list. What else can we do? How about the AGI_STUB column. If you remember, we are using the version that does not include AGI (adjusted gross income), and all of the example lines show a zero. From the documentation, there’s no value for zero. I think we can reasonably assume that it will be zeros throughout the dataset, so we can remove that column too.
So, what’s the N1 column? From the documentation, it says it’s “Number of returns.” Perhaps we should give it a more friendly name. For this one, you can select the column, then right-click and select “Rename” (or you can go to the “Transform” tab and select “Rename”).
One key of Power BI is to use “friendly” names. If you’re a developer, you may be tempted to call this “NoOfReturns” or “ReturnCount.” This is intended to be easily decipherable by end users. Spaces are just fine here. Let’s call it “Number of Returns”.
Now, if you keep going down the line, you will see a LOT of columns, and most of them have non-descriptive names (what exactly is MARS1 anyway?). Go through the documentation, and give these columns some logical, friendly names. You can do all of them if you like, but I’m going to do these:
Original Name | New Name |
MARS1 | Number of Single Returns |
MARS2 | Number of Joint Returns |
MARS4 | Number of Head of Household Returns |
PREP | Number of Paid Preparer Returns |
N2 | Number of Exemptions |
NUMDEP | Number of Dependents |
TOTAL_VITA | Number of Volunteer Prepared Returns |
A00100 | Adjusted Gross Income |
A02650 | Total Income |
A06500 | Income Tax (close to the end) |
A11902 | Overpayments Refunded (last column) |
Whatever columns you don’t rename, let’s go ahead and delete. You can select multiple columns by selecting the first column header you want, holding the “Shift” key, then selecting the last column header you want. You can then remove columns in larger chunks.
Let’s rename the first couple of columns (STATE and ZIPCODE) so they look more like the rest of them (State and Zip Code).
We can rename the dataset too. On the right, you’ll see a “Name” spot above the “Applied Steps.” Let’s call this “2014 Tax Data”.
Now we’ve finally got the data cleaned up and shaped. There are a lot of other things you can do with the data to clean it up, but these are the basic ones that you will see very often.
Let’s hit “Close and Apply” at the top. This will tell Power BI that you’re ready to bring in all of the data, exactly the way you want it.
Now, Power BI will load your data. You’ll be brought back to a blank screen—this is your dashboard. Before you do anything else, you should really save! We’ve done a lot of work, and it would be a shame to lose it if the application crashed. Just click the little “Save” disk at the top of the window and save it in a logical place that you’ll be able to find again.
Now that we have our model created and saved, let’s take a break! You’ve accomplished a lot. In part two of this blog series, I’ll address creating your first visualization!