I recently read a post on the Power BI blog about a new enhancement to the scheduled cache refresh process with DirectQuery. The post does a great job of explaining how to implement it, but I wanted to write about why you may want to consider this for your environment.
As you have probably noticed, there are two major methods of getting data from your source system into Power BI: Import and DirectQuery. In the “import” mode, you load data into a Power BI model, and relate the data together in the model, all from within Power BI Desktop. You then schedule when you want the data to refresh (if you are using the Pro version of Power BI), and the visualizations will all use this model when the user requests data (by changing the value of a slicer, loading the dashboard, etc.). This method quickly refreshes the visualization because all the data is stored in the model, and is in the same place as the dashboard (in Power BI). When you choose “DirectQuery” mode, Power BI does not store the data internally—rather, when the user requests data, Power BI will issue a query directly to the underlying system, wherever it may be. This is especially useful when you are working with large datasets where you exceed the limits of import mode. This is also useful in cases where the data changes frequently, and you want those changes reflected on the dashboard as quickly as possible.
In DirectQuery mode, Power BI (like almost every data tool today) uses caching to try and shorten the time the user spends waiting for the data to return from the source system. Over time, the cache knows what data is used most frequently, such as the items that appear on the default load of the dashboard. Periodically, Power BI will go back to the source system to pull new data, so it will be ready with current data whenever a user comes looking for it.
With the new enhancement to Power BI, you can now define how often you want Power BI to perform this refresh. For example, if your ETL process runs every 2-4 hours to load data, you probably don’t need to run a cache refresh every hour. Alternatively, if you are reporting off of a transactional system that is updated every second, waiting an hour for refresh may not work for your needs. Ultimately this is going to be a decision made specifically for your environment, and Microsoft has given you a way to set it appropriately for your needs.
Do you have questions about how to use this in your organization, or anything else related to Power BI? Learn more about our Power BI mentoring, training and consulting services. We also invite you to join the conversation in the Power BI Professional Idea Network! You can connect with your peers and get help from people who work with Power BI every day. Join the LinkedIn group at http://bit.ly/powerbi-pin.