When running a query, many times there are instances where there is non-existent data. The following blog addresses various scenarios and solutions for overcoming this issue.
A hospital report needs to show Intentsive Care Unit (ICU) bed occupancy by date. The WITH statement below simulates a small table of source data so you can easily replicate the various solution demonstrations without needing to create or populate tables.
Summary reports like the one described above are quite common, but the typical GROUP BY query formulation below doesn’t meet requirements unless all GROUP BY column values exist in the set of rows being reported upon. As our sample data shows, for week one, only two of seven days are listed and only five days are listed in week two while we want all seven days listed even if no data for a day exists.
Often the desired output data format lends itself to using the pivot operator. Although pivot returns a (typically un-useful) null as the aggregate result for missing data, this is easily (if not tediously) converted to blank or zero with an ISNULL or COALESCE function. Note that ISNULL is implemented as a native function but is not ANSI-SQL compliant while COALESCE is ANSI compliant and more flexible, but is implemented as a CASE WHEN or a group of nested CASE WHEN statements.
If a serial rather than parallel format is required, some people find that simply un-pivoting the above result is sufficient. Although this works, for significant amounts of data, the extra overhead might not be tolerable.
Solution 2 not only incurs unnecessary steps that degrade performance, it also doesn’t work if your requirements preclude use of the PIVOT operator. For example, if you want to report upon a non-static list of values like a range of dates. In this example, I report on the first 14 days of the year but any distinct list of values can be reported upon using this method.
The trick is to start with a logical table that contains all the values you want to summarize over, then use a left join to access the data you want summarized. Since COUNT(), SUM() and most other aggregate functions ignore null values, any group of rows containing only null in the aggregated column will return null which you can easily translate to zero. This leaves us with the most common problem—how to conveniently and efficiently generate a list of all the grouping values to left join against. Using a CTE helps with this by making the code easier to understand.
There are lots of ways to retrieve or generate a range of integer values for use in generating dates, times or in general looking for absent data. When generating 2048 or fewer values master.dbo.spt_values is a good number source. It is important, however to select only rows “WHERE [type] = ‘P’” (as demonstrated below). If you can’t read master or you need more than 2048 values see my older post: http://rickbielawski.wordpress.com/2011/09/07/numbers-function/ for other number generation methods—or simply implement a numbers table in your database.
Since the ReportingDate version has different requirements from the DayOfWeek version (14 days starting on the first vs. two calendar weeks) the range of dates returned here is different from the prior queries, but an “apples-to-apples” query can also be formulated. It’s slightly more complex because the DayOfWeek version needs to group over more than one column (WeekOfYear and DayAbbr) and we also want to sort on a column other than the one displayed (DOW vs. DayAbbr). To satisfy the first requirement, use 2 CTEs and cross join their values to generate a logical table with all days of all applicable weeks. For the second requirement, simply return both the number and the name in the WeekDays CTE so both are available in the final select.
Although no solution is terribly easy, all are fairly straightforward to implement. A rather strict formula, when applied to a query, satisfies all instances of the problem: start with a query (logical table) that returns a distinct list of all possible values to be grouped upon and then left join those rows to the actual data you want to summarize. Don’t forget to convert the nulls to blanks or zero as requirements dictate.