calculate_outliers_summary
provides an annual overview of data quality by
summarizing extreme outliers for health indicators. Outliers are identified
based on robust statistical metrics (Median Absolute Deviation, MAD) and
flagged when they deviate significantly (beyond five MADs from the median).
Usage
calculate_outliers_summary(
.data,
admin_level = c("national", "adminlevel_1", "district")
)
Arguments
- .data
A data frame with district-level health indicators. This data frame must include a
district
andyear
column, along with indicator columns for calculating outliers. Outlier flags should be computed prior and named with the suffix_outlier5std
(e.g.,anc1_outlier5std
where 1 indicates an outlier and 0 indicates non-outliers).- admin_level
Character. The administrative level at which to calculate reporting rates. Must be one of
"national"
,"adminlevel_1"
or"district"
.
Value
A cd_outliers_summary
object (tibble) with:
Each indicator's non-outlier percentage (
_outlier5std
columns).Overall non-outlier summaries across all indicators, vaccination indicators, and tracers.
Details
Outlier Detection: Outliers are calculated based on Hampel’s robust X84 method, using the Median Absolute Deviation (MAD). This method identifies values that exceed five times the MAD from the median, reducing the influence of extreme values on the analysis.
Annual Non-Outlier Rate: For each indicator and each year, the function calculates the percentage of non-outliers. Additionally, the function aggregates the non-outlier rates across all indicators, as well as vaccination-only and tracer-only indicators, providing an overall data quality summary.