Show code for missingness analysis
# Complex visualization code here...Exploratory data analysis (EDA) is important for identifying data quality issues and refining hypotheses. While EDA can be “messy,” at emLab we treat it as a formal part of the research record. Using Quarto for EDA allows us to document not just what we found, but what we looked for and why certain paths were abandoned.
Use .qmd files as an interactive laboratory notebook. Unlike a final report, an EDA document should include:
EDA often requires long blocks of code to generate diagnostic plots. To keep the document readable while remaining transparent, use code-fold at the chunk level.
# Complex visualization code here...Static tables are often insufficient for exploring large datasets. We recommend using interactive widgets within your Quarto EDA documents (rendered to HTML) to allow for deeper inspection.
Use the DT package to create searchable, sortable tables. This is invaluable for spotting specific outliers or checking metadata.
library(DT)
mtcars |>
datatable(options = list(pageLength = 10, autoWidth = TRUE))skimrInstead of standard summary(), use skimr::skim() to get a high-level overview of distributions and missingness directly in your Quarto output.
mtcars |>
skimr::skim()| Name | mtcars |
| Number of rows | 32 |
| Number of columns | 11 |
| _______________________ | |
| Column type frequency: | |
| numeric | 11 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| mpg | 0 | 1 | 20.09 | 6.03 | 10.40 | 15.43 | 19.20 | 22.80 | 33.90 | ▃▇▅▁▂ |
| cyl | 0 | 1 | 6.19 | 1.79 | 4.00 | 4.00 | 6.00 | 8.00 | 8.00 | ▆▁▃▁▇ |
| disp | 0 | 1 | 230.72 | 123.94 | 71.10 | 120.83 | 196.30 | 326.00 | 472.00 | ▇▃▃▃▂ |
| hp | 0 | 1 | 146.69 | 68.56 | 52.00 | 96.50 | 123.00 | 180.00 | 335.00 | ▇▇▆▃▁ |
| drat | 0 | 1 | 3.60 | 0.53 | 2.76 | 3.08 | 3.70 | 3.92 | 4.93 | ▇▃▇▅▁ |
| wt | 0 | 1 | 3.22 | 0.98 | 1.51 | 2.58 | 3.33 | 3.61 | 5.42 | ▃▃▇▁▂ |
| qsec | 0 | 1 | 17.85 | 1.79 | 14.50 | 16.89 | 17.71 | 18.90 | 22.90 | ▃▇▇▂▁ |
| vs | 0 | 1 | 0.44 | 0.50 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▆ |
| am | 0 | 1 | 0.41 | 0.50 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▆ |
| gear | 0 | 1 | 3.69 | 0.74 | 3.00 | 3.00 | 4.00 | 4.00 | 5.00 | ▇▁▆▁▂ |
| carb | 0 | 1 | 2.81 | 1.62 | 1.00 | 2.00 | 2.00 | 4.00 | 8.00 | ▇▂▅▁▁ |
If you use Positron, click on your object in the Session Pane to open the Data Explorer for that object, which provides a useful visual summary of variable distributions and missingness as well as a sortable, filterable table right in your IDE. You can also run the View() function in your console to open the Data Explorer.
For large projects, do not crowd a single file with all exploratory work. Instead: