Engagement in Integrative and Nonpharmacologic Pain Management Modalities Among Adults with Chronic Pain

Analysis of the 2019 National Health Interview Survey

Author

Samuel N. Rodgers-Melnick, MPH, MT-BC

Published

2024-01-22

You can download the .qmd file, NHISstudy2024-01-07GitHub.qmd, from my GitHub page here.

You can read the full research article in Journal of Pain Research here

Rodgers-Melnick SN, Trager RJ, Love TE, Dusek JA. Engagement in Integrative and Nonpharmacologic Pain Management Modalities Among Adults with Chronic Pain: Analysis of the 2019 National Health Interview Survey. J Pain Res. Jan 16 2024;17:253-264. doi:10.2147/jpr.S439682

R Packages and Setup

I loaded the following packages to ensure I had the necessary data cleaning, analysis, and visualization tools needed for this project. Given the number of packages, I used conflict_prefer() to ensure functions would not interfere with each other. I also loaded uhred for use in plots.

Code

knitr::opts_chunk$set(comment = NA) # do not remove this

library(janitor)
library(gt)
library(naniar)
library(rms)
library(patchwork)
library(Epi)
library(car)
library(caret)
library(GGally)
library(pscl)
library(VGAM)
library(MASS)
library(flextable)
library(mosaic)
library(pROC)
library(ROCR)
library(gtsummary)
library(simputation)
library(ggrepel)
library(nnet)
library(kableExtra)
library(conflicted)
library(survey)
library(tidymodels)
library(tidyverse)

conflict_prefer("select", "dplyr")
conflict_prefer("summarize", "dplyr")
conflict_prefer("filter", "dplyr")
conflict_prefer("mean", "base")
conflict_prefer("sum", "base")
conflict_prefer("vif", "car")
conflict_prefer("count", "dplyr")

uhred <- "#B70022" 

theme_set(theme_bw())

1 Data Source

1.1 National Health Interview Survey

The data for this project comes from the 2019 National Health Interview Survey (NHIS) Website (National Center for Health Statistics, 2021). To obtain the data, I downloaded the CSV data file for the Sample Adult Interview to my local hard drive and unzipped the adult19.csv file.

According to the National Health Interview Survey website:

Purpose: “The main objective of the NHIS is to monitor the health of the United States population through the collection and analysis of data on a broad range of health topics. NHIS data are used widely throughout the Department of Health and Human Services (HHS) to monitor trends in illness and disability and to track progress toward achieving national health objectives. The data are also used by the public health research community for epidemiological and policy analysis of such timely issues as characterizing those with various health problems, determining barriers to accessing and using appropriate health care, and evaluating Federal health programs.”

Design: The NHIS is a cross-sectional household interview survey.

Target population: “The target population for the NHIS is the civilian non-institutionalized population residing within the 50 states and the District of Columbia at the time of the interview. Persons excluded from the universe are those with no fixed household address (e.g., homeless and/or transient persons not residing in shelters), active duty military personnel, and civilians living on military bases, persons in long-term care institutions (e.g., nursing homes for the elderly, hospitals for the chronically ill or physically or intellectually disabled, and wards for abused or neglected children), persons in correctional facilities (e.g., prisons or jails, juvenile detention centers, and halfway houses), and U.S. nationals living in foreign countries. While active-duty Armed Forces personnel cannot be sampled for inclusion in the survey, any civilians residing with Armed Forces personnel in non-military housing are eligible to be sampled.”

Sampling strategy: “To keep survey operations manageable, cost-effective, and timely, the NHIS uses geographically clustered sampling techniques to select the sample of dwelling units for the NHIS. The sample is designed in such a way that each month’s sample is nationally representative. Data collection on the NHIS is continuous, i.e., from January to December each year.”

“One ‘sample adult’ aged 18 years or older and one ‘sample child’ aged 17 years or younger (if any children live in the household) are randomly selected from each household following a brief initial interview that identifies everyone who usually lives or stays in the household. Information about the sample adult is collected from the sample adult herself or himself unless she or he is physically or mentally unable to do so, in which case a knowledgeable proxy can answer for the sample adult. Information about the sample child is collected from a parent or adult who is knowledgeable about and responsible for the health care of the sample child. This respondent may or may not also be the sample adult.”

Data collection procedures: “The U.S. Census Bureau, under a contractual agreement, is the data collection agent for the NHIS. NHIS data are collected continuously throughout the year by Census interviewers. Nationally, about 750 interviewers (also called ‘Field Representatives’) are trained and directed by health survey supervisors in the U.S. Census Bureau Regional Offices to conduct interviews for NHIS.

The NHIS is conducted using computer-assisted personal interviewing. Face-to-face interviews are conducted in respondents’ homes, but follow-ups to complete interviews may be conducted over the telephone. A telephone interview may also be conducted when the respondent requests a telephone interview or when road conditions or travel distances would make it difficult to schedule a personal visit before the required completion date. In 2019, 34.3% of the Sample Adult interviews and 31.7% of the Sample Child interviews were conducted at least partially by telephone.”

1.2 The Subjects

Subjects included in this project are adults (18 years and older) with chronic pain as defined by responding “most days” or “every day” to the survey question, “In the past 3 months, how often did you have pain?” (Zelaya et al., 2020).

2 Research Questions

Which demographic, mental health, and pain-related variables are associated with engagement in: (1) the number of IHM modalities used for pain and (2) exclusively nonpharmacologic modalities rather than opioid utilization among U.S. adults with chronic pain.

3 Data Ingest and Management

3.1 Loading the Raw Data

I ingested the adult19.csv raw data that I downloaded and unzipped from the 2019 National Health Interview Survey (NHIS) Website as projectb_raw using the read_csv() function.

Code

projectb_raw <- read_csv("data/adult19.csv", show_col_types = FALSE)

3.2 Cleaning the Data

In the following code, I created a new tibble, projectb_clean, limited to respondents who stated that they had pain on most days (PAIFRQ3M_A = 3) or everyday (PAIFRQ3M_A = 4) in response to the question PAIFRQ3M_A, “In the past three months, how often did you have pain?” as this matched the chronic pain definition reported by the National Center for Health Statistics (Zelaya et al., 2020). I also set the filter at PAIFRQ3M_A < 7 to exclude responses that were missing, refused, non ascertained, or “don’t know” in response to the chronic pain question.

After filtering to the chronic pain cohort, I selected the variables from the NHIS that I needed for this analysis.

Code

projectb_clean <- projectb_raw |>
  filter(PAIFRQ3M_A >= 3 & PAIFRQ3M_A < 7) |> #Limited to pain on most days or everyday per Zelaya et al. (2020)
  select(subject_id = HHX, age = AGEP_A, sex = SEX_A,
         wt_final = WTFA_A,
         education = EDUC_A, urban_rural = URBRRL,
         painlimit_lifework = PAIWKLM3M_A,
         anymeds12mo = RX12M_A, opioids12mo = OPD12M_A, opioids3mo = OPD3M_A, 
         opioids_chonpain3mo = OPDCHRONIC_A, opioid_freq3mo = OPDFREQ_A,
         GAD71_A, GAD72_A, GAD73_A, GAD74_A, 
         GAD75_A, GAD76_A, GAD77_A, anx_severity = GADCAT_A,
         back_pain = PAIBACK3M_A, upper_ext_pain = PAIULMB3M_A,
         lower_ext_pain = PAILLMB3M_A, headache_migraine = PAIHDFC3M_A,
         abd_pelvic_pain = PAIAPG3M_A, tooth_jaw_pain = PAITOOTH3M_A,
         chiro = PAICHIRO_A, yoga_taichi = PAIYOGA_A, 
         massage = PAIMASSAGE_A, meditation = PAIMEDITAT_A,
         race_ethnicity = HISPALLP_A, fam_income = FAMINCTC_A,
         pt_rehab_ot = PAIPHYSTPY_A, talk_therapy = PAITALKTPY_A,
         chron_pain_selfmgmt = PAIPROGRAM_A, peer_support_grp = PAIGROUP_A,
         PHQ81_A, PHQ82_A, PHQ83_A, PHQ84_A, PHQ85_A, 
         PHQ86_A, PHQ87_A, PHQ88_A, dep_severity = PHQCAT_A)

3.2.1 Recode Refused, Not Ascertained, and Don’t Know as Missing

Using the 2019 NHIS Codebook, I recoded all responses that were Refused, Not Ascertained, or Don’t Know as NA.

Code

projectb_clean$age[projectb_clean$age >= 97] <- NA
projectb_clean$sex[projectb_clean$sex >= 7] <- NA
projectb_clean$education[projectb_clean$education >= 97] <- NA
projectb_clean$painlimit_lifework[projectb_clean$painlimit_lifework >= 7] <- NA
projectb_clean$anymeds12mo[projectb_clean$anymeds12mo >= 7] <- NA
projectb_clean$opioids12mo[projectb_clean$opioids12mo >= 7] <- NA
projectb_clean$opioids3mo[projectb_clean$opioids3mo >= 7] <- NA
projectb_clean$opioids_chonpain3mo[projectb_clean$opioids_chonpain3mo >= 7] <- NA
projectb_clean$opioid_freq3mo[projectb_clean$opioid_freq3mo >= 7] <- NA
projectb_clean$GAD71_A[projectb_clean$GAD71_A >= 7] <- NA
projectb_clean$GAD72_A[projectb_clean$GAD72_A >= 7] <- NA
projectb_clean$GAD73_A[projectb_clean$GAD73_A >= 7] <- NA
projectb_clean$GAD74_A[projectb_clean$GAD74_A >= 7] <- NA
projectb_clean$GAD75_A[projectb_clean$GAD75_A >= 7] <- NA
projectb_clean$GAD76_A[projectb_clean$GAD76_A >= 7] <- NA
projectb_clean$GAD77_A[projectb_clean$GAD77_A >= 7] <- NA
projectb_clean$anx_severity[projectb_clean$anx_severity >= 8] <- NA


projectb_clean$back_pain[projectb_clean$back_pain >= 7] <- NA
projectb_clean$upper_ext_pain[projectb_clean$upper_ext_pain >= 7] <- NA
projectb_clean$lower_ext_pain[projectb_clean$lower_ext_pain >= 7] <- NA
projectb_clean$headache_migraine[projectb_clean$headache_migraine >= 7] <- NA
projectb_clean$abd_pelvic_pain[projectb_clean$abd_pelvic_pain >= 7] <- NA
projectb_clean$tooth_jaw_pain[projectb_clean$tooth_jaw_pain >= 7] <- NA

projectb_clean$chiro[projectb_clean$chiro >= 7] <- NA
projectb_clean$yoga_taichi[projectb_clean$yoga_taichi >= 7] <- NA
projectb_clean$massage[projectb_clean$massage >= 7] <- NA
projectb_clean$meditation[projectb_clean$meditation >= 7] <- NA

projectb_clean$race_ethnicity[projectb_clean$race_ethnicity >= 97] <- NA

projectb_clean$pt_rehab_ot[projectb_clean$pt_rehab_ot >= 7] <- NA
projectb_clean$talk_therapy[projectb_clean$talk_therapy >= 7] <- NA
projectb_clean$chron_pain_selfmgmt[projectb_clean$chron_pain_selfmgmt >= 7] <- NA
projectb_clean$peer_support_grp[projectb_clean$peer_support_grp >= 7] <- NA

projectb_clean$PHQ81_A[projectb_clean$PHQ81_A >= 7] <- NA
projectb_clean$PHQ82_A[projectb_clean$PHQ82_A >= 7] <- NA
projectb_clean$PHQ83_A[projectb_clean$PHQ83_A >= 7] <- NA
projectb_clean$PHQ84_A[projectb_clean$PHQ84_A >= 7] <- NA
projectb_clean$PHQ85_A[projectb_clean$PHQ85_A >= 7] <- NA
projectb_clean$PHQ86_A[projectb_clean$PHQ86_A >= 7] <- NA
projectb_clean$PHQ87_A[projectb_clean$PHQ87_A >= 7] <- NA
projectb_clean$PHQ88_A[projectb_clean$PHQ88_A >= 7] <- NA
projectb_clean$dep_severity[projectb_clean$dep_severity >= 8] <- NA

3.2.2 Recode Binary Variables From (1 = Yes, 2 = No) to (1 = Yes, 0 = No)

In the following code, I recoded binary variables with initial responses in the format, 1 = Yes, 2 = No, to binary variables coded as 1 = Yes, 0 = No using the mutate() function, subtracting 2 from the variables chiro, yoga_taichi, massage, meditation, anymeds12mo, opioids12mo, opioids3mo, opioids_chonpain3mo, pt_rehab_ot, talk_therapy, chron_pain_selfmgmt, and peer_support_grp.

Code

projectb_clean <- projectb_clean |>
  mutate(chiro = as.numeric(2 - chiro),
         yoga_taichi = as.numeric(2 - yoga_taichi),
         massage = as.numeric(2 - massage),
         meditation = as.numeric(2 - meditation),
         anymeds12mo = as.numeric(2 - anymeds12mo),
         opioids12mo = as.numeric(2 - opioids12mo),
         opioids3mo = as.numeric(2 - opioids3mo),
         opioids_chonpain3mo = as.numeric(2 - opioids_chonpain3mo),
         pt_rehab_ot = as.numeric(2 - pt_rehab_ot),
         talk_therapy = as.numeric(2 - talk_therapy),
         chron_pain_selfmgmt = as.numeric(2 - chron_pain_selfmgmt),
         peer_support_grp = as.numeric(2 - peer_support_grp))

3.2.3 Recode Categorical Variables to Meaningful Factors

Using the 2019 NHIS Codebook, I recoded the following variables to meaningful, ascending, and ordered factors using the mutate() and factor() functions. To make race_ethnicity 6 categories, I combined “NH Amer Indian Alaska Natv only” and “NH Amer Indian Alaska Natv and any other group” into “NH Amer Indian Alaska Natv.”

Code

projectb_clean <- projectb_clean |>
  mutate(sex = factor(case_when(
    sex == 1 ~ "Male",
    sex == 2 ~ "Female"))) |>
  mutate(race_ethnicity = case_when(
    race_ethnicity == 1 ~ "Hispanic",
    race_ethnicity == 2 ~ "NH White only",
    race_ethnicity == 3 ~ "NH Black/African American only",
    race_ethnicity == 4 ~ "NH Asian only",
    race_ethnicity == 5 ~ "NH Amer Indian Alaska Natv",
    race_ethnicity == 6 ~ "NH Amer Indian Alaska Natv",
    race_ethnicity == 7 ~ "Other single and multiple races")) |>
  mutate(race_ethnicity = 
           factor(race_ethnicity, 
                  levels = c("NH White only", 
                             "Hispanic", 
                             "NH Black/African American only", 
                             "NH Asian only", 
                             "NH Amer Indian Alaska Natv",
                             "Other single and multiple races"))) |>
  mutate(painlimit_lifework = case_when(
    painlimit_lifework == 1 ~ "Never",
    painlimit_lifework == 2 ~ "Some days",
    painlimit_lifework == 3 ~ "Most days",
    painlimit_lifework == 4 ~ "Every day")) |>
  mutate(painlimit_lifework = factor(painlimit_lifework, 
                                     levels = c("Never", 
                                                "Some days", 
                                                "Most days", 
                                                "Every day"))) |>
  mutate(anx_severity = case_when(
    anx_severity == 1 ~ "None/Minimal",
    anx_severity == 2 ~ "Mild",
    anx_severity == 3 ~ "Moderate",
    anx_severity == 4 ~ "Severe")) |>
  mutate(anx_severity = factor(anx_severity, 
                               levels = c("None/Minimal", 
                                          "Mild", 
                                          "Moderate", 
                                          "Severe"))) |>
  mutate(dep_severity = case_when(
    dep_severity == 1 ~ "None/Minimal",
    dep_severity == 2 ~ "Mild",
    dep_severity == 3 ~ "Moderate",
    dep_severity == 4 ~ "Severe")) |>
  mutate(dep_severity = factor(dep_severity, 
                               levels = c("None/Minimal", 
                                          "Mild", 
                                          "Moderate", 
                                          "Severe"))) |>
  mutate(opioid_freq3mo = case_when(
    opioid_freq3mo == 1 ~ "Some days",
    opioid_freq3mo == 2 ~ "Most days",
    opioid_freq3mo == 3 ~ "Every day")) |>
  mutate(opioid_freq3mo = factor(opioid_freq3mo, levels = c("Some days", 
                                                        "Most days", 
                                                        "Every day"))) |>
  mutate(education = case_when(
    education == 0 ~ "Less than high school",
    education == 1 ~ "Less than high school",
    education == 2 ~ "Less than high school",
    education == 3 ~ "High school graduate/GED",
    education == 4 ~ "High school graduate/GED",
    education == 5 ~ "Some college, no degree",
    education == 6 ~ "Associate's or bachelor's degree",
    education == 7 ~ "Associate's or bachelor's degree",
    education == 8 ~ "Associate's or bachelor's degree",
    education == 9 ~ "Master's, doctoral, or professional degree",
    education == 10 ~ "Master's, doctoral, or professional degree",
    education == 11 ~ "Master's, doctoral, or professional degree")) |>
  mutate(education = factor(education, levels = c("Less than high school", 
                                                  "High school graduate/GED",
                                                  "Some college, no degree",
                                                  "Associate's or bachelor's degree",
                                                  "Master's, doctoral, or professional degree"))) |>
  mutate(urban_rural = case_when(
    urban_rural == 1 ~ "Large central metro",
    urban_rural == 2 ~ "Large fringe metro",
    urban_rural == 3 ~ "Medium and small metro",
    urban_rural == 4 ~ "Nonmetropolitan")) |>
  mutate(urban_rural = factor(urban_rural, levels = c("Nonmetropolitan", 
                                                      "Medium and small metro", 
                                                      "Large fringe metro",
                                                      "Large central metro")))

3.2.4 Calculate GAD-7 total

In the following code, I recoded the original NHIS responses to conform to the actual scoring method for the GAD-7 (Löwe et al., 2008). I then calculated gad7total by adding each of the individual GAD-7 responses. Finally, to verify that gad7total was scored correctly, I summarized gad7total by anx_severity using favstats() to ensure that the values were in range. Interestingly, in the 2019 NHIS, it appears that they scored subjects even if they had missing individual items on the GAD-7.

Code

projectb_clean <- projectb_clean |>
  mutate(GAD71_A = case_when(
    GAD71_A == 1 ~ 0,
    GAD71_A == 2 ~ 1,
    GAD71_A == 3 ~ 2,
    GAD71_A == 4 ~ 3)) |>
  mutate(GAD72_A = case_when(
    GAD72_A == 1 ~ 0,
    GAD72_A == 2 ~ 1,
    GAD72_A == 3 ~ 2,
    GAD72_A == 4 ~ 3)) |>
  mutate(GAD73_A = case_when(
    GAD73_A == 1 ~ 0,
    GAD73_A == 2 ~ 1,
    GAD73_A == 3 ~ 2,
    GAD73_A == 4 ~ 3)) |>
  mutate(GAD74_A = case_when(
    GAD74_A == 1 ~ 0,
    GAD74_A == 2 ~ 1,
    GAD74_A == 3 ~ 2,
    GAD74_A == 4 ~ 3)) |>
  mutate(GAD75_A = case_when(
    GAD75_A == 1 ~ 0,
    GAD75_A == 2 ~ 1,
    GAD75_A == 3 ~ 2,
    GAD75_A == 4 ~ 3)) |>
  mutate(GAD76_A = case_when(
    GAD76_A == 1 ~ 0,
    GAD76_A == 2 ~ 1,
    GAD76_A == 3 ~ 2,
    GAD76_A == 4 ~ 3)) |>
  mutate(GAD77_A = case_when(
    GAD77_A == 1 ~ 0,
    GAD77_A == 2 ~ 1,
    GAD77_A == 3 ~ 2,
    GAD77_A == 4 ~ 3)) |>
  mutate(gad7total = GAD71_A + GAD72_A + GAD73_A + 
           GAD74_A + GAD75_A + GAD76_A + GAD77_A)

favstats(gad7total ~ anx_severity, data = projectb_clean) |> 
  flextable() |>
  colformat_double(digits = 2) |>
  theme_zebra(odd_header = "white") |>
  hline_top(part="body")

anx_severity	min	Q1	median	Q3	max	mean	sd	n	missing
None/Minimal	0.00	0.00	0.00	2.00	4.00	1.00	1.31	4,870	27
Mild	5.00	5.00	6.00	8.00	9.00	6.60	1.31	1,141	12
Moderate	10.00	11.00	12.00	13.00	14.00	11.83	1.42	574	4
Severe	15.00	16.00	18.00	19.00	21.00	17.82	2.13	508	5

3.2.5 Calculate PHQ-8 Total

In the following code, I recoded the original NHIS responses to conform to the actual scoring method for the PHQ-8 (Kroenke et al., 2001). I then calculated phq8total by adding each of the individual PHQ-8 responses.

Finally, to verify that phq8total was scored correctly, I summarized phq8total by dep_severity using favstats() to ensure that the values were in range. Interestingly, in the 2019 NHIS, it appears that they dropped “moderate-to-severe depression” as a value given that they did not use the full PHQ-9, and they scored subjects even if they had missing individual items on the PHQ-8.

Code

projectb_clean <- projectb_clean |>
  mutate(PHQ81_A = case_when(
    PHQ81_A == 1 ~ 0,
    PHQ81_A == 2 ~ 1,
    PHQ81_A == 3 ~ 2,
    PHQ81_A == 4 ~ 3)) |>
  mutate(PHQ82_A = case_when(
    PHQ82_A == 1 ~ 0,
    PHQ82_A == 2 ~ 1,
    PHQ82_A == 3 ~ 2,
    PHQ82_A == 4 ~ 3)) |>
  mutate(PHQ83_A = case_when(
    PHQ83_A == 1 ~ 0,
    PHQ83_A == 2 ~ 1,
    PHQ83_A == 3 ~ 2,
    PHQ83_A == 4 ~ 3)) |>
  mutate(PHQ84_A = case_when(
    PHQ84_A == 1 ~ 0,
    PHQ84_A == 2 ~ 1,
    PHQ84_A == 3 ~ 2,
    PHQ84_A == 4 ~ 3)) |>
  mutate(PHQ85_A = case_when(
    PHQ85_A == 1 ~ 0,
    PHQ85_A == 2 ~ 1,
    PHQ85_A == 3 ~ 2,
    PHQ85_A == 4 ~ 3)) |>
  mutate(PHQ86_A = case_when(
    PHQ86_A == 1 ~ 0,
    PHQ86_A == 2 ~ 1,
    PHQ86_A == 3 ~ 2,
    PHQ86_A == 4 ~ 3)) |>
  mutate(PHQ87_A = case_when(
    PHQ87_A == 1 ~ 0,
    PHQ87_A == 2 ~ 1,
    PHQ87_A == 3 ~ 2,
    PHQ87_A == 4 ~ 3)) |>
  mutate(PHQ88_A = case_when(
    PHQ88_A == 1 ~ 0,
    PHQ88_A == 2 ~ 1,
    PHQ88_A == 3 ~ 2,
    PHQ88_A == 4 ~ 3)) |>
  mutate(phq8total = PHQ81_A + PHQ82_A + PHQ83_A + PHQ84_A + 
           PHQ85_A + PHQ86_A + PHQ87_A + PHQ88_A)

favstats(phq8total ~ dep_severity, data = projectb_clean) |> 
  flextable() |>
  colformat_double(digits = 2) |>
  theme_zebra(odd_header = "white") |>
  hline_top(part="body")

dep_severity	min	Q1	median	Q3	max	mean	sd	n	missing
None/Minimal	0.00	0.00	1.00	3.00	4.00	1.46	1.42	4,163	34
Mild	5.00	5.00	6.00	8.00	9.00	6.66	1.40	1,585	17
Moderate	10.00	10.00	11.00	13.00	14.00	11.68	1.39	749	13
Severe	15.00	16.00	18.00	20.00	24.00	18.32	2.80	566	9

3.2.6 Integrative Health and Medicine (IHM) Variables

In the following code, I performed the following steps to create the 5-level count outcome, ihm4pain_num, and the binary variable, ihm4pain_bin.

First, I used the mutate() function to create ihm4pain_num as the sum of engaging in chiro, yoga_taichi, massage, or meditation to manage pain in the past 3 months.
Then, I created the binary outcome, ihm4pain_bin, which was equal to 1 if ihm4pain_num >= 1 and equal to 0 if ihm4pain_num == 0.

Code

projectb_clean <- projectb_clean |>
  mutate(ihm4pain_num = chiro + yoga_taichi + massage + meditation,
         ihm4pain_bin = case_when(ihm4pain_num >= 1 ~ 1,
                                  ihm4pain_num == 0 ~ 0))

3.2.7 Opioid frequency

To create a single variable describing frequency of opioid use in the past 3 months to manage chronic pain, I performed the following steps:

First, since 1) the opioid frequency question was only asked to subjects who were taking prescription medications and opioid medications and 2) subjects responding “No” to the prescription medication and opioid use questions would be assumed to have never taken opioids in the past 3 months, I coded opioid_freq3mo_final as equal to “Never” if the subject answered “No” to anymeds12mo, opioids12mo, opioids3mo, or opioids_chonpain3mo.
Next, if subjects responded “Some days”, “Most days”, “Every day” to the opioid_freq3mo question, I coded the opioid_freq3mo_final variable as equal to that response.
Finally, I coded opioid_freq3mo_final as a meaningfully ordered factor.

Code

projectb_clean <- projectb_clean |>
  mutate(opioid_freq3mo_final = case_when(
    anymeds12mo == 0|
      opioids12mo == 0|
      opioids3mo == 0|
      opioids_chonpain3mo == 0 ~ "Never",
    opioid_freq3mo %in% c("Some days", "Most days", "Every day") ~ 
      opioid_freq3mo),
  opioid_freq3mo_final = factor(opioid_freq3mo_final, 
                                levels = c("Never",
                                           "Some days",
                                           "Most days",
                                           "Every day")))

3.2.8 Number of Pain Locations

To create a single variable quantifying the total number of locations in which subjects reported being bothered at least “a little” by pain in the past three months, I performed the following steps:

I converted the 6 individual pain variables to binary variables such that 1 indicated being bothered at least “a little” (>= 2), and 0 meant “Not at all.”
I then calculated num_pain_locations as the sum of those binary variables, ignoring missingness using the rowSums() function.
Finally, I used mutate() and if_else() to mark num_pain_locations as missing if all pain locations were missing.

Code

projectb_clean <- projectb_clean |>
  mutate(back_pain = if_else(back_pain >= 2, 1, 0),
         upper_ext_pain = if_else(upper_ext_pain >= 2, 1, 0),
         lower_ext_pain = if_else(lower_ext_pain >= 2, 1, 0),
         headache_migraine = if_else(headache_migraine >= 2, 1, 0),
         abd_pelvic_pain = if_else(abd_pelvic_pain >= 2, 1, 0),
         tooth_jaw_pain = if_else(tooth_jaw_pain >= 2, 1, 0))

projectb_clean$num_pain_locations <- rowSums(projectb_clean[,c("back_pain",
                                             "upper_ext_pain",
                                             "lower_ext_pain",
                                             "headache_migraine",
                                             "abd_pelvic_pain",
                                             "tooth_jaw_pain")], na.rm=TRUE)

projectb_clean <- projectb_clean |>
  mutate(num_pain_locations = if_else(
    is.na(back_pain) & is.na(upper_ext_pain) &
      is.na(lower_ext_pain) & is.na(headache_migraine) &
      is.na(abd_pelvic_pain) & is.na(tooth_jaw_pain), NA, num_pain_locations))

3.2.9 Chronic Pain Treatment Category - SUPPORT

In the following code, I performed the following steps to create the 4-category outcome, pain_tx_type.

First, I used the mutate() function to create support4pain_num as the sum of engaging in pt_rehab_ot, talk_therapy, chron_pain_selfmgmt, or peer_support_grp to manage pain in the past 3 months. In this case, SUPPORT stands for SUpportive, Physical, Psychological, Occupational, or Rehabilitative Therapies.
Next, I created the binary outcome, support4pain_bin, which was equal to 1 if support4pain_num >= 1 and equal to 0 if support4pain_num == 0.
Then, I created the outcome variable pain_tx_type as equal to None, IHM only, SUPPORT only, or SUPPORT and IHM based on the the support4pain_num and ihm4pain_bin variables.
Finally, I ensured pain_tx_type was coded as a factor using the factor function, with “None” as the reference category.

Code

projectb_clean <- projectb_clean |>
  mutate(support4pain_num = pt_rehab_ot + talk_therapy + 
           chron_pain_selfmgmt + peer_support_grp,
         support4pain_bin = case_when(support4pain_num >= 1 ~ 1,
                                  support4pain_num == 0 ~ 0),
         pain_tx_type = case_when(
           ihm4pain_bin == 0 & support4pain_bin == 0 ~ "None",
           ihm4pain_bin == 1 & support4pain_bin == 0 ~ "IHM only",
           ihm4pain_bin == 0 & support4pain_bin == 1 ~ "SUPPORT only",
           ihm4pain_bin == 1 & support4pain_bin == 1 ~ "SUPPORT and IHM"),
         pain_tx_type = factor(pain_tx_type, levels = c("None", 
                                                        "IHM only", 
                                                        "SUPPORT only", 
                                                        "SUPPORT and IHM")))

3.2.10 Chronic Pain Treatment Category - Opioids

Code

projectb_clean <- projectb_clean |>
  mutate(nonpharm_opioid = case_when(
    opioid_freq3mo_final == "Never" &
      ihm4pain_bin == 0 & support4pain_bin == 0 ~ "None",
    opioid_freq3mo_final %in% c("Some days",
                                "Most days",
                                "Every day") &
      ihm4pain_bin == 0 & support4pain_bin == 0 ~ "Opioids only",
    opioid_freq3mo_final == "Never" &
      (ihm4pain_bin == 1|support4pain_bin == 1) ~ "Nonpharm only",
    opioid_freq3mo_final %in% c("Some days",
                                "Most days",
                                "Every day") &
      (ihm4pain_bin == 1|support4pain_bin == 1) ~ "Opioids + Nonpharm"),
    nonpharm_opioid = factor(nonpharm_opioid, levels = c("None", 
                                                     "Opioids only", 
                                                     "Nonpharm only", 
                                                     "Opioids + Nonpharm")))


projectb_clean |> tabyl(nonpharm_opioid) |>
  adorn_pct_formatting() |>
  flextable() |>
  colformat_double(digits = 0) |>
  autofit() |>
  theme_zebra(odd_header = "white") |>
  hline_top(part="body")

nonpharm_opioid	n	percent	valid_percent
None	3,201	44.6%	45.0%
Opioids only	510	7.1%	7.2%
Nonpharm only	2,791	38.9%	39.2%
Opioids + Nonpharm	615	8.6%	8.6%
	67	0.9%	-

3.2.11 IHM Category

Code

projectb_clean <- projectb_clean |>
  mutate(manualtx4pain = case_when(
    chiro + massage >= 1 ~ 1,
    chiro + massage == 0 ~ 0),
    mindbodytx4pain = case_when(
      yoga_taichi + meditation >= 1 ~ 1,
      yoga_taichi + meditation  == 0 ~ 0),
    ihm4pain_cat = case_when(
      manualtx4pain + mindbodytx4pain == 0 ~ "None",
      manualtx4pain == 1 & mindbodytx4pain == 0 ~ "Manual only",
      manualtx4pain == 0 & mindbodytx4pain == 1 ~ "Mind-body only",
      manualtx4pain == 1 & mindbodytx4pain == 1 ~ "Manual and mind-body"),
    ihm4pain_cat = factor(ihm4pain_cat, levels = c("None", 
                                                   "Manual only", 
                                                   "Mind-body only", 
                                                   "Manual and mind-body")))
projectb_clean |> tabyl(ihm4pain_cat) |>
  adorn_pct_formatting() |>
  flextable() |>
  colformat_double(digits = 0) |>
  autofit() |>
  theme_zebra(odd_header = "white") |>
  hline_top(part="body")

ihm4pain_cat	n	percent	valid_percent
None	4,587	63.9%	64.0%
Manual only	1,125	15.7%	15.7%
Mind-body only	818	11.4%	11.4%
Manual and mind-body	633	8.8%	8.8%
	21	0.3%	-

3.2.12 Selecting Variables I Used and Arranging the Tibble

With all variables recoded, I created the projectb_data tibble as a subset of projectb_clean, limited to the variables needed for analysis and subjects with complete data on the outcomes, ihm4pain_num and nonpharm_opioid. I also renamed opioid_freq3mo_final as opioid_freq3mo.

Code

projectb_data <- projectb_clean |>
  select(subject_id, age, sex, education, urban_rural,
         painlimit_lifework, opioid_freq3mo = opioid_freq3mo_final,
         gad7total, phq8total, num_pain_locations,
         ihm4pain_num, race_ethnicity, fam_income, 
         nonpharm_opioid, ihm4pain_bin, ihm4pain_cat, wt_final) |>
  filter(!is.na(ihm4pain_num), !is.na(nonpharm_opioid))

3.2.13 Working with Categorical Predictors

To examine distribution of categorical predictors, I used the tbl_summary() function to summarize the factor variables in projectb_data.

Code

projectb_data |> select(where(~ is.factor(.x))) |>
  tbl_summary() |>
  modify_header(label ~ "**Variable**") |>
  bold_labels()

Variable	N = 7,114¹
sex
Female	4,050 (57%)
Male	3,063 (43%)
Unknown	1
education
Less than high school	870 (12%)
High school graduate/GED	2,057 (29%)
Some college, no degree	1,325 (19%)
Associate's or bachelor's degree	2,193 (31%)
Master's, doctoral, or professional degree	628 (8.9%)
Unknown	41
urban_rural
Nonmetropolitan	1,491 (21%)
Medium and small metro	2,500 (35%)
Large fringe metro	1,456 (20%)
Large central metro	1,667 (23%)
painlimit_lifework
Never	1,999 (28%)
Some days	2,480 (35%)
Most days	1,088 (15%)
Every day	1,543 (22%)
Unknown	4
opioid_freq3mo
Never	5,991 (84%)
Some days	342 (4.8%)
Most days	133 (1.9%)
Every day	648 (9.1%)
race_ethnicity
NH White only	5,421 (76%)
Hispanic	605 (8.5%)
NH Black/African American only	743 (10%)
NH Asian only	113 (1.6%)
NH Amer Indian Alaska Natv	168 (2.4%)
Other single and multiple races	64 (0.9%)
nonpharm_opioid
None	3,201 (45%)
Opioids only	510 (7.2%)
Nonpharm only	2,790 (39%)
Opioids + Nonpharm	613 (8.6%)
ihm4pain_cat
None	4,554 (64%)
Manual only	1,114 (16%)
Mind-body only	815 (11%)
Manual and mind-body	631 (8.9%)
¹ n (%)

In examining the table above, I decided that the variable race_ethnicity required some collapsing of categories to ensure all categories contained at least 25 subjects in the testing and training samples. Therefore, I combined the race_ethnicity categories “NH Asian only”, “NH Amer Indian Alaska Natv”, and “Other single and multiple races” into “Other single and multiple races.”

Code

projectb_data <- projectb_data |>
  mutate(race_ethnicity = case_when(
    race_ethnicity == "NH White only" ~ "NH White only",
    race_ethnicity == "Hispanic" ~ "Hispanic",
    race_ethnicity == "NH Black/African American only" ~ 
      "NH Black/African American only",
    race_ethnicity == "NH Asian only" ~ "Other single and multiple races",
    race_ethnicity == "NH Amer Indian Alaska Natv" ~ 
      "Other single and multiple races",
    race_ethnicity == "Other single and multiple races" ~ 
      "Other single and multiple races")) |>
  mutate(race_ethnicity = factor(race_ethnicity, 
                                 levels = c("NH White only", 
                                            "Hispanic", 
                                            "NH Black/African American only", 
                                            "Other single and multiple races")))

3.3 Print and Save The Final Tibble

Next, I verified that projectb_data was a tibble by printing it. In this tibble, I can confirm that:

It is a tibble and thus calling it prints only the first 10 rows.
The initial row tells me that this is a tibble and specifies its dimensions.
I have a complete set of 7,114 NHIS subjects.
I’ve included only 17 variables

Code

projectb_data

# A tibble: 7,114 × 17
   subject_id   age sex    education              urban_rural painlimit_lifework
   <chr>      <dbl> <fct>  <fct>                  <fct>       <fct>             
 1 H007122       60 Male   Some college, no degr… Large cent… Some days         
 2 H007736       78 Male   Some college, no degr… Large cent… Most days         
 3 H022161       61 Male   High school graduate/… Large cent… Some days         
 4 H045961       71 Female Associate's or bachel… Large cent… Never             
 5 H025759       80 Female Some college, no degr… Large cent… Some days         
 6 H003675       52 Female Master's, doctoral, o… Large cent… Most days         
 7 H044229       25 Female Some college, no degr… Nonmetropo… Never             
 8 H035660       37 Female Some college, no degr… Medium and… Some days         
 9 H062584       65 Female Master's, doctoral, o… Medium and… Some days         
10 H008849       26 Male   Less than high school  Nonmetropo… Some days         
# ℹ 7,104 more rows
# ℹ 11 more variables: opioid_freq3mo <fct>, gad7total <dbl>, phq8total <dbl>,
#   num_pain_locations <dbl>, ihm4pain_num <dbl>, race_ethnicity <fct>,
#   fam_income <dbl>, nonpharm_opioid <fct>, ihm4pain_bin <dbl>,
#   ihm4pain_cat <fct>, wt_final <dbl>

3.3.1 Save The Tibble

Finally, I saved the final tibble, projectb_data, as an Rds file using the saveRDS() function.

Code

saveRDS(projectb_data, file = "data/projectb_data2024-01-07.Rds")

4 Code Book and Description

4.1 Defining the Variables

7,114 adults ages 18-85 with pain on most days or every day in the past 3 months and participating in the 2019 NHIS with data on the variables listed in the table below.

For each of my 17 selected variables, I have included the original variable name from the raw NHIS file.

Variable	Role	Type	Description (units/levels)	Original NHIS Variable Name
`subject_id`	identifier	-	Randomly assigned household number unique to a household (H000074-H064180)	HHX
`age`	input	quant	Age of subject	AGEP_A
`sex`	input	2-cat	Sex (Male or Female)	SEX_A
`education`	input	5-cat	Education level (Less than high school; High school graduate/GED; Some college, no degree; Associate’s or bachelor’s degree; Master’s, doctoral, or professional degree)	EDUC_A
`urban_rural`	input	4-cat	Urban-rural classification scheme for counties (Large central metro, Large fringe metro, Medium and small metro, Non-metropolitan)	URBRRL
`painlimit_lifework`	input	4-cat	How often pain limited the subject’s life or work activities in the past 3 months (Never, Some days, Most days, or Every day)	PAIWKLM3M_A
`opioid_freq3mo`	input	4-cat	How often the subject took a prescription opioid for chronic pain in the past three months (Never, Some days, Most days, or Every day)	RX12M_A, OPD12M_A, OPD3M_A, OPDCHRONIC_A, OPDFREQ_A
`gad7total`	input	quant	Subject’s total score on the GAD-7 anxiety scale (scale units 0 - 21)	SUM(GAD71_A, GAD72_A, GAD73_A, GAD74_A, GAD75_A, GAD76_A, GAD77_A)
`phq8total`	input	quant	Subject’s total score on the PHQ-8 depression scale (scale units 0 - 24)	SUM(PHQ81_A, PHQ82_A, PHQ83_A, PHQ84_A, PHQ85_A, PHQ86_A, PHQ87_A, PHQ88_A)
`num_pain_locations`	input	quant	Total number of locations in which subject reported being bothered at least “a little” by pain in the past three months (0 - 6)	PAIBACK3M_A, PAIULMB3M_A, PAILLMB3M_A, PAIHDFC3M_A, PAIAPG3M_A, PAITOOTH3M_A
`ihm4pain_num`	outcome	quant	How many integrative health and medicine (IHM) modalities (i.e., chiropractic care, yoga/Tai Chi, meditation/guided imagery, or massage) subject used to manage pain in the past three months (0, 1, 2, 3, or 4)	PAICHIRO_A, PAIYOGA_A, PAIMASSAGE_A, PAIMEDITAT_A
`race_ethnicity`	input	4-cat	Race and Hispanic origin (NH White only, Hispanic, NH Black/African American only, or Other single and multiple races)	HISPALLP_A
`fam_income`	input	quant	Adult annual family income	FAMINCTC_A
`nonpharm_opioid`	outcome	4-cat	Engagement in one of the following categories of care for pain in the past three months (None, Opioids only, Nonpharm only, Opioids + Nonpharm).	PAICHIRO_A, PAIYOGA_A, PAIMASSAGE_A, PAIMEDITAT_A, PAIPHYSTPY_A, PAITALKTPY_A, PAIPROGRAM_A, PAIGROUP_A, RX12M_A, OPD12M_A, OPD3M_A, OPDCHRONIC_A, OPDFREQ_A
`ihm4pain_bin`	outcome	quant	Whether or not subject engaged in integrative health and medicine (IHM) modalities (i.e., chiropractic care, yoga/Tai Chi, meditation/guided imagery, or massage) to manage pain in the past three months (0, 1, 2, 3, or 4)	PAICHIRO_A, PAIYOGA_A, PAIMASSAGE_A, PAIMEDITAT_A
`ihm4pain_cat`	outcome	quant	Which type of (IHM) modalities (i.e., chiropractic care, yoga/Tai Chi, meditation/guided imagery, or massage) subject used to manage pain in the past three months (None, Manual only, Mind-body only, Manual and mind-body)	PAICHIRO_A, PAIYOGA_A, PAIMASSAGE_A, PAIMEDITAT_A
`wt_final`	weight	quant	Weight - Final Annual	WTFA_A

4.2 Numerical Description

The following code provides a numerical summary of all variables included in the projectb_data dataset using the describe() function from the Hmisc() library.

In examining the data, I confirm that:

There are no impossible values for age (i.e., > 85 or < 18), fam_income, or phq8total (i.e., > 24 or < 0).
Each factor variable is in an appropriate ascending order.
All categorical and binary variables have at least 30 observations in each category.

Code

describe(projectb_data)

projectb_data 

 17  Variables      7114  Observations
--------------------------------------------------------------------------------
subject_id 
       n  missing distinct 
    7114        0     7114 

lowest : H000003 H000005 H000010 H000023 H000040
highest: H064153 H064154 H064160 H064165 H064180
--------------------------------------------------------------------------------
age 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    7107        7       68        1    58.68    18.56       29       35 
     .25      .50      .75      .90      .95 
      48       60       71       80       85 

lowest : 18 19 20 21 22, highest: 81 82 83 84 85
--------------------------------------------------------------------------------
sex 
       n  missing distinct 
    7113        1        2 
                        
Value      Female   Male
Frequency    4050   3063
Proportion  0.569  0.431
--------------------------------------------------------------------------------
education 
       n  missing distinct 
    7073       41        5 

lowest : Less than high school                      High school graduate/GED                   Some college, no degree                    Associate's or bachelor's degree           Master's, doctoral, or professional degree
highest: Less than high school                      High school graduate/GED                   Some college, no degree                    Associate's or bachelor's degree           Master's, doctoral, or professional degree
--------------------------------------------------------------------------------
urban_rural 
       n  missing distinct 
    7114        0        4 
                                                                               
Value             Nonmetropolitan Medium and small metro     Large fringe metro
Frequency                    1491                   2500                   1456
Proportion                  0.210                  0.351                  0.205
                                 
Value         Large central metro
Frequency                    1667
Proportion                  0.234
--------------------------------------------------------------------------------
painlimit_lifework 
       n  missing distinct 
    7110        4        4 
                                                  
Value          Never Some days Most days Every day
Frequency       1999      2480      1088      1543
Proportion     0.281     0.349     0.153     0.217
--------------------------------------------------------------------------------
opioid_freq3mo 
       n  missing distinct 
    7114        0        4 
                                                  
Value          Never Some days Most days Every day
Frequency       5991       342       133       648
Proportion     0.842     0.048     0.019     0.091
--------------------------------------------------------------------------------
gad7total 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    7033       81       22    0.946    3.977    5.245        0        0 
     .25      .50      .75      .90      .95 
       0        2        6       12       16 

lowest :  0  1  2  3  4, highest: 17 18 19 20 21
--------------------------------------------------------------------------------
phq8total 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    7007      107       25    0.985    5.058    5.674        0        0 
     .25      .50      .75      .90      .95 
       1        3        8       13       17 

lowest :  0  1  2  3  4, highest: 20 21 22 23 24
--------------------------------------------------------------------------------
num_pain_locations 
       n  missing distinct     Info     Mean      Gmd 
    7106        8        7    0.954    3.071    1.514 
                                                    
Value          0     1     2     3     4     5     6
Frequency     82   850  1554  2034  1469   819   298
Proportion 0.012 0.120 0.219 0.286 0.207 0.115 0.042

For the frequency table, variable is rounded to the nearest 0
--------------------------------------------------------------------------------
ihm4pain_num 
       n  missing distinct     Info     Mean      Gmd 
    7114        0        5    0.725    0.545   0.7913 
                                        
Value          0     1     2     3     4
Frequency   4554  1602   667   223    68
Proportion 0.640 0.225 0.094 0.031 0.010

For the frequency table, variable is rounded to the nearest 0
--------------------------------------------------------------------------------
race_ethnicity 
       n  missing distinct 
    7114        0        4 
                                                                          
Value                        NH White only                        Hispanic
Frequency                             5421                             605
Proportion                           0.762                           0.085
                                                                          
Value       NH Black/African American only Other single and multiple races
Frequency                              743                             345
Proportion                           0.104                           0.048
--------------------------------------------------------------------------------
fam_income 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    7114        0     1225        1    57217    51257     8741    11000 
     .25      .50      .75      .90      .95 
   20110    40434    79000   125000   160000 

lowest :      0      1      4      5     10, highest: 209500 210000 215341 218000 220000
--------------------------------------------------------------------------------
nonpharm_opioid 
       n  missing distinct 
    7114        0        4 
                                                                   
Value                    None       Opioids only      Nonpharm only
Frequency                3201                510               2790
Proportion              0.450              0.072              0.392
                             
Value      Opioids + Nonpharm
Frequency                 613
Proportion              0.086
--------------------------------------------------------------------------------
ihm4pain_bin 
       n  missing distinct     Info      Sum     Mean      Gmd 
    7114        0        2    0.691     2560   0.3599   0.4608 

--------------------------------------------------------------------------------
ihm4pain_cat 
       n  missing distinct 
    7114        0        4 
                                                                         
Value                      None          Manual only       Mind-body only
Frequency                  4554                 1114                  815
Proportion                0.640                0.157                0.115
                               
Value      Manual and mind-body
Frequency                   631
Proportion                0.089
--------------------------------------------------------------------------------
wt_final 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    7114        0     7072        1     6986     4729     2341     2761 
     .25      .50      .75      .90      .95 
    3692     5837     8509    12623    16424 

lowest : 733.485 738.112 762.185 763.376 822.349
highest: 45380.6 48013.1 49073.1 51662.9 78079.9
--------------------------------------------------------------------------------

5 Single Imputation

5.1 Missingness

For this analysis, I assumed that the data were Missing at Random (MAR). Specifically, I assumed that the probability of a predictor missing data depended upon available information from the other predictors. For example, missing phq8total data may be related to other variables such as how much pain limited the subject’s life or work activities (painlimit_lifework) or race_ethnicity rather than their level of depression. In this situation, certain subjects may have been less likely to rate their anxiety based on their other characteristics. Thus, single imputation was warranted and required as missingness could be accounted for by studying variables with complete information. Missing NOT at Random (MNAR) was not an appropriate assumption here as there was no indication in the data or the 2019 NHIS Codebook that data were missing due to the variables themselves.

5.1.1 Missing Values

To explore missingness within the projectb_data, I first used the n_miss() to quantify total missing values.

Code

projectb_data |> n_miss()

[1] 249

5.1.2 Missing Values Within Variables

I then used the miss_var_summary() to determine which variables contained those 249 missing values. These variables included phq8total, gad7total, education, num_pain_locations, age, painlimit_lifework, and sex. Given the relative infrequency of missing data (highest = 1.50%), I expected that performing imputation would not alter subsequent models in any meaningful way.

Code

miss_var_summary(projectb_data) |>   flextable() |>
  colformat_double(digits = 2) |>
  autofit() |>
  theme_zebra(odd_header = "white") |>
  hline_top(part="body")

variable	n_miss	pct_miss
phq8total	107	1.50
gad7total	81	1.14
education	41	0.58
num_pain_locations	8	0.11
age	7	0.10
painlimit_lifework	4	0.06
sex	1	0.01
subject_id	0	0.00
urban_rural	0	0.00
opioid_freq3mo	0	0.00
ihm4pain_num	0	0.00
race_ethnicity	0	0.00
fam_income	0	0.00
nonpharm_opioid	0	0.00
ihm4pain_bin	0	0.00
ihm4pain_cat	0	0.00
wt_final	0	0.00

5.1.3 Missing Values within Cases

Finally, to understand missing values within cases, I used the miss_case_table() function, which revealed that 159 cases were missing 1 value, 42 cases were missing 2 values, and 2 cases were missing 3 values

Code

miss_case_table(projectb_data) |> flextable() |>
  colformat_double(digits = 0) |>
  autofit() |>
  theme_zebra(odd_header = "white") |>
  hline_top(part="body")

n_miss_in_case	n_cases	pct_cases
0	6,911	97
1	159	2
2	42	1
3	2	0

5.2 Generating the Single Imputation Dataset

To perform single imputation and generate the projectb_si dataset, I performed the following steps:

Set a seed of 19900317 using the set.seed() function
Created projectb_si as a data.frame() derived from projectb_data
Used robust linear models to impute the quantitative variable gad7total based on the predictors with no missing values: urban_rural, race_ethnicity, fam_income and opioid_freq3mo,
Used robust linear models to impute the quantitative variable phq8total based on gad7total and the predictors with no missing values: urban_rural, race_ethnicity, fam_income, and opioid_freq3mo.
Used classification and regression trees (impute_cart()) to impute the categorical variable education based on the now complete variable, phq8total, and the other complete predictors.
Used predictive mean matching (impute_pmm()) to impute the quantitative integer variable num_pain_locations based on the now complete variable, education, and the other complete predictors as I wanted the imputation to be an integer.
Used robust linear models to impute the quantitative variable age based on the now complete variable, num_pain_locations, and the other complete predictors.
Used classification and regression trees to impute the categorical variable painlimit_lifework based on the now complete variable, age, and the other complete predictors.
Used classification and regression trees to impute the categorical variable sex based on the now complete variable, painlimit_lifework, and the other complete predictors.
Used the as_tibble() function to ensure projectb_si was stored as a tibble.
I performed mutations to use the variable one in the survey design and transform covariates so that coefficients would represent meaningful increases in (1) age10 [10 unit increment increase in age]; (2) gad7total5unit [5-unit increases in GAD-7]; (3) phq8total5unit [5-unit increases in PHQ-8]; (4) fam_income10k [10,000 dollar increase in family income]; (5) fam_income10k [1,000 dollar increase in family income].
I created a meaningful factor for the ihm4pain_bin_fact variable (i.e., IHM vs. No IHM)
Used the n_miss() function to verify that projectb_si contained no missing values.

Code

set.seed(19900317)
projectb_si <- projectb_data |> data.frame() |>
  impute_rlm(gad7total ~ urban_rural + race_ethnicity +
               fam_income + opioid_freq3mo) |>
  impute_rlm(phq8total ~ gad7total + urban_rural + race_ethnicity +
               fam_income + opioid_freq3mo) |>
  impute_cart(education ~ phq8total + gad7total + urban_rural + race_ethnicity +
               fam_income + opioid_freq3mo) |>
  impute_pmm(num_pain_locations ~ phq8total + opioid_freq3mo + education + gad7total + 
               urban_rural + race_ethnicity + fam_income) |>
  impute_rlm(age ~ phq8total + num_pain_locations + opioid_freq3mo + education + 
               gad7total + urban_rural + race_ethnicity + 
               fam_income) |>
  impute_cart(painlimit_lifework ~ phq8total + age + num_pain_locations + 
                opioid_freq3mo + education + gad7total + 
                urban_rural + race_ethnicity + fam_income) |>
  impute_cart(sex ~ phq8total + painlimit_lifework + age + num_pain_locations + 
                opioid_freq3mo + education + gad7total + 
                urban_rural + race_ethnicity + fam_income) |>
  as_tibble() |>
  mutate(one = 1,
         age10 = age/10,
         gad7total5unit = gad7total/5,
         phq8total5unit = phq8total/5,
         fam_income10k = round(fam_income/10000, digits = 2),
         fam_income1k = round(fam_income/1000, digits = 2),
         ihm4pain_bin_fact = factor(case_when(ihm4pain_bin == 0 ~ "No IHM",
                                              ihm4pain_bin == 1 ~ "IHM")))
n_miss(projectb_si)

[1] 0

6 Model 1: Engagement in IHM

For Model 1, I sought to answer the question:

Which demographic, mental health, and pain-related variables are associated with engagement in the number of IHM modalities used for pain in the past 3 months among adults with chronic pain?

In this analysis, I chose to predict ihm4pain_num based on 10 variables: age10, sex, education, urban_rural, painlimit_lifework, opioid_freq3mo, phq8total5unit, num_pain_locations, race_ethnicity, and fam_income10k.

6.1 Table 1: Bivariate Analysis

The following code details the steps taken to generate Table 1.

6.1.1 Create Survey Design

Code

nhis_des <- svydesign(data = projectb_si, id = ~ subject_id,
                        weights = ~ wt_final, nest = TRUE)

6.1.2 Total population represented

Code

total_n <- svytotal(~ one, nhis_des, na.rm = FALSE) |> as_tibble() |>
  mutate(Variable = "Total, n",
         total = floor(total),
         total = formatC(total, format = "d", big.mark = ","),
         p.value = "") |>
  select(Variable, All = total, p.value)

Code

ihm_n <- svyby(~ one, ~ ihm4pain_bin, nhis_des, svytotal) |> as_tibble() |> filter(ihm4pain_bin == 1) |> select(IHM = one) |>
  mutate(IHM = round(IHM),
         IHM = formatC(IHM, format = "d", big.mark = ","))
no_ihm_n <- svyby(~ one, ~ ihm4pain_bin, nhis_des, svytotal) |> as_tibble() |> filter(ihm4pain_bin == 0) |> select(NoIHM = one) |>
  mutate(NoIHM = round(NoIHM),
         NoIHM = formatC(NoIHM, format = "d", big.mark = ","))
total_n <- bind_cols(total_n, ihm_n, no_ihm_n) |>
  mutate(All = as.character(All),
         IHM = as.character(IHM),
         NoIHM = as.character(NoIHM))

Code

total_n <- total_n |> select(Variable, All, IHM, NoIHM, p.value) 

total_n |>
  flextable() |>
  autofit() |>
  bold(bold = TRUE, part = "header") |>
  theme_zebra(odd_header = "white") |>
  hline_top(part="body")

Variable	All	IHM	NoIHM	p.value
Total, n	49,695,212	17,610,400	32,084,812

6.1.3 Crosstabulation

Code

style_percent_1digit <- purrr::partial(gtsummary::style_percent, digits = 1)
style_number_1digit <- purrr::partial(gtsummary::style_number, digits = 1)

table1_cross <- nhis_des |>
  tbl_svysummary(
    include = c(age, sex, race_ethnicity, education, 
                fam_income1k, urban_rural, num_pain_locations,
                painlimit_lifework, opioid_freq3mo, phq8total),
    type = c(num_pain_locations = "continuous",
             fam_income = "continuous"),
    statistic = list(all_continuous()  ~ "{mean}",
                     all_categorical() ~ "{p}"),
    label = list(age ~ "Age in years, mean (95% CI)",
                 sex ~ "Sex, % (95% CI)",
                 race_ethnicity ~ "Race/Ethnicity, % (95% CI)",
                 education ~ "Education, % (95% CI)",
                 fam_income1k ~ "Family income in $1000s, mean (95% CI)",
                 urban_rural ~ "Proximity to metro area, % (95% CI)",
                 num_pain_locations ~ "Number of pain locations, mean (95% CI)",
                 painlimit_lifework ~ "Pain limiting life/work, % (95% CI)",
                 opioid_freq3mo ~ "Opioid use in past 3 months, % (95% CI)",
                 phq8total ~ "PHQ-8 total, mean (95% CI)"),
    digits = list(all_continuous() ~ 1,
                  all_categorical() ~ 1),
    by = ihm4pain_bin_fact) |>
  add_overall() |>
  add_ci(pattern = "{stat} ({ci})", 
         style_fun = list(all_categorical() ~ style_percent_1digit, 
                          all_continuous() ~ style_number_1digit),
         statistic = list(all_categorical() ~ "{conf.low}-{conf.high}", 
                          all_continuous() ~ "{conf.low}-{conf.high}")) |>
  add_p(pvalue_fun = function(x) style_pvalue(x, digits = 3)) |>
  modify_header(label = "Variable",
                stat_0 ~ "All",
                stat_1 ~ "IHM",
                stat_2 ~ "NoIHM",
                p.value ~ "p.value")

table1_cross

Variable	All^1,2	IHM^1,2	NoIHM^1,2	p.value³
Age in years, mean (95% CI)	55.4 (54.9-55.9)	51.8 (51.0-52.6)	57.3 (56.7-57.9)	<0.001
Sex, % (95% CI)				<0.001
Female	54.9 (53.5-56.3)	60.3 (57.9-62.6)	52.0 (50.2-53.7)
Male	45.1 (43.7-46.5)	39.7 (37.4-42.1)	48.0 (46.3-49.8)
Race/Ethnicity, % (95% CI)				0.005
NH White only	73.2 (71.9-74.5)	74.8 (72.7-76.8)	72.3 (70.6-73.9)
Hispanic	10.5 (9.59-11.5)	10.1 (8.79-11.6)	10.7 (9.52-12.0)
NH Black/African American only	11.0 (10.1-11.9)	9.0 (7.73-10.5)	12.1 (11.0-13.3)
Other single and multiple races	5.3 (4.66-6.03)	6.1 (5.01-7.33)	4.9 (4.11-5.81)
Education, % (95% CI)				<0.001
Less than high school	15.5 (14.4-16.7)	8.8 (7.45-10.4)	19.2 (17.7-20.7)
High school graduate/GED	30.4 (29.1-31.7)	25.2 (23.1-27.5)	33.2 (31.6-34.9)
Some college, no degree	19.1 (18.0-20.2)	21.3 (19.4-23.4)	17.8 (16.5-19.2)
Associate's or bachelor's degree	28.2 (27.0-29.4)	34.3 (32.2-36.6)	24.8 (23.4-26.3)
Master's, doctoral, or professional degree	6.9 (6.31-7.53)	10.3 (9.11-11.6)	5.0 (4.42-5.71)
Family income in $1000s, mean (95% CI)	63.1 (61.6-64.6)	71.7 (69.0-74.3)	58.4 (56.7-60.2)	<0.001
Proximity to metro area, % (95% CI)				<0.001
Nonmetropolitan	19.9 (18.8-21.0)	16.1 (14.5-17.9)	22.0 (20.6-23.5)
Medium and small metro	33.9 (32.6-35.3)	33.1 (30.9-35.3)	34.4 (32.8-36.1)
Large fringe metro	21.7 (20.5-22.9)	23.6 (21.6-25.8)	20.6 (19.2-22.1)
Large central metro	24.4 (23.2-25.7)	27.2 (25.1-29.4)	22.9 (21.4-24.5)
Number of pain locations, mean (95% CI)	3.1 (3.1-3.1)	3.4 (3.3-3.4)	3.0 (2.9-3.0)	<0.001
Pain limiting life/work, % (95% CI)				<0.001
Never	28.2 (27.0-29.5)	22.9 (21.1-24.9)	31.1 (29.5-32.8)
Some days	35.6 (34.2-36.9)	38.6 (36.2-40.9)	33.9 (32.2-35.6)
Most days	15.4 (14.4-16.5)	17.1 (15.2-19.0)	14.5 (13.3-15.8)
Every day	20.8 (19.7-22.0)	21.5 (19.6-23.5)	20.5 (19.1-21.9)
Opioid use in past 3 months, % (95% CI)				0.026
Never	84.8 (83.7-85.8)	85.5 (83.7-87.2)	84.4 (83.1-85.7)
Some days	4.7 (4.12-5.38)	5.1 (3.97-6.45)	4.5 (3.86-5.27)
Most days	1.8 (1.47-2.19)	2.2 (1.63-2.91)	1.6 (1.20-2.09)
Every day	8.7 (7.91-9.53)	7.2 (6.12-8.49)	9.5 (8.47-10.6)
PHQ-8 total, mean (95% CI)	5.3 (5.1-5.4)	5.7 (5.4-6.0)	5.0 (4.8-5.2)	<0.001
¹ Mean; %
² CI = Confidence Interval
³ Wilcoxon rank-sum test for complex survey samples; chi-squared test with Rao & Scott’s second-order correction

6.1.4 Assemble Table 1 Tibble

Code

table1_cross <- table1_cross |> as.data.frame() |> as_tibble()

table1_cross[is.na(table1_cross)] <- ""

table1_cross$p.value <- sub("0\\.",".", table1_cross$p.value)
table1_cross$p.value <- sub(" ","", table1_cross$p.value)

table1 <- bind_rows(total_n, table1_cross) |>
  rename("No IHM" = NoIHM,
         "All with chronic pain" = All,
         "Weighted variable" = Variable,
         "P value" = p.value)

6.1.5 Print Table 1

I used flextable to format Table 1.

Code

table1 |> flextable() |>
  add_header_lines(values = "Table 1: Socio-Demographic, Pain, and Mental Health Characteristics by IHM Engagement") |>
  bold(bold = TRUE, part = "header") |>
  bold(i = c(1, 2, 3, 6, 11, 17, 18, 23:24, 29, 34), j = 1,  bold = TRUE) |>
  padding(i = c(4:5, 7:10, 12:16, 19:22, 25:28, 30:33), j = 1, padding.left = 20) |>
  add_footer_lines(values = "Abbreviations: CI, confidence interval; PHQ-8, Patient Health Questionnaire-8; GED, general equivalency diploma; IHM, integrative health and medicine; metro, metropolitan; NH, Non-Hispanic",
                   top = FALSE) |>
  flextable::footnote(i = c(2, 17, 23, 34), j = 5, 
                      ref_symbols = "a",
                      value = as_paragraph("Wilcoxon rank-sum test for complex survey samples")) |>
  flextable::footnote(i = c(3, 6, 11, 18, 24, 29), 
                      j = 5, 
                      ref_symbols = "b",
                      value = as_paragraph("Chi-squared test with Rao & Scott’s second-order correction")) |>
  autofit(part = c("body", "header"))

Table 1: Socio-Demographic, Pain, and Mental Health Characteristics by IHM Engagement
Weighted variable	All with chronic pain	IHM	No IHM	P value
Total, n	49,695,212	17,610,400	32,084,812
Age in years, mean (95% CI)	55.4 (54.9-55.9)	51.8 (51.0-52.6)	57.3 (56.7-57.9)	<.001a
Sex, % (95% CI)				<.001b
Female	54.9 (53.5-56.3)	60.3 (57.9-62.6)	52.0 (50.2-53.7)
Male	45.1 (43.7-46.5)	39.7 (37.4-42.1)	48.0 (46.3-49.8)
Race/Ethnicity, % (95% CI)				.005b
NH White only	73.2 (71.9-74.5)	74.8 (72.7-76.8)	72.3 (70.6-73.9)
Hispanic	10.5 (9.59-11.5)	10.1 (8.79-11.6)	10.7 (9.52-12.0)
NH Black/African American only	11.0 (10.1-11.9)	9.0 (7.73-10.5)	12.1 (11.0-13.3)
Other single and multiple races	5.3 (4.66-6.03)	6.1 (5.01-7.33)	4.9 (4.11-5.81)
Education, % (95% CI)				<.001b
Less than high school	15.5 (14.4-16.7)	8.8 (7.45-10.4)	19.2 (17.7-20.7)
High school graduate/GED	30.4 (29.1-31.7)	25.2 (23.1-27.5)	33.2 (31.6-34.9)
Some college, no degree	19.1 (18.0-20.2)	21.3 (19.4-23.4)	17.8 (16.5-19.2)
Associate's or bachelor's degree	28.2 (27.0-29.4)	34.3 (32.2-36.6)	24.8 (23.4-26.3)
Master's, doctoral, or professional degree	6.9 (6.31-7.53)	10.3 (9.11-11.6)	5.0 (4.42-5.71)
Family income in $1000s, mean (95% CI)	63.1 (61.6-64.6)	71.7 (69.0-74.3)	58.4 (56.7-60.2)	<.001a
Proximity to metro area, % (95% CI)				<.001b
Nonmetropolitan	19.9 (18.8-21.0)	16.1 (14.5-17.9)	22.0 (20.6-23.5)
Medium and small metro	33.9 (32.6-35.3)	33.1 (30.9-35.3)	34.4 (32.8-36.1)
Large fringe metro	21.7 (20.5-22.9)	23.6 (21.6-25.8)	20.6 (19.2-22.1)
Large central metro	24.4 (23.2-25.7)	27.2 (25.1-29.4)	22.9 (21.4-24.5)
Number of pain locations, mean (95% CI)	3.1 (3.1-3.1)	3.4 (3.3-3.4)	3.0 (2.9-3.0)	<.001a
Pain limiting life/work, % (95% CI)				<.001b
Never	28.2 (27.0-29.5)	22.9 (21.1-24.9)	31.1 (29.5-32.8)
Some days	35.6 (34.2-36.9)	38.6 (36.2-40.9)	33.9 (32.2-35.6)
Most days	15.4 (14.4-16.5)	17.1 (15.2-19.0)	14.5 (13.3-15.8)
Every day	20.8 (19.7-22.0)	21.5 (19.6-23.5)	20.5 (19.1-21.9)
Opioid use in past 3 months, % (95% CI)				.026b
Never	84.8 (83.7-85.8)	85.5 (83.7-87.2)	84.4 (83.1-85.7)
Some days	4.7 (4.12-5.38)	5.1 (3.97-6.45)	4.5 (3.86-5.27)
Most days	1.8 (1.47-2.19)	2.2 (1.63-2.91)	1.6 (1.20-2.09)
Every day	8.7 (7.91-9.53)	7.2 (6.12-8.49)	9.5 (8.47-10.6)
PHQ-8 total, mean (95% CI)	5.3 (5.1-5.4)	5.7 (5.4-6.0)	5.0 (4.8-5.2)	<.001a
Abbreviations: CI, confidence interval; PHQ-8, Patient Health Questionnaire-8; GED, general equivalency diploma; IHM, integrative health and medicine; metro, metropolitan; NH, Non-Hispanic
aWilcoxon rank-sum test for complex survey samples
bChi-squared test with Rao & Scott’s second-order correction

6.2 Splitting into Training and Testing Samples

I then used the initial_split(), training(), and testing() functions to generate projectb_si_train (training data) and projectb_si_test (testing data), each with n = 3557 subjects.

Code

set.seed(19900317)

projectb_split <- initial_split(projectb_si, prop = 0.5)

projectb_si_train <- training(projectb_split)
projectb_si_test <- testing(projectb_split)

nrow(projectb_si_train)

[1] 3557

Code

nrow(projectb_si_test)

[1] 3557

6.3 Scatterplot Matrix and Collinearity

6.3.1 Scatterplot Matrix of Quantitative Predictors with Outcome

To investigate issues of collinearity prior to fitting models, I generated a series of scatterplot matrices using ggpairs(). First I assessed the relationship between the quantitative outcomes and ihm4pain_num.

Code

temp1quant <- projectb_si_train |> 
  select(age10, phq8total5unit, num_pain_locations, fam_income10k, ihm4pain_num)

ggpairs(temp1quant, 
        title = "Scatterplot Matrix of Quantitative Variables with Outcome",
        lower = list(combo = wrap("facethist", bins = 20)))

Code

rm(temp1quant)

In examining the plot above, there do not appear to be any strong (i.e., > 0.6) correlations between quantitative predictors and the outcome.

6.3.2 Scatterplot Matrix of Categorical Variables with Outcome

Then, I looked at the relationships between three of the categorical predictors and ihm4pain_num using ggpairs().

Code

temp2cat1 <- projectb_si_train |> 
  select(sex, education, urban_rural, ihm4pain_num)

ggpairs(temp2cat1, 
        title = "Scatterplot Matrix of Categorical Variables with Outcome",
        lower = list(combo = wrap("facethist", bins = 20)))

Code

rm(temp2cat1)

6.3.3 Scatterplot Matrix of Remaining Categorical Variables with Outcome

Finally, I looked at the relationships between the two remaining categorical predictors and ihm4pain_num using ggpairs().

Code

temp2cat2 <- projectb_si_train |> 
  select(painlimit_lifework, opioid_freq3mo, race_ethnicity, ihm4pain_num)

ggpairs(temp2cat2, 
        title = "Scatterplot Matrix of Categorical Variables with Outcome",
        lower = list(combo = wrap("facethist", bins = 20)))

Code

rm(temp2cat2)

To investigate this further, I calculated variance inflation factors (VIF) for each variable using the vif() function. Given the recommendation to look closely at any VIF value that is greater than 5, there does not seem to be an issue with collinearity in this full model as all VIF values are less than 1.4.

Code

vif(lm(ihm4pain_num ~ age10 + sex + education +
                     urban_rural + painlimit_lifework + opioid_freq3mo +
                     phq8total5unit + num_pain_locations + fam_income10k + race_ethnicity, 
                   data = projectb_si_train))

                       GVIF Df GVIF^(1/(2*Df))
age10              1.141259  1        1.068297
sex                1.044327  1        1.021923
education          1.282077  4        1.031548
urban_rural        1.129965  3        1.020573
painlimit_lifework 1.294856  3        1.044007
opioid_freq3mo     1.093742  3        1.015046
phq8total5unit     1.396814  1        1.181869
num_pain_locations 1.242368  1        1.114616
fam_income10k      1.305455  1        1.142565
race_ethnicity     1.179024  3        1.027828

6.4 Considering Non-Linear Terms

To determine how best to potentially spend degrees of freedom on non-linear terms, I generated a Spearman $\rho^2$ plot using the plot() and spearman2()functions.

Code

plot(spearman2(ihm4pain_num ~ age10 + sex + education +
                 urban_rural + painlimit_lifework + opioid_freq3mo +
                 phq8total5unit + num_pain_locations + fam_income10k + race_ethnicity, 
               data = projectb_si_train))

The plot suggests that I consider, for example, an interaction between education and age10 as a non-linear term. However, given that 1) this model already contains 21 degrees of freedom, I opted not to include any non-linear terms.

6.5 Fitting Models for Comparison

I chose to compare the following 6 modeling approaches for m1 before choosing a final version.

Poisson
Negative binomial regression
Zero-inflated Poisson
Zero-inflated negative binomial
Hurdle with Poisson
Hurdle with negative binomial

In the following series of code chunks, I performed the following

Fit each model using the appropriate function
Generated a summary of fit statistics including $R^2$, RMSE, MAE, AIC, and BIC using various functions including augment(), glance(), metric_set(), and predict().
Used functions including mutate(), group_by(), and summarize() to pivot the fit statistics from each model into a tidy tibble for combination later.

6.5.1 Poisson (`m1_pois`)

Code

m1_pois <- glm(ihm4pain_num ~ age10 + sex + education +
                 urban_rural + painlimit_lifework + opioid_freq3mo +
                 phq8total5unit + num_pain_locations + fam_income10k + race_ethnicity, 
                 data = projectb_si_train, family = "poisson")

m1_pois_aug <- augment(m1_pois, projectb_si_train, type.predict = "response")
m1_pois_glance <- glance(m1_pois)

mets <- metric_set(rsq, rmse, mae)

m1_pois_summary <-
  mets(m1_pois_aug, truth = ihm4pain_num, estimate = .fitted) |>
  mutate(Model = "m1_pois",
         R2 = if_else(.metric == "rsq", .estimate, NA),
         RMSE = if_else(.metric == "rmse", .estimate, NA),
         MAE = if_else(.metric == "mae", .estimate, NA)) |> relocate(Model)

m1_pois_summary <- m1_pois_summary |>
  group_by(Model) |>
  summarize(R2 = mean(R2, na.rm = TRUE),
                   RMSE = mean(RMSE, na.rm = TRUE),
                   MAE = mean(MAE, na.rm = TRUE))

m1_pois_summary <- bind_cols(m1_pois_summary, m1_pois_glance) |>
  select(-c(logLik))
summary(m1_pois)


Call:
glm(formula = ihm4pain_num ~ age10 + sex + education + urban_rural + 
    painlimit_lifework + opioid_freq3mo + phq8total5unit + num_pain_locations + 
    fam_income10k + race_ethnicity, family = "poisson", data = projectb_si_train)

Coefficients:
                                                     Estimate Std. Error
(Intercept)                                         -1.183452   0.162138
age10                                               -0.131602   0.014550
sexMale                                             -0.275074   0.047688
educationHigh school graduate/GED                    0.273482   0.105837
educationSome college, no degree                     0.656055   0.105739
educationAssociate's or bachelor's degree            0.812995   0.101206
educationMaster's, doctoral, or professional degree  0.960610   0.117198
urban_ruralMedium and small metro                    0.023538   0.068379
urban_ruralLarge fringe metro                        0.064347   0.075702
urban_ruralLarge central metro                       0.251440   0.071819
painlimit_lifeworkSome days                          0.205155   0.060383
painlimit_lifeworkMost days                          0.256226   0.074951
painlimit_lifeworkEvery day                          0.260852   0.075321
opioid_freq3moSome days                              0.036669   0.099685
opioid_freq3moMost days                              0.072801   0.166465
opioid_freq3moEvery day                             -0.182815   0.091255
phq8total5unit                                      -0.029353   0.024111
num_pain_locations                                   0.147973   0.018430
fam_income10k                                        0.027594   0.004528
race_ethnicityHispanic                              -0.008236   0.082486
race_ethnicityNH Black/African American only        -0.203382   0.084115
race_ethnicityOther single and multiple races       -0.180839   0.107077
                                                    z value Pr(>|z|)    
(Intercept)                                          -7.299 2.90e-13 ***
age10                                                -9.045  < 2e-16 ***
sexMale                                              -5.768 8.01e-09 ***
educationHigh school graduate/GED                     2.584 0.009766 ** 
educationSome college, no degree                      6.204 5.49e-10 ***
educationAssociate's or bachelor's degree             8.033 9.51e-16 ***
educationMaster's, doctoral, or professional degree   8.196 2.48e-16 ***
urban_ruralMedium and small metro                     0.344 0.730672    
urban_ruralLarge fringe metro                         0.850 0.395324    
urban_ruralLarge central metro                        3.501 0.000464 ***
painlimit_lifeworkSome days                           3.398 0.000680 ***
painlimit_lifeworkMost days                           3.419 0.000629 ***
painlimit_lifeworkEvery day                           3.463 0.000534 ***
opioid_freq3moSome days                               0.368 0.712985    
opioid_freq3moMost days                               0.437 0.661868    
opioid_freq3moEvery day                              -2.003 0.045140 *  
phq8total5unit                                       -1.217 0.223444    
num_pain_locations                                    8.029 9.82e-16 ***
fam_income10k                                         6.094 1.10e-09 ***
race_ethnicityHispanic                               -0.100 0.920464    
race_ethnicityNH Black/African American only         -2.418 0.015610 *  
race_ethnicityOther single and multiple races        -1.689 0.091246 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 4380.5  on 3556  degrees of freedom
Residual deviance: 3834.2  on 3535  degrees of freedom
AIC: 6841.7

Number of Fisher Scoring iterations: 6

6.5.2 Negative Binomial Regression (`m1_nb`)

Code

m1_nb <- glm.nb(ihm4pain_num ~ age10 + sex + education +
                 urban_rural + painlimit_lifework + opioid_freq3mo +
                 phq8total5unit + num_pain_locations + fam_income10k + race_ethnicity, 
               data = projectb_si_train)

m1_nb_aug <- augment(m1_nb, projectb_si_train, type.predict = "response")
m1_nb_glance <- glance(m1_nb)

m1_nb_summary <-
  mets(m1_nb_aug, truth = ihm4pain_num, estimate = .fitted) |>
  mutate(Model = "m1_nb",
         R2 = if_else(.metric == "rsq", .estimate, NA),
         RMSE = if_else(.metric == "rmse", .estimate, NA),
         MAE = if_else(.metric == "mae", .estimate, NA)) |> relocate(Model)

m1_nb_summary <- m1_nb_summary |>
  group_by(Model) |>
  summarize(R2 = mean(R2, na.rm = TRUE),
                   RMSE = mean(RMSE, na.rm = TRUE),
                   MAE = mean(MAE, na.rm = TRUE))

m1_nb_summary <- bind_cols(m1_nb_summary, m1_nb_glance) |>
  select(-c(logLik))
summary(m1_nb)


Call:
glm.nb(formula = ihm4pain_num ~ age10 + sex + education + urban_rural + 
    painlimit_lifework + opioid_freq3mo + phq8total5unit + num_pain_locations + 
    fam_income10k + race_ethnicity, data = projectb_si_train, 
    init.theta = 3.17461915, link = log)

Coefficients:
                                                      Estimate Std. Error
(Intercept)                                         -1.1668329  0.1757511
age10                                               -0.1340332  0.0160376
sexMale                                             -0.2773424  0.0523129
educationHigh school graduate/GED                    0.2712388  0.1116211
educationSome college, no degree                     0.6551017  0.1123832
educationAssociate's or bachelor's degree            0.8073798  0.1074271
educationMaster's, doctoral, or professional degree  0.9724037  0.1264575
urban_ruralMedium and small metro                    0.0128675  0.0743288
urban_ruralLarge fringe metro                        0.0597904  0.0826633
urban_ruralLarge central metro                       0.2508966  0.0786869
painlimit_lifeworkSome days                          0.2030962  0.0661423
painlimit_lifeworkMost days                          0.2544446  0.0825732
painlimit_lifeworkEvery day                          0.2508763  0.0825099
opioid_freq3moSome days                              0.0368888  0.1107997
opioid_freq3moMost days                              0.0825150  0.1823585
opioid_freq3moEvery day                             -0.1805531  0.0989803
phq8total5unit                                      -0.0297361  0.0266371
num_pain_locations                                   0.1490524  0.0203585
fam_income10k                                        0.0280486  0.0050948
race_ethnicityHispanic                               0.0009119  0.0910714
race_ethnicityNH Black/African American only        -0.2054487  0.0914048
race_ethnicityOther single and multiple races       -0.1778063  0.1179438
                                                    z value Pr(>|z|)    
(Intercept)                                          -6.639 3.16e-11 ***
age10                                                -8.357  < 2e-16 ***
sexMale                                              -5.302 1.15e-07 ***
educationHigh school graduate/GED                     2.430  0.01510 *  
educationSome college, no degree                      5.829 5.57e-09 ***
educationAssociate's or bachelor's degree             7.516 5.66e-14 ***
educationMaster's, doctoral, or professional degree   7.690 1.48e-14 ***
urban_ruralMedium and small metro                     0.173  0.86256    
urban_ruralLarge fringe metro                         0.723  0.46950    
urban_ruralLarge central metro                        3.189  0.00143 ** 
painlimit_lifeworkSome days                           3.071  0.00214 ** 
painlimit_lifeworkMost days                           3.081  0.00206 ** 
painlimit_lifeworkEvery day                           3.041  0.00236 ** 
opioid_freq3moSome days                               0.333  0.73919    
opioid_freq3moMost days                               0.452  0.65092    
opioid_freq3moEvery day                              -1.824  0.06813 .  
phq8total5unit                                       -1.116  0.26428    
num_pain_locations                                    7.321 2.45e-13 ***
fam_income10k                                         5.505 3.68e-08 ***
race_ethnicityHispanic                                0.010  0.99201    
race_ethnicityNH Black/African American only         -2.248  0.02460 *  
race_ethnicityOther single and multiple races        -1.508  0.13167    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Negative Binomial(3.1746) family taken to be 1)

    Null deviance: 3741.6  on 3556  degrees of freedom
Residual deviance: 3278.2  on 3535  degrees of freedom
AIC: 6799.9

Number of Fisher Scoring iterations: 1

              Theta:  3.175 
          Std. Err.:  0.576 

 2 x log-likelihood:  -6753.854

6.5.3 Zero-inflated Poisson (`m1_zip`)

Code

m1_zip <- zeroinfl(ihm4pain_num ~ age10 + sex + education +
                 urban_rural + painlimit_lifework + opioid_freq3mo +
                 phq8total5unit + num_pain_locations + fam_income10k + race_ethnicity, 
                 data = projectb_si_train)

m1_zip_aug <- projectb_si_train |>
    mutate(".fitted" = predict(m1_zip, type = "response"),
           ".resid" = resid(m1_zip, type = "response"))

m1_zip_summary <-
  mets(m1_zip_aug, truth = ihm4pain_num, estimate = .fitted) |>
  mutate(Model = "m1_zip",
         R2 = if_else(.metric == "rsq", .estimate, NA),
         RMSE = if_else(.metric == "rmse", .estimate, NA),
         MAE = if_else(.metric == "mae", .estimate, NA)) |> relocate(Model)

m1_zip_summary <- m1_zip_summary |>
  group_by(Model) |>
  summarize(R2 = mean(R2, na.rm = TRUE),
                   RMSE = mean(RMSE, na.rm = TRUE),
                   MAE = mean(MAE, na.rm = TRUE))

m1_zip_summary <- m1_zip_summary |>
  mutate(AIC = AIC(m1_zip),
         BIC = BIC(m1_zip))
summary(m1_zip)


Call:
zeroinfl(formula = ihm4pain_num ~ age10 + sex + education + urban_rural + 
    painlimit_lifework + opioid_freq3mo + phq8total5unit + num_pain_locations + 
    fam_income10k + race_ethnicity, data = projectb_si_train)

Pearson residuals:
    Min      1Q  Median      3Q     Max 
-1.3529 -0.6426 -0.4657  0.5057  7.1811 

Count model coefficients (poisson with log link):
                                                     Estimate Std. Error
(Intercept)                                         -0.415196   0.261037
age10                                               -0.135764   0.024135
sexMale                                             -0.210056   0.077075
educationHigh school graduate/GED                    0.282405   0.167720
educationSome college, no degree                     0.635911   0.165162
educationAssociate's or bachelor's degree            0.619413   0.160655
educationMaster's, doctoral, or professional degree  0.761770   0.176987
urban_ruralMedium and small metro                    0.012747   0.111366
urban_ruralLarge fringe metro                        0.063042   0.119672
urban_ruralLarge central metro                       0.186333   0.115089
painlimit_lifeworkSome days                          0.204220   0.098190
painlimit_lifeworkMost days                          0.319729   0.112174
painlimit_lifeworkEvery day                          0.201525   0.119923
opioid_freq3moSome days                              0.040076   0.141439
opioid_freq3moMost days                              0.221426   0.248309
opioid_freq3moEvery day                             -0.075506   0.156667
phq8total5unit                                      -0.056037   0.038500
num_pain_locations                                   0.082879   0.026625
fam_income10k                                        0.017314   0.006237
race_ethnicityHispanic                              -0.359617   0.095201
race_ethnicityNH Black/African American only        -0.110367   0.145961
race_ethnicityOther single and multiple races       -0.217057   0.206899
                                                    z value Pr(>|z|)    
(Intercept)                                          -1.591 0.111707    
age10                                                -5.625 1.85e-08 ***
sexMale                                              -2.725 0.006423 ** 
educationHigh school graduate/GED                     1.684 0.092223 .  
educationSome college, no degree                      3.850 0.000118 ***
educationAssociate's or bachelor's degree             3.856 0.000115 ***
educationMaster's, doctoral, or professional degree   4.304 1.68e-05 ***
urban_ruralMedium and small metro                     0.114 0.908874    
urban_ruralLarge fringe metro                         0.527 0.598340    
urban_ruralLarge central metro                        1.619 0.105441    
painlimit_lifeworkSome days                           2.080 0.037541 *  
painlimit_lifeworkMost days                           2.850 0.004368 ** 
painlimit_lifeworkEvery day                           1.680 0.092869 .  
opioid_freq3moSome days                               0.283 0.776915    
opioid_freq3moMost days                               0.892 0.372536    
opioid_freq3moEvery day                              -0.482 0.629839    
phq8total5unit                                       -1.455 0.145531    
num_pain_locations                                    3.113 0.001853 ** 
fam_income10k                                         2.776 0.005501 ** 
race_ethnicityHispanic                               -3.777 0.000158 ***
race_ethnicityNH Black/African American only         -0.756 0.449562    
race_ethnicityOther single and multiple races        -1.049 0.294133    

Zero-inflation model coefficients (binomial with logit link):
                                                      Estimate Std. Error
(Intercept)                                          6.712e-01  7.979e-01
age10                                               -4.693e-03  7.769e-02
sexMale                                              2.826e-01  2.410e-01
educationHigh school graduate/GED                   -2.112e-02  4.783e-01
educationSome college, no degree                    -9.915e-02  4.802e-01
educationAssociate's or bachelor's degree           -7.535e-01  5.073e-01
educationMaster's, doctoral, or professional degree -8.290e-01  5.714e-01
urban_ruralMedium and small metro                   -1.014e-02  3.239e-01
urban_ruralLarge fringe metro                       -7.261e-04  3.504e-01
urban_ruralLarge central metro                      -2.918e-01  3.640e-01
painlimit_lifeworkSome days                          8.064e-03  3.156e-01
painlimit_lifeworkMost days                          2.778e-01  3.413e-01
painlimit_lifeworkEvery day                         -1.917e-01  3.966e-01
opioid_freq3moSome days                              1.341e-02  4.731e-01
opioid_freq3moMost days                              3.755e-01  6.382e-01
opioid_freq3moEvery day                              3.516e-01  4.513e-01
phq8total5unit                                      -1.027e-01  1.431e-01
num_pain_locations                                  -2.714e-01  8.580e-02
fam_income10k                                       -4.744e-02  2.318e-02
race_ethnicityHispanic                              -1.585e+01  1.817e+03
race_ethnicityNH Black/African American only         2.591e-01  3.857e-01
race_ethnicityOther single and multiple races       -1.622e-01  8.016e-01
                                                    z value Pr(>|z|)   
(Intercept)                                           0.841  0.40024   
age10                                                -0.060  0.95184   
sexMale                                               1.173  0.24089   
educationHigh school graduate/GED                    -0.044  0.96478   
educationSome college, no degree                     -0.206  0.83644   
educationAssociate's or bachelor's degree            -1.485  0.13749   
educationMaster's, doctoral, or professional degree  -1.451  0.14687   
urban_ruralMedium and small metro                    -0.031  0.97502   
urban_ruralLarge fringe metro                        -0.002  0.99835   
urban_ruralLarge central metro                       -0.802  0.42270   
painlimit_lifeworkSome days                           0.026  0.97961   
painlimit_lifeworkMost days                           0.814  0.41575   
painlimit_lifeworkEvery day                          -0.483  0.62889   
opioid_freq3moSome days                               0.028  0.97738   
opioid_freq3moMost days                               0.588  0.55624   
opioid_freq3moEvery day                               0.779  0.43597   
phq8total5unit                                       -0.718  0.47280   
num_pain_locations                                   -3.163  0.00156 **
fam_income10k                                        -2.046  0.04073 * 
race_ethnicityHispanic                               -0.009  0.99304   
race_ethnicityNH Black/African American only          0.672  0.50171   
race_ethnicityOther single and multiple races        -0.202  0.83960   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Number of iterations in BFGS optimization: 86 
Log-likelihood: -3347 on 44 Df

6.5.4 Zero-inflated Negative Binomial (`m1_zinb`)

Code

m1_zinb <- zeroinfl(ihm4pain_num ~ age10 + sex + education +
                          urban_rural + painlimit_lifework + opioid_freq3mo +
                          phq8total5unit + num_pain_locations + fam_income10k + race_ethnicity,
                        dist = "negbin", data = projectb_si_train)

m1_zinb_aug <- projectb_si_train |>
    mutate(".fitted" = predict(m1_zinb, type = "response"),
           ".resid" = resid(m1_zinb, type = "response"))

m1_zinb_summary <-
  mets(m1_zinb_aug, truth = ihm4pain_num, estimate = .fitted) |>
  mutate(Model = "m1_zinb",
         R2 = if_else(.metric == "rsq", .estimate, NA),
         RMSE = if_else(.metric == "rmse", .estimate, NA),
         MAE = if_else(.metric == "mae", .estimate, NA)) |> relocate(Model)

m1_zinb_summary <- m1_zinb_summary |>
  group_by(Model) |>
  summarize(R2 = mean(R2, na.rm = TRUE),
                   RMSE = mean(RMSE, na.rm = TRUE),
                   MAE = mean(MAE, na.rm = TRUE))

m1_zinb_summary <- m1_zinb_summary |>
  mutate(AIC = AIC(m1_zinb),
         BIC = BIC(m1_zinb))
summary(m1_zinb)


Call:
zeroinfl(formula = ihm4pain_num ~ age10 + sex + education + urban_rural + 
    painlimit_lifework + opioid_freq3mo + phq8total5unit + num_pain_locations + 
    fam_income10k + race_ethnicity, data = projectb_si_train, dist = "negbin")

Pearson residuals:
    Min      1Q  Median      3Q     Max 
-1.3529 -0.6426 -0.4657  0.5057  7.1811 

Count model coefficients (negbin with log link):
                                                     Estimate Std. Error
(Intercept)                                         -0.415181   0.261036
age10                                               -0.135766   0.024135
sexMale                                             -0.210059   0.077075
educationHigh school graduate/GED                    0.282398   0.167720
educationSome college, no degree                     0.635914   0.165161
educationAssociate's or bachelor's degree            0.619416   0.160655
educationMaster's, doctoral, or professional degree  0.761771   0.176987
urban_ruralMedium and small metro                    0.012739   0.111366
urban_ruralLarge fringe metro                        0.063033   0.119671
urban_ruralLarge central metro                       0.186324   0.115088
painlimit_lifeworkSome days                          0.204214   0.098190
painlimit_lifeworkMost days                          0.319721   0.112174
painlimit_lifeworkEvery day                          0.201516   0.119922
opioid_freq3moSome days                              0.040082   0.141439
opioid_freq3moMost days                              0.221426   0.248310
opioid_freq3moEvery day                             -0.075485   0.156665
phq8total5unit                                      -0.056035   0.038500
num_pain_locations                                   0.082879   0.026625
fam_income10k                                        0.017315   0.006237
race_ethnicityHispanic                              -0.359612   0.095200
race_ethnicityNH Black/African American only        -0.110358   0.145960
race_ethnicityOther single and multiple races       -0.217029   0.206893
Log(theta)                                          15.442114  25.722300
                                                    z value Pr(>|z|)    
(Intercept)                                          -1.591 0.111719    
age10                                                -5.625 1.85e-08 ***
sexMale                                              -2.725 0.006422 ** 
educationHigh school graduate/GED                     1.684 0.092230 .  
educationSome college, no degree                      3.850 0.000118 ***
educationAssociate's or bachelor's degree             3.856 0.000115 ***
educationMaster's, doctoral, or professional degree   4.304 1.68e-05 ***
urban_ruralMedium and small metro                     0.114 0.908932    
urban_ruralLarge fringe metro                         0.527 0.598391    
urban_ruralLarge central metro                        1.619 0.105455    
painlimit_lifeworkSome days                           2.080 0.037545 *  
painlimit_lifeworkMost days                           2.850 0.004369 ** 
painlimit_lifeworkEvery day                           1.680 0.092881 .  
opioid_freq3moSome days                               0.283 0.776883    
opioid_freq3moMost days                               0.892 0.372537    
opioid_freq3moEvery day                              -0.482 0.629929    
phq8total5unit                                       -1.455 0.145539    
num_pain_locations                                    3.113 0.001853 ** 
fam_income10k                                         2.776 0.005497 ** 
race_ethnicityHispanic                               -3.777 0.000158 ***
race_ethnicityNH Black/African American only         -0.756 0.449599    
race_ethnicityOther single and multiple races        -1.049 0.294182    
Log(theta)                                            0.600 0.548280    

Zero-inflation model coefficients (binomial with logit link):
                                                      Estimate Std. Error
(Intercept)                                          6.712e-01  7.979e-01
age10                                               -4.698e-03  7.769e-02
sexMale                                              2.826e-01  2.410e-01
educationHigh school graduate/GED                   -2.113e-02  4.783e-01
educationSome college, no degree                    -9.913e-02  4.802e-01
educationAssociate's or bachelor's degree           -7.535e-01  5.073e-01
educationMaster's, doctoral, or professional degree -8.290e-01  5.714e-01
urban_ruralMedium and small metro                   -1.017e-02  3.239e-01
urban_ruralLarge fringe metro                       -7.511e-04  3.504e-01
urban_ruralLarge central metro                      -2.918e-01  3.640e-01
painlimit_lifeworkSome days                          8.048e-03  3.156e-01
painlimit_lifeworkMost days                          2.778e-01  3.413e-01
painlimit_lifeworkEvery day                         -1.917e-01  3.966e-01
opioid_freq3moSome days                              1.343e-02  4.731e-01
opioid_freq3moMost days                              3.755e-01  6.382e-01
opioid_freq3moEvery day                              3.516e-01  4.513e-01
phq8total5unit                                      -1.027e-01  1.431e-01
num_pain_locations                                  -2.714e-01  8.580e-02
fam_income10k                                       -4.744e-02  2.318e-02
race_ethnicityHispanic                              -1.685e+01  2.986e+03
race_ethnicityNH Black/African American only         2.591e-01  3.857e-01
race_ethnicityOther single and multiple races       -1.622e-01  8.015e-01
                                                    z value Pr(>|z|)   
(Intercept)                                           0.841  0.40021   
age10                                                -0.060  0.95178   
sexMale                                               1.173  0.24091   
educationHigh school graduate/GED                    -0.044  0.96477   
educationSome college, no degree                     -0.206  0.83645   
educationAssociate's or bachelor's degree            -1.485  0.13750   
educationMaster's, doctoral, or professional degree  -1.451  0.14687   
urban_ruralMedium and small metro                    -0.031  0.97496   
urban_ruralLarge fringe metro                        -0.002  0.99829   
urban_ruralLarge central metro                       -0.802  0.42265   
painlimit_lifeworkSome days                           0.026  0.97965   
painlimit_lifeworkMost days                           0.814  0.41578   
painlimit_lifeworkEvery day                          -0.483  0.62884   
opioid_freq3moSome days                               0.028  0.97735   
opioid_freq3moMost days                               0.588  0.55624   
opioid_freq3moEvery day                               0.779  0.43589   
phq8total5unit                                       -0.718  0.47282   
num_pain_locations                                   -3.163  0.00156 **
fam_income10k                                        -2.046  0.04073 * 
race_ethnicityHispanic                               -0.006  0.99550   
race_ethnicityNH Black/African American only          0.672  0.50167   
race_ethnicityOther single and multiple races        -0.202  0.83968   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Theta = 5086566.3791 
Number of iterations in BFGS optimization: 86 
Log-likelihood: -3347 on 45 Df

6.5.5 Hurdle with Poisson (`m1_hurdle_pois`)

Code

m1_hurdle_pois <- hurdle(ihm4pain_num ~ age10 + sex + race_ethnicity + education +
                           fam_income10k + urban_rural + num_pain_locations +
                           painlimit_lifework + opioid_freq3mo + phq8total5unit,
                        dist = "poisson", zero.dist = "binomial", 
                        data = projectb_si_train)

m1_hurdle_pois_aug <- projectb_si_train |>
    mutate(".fitted" = predict(m1_hurdle_pois, type = "response"),
           ".resid" = resid(m1_hurdle_pois, type = "response"))

m1_hurdle_pois_summary <-
  mets(m1_hurdle_pois_aug, truth = ihm4pain_num, estimate = .fitted) |>
  mutate(Model = "m1_hurdle_pois",
         R2 = if_else(.metric == "rsq", .estimate, NA),
         RMSE = if_else(.metric == "rmse", .estimate, NA),
         MAE = if_else(.metric == "mae", .estimate, NA)) |> relocate(Model)

m1_hurdle_pois_summary <- m1_hurdle_pois_summary |>
  group_by(Model) |>
  summarize(R2 = mean(R2, na.rm = TRUE),
                   RMSE = mean(RMSE, na.rm = TRUE),
                   MAE = mean(MAE, na.rm = TRUE))

m1_hurdle_pois_summary <- m1_hurdle_pois_summary |>
  mutate(AIC = AIC(m1_hurdle_pois),
         BIC = BIC(m1_hurdle_pois))
summary(m1_hurdle_pois)


Call:
hurdle(formula = ihm4pain_num ~ age10 + sex + race_ethnicity + education + 
    fam_income10k + urban_rural + num_pain_locations + painlimit_lifework + 
    opioid_freq3mo + phq8total5unit, data = projectb_si_train, dist = "poisson", 
    zero.dist = "binomial")

Pearson residuals:
    Min      1Q  Median      3Q     Max 
-1.3352 -0.6384 -0.4592  0.5140  7.1628 

Count model coefficients (truncated poisson with log link):
                                                     Estimate Std. Error
(Intercept)                                         -0.045206   0.281619
age10                                               -0.143934   0.024059
sexMale                                             -0.178949   0.078531
race_ethnicityHispanic                              -0.220647   0.140983
race_ethnicityNH Black/African American only        -0.101348   0.141085
race_ethnicityOther single and multiple races       -0.098327   0.170501
educationHigh school graduate/GED                    0.054218   0.206801
educationSome college, no degree                     0.414146   0.200328
educationAssociate's or bachelor's degree            0.402233   0.194865
educationMaster's, doctoral, or professional degree  0.462490   0.213386
fam_income10k                                        0.014675   0.006929
urban_ruralMedium and small metro                    0.005048   0.114664
urban_ruralLarge fringe metro                       -0.009438   0.125995
urban_ruralLarge central metro                       0.148443   0.117913
num_pain_locations                                   0.079246   0.029779
painlimit_lifeworkSome days                          0.179310   0.099253
painlimit_lifeworkMost days                          0.248306   0.121008
painlimit_lifeworkEvery day                          0.148881   0.127017
opioid_freq3moSome days                              0.059283   0.158583
opioid_freq3moMost days                              0.113070   0.279708
opioid_freq3moEvery day                             -0.160421   0.164993
phq8total5unit                                      -0.085068   0.040396
                                                    z value Pr(>|z|)    
(Intercept)                                          -0.161  0.87247    
age10                                                -5.982  2.2e-09 ***
sexMale                                              -2.279  0.02268 *  
race_ethnicityHispanic                               -1.565  0.11757    
race_ethnicityNH Black/African American only         -0.718  0.47254    
race_ethnicityOther single and multiple races        -0.577  0.56415    
educationHigh school graduate/GED                     0.262  0.79319    
educationSome college, no degree                      2.067  0.03870 *  
educationAssociate's or bachelor's degree             2.064  0.03900 *  
educationMaster's, doctoral, or professional degree   2.167  0.03021 *  
fam_income10k                                         2.118  0.03418 *  
urban_ruralMedium and small metro                     0.044  0.96488    
urban_ruralLarge fringe metro                        -0.075  0.94029    
urban_ruralLarge central metro                        1.259  0.20806    
num_pain_locations                                    2.661  0.00779 ** 
painlimit_lifeworkSome days                           1.807  0.07083 .  
painlimit_lifeworkMost days                           2.052  0.04017 *  
painlimit_lifeworkEvery day                           1.172  0.24115    
opioid_freq3moSome days                               0.374  0.70853    
opioid_freq3moMost days                               0.404  0.68603    
opioid_freq3moEvery day                              -0.972  0.33091    
phq8total5unit                                       -2.106  0.03522 *  
Zero hurdle model coefficients (binomial with logit link):
                                                     Estimate Std. Error
(Intercept)                                         -1.479732   0.249322
age10                                               -0.142153   0.023765
sexMale                                             -0.375766   0.076124
race_ethnicityHispanic                               0.139327   0.135109
race_ethnicityNH Black/African American only        -0.272954   0.128870
race_ethnicityOther single and multiple races       -0.244884   0.174931
educationHigh school graduate/GED                    0.355758   0.142993
educationSome college, no degree                     0.759842   0.149059
educationAssociate's or bachelor's degree            1.039827   0.141786
educationMaster's, doctoral, or professional degree  1.328910   0.180035
fam_income10k                                        0.042015   0.008126
urban_ruralMedium and small metro                    0.027743   0.105044
urban_ruralLarge fringe metro                        0.114360   0.118742
urban_ruralLarge central metro                       0.346088   0.114916
num_pain_locations                                   0.212404   0.030285
painlimit_lifeworkSome days                          0.239884   0.096082
painlimit_lifeworkMost days                          0.275086   0.122223
painlimit_lifeworkEvery day                          0.325687   0.118433
opioid_freq3moSome days                              0.043253   0.167722
opioid_freq3moMost days                              0.107665   0.265744
opioid_freq3moEvery day                             -0.194347   0.137609
phq8total5unit                                      -0.001161   0.039121
                                                    z value Pr(>|z|)    
(Intercept)                                          -5.935 2.94e-09 ***
age10                                                -5.982 2.21e-09 ***
sexMale                                              -4.936 7.97e-07 ***
race_ethnicityHispanic                                1.031  0.30244    
race_ethnicityNH Black/African American only         -2.118  0.03417 *  
race_ethnicityOther single and multiple races        -1.400  0.16155    
educationHigh school graduate/GED                     2.488  0.01285 *  
educationSome college, no degree                      5.098 3.44e-07 ***
educationAssociate's or bachelor's degree             7.334 2.24e-13 ***
educationMaster's, doctoral, or professional degree   7.381 1.57e-13 ***
fam_income10k                                         5.170 2.34e-07 ***
urban_ruralMedium and small metro                     0.264  0.79170    
urban_ruralLarge fringe metro                         0.963  0.33550    
urban_ruralLarge central metro                        3.012  0.00260 ** 
num_pain_locations                                    7.013 2.33e-12 ***
painlimit_lifeworkSome days                           2.497  0.01254 *  
painlimit_lifeworkMost days                           2.251  0.02440 *  
painlimit_lifeworkEvery day                           2.750  0.00596 ** 
opioid_freq3moSome days                               0.258  0.79650    
opioid_freq3moMost days                               0.405  0.68537    
opioid_freq3moEvery day                              -1.412  0.15786    
phq8total5unit                                       -0.030  0.97632    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Number of iterations in BFGS optimization: 30 
Log-likelihood: -3346 on 44 Df

6.5.6 Hurdle with Negative Binomial (`m1_hurdle_nb`)

Code

m1_hurdle_nb <- hurdle(ihm4pain_num ~ age10 + sex + education +
                          urban_rural + painlimit_lifework + opioid_freq3mo +
                          phq8total5unit + num_pain_locations + fam_income10k + race_ethnicity,
                       dist = "negbin", zero.dist = "binomial", 
                       data = projectb_si_train)

m1_hurdle_nb_aug <- projectb_si_train |>
    mutate(".fitted" = predict(m1_hurdle_nb, type = "response"),
           ".resid" = resid(m1_hurdle_nb, type = "response"))

m1_hurdle_nb_summary <-
  mets(m1_hurdle_nb_aug, truth = ihm4pain_num, estimate = .fitted) |>
  mutate(Model = "m1_hurdle_nb",
         R2 = if_else(.metric == "rsq", .estimate, NA),
         RMSE = if_else(.metric == "rmse", .estimate, NA),
         MAE = if_else(.metric == "mae", .estimate, NA)) |> relocate(Model)

m1_hurdle_nb_summary <- m1_hurdle_nb_summary |>
  group_by(Model) |>
  summarize(R2 = mean(R2, na.rm = TRUE),
                   RMSE = mean(RMSE, na.rm = TRUE),
                   MAE = mean(MAE, na.rm = TRUE))

m1_hurdle_nb_summary <- m1_hurdle_nb_summary |>
  mutate(AIC = AIC(m1_hurdle_nb),
         BIC = BIC(m1_hurdle_nb))
summary(m1_hurdle_nb)


Call:
hurdle(formula = ihm4pain_num ~ age10 + sex + education + urban_rural + 
    painlimit_lifework + opioid_freq3mo + phq8total5unit + num_pain_locations + 
    fam_income10k + race_ethnicity, data = projectb_si_train, dist = "negbin", 
    zero.dist = "binomial")

Pearson residuals:
    Min      1Q  Median      3Q     Max 
-1.3352 -0.6384 -0.4592  0.5140  7.1628 

Count model coefficients (truncated negbin with log link):
                                                     Estimate Std. Error
(Intercept)                                         -0.045178   0.281617
age10                                               -0.143937   0.024059
sexMale                                             -0.178955   0.078531
educationHigh school graduate/GED                    0.054212   0.206800
educationSome college, no degree                     0.414166   0.200326
educationAssociate's or bachelor's degree            0.402225   0.194864
educationMaster's, doctoral, or professional degree  0.462490   0.213385
urban_ruralMedium and small metro                    0.005048   0.114663
urban_ruralLarge fringe metro                       -0.009442   0.125995
urban_ruralLarge central metro                       0.148444   0.117912
painlimit_lifeworkSome days                          0.179297   0.099253
painlimit_lifeworkMost days                          0.248299   0.121008
painlimit_lifeworkEvery day                          0.148878   0.127017
opioid_freq3moSome days                              0.059280   0.158583
opioid_freq3moMost days                              0.113059   0.279709
opioid_freq3moEvery day                             -0.160405   0.164991
phq8total5unit                                      -0.085068   0.040396
num_pain_locations                                   0.079248   0.029779
fam_income10k                                        0.014675   0.006929
race_ethnicityHispanic                              -0.220641   0.140982
race_ethnicityNH Black/African American only        -0.101337   0.141084
race_ethnicityOther single and multiple races       -0.098315   0.170499
Log(theta)                                          19.343411        NaN
                                                    z value Pr(>|z|)    
(Intercept)                                          -0.160  0.87255    
age10                                                -5.983  2.2e-09 ***
sexMale                                              -2.279  0.02268 *  
educationHigh school graduate/GED                     0.262  0.79321    
educationSome college, no degree                      2.067  0.03869 *  
educationAssociate's or bachelor's degree             2.064  0.03900 *  
educationMaster's, doctoral, or professional degree   2.167  0.03020 *  
urban_ruralMedium and small metro                     0.044  0.96488    
urban_ruralLarge fringe metro                        -0.075  0.94026    
urban_ruralLarge central metro                        1.259  0.20805    
painlimit_lifeworkSome days                           1.806  0.07085 .  
painlimit_lifeworkMost days                           2.052  0.04018 *  
painlimit_lifeworkEvery day                           1.172  0.24115    
opioid_freq3moSome days                               0.374  0.70854    
opioid_freq3moMost days                               0.404  0.68606    
opioid_freq3moEvery day                              -0.972  0.33095    
phq8total5unit                                       -2.106  0.03522 *  
num_pain_locations                                    2.661  0.00779 ** 
fam_income10k                                         2.118  0.03419 *  
race_ethnicityHispanic                               -1.565  0.11758    
race_ethnicityNH Black/African American only         -0.718  0.47259    
race_ethnicityOther single and multiple races        -0.577  0.56419    
Log(theta)                                              NaN      NaN    
Zero hurdle model coefficients (binomial with logit link):
                                                     Estimate Std. Error
(Intercept)                                         -1.479732   0.249322
age10                                               -0.142153   0.023765
sexMale                                             -0.375766   0.076124
educationHigh school graduate/GED                    0.355758   0.142993
educationSome college, no degree                     0.759842   0.149059
educationAssociate's or bachelor's degree            1.039827   0.141786
educationMaster's, doctoral, or professional degree  1.328910   0.180035
urban_ruralMedium and small metro                    0.027743   0.105044
urban_ruralLarge fringe metro                        0.114360   0.118742
urban_ruralLarge central metro                       0.346088   0.114916
painlimit_lifeworkSome days                          0.239884   0.096082
painlimit_lifeworkMost days                          0.275086   0.122223
painlimit_lifeworkEvery day                          0.325687   0.118433
opioid_freq3moSome days                              0.043253   0.167722
opioid_freq3moMost days                              0.107665   0.265744
opioid_freq3moEvery day                             -0.194347   0.137609
phq8total5unit                                      -0.001161   0.039121
num_pain_locations                                   0.212404   0.030285
fam_income10k                                        0.042015   0.008126
race_ethnicityHispanic                               0.139327   0.135109
race_ethnicityNH Black/African American only        -0.272954   0.128870
race_ethnicityOther single and multiple races       -0.244884   0.174931
                                                    z value Pr(>|z|)    
(Intercept)                                          -5.935 2.94e-09 ***
age10                                                -5.982 2.21e-09 ***
sexMale                                              -4.936 7.97e-07 ***
educationHigh school graduate/GED                     2.488  0.01285 *  
educationSome college, no degree                      5.098 3.44e-07 ***
educationAssociate's or bachelor's degree             7.334 2.24e-13 ***
educationMaster's, doctoral, or professional degree   7.381 1.57e-13 ***
urban_ruralMedium and small metro                     0.264  0.79170    
urban_ruralLarge fringe metro                         0.963  0.33550    
urban_ruralLarge central metro                        3.012  0.00260 ** 
painlimit_lifeworkSome days                           2.497  0.01254 *  
painlimit_lifeworkMost days                           2.251  0.02440 *  
painlimit_lifeworkEvery day                           2.750  0.00596 ** 
opioid_freq3moSome days                               0.258  0.79650    
opioid_freq3moMost days                               0.405  0.68537    
opioid_freq3moEvery day                              -1.412  0.15786    
phq8total5unit                                       -0.030  0.97632    
num_pain_locations                                    7.013 2.33e-12 ***
fam_income10k                                         5.170 2.34e-07 ***
race_ethnicityHispanic                                1.031  0.30244    
race_ethnicityNH Black/African American only         -2.118  0.03417 *  
race_ethnicityOther single and multiple races        -1.400  0.16155    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Theta: count = 251615122.8885
Number of iterations in BFGS optimization: 49 
Log-likelihood: -3346 on 45 Df

6.6 Comparing `m1` Versions

6.6.1 Comparing with Rootograms

To begin a comparison of the 4 variations of Model 1, I generated four rootogram plots using the countreg::rootogram() function.

Code

par(mfrow = c(3,2))
countreg::rootogram(m1_pois, main = "Rootogram for m1_pois")
countreg::rootogram(m1_nb, main = "Rootogram for m1_nb")
countreg::rootogram(m1_zip, main = "Rootogram for m1_zip")
countreg::rootogram(m1_zinb, main = "Rootogram for m1_zinb")
countreg::rootogram(m1_hurdle_pois, main = "Rootogram for m1_hurdle_pois")
countreg::rootogram(m1_hurdle_nb, main = "Rootogram for m1_hurdle_nb")

Code

par(mfrow = c(1,1))

Looking at the plots above, there is a clear improvement observed between 1) the Poisson (m1_pois) and negative binomial (m1_nb) models; and 2) the zero-inflated (m1_zip and m1_zinb) and hurdle (m1_hurdle_pois and mt_hurdle_nb) models as the latter appear to fit 0 counts perfectly and lack the over/under-fitting of counts 1-3. However, there do not appear to be any noticeable differences in predicting counts between the bottom four models. Furthermore, it appears each model predicts counts of ihm4pain_num > 4, but not to a substantial degree. Certainly, some subjects may have engaged in more than four IHM modalities for pain, but these were not measured in the 2019 NHIS.

6.6.2 Comparing with Fit Statistics

To further examine differences, I generated m1_summary as a combination of the 6 previous model summaries using the bind_rows() function. I then selected the Model, $R^2$, RMSE, MAE, AIC, and BIC values for output.

Code

m1_summary <- bind_rows(m1_pois_summary, m1_nb_summary, 
                        m1_zip_summary, m1_zinb_summary, 
                        m1_hurdle_pois_summary, m1_hurdle_nb_summary)

m1_summary |> select(Model, R2, RMSE, MAE, AIC, BIC) |>
  flextable() |>
  colformat_double(j = 2:4, digits = 3) |>
  colformat_double(j = 5:6, digits = 0) |>
  bold(i = 5, j = 1:6, bold = TRUE) |>
  autofit() |>
  theme_zebra(odd_header = "white") |>
  hline_top(part="body")

Model	R2	RMSE	MAE	AIC	BIC
m1_pois	0.120	0.805	0.624	6,842	6,978
m1_nb	0.119	0.806	0.624	6,800	6,942
m1_zip	0.120	0.806	0.622	6,783	7,055
m1_zinb	0.120	0.806	0.622	6,785	7,063
m1_hurdle_pois	0.121	0.805	0.622	6,780	7,052
m1_hurdle_nb	0.121	0.805	0.622	6,782	7,060

Similar to the rootograms, there is a clear improvement in $R^2$ between 1) the Poisson (m1_pois) and negative binomial (m1_nb) models; and 2) the zero-inflated (m1_zip and m1_zinb) and hurdle (m1_hurdle_pois and mt_hurdle_nb) models. m1_hurdle_pois appears to be the best fit based on it’s $R^2$ (0.121), RMSE (0.805), and AIC (6780 [lowest overall]). Importantly, m1_hurdle_pois has the lowest BIC among the four models that had better rootograms in the plot above.

6.6.3 Comparing with Vuong tests

Finally, given the inclination to use m1_hurdle_pois, I used the vuong() function to compare it to the other models.

m1_hurdle_pois v. m1_pois

Code

vuong(m1_hurdle_pois, m1_pois)

Vuong Non-Nested Hypothesis Test-Statistic: 
(test-statistic is asymptotically distributed N(0,1) under the
 null that the models are indistinguishible)
-------------------------------------------------------------
              Vuong z-statistic             H_A    p-value
Raw                    4.991730 model1 > model2  2.992e-07
AIC-corrected          2.913802 model1 > model2 0.00178528
BIC-corrected         -3.503540 model2 > model1 0.00022956

m1_hurdle_pois is a better fit than m1_pois based on small p-values from the the Raw, and AIC-corrected Vuong tests.

m1_hurdle_pois v. m1_nb

Code

vuong(m1_hurdle_pois, m1_nb)

Vuong Non-Nested Hypothesis Test-Statistic: 
(test-statistic is asymptotically distributed N(0,1) under the
 null that the models are indistinguishible)
-------------------------------------------------------------
              Vuong z-statistic             H_A    p-value
Raw                    4.754520 model1 > model2 9.9459e-07
AIC-corrected          1.369766 model1 > model2    0.08538
BIC-corrected         -9.083491 model2 > model1 < 2.22e-16

m1_hurdle_pois is a better fit than m1_nb based on small p-values from the the Raw Vuong test.

m1_hurdle_pois v. m1_zip

Code

vuong(m1_hurdle_pois, m1_zip)

Vuong Non-Nested Hypothesis Test-Statistic: 
(test-statistic is asymptotically distributed N(0,1) under the
 null that the models are indistinguishible)
-------------------------------------------------------------
              Vuong z-statistic             H_A p-value
Raw                   0.3795831 model1 > model2 0.35213
AIC-corrected         0.3795831 model1 > model2 0.35213
BIC-corrected         0.3795831 model1 > model2 0.35213

m1_hurdle_pois is not meaningfully different than m1_zip (p = .352).

m1_hurdle_pois v. m1_zinb

Code

vuong(m1_hurdle_pois, m1_zinb)

Vuong Non-Nested Hypothesis Test-Statistic: 
(test-statistic is asymptotically distributed N(0,1) under the
 null that the models are indistinguishible)
-------------------------------------------------------------
              Vuong z-statistic             H_A p-value
Raw                   0.3795879 model1 > model2 0.35213
AIC-corrected         0.3795879 model1 > model2 0.35213
BIC-corrected         0.3795879 model1 > model2 0.35213

m1_hurdle_pois is not meaningfully different than m1_zinb (p = .352).

m1_hurdle_pois v. m1_hurdle_nb

Code

vuong(m1_hurdle_pois, m1_hurdle_nb)

Vuong Non-Nested Hypothesis Test-Statistic: 
(test-statistic is asymptotically distributed N(0,1) under the
 null that the models are indistinguishible)
-------------------------------------------------------------
              Vuong z-statistic             H_A p-value
Raw                -0.007316386 model2 > model1 0.49708
AIC-corrected      -0.007316386 model2 > model1 0.49708
BIC-corrected      -0.007316386 model2 > model1 0.49708

m1_hurdle_pois is not meaningfully different than m1_hurdle_nb (p = .497).

6.7 Final Model Choice

Based on a combination of the rootograms, the fit statistics, and the Vuong tests, I chose m1_hurdle_pois as the model for the count outcome.

The hurdle model also makes sense given its description (Ford 2016).

The hurdle model is a two-part model that specifies one process for zero counts and another process for positive counts. The idea is that positive counts occur once a threshold is crossed, or put another way, a hurdle is cleared. If the hurdle is not cleared, then we have a count of 0.

Conceptually, moving from no engagement in IHM modalities for pain to engaging in any number of IHM modalities for pain seems like a hurdle to model independently of the count data.

6.8 Fitting Weighted Hurdle Model

In the following code, I added weights to m1_hurdle_pois model using the code, weights = wt_final/mean(wt_final). I then printed a summary of the model.

Code

m1_hurdle_pois <- hurdle(ihm4pain_num ~ age10 + sex + race_ethnicity + education +
                           fam_income10k + urban_rural + num_pain_locations +
                           painlimit_lifework + opioid_freq3mo + phq8total5unit,
                        dist = "poisson", zero.dist = "binomial", 
                        weights = wt_final/mean(wt_final),
                        data = projectb_si_train)

mets <- metric_set(rsq, rmse, mae)

m1_hurdle_pois_aug <- projectb_si_train |>
    mutate(".fitted" = predict(m1_hurdle_pois, type = "response"),
           ".resid" = resid(m1_hurdle_pois, type = "response"))

m1_hurdle_pois_summary <-
  mets(m1_hurdle_pois_aug, truth = ihm4pain_num, estimate = .fitted) |>
  mutate(Model = "m1_hurdle_pois",
         R2 = if_else(.metric == "rsq", .estimate, NA),
         RMSE = if_else(.metric == "rmse", .estimate, NA),
         MAE = if_else(.metric == "mae", .estimate, NA)) |> relocate(Model)

m1_hurdle_pois_summary <- m1_hurdle_pois_summary |>
  group_by(Model) |>
  summarize(R2 = mean(R2, na.rm = TRUE),
                   RMSE = mean(RMSE, na.rm = TRUE),
                   MAE = mean(MAE, na.rm = TRUE))

m1_hurdle_pois_summary <- m1_hurdle_pois_summary |>
  mutate(AIC = AIC(m1_hurdle_pois),
         BIC = BIC(m1_hurdle_pois))

summary(m1_hurdle_pois)


Call:
hurdle(formula = ihm4pain_num ~ age10 + sex + race_ethnicity + education + 
    fam_income10k + urban_rural + num_pain_locations + painlimit_lifework + 
    opioid_freq3mo + phq8total5unit, data = projectb_si_train, weights = wt_final/mean(wt_final), 
    dist = "poisson", zero.dist = "binomial")

Pearson residuals:
    Min      1Q  Median      3Q     Max 
-1.8226 -0.5770 -0.3611  0.4732  8.1198 

Count model coefficients (truncated poisson with log link):
                                                     Estimate Std. Error
(Intercept)                                         -0.288706   0.286847
age10                                               -0.143633   0.025058
sexMale                                             -0.254302   0.080070
race_ethnicityHispanic                              -0.187190   0.136711
race_ethnicityNH Black/African American only        -0.096093   0.139174
race_ethnicityOther single and multiple races       -0.306117   0.184700
educationHigh school graduate/GED                    0.239399   0.208846
educationSome college, no degree                     0.638642   0.202086
educationAssociate's or bachelor's degree            0.572008   0.199347
educationMaster's, doctoral, or professional degree  0.648524   0.222696
fam_income10k                                        0.010163   0.006892
urban_ruralMedium and small metro                    0.066435   0.124326
urban_ruralLarge fringe metro                        0.056516   0.132087
urban_ruralLarge central metro                       0.216033   0.125488
num_pain_locations                                   0.097209   0.029301
painlimit_lifeworkSome days                          0.190048   0.101775
painlimit_lifeworkMost days                          0.273623   0.122793
painlimit_lifeworkEvery day                          0.165813   0.129089
opioid_freq3moSome days                             -0.118804   0.180672
opioid_freq3moMost days                              0.235398   0.269783
opioid_freq3moEvery day                             -0.265695   0.189130
phq8total5unit                                      -0.104129   0.039421
                                                    z value Pr(>|z|)    
(Intercept)                                          -1.006 0.314184    
age10                                                -5.732 9.92e-09 ***
sexMale                                              -3.176 0.001493 ** 
race_ethnicityHispanic                               -1.369 0.170925    
race_ethnicityNH Black/African American only         -0.690 0.489910    
race_ethnicityOther single and multiple races        -1.657 0.097443 .  
educationHigh school graduate/GED                     1.146 0.251674    
educationSome college, no degree                      3.160 0.001576 ** 
educationAssociate's or bachelor's degree             2.869 0.004112 ** 
educationMaster's, doctoral, or professional degree   2.912 0.003589 ** 
fam_income10k                                         1.475 0.140336    
urban_ruralMedium and small metro                     0.534 0.593090    
urban_ruralLarge fringe metro                         0.428 0.668743    
urban_ruralLarge central metro                        1.722 0.085154 .  
num_pain_locations                                    3.318 0.000908 ***
painlimit_lifeworkSome days                           1.867 0.061854 .  
painlimit_lifeworkMost days                           2.228 0.025859 *  
painlimit_lifeworkEvery day                           1.284 0.198972    
opioid_freq3moSome days                              -0.658 0.510815    
opioid_freq3moMost days                               0.873 0.382911    
opioid_freq3moEvery day                              -1.405 0.160072    
phq8total5unit                                       -2.641 0.008255 ** 
Zero hurdle model coefficients (binomial with logit link):
                                                     Estimate Std. Error
(Intercept)                                         -1.678505   0.240666
age10                                               -0.152575   0.023318
sexMale                                             -0.218202   0.076494
race_ethnicityHispanic                              -0.064007   0.129418
race_ethnicityNH Black/African American only        -0.380766   0.128323
race_ethnicityOther single and multiple races       -0.353984   0.170466
educationHigh school graduate/GED                    0.415864   0.135167
educationSome college, no degree                     0.785447   0.141858
educationAssociate's or bachelor's degree            1.053611   0.135944
educationMaster's, doctoral, or professional degree  1.396878   0.184676
fam_income10k                                        0.036441   0.007846
urban_ruralMedium and small metro                    0.064509   0.110355
urban_ruralLarge fringe metro                        0.262959   0.120586
urban_ruralLarge central metro                       0.463635   0.118048
num_pain_locations                                   0.180661   0.029999
painlimit_lifeworkSome days                          0.327998   0.097323
painlimit_lifeworkMost days                          0.410528   0.122285
painlimit_lifeworkEvery day                          0.472318   0.120034
opioid_freq3moSome days                              0.139740   0.168191
opioid_freq3moMost days                              0.161674   0.270131
opioid_freq3moEvery day                             -0.344714   0.145339
phq8total5unit                                       0.071419   0.037659
                                                    z value Pr(>|z|)    
(Intercept)                                          -6.974 3.07e-12 ***
age10                                                -6.543 6.01e-11 ***
sexMale                                              -2.853 0.004337 ** 
race_ethnicityHispanic                               -0.495 0.620900    
race_ethnicityNH Black/African American only         -2.967 0.003005 ** 
race_ethnicityOther single and multiple races        -2.077 0.037841 *  
educationHigh school graduate/GED                     3.077 0.002093 ** 
educationSome college, no degree                      5.537 3.08e-08 ***
educationAssociate's or bachelor's degree             7.750 9.16e-15 ***
educationMaster's, doctoral, or professional degree   7.564 3.91e-14 ***
fam_income10k                                         4.644 3.41e-06 ***
urban_ruralMedium and small metro                     0.585 0.558844    
urban_ruralLarge fringe metro                         2.181 0.029207 *  
urban_ruralLarge central metro                        3.928 8.58e-05 ***
num_pain_locations                                    6.022 1.72e-09 ***
painlimit_lifeworkSome days                           3.370 0.000751 ***
painlimit_lifeworkMost days                           3.357 0.000788 ***
painlimit_lifeworkEvery day                           3.935 8.32e-05 ***
opioid_freq3moSome days                               0.831 0.406064    
opioid_freq3moMost days                               0.599 0.549505    
opioid_freq3moEvery day                              -2.372 0.017702 *  
phq8total5unit                                        1.896 0.057901 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Number of iterations in BFGS optimization: 29 
Log-likelihood: -3275 on 44 Df

6.9 Model Coefficients

To generate a singular summary of coefficients from m1_hurdle_pois, I performed the following

Generated an initial m1_hurdle_pois_coef tibble containing the variables Type (i.e., count or non-zero), Group (i.e., each variable group), and Comparison (e.g., Age +10yrs, Male v. Female) to match the output from m1_hurdle_pois
Generated m1_hurdle_pois_pe as a tibble of the exponentiated coefficients from m1_hurdle_pois
Generated m1_hurdle_pois_ci as a tibble of the exponentiated 95% CI from m1_hurdle_pois
Bound all of the above into the tibble, m1_hurdle_pois_coef.

Code

m1_hurdle_pois_coef <- tibble(Type = c("count", "count", "count", 
                                       "count", "count", "count",
                                       "count", "count", "count",
                                       "count", "count", "count",
                                       "count", "count", "count", 
                                       "count", "count", "count",
                                       "count", "count", "count", "count",
                                       "non-zero", "non-zero", "non-zero", 
                                       "non-zero", "non-zero", "non-zero",
                                       "non-zero", "non-zero", "non-zero", 
                                       "non-zero", "non-zero", "non-zero",
                                       "non-zero", "non-zero", "non-zero", 
                                       "non-zero", "non-zero", "non-zero",
                                       "non-zero", "non-zero", "non-zero", "non-zero"),
                              Group = c("(Intercept)", "Age", "Sex", 
                                        "Race/Ethnicity", "Race/Ethnicity", "Race/Ethnicity",
                                        "Education", "Education", 
                                        "Education", "Education", 
                                        "Family income", 
                                        "Proximity to metro area", "Proximity to metro area",
                                        "Proximity to metro area", 
                                        "Number of pain locations", 
                                        "Pain limiting life or work", 
                                        "Pain limiting life or work",
                                        "Pain limiting life or work", 
                                        "Opioid use past 3 mo", 
                                        "Opioid use past 3 mo",
                                        "Opioid use past 3 mo", 
                                        "PHQ-8", 
                                        "(Intercept)", "Age", "Sex", 
                                        "Race/Ethnicity", "Race/Ethnicity", "Race/Ethnicity",
                                        "Education", "Education", 
                                        "Education", "Education", 
                                        "Family income", 
                                        "Proximity to metro area", "Proximity to metro area",
                                        "Proximity to metro area", 
                                        "Number of pain locations", 
                                        "Pain limiting life or work", 
                                        "Pain limiting life or work",
                                        "Pain limiting life or work", 
                                        "Opioid use past 3 mo", 
                                        "Opioid use past 3 mo",
                                        "Opioid use past 3 mo", 
                                        "PHQ-8"),
                              Comparison = c("(Intercept)", "Age +10yrs", 
                                             "Male v. Female", 
                                             "Hispanic v. NH White",
                                             "NH Black v. NH White",
                                             "Other v. NH White",
                                             "HS/GED v. < HS", 
                                             "Some College v. < HS", 
                                             "Assoc/Bach v. < HS",
                                             "Masters/PhD v. < HS", 
                                             "Fam income +$10,000",
                                             "M/S metro v. Non-metro", 
                                             "L fringe metro v. Non-metro",
                                             "L central metro v. Non-metro", 
                                             "Pain locations +1", 
                                             "Pain limit some days v. Never", 
                                             "Pain limit most days v. Never",
                                             "Pain limit every day v. Never", 
                                             "Opioids some days v. Never", 
                                             "Opioids most days v. Never",
                                             "Opioids every day v. Never", 
                                             "PHQ-8 +5 units", 
                                             "(Intercept)", "Age +10yrs", 
                                             "Male v. Female", 
                                             "Hispanic v. NH White",
                                             "NH Black v. NH White",
                                             "Other v. NH White",
                                             "HS/GED v. < HS", 
                                             "Some College v. < HS", 
                                             "Assoc/Bach v. < HS",
                                             "Masters/PhD v. < HS", 
                                             "Fam income +$10,000",
                                             "M/S metro v. Non-metro", 
                                             "L fringe metro v. Non-metro",
                                             "L central metro v. Non-metro", 
                                             "Pain locations +1", 
                                             "Pain limit some days v. Never", 
                                             "Pain limit most days v. Never",
                                             "Pain limit every day v. Never", 
                                             "Opioids some days v. Never", 
                                             "Opioids most days v. Never",
                                             "Opioids every day v. Never", 
                                             "PHQ-8 +5 units"))

m1_hurdle_pois_pe <- exp(coef(m1_hurdle_pois)) |> as_tibble()
m1_hurdle_pois_ci <- exp(confint(m1_hurdle_pois)) |> as_tibble() |>
  clean_names() |>
  rename(Conf.low = "x2_5_percent",
         Conf.high = "x97_5_percent")

m1_hurdle_pois_coef <- bind_cols(m1_hurdle_pois_coef, 
                                 m1_hurdle_pois_pe, 
                                 m1_hurdle_pois_ci) |>
  rename(Estimate = value)

6.9.1 Numeric Summary of Odds of `ihm4pain_num` != 0

In the following code, I generated a numeric summary of the exponentiated coefficients from m1_hurdle_pois describing the odds of ihm4pain_num != 0.

Code

m1_hurdle_pois_coef |> 
  filter(Type == "non-zero") |>
  select(Group, Comparison, Estimate, Conf.low, Conf.high) |> 
  mutate(Estimate = round(Estimate, digits = 3),
         Conf.low = round(Conf.low, digits = 3),
         Conf.high = round(Conf.high, digits = 3)) |>
  rename(aOR = Estimate,
         "95% CI Low" = Conf.low,
         "95% CI High" = Conf.high) |>
  flextable() |>
  bold(bold = TRUE, part = "header") |>
  add_header_lines(values = "Supplemental Table 1: aOR of Engagement in IHM Modalities for Pain") |>
  autofit() |>
  add_footer_lines(values = "Abbreviations: Assoc, associate’s; aOR, adjusted odds ratio; Bach, bachelor’s; CI, confidence interval; fam, family; PHQ-8, Patient Health Questionnaire-8; GED, general equivalency diploma; HS, high school; L, large; M, medium; metro, metropolitan; mo, months; NH, Non-Hispanic; S, small; v., versus; yrs, years", top = FALSE) |>
  theme_zebra(odd_header = "white") |>
  hline_top(part="body")

Supplemental Table 1: aOR of Engagement in IHM Modalities for Pain
Group	Comparison	aOR	95% CI Low	95% CI High
(Intercept)	(Intercept)	0.187	0.116	0.299
Age	Age +10yrs	0.858	0.820	0.899
Sex	Male v. Female	0.804	0.692	0.934
Race/Ethnicity	Hispanic v. NH White	0.938	0.728	1.209
Race/Ethnicity	NH Black v. NH White	0.683	0.531	0.879
Race/Ethnicity	Other v. NH White	0.702	0.503	0.980
Education	HS/GED v. < HS	1.516	1.163	1.975
Education	Some College v. < HS	2.193	1.661	2.896
Education	Assoc/Bach v. < HS	2.868	2.197	3.744
Education	Masters/PhD v. < HS	4.043	2.815	5.806
Family income	Fam income +$10,000	1.037	1.021	1.053
Proximity to metro area	M/S metro v. Non-metro	1.067	0.859	1.324
Proximity to metro area	L fringe metro v. Non-metro	1.301	1.027	1.648
Proximity to metro area	L central metro v. Non-metro	1.590	1.261	2.004
Number of pain locations	Pain locations +1	1.198	1.130	1.271
Pain limiting life or work	Pain limit some days v. Never	1.388	1.147	1.680
Pain limiting life or work	Pain limit most days v. Never	1.508	1.186	1.916
Pain limiting life or work	Pain limit every day v. Never	1.604	1.268	2.029
Opioid use past 3 mo	Opioids some days v. Never	1.150	0.827	1.599
Opioid use past 3 mo	Opioids most days v. Never	1.175	0.692	1.996
Opioid use past 3 mo	Opioids every day v. Never	0.708	0.533	0.942
PHQ-8	PHQ-8 +5 units	1.074	0.998	1.156
Abbreviations: Assoc, associate’s; aOR, adjusted odds ratio; Bach, bachelor’s; CI, confidence interval; fam, family; PHQ-8, Patient Health Questionnaire-8; GED, general equivalency diploma; HS, high school; L, large; M, medium; metro, metropolitan; mo, months; NH, Non-Hispanic; S, small; v., versus; yrs, years

In examining the effects table above, I will summarize the effect of Masters/PhD v. < HS using two fictional cases: Luke Skywalker and Din Djarin

Suppose Luke Skywalker and Din Djarin 1) have the same demographic profile (i.e., same age, male); 2) reside in rural areas; 3) rated similar challenges related to pain interfering with life or work in the past 3 months; 4) used the same amount of opioids for chronic pain within the past 3 months; 5) reported the same score on the PHQ-8; 6) reported the same number of locations where they were bothered at least “a little” by pain in the past three months; 7) have the same family income; and 8) are of the same race and ethnicity. However, Luke Skywalker has a PhD in being a Jedi while Din Djarin did not graduate “This is the Way” High School. In this scenario, m1_hurdle_pois predicts that Luke Skywalker’s odds of ihm4pain_num != 0 will be 4.043 times that of Din Djarin.

The 95% confidence interval for this estimated odds ratio for the effect of Masters/PhD v. < HS on ihm4pain_num != 0 is (2.815, 5.806). This confidence interval does not include one. Thus, holding all other variables constant, having a Masters, PhD, or professional degree is associated with higher odds of having engaged in an IHM modality to manage pain in the past three months.

6.9.2 Numeric Summary of Relative Change in Count of `ihm4pain_num`

In the following code, I generated a numeric summary of the exponentiated coefficients from m1_hurdle_pois describing the predicted relative change in the count of ihm4pain_num.

Code

m1_hurdle_pois_coef |> 
  filter(Type == "count") |>
  select(Group, Comparison, Estimate, Conf.low, Conf.high) |> 
  mutate(Estimate = round(Estimate, digits = 3),
         Conf.low = round(Conf.low, digits = 3),
         Conf.high = round(Conf.high, digits = 3)) |>
  rename("Relative change" = Estimate,
         "95% CI Low" = Conf.low,
         "95% CI High" = Conf.high) |>
  flextable() |>
  bold(bold = TRUE, part = "header") |>
  add_header_lines(values = "Supplemental Table 2: Relative Change in Count of IHM Modalities Used for Pain") |>
  autofit() |>
  add_footer_lines(values = "Abbreviations: Assoc, associate’s; Bach, bachelor’s; CI, confidence interval; fam, family; PHQ-8, Patient Health Questionnaire-8; GED, general equivalency diploma; HS, high school; L, large; M, medium; metro, metropolitan; mo, months; NH, Non-Hispanic; S, small; v., versus; yrs, years", top = FALSE) |>
  theme_zebra(odd_header = "white") |>
  hline_top(part="body")

Supplemental Table 2: Relative Change in Count of IHM Modalities Used for Pain
Group	Comparison	Relative change	95% CI Low	95% CI High
(Intercept)	(Intercept)	0.749	0.427	1.315
Age	Age +10yrs	0.866	0.825	0.910
Sex	Male v. Female	0.775	0.663	0.907
Race/Ethnicity	Hispanic v. NH White	0.829	0.634	1.084
Race/Ethnicity	NH Black v. NH White	0.908	0.692	1.193
Race/Ethnicity	Other v. NH White	0.736	0.513	1.057
Education	HS/GED v. < HS	1.270	0.844	1.913
Education	Some College v. < HS	1.894	1.275	2.814
Education	Assoc/Bach v. < HS	1.772	1.199	2.619
Education	Masters/PhD v. < HS	1.913	1.236	2.959
Family income	Fam income +$10,000	1.010	0.997	1.024
Proximity to metro area	M/S metro v. Non-metro	1.069	0.838	1.364
Proximity to metro area	L fringe metro v. Non-metro	1.058	0.817	1.371
Proximity to metro area	L central metro v. Non-metro	1.241	0.971	1.587
Number of pain locations	Pain locations +1	1.102	1.041	1.167
Pain limiting life or work	Pain limit some days v. Never	1.209	0.991	1.476
Pain limiting life or work	Pain limit most days v. Never	1.315	1.034	1.672
Pain limiting life or work	Pain limit every day v. Never	1.180	0.916	1.520
Opioid use past 3 mo	Opioids some days v. Never	0.888	0.623	1.265
Opioid use past 3 mo	Opioids most days v. Never	1.265	0.746	2.147
Opioid use past 3 mo	Opioids every day v. Never	0.767	0.529	1.111
PHQ-8	PHQ-8 +5 units	0.901	0.834	0.973
Abbreviations: Assoc, associate’s; Bach, bachelor’s; CI, confidence interval; fam, family; PHQ-8, Patient Health Questionnaire-8; GED, general equivalency diploma; HS, high school; L, large; M, medium; metro, metropolitan; mo, months; NH, Non-Hispanic; S, small; v., versus; yrs, years

In examining the effects table above, I will summarize the effect of Age using two fictional cases: Han Solo and Luke Skywalker

Per the guidance by Yoshida, given the response is positive (among those who have positive counts for engaging in IHM modalities), the average count is 0.701 (Intercept).

Suppose Han Solo and Luke Skywalker 1) are both male and of the same race and ethnicity; 2) have the same level of education; 3) reside in rural areas; 4) rated similar challenges related to pain interfering with life or work in the past 3 months; 5) used the same amount of opioids for chronic pain within the past 3 months; 6) reported the same score on the PHQ-8; 7) reported the same number of locations where they were bothered at least “a little” by pain in the past three months; and 8) have the same family income. However, Han Solo is 10 years older than Luke Skywalker. In this scenario, m1_hurdle_pois predicts that the number of IHM modalities that Han Solo engaged in for pain in the past 3 months will be .866 times that of Luke’s (13.4% lower than Luke’s).

The 95% confidence interval for this estimated effect of age10 on ihm4pain_num is (0.825, 0.910). This confidence interval does not include one. Thus, holding all other variables constant, being 10 years older is associated with lower counts of IHM modalities used to manage pain in the past three months.

6.9.3 Visual Summary of Odds of `ihm4pain_num` != 0

To visualize the effects of the covariates on odds of ihm4pain_num != 0, I first generated the m1_coef_non0 tibble, limited to the “non-zero” coefficients. I then used ggplot() to plot the effects. Education covariates are placed on a separate axis given their size to not obscure the other covariates.

Code

m1_coef_non0 <- m1_hurdle_pois_coef |> 
  filter(Type == "non-zero", Comparison != "(Intercept)", Group != "Education") |>
  mutate(Comparison = fct_inorder(Comparison, ordered = NA),
         Comparison = fct_rev(Comparison))

m1_coef_non0educ <- m1_hurdle_pois_coef |> 
  filter(Type == "non-zero", Comparison != "(Intercept)", Group == "Education") |>
  mutate(Comparison = fct_inorder(Comparison, ordered = NA),
         Comparison = fct_rev(Comparison))

p1 <- ggplot(m1_coef_non0, aes(x = Estimate, y = Comparison)) +
  geom_vline(aes(xintercept = 1), linewidth = .25, linetype = 'dashed') +
  geom_errorbarh(aes(xmax = Conf.high, xmin = Conf.low), linewidth = .5, 
                 height = .2, color = 'gray50') +
  geom_point(size = 2.5, color = uhred) +
  labs(x = "aOR",
       y = "",
       title = "aOR of Engaging in ≥ 1 IHM Modality for Pain") +
  scale_x_continuous(breaks = seq(0,4,0.25))

p2 <- ggplot(m1_coef_non0educ, aes(x = Estimate, y = Comparison)) +
  geom_vline(aes(xintercept = 1), linewidth = .25, linetype = 'dashed') +
  geom_errorbarh(aes(xmax = Conf.high, xmin = Conf.low), linewidth = .5,
                 height = .2, color = 'gray50') +
  geom_point(size = 2.5, color = uhred) +
  labs(x = "aOR",
       y = "") +
  scale_x_continuous(breaks = seq(0,8,0.5))

p1/p2 + plot_layout(heights = c(5,1))

6.9.4 Visual Summary of Estimated Relative Change in Count of `ihm4pain_num`

To visualize the effects of the covariates on the estimated relative change in count of ihm4pain_num, I first generated the m1_coef_count tibble, limited to the “count” coefficients. I then used ggplot() to plot the effects. Education covariates are placed on a separate axis given their size to not obscure the other covariates.

Code

m1_coef_count <- m1_hurdle_pois_coef |> 
  filter(Type == "count", Comparison != "(Intercept)", Group != "Education") |>
  mutate(Comparison = fct_inorder(Comparison, ordered = NA),
         Comparison = fct_rev(Comparison))

m1_coef_count_educ <- m1_hurdle_pois_coef |> 
  filter(Type == "count", Comparison != "(Intercept)",
         Group == "Education") |>
  mutate(Comparison = fct_inorder(Comparison, ordered = NA),
         Comparison = fct_rev(Comparison))

p1 <- ggplot(m1_coef_count, aes(x = Estimate, y = Comparison)) +
  geom_vline(aes(xintercept = 1), linewidth = .25, linetype = 'dashed') +
  geom_errorbarh(aes(xmax = Conf.high, xmin = Conf.low), linewidth = .5, 
                 height = .2, color = 'gray50') +
  geom_point(size = 2.5, color = uhred) +
  labs(x = "Relative Change in Count",
       y = "",
       title = "Relative Change in Count of IHM Modalities Used for Pain",
       subtitle = "Assuming Engagement in > 0 IHM Modalities") +
  scale_x_continuous(breaks = seq(0,7,0.25))

p2 <- ggplot(m1_coef_count_educ, aes(x = Estimate, y = Comparison)) +
  geom_vline(aes(xintercept = 1), linewidth = .25, linetype = 'dashed') +
  geom_errorbarh(aes(xmax = Conf.high, xmin = Conf.low), linewidth = .5, 
                 height = .2, color = 'gray50') +
  geom_point(size = 2.5, color = uhred) +
  labs(x = "Relative Change in Count",
       y = "") +
  scale_x_continuous(breaks = seq(0,7,0.5))

p1/p2 + plot_layout(heights = c(5,1))

6.10 Check Model Assumptions

To generate a residuals vs. fitted plot from m1_hurdle_pois, I used ggplot() and geom_point().

Code

ggplot(m1_hurdle_pois_aug, aes(x = .fitted, y = .resid)) +
    geom_point() +
    labs(title = "Residuals vs. Fitted `ihm4pain_num`",
         subtitle = "Hurdle model with Poisson counts")

In examining the plot above, I see no major issues of concern.

6.11 Validating `m1_hurdle_pois`

6.11.1 Fit Summary in Test Data

To validate m1_hurdle_pois in the projectb_si_test data, I performed the following to generate m1test_hurdle_pois_summary, a summary of fit statistics.

I created m1test_hurdle_pois_aug as an augmentation of m1_hurdle_pois applied to the projectb_si_test data. This time I used mutate() to obtain .fitted and .resid values.
I generated, $R^2$, RMSE, and MAE values using the mets() function.
I used group_by() and summarize() to pivot m1test_hurdle_pois_summary such that there would be one column for each fit statistic.

Code

m1test_hurdle_pois_aug <- projectb_si_test |>
    mutate(".fitted" = predict(m1_hurdle_pois, type = "response"),
           ".resid" = resid(m1_hurdle_pois, type = "response"))

m1test_hurdle_pois_summary <-
  mets(m1test_hurdle_pois_aug, truth = ihm4pain_num, estimate = .fitted) |>
  mutate(Model = "m1_hurdle_pois",
         R2 = if_else(.metric == "rsq", .estimate, NA),
         RMSE = if_else(.metric == "rmse", .estimate, NA),
         MAE = if_else(.metric == "mae", .estimate, NA)) |> relocate(Model)

m1test_hurdle_pois_summary <- m1test_hurdle_pois_summary |>
  group_by(Model) |>
  summarize(R2 = mean(R2, na.rm = TRUE),
                   RMSE = mean(RMSE, na.rm = TRUE),
                   MAE = mean(MAE, na.rm = TRUE))

6.11.2 Comparing Fits in Datasets

I then bound together m1_hurdle_pois_summary and m1test_hurdle_pois_summary into m1_summaryfinal to present a table comparing the $R^2$, RMSE, and MAE values from the training and testing data.

Code

m1_summaryfinal <- bind_rows(m1_hurdle_pois_summary, 
                             m1test_hurdle_pois_summary) |>
  mutate(Sample = c("Training", "Testing"))

m1_summaryfinal |> select(Sample, R2, RMSE, MAE) |>
  flextable() |>
  colformat_double(digits = 4) |>
  autofit() |>
  theme_zebra(odd_header = "white") |>
  hline_top(part="body")

Sample	R2	RMSE	MAE
Training	0.1182	0.8069	0.6137
Testing	0.0001	0.9067	0.6929

In examining the table above, it appears the model does not perform as well in the testing data given the lower $R^2$ (0.0001 vs. 0.1182). m1_hurdle_pois also has higher RMSE and MAE in the testing data.

6.11.3 Table Comparing Observed and Predicted Counts

To further explore the accuracy of m1_hurdle_pois, I created a table (m1_countsum) summarizing the observed and expected counts from the testing and training datasets as well as their respective m1_hurdle_pois applications.

Code

m1_countsum <- tibble(Source = c("Training data observations", 
                                 "m1_hurdle_pois in training", 
                                 "Test data observations", 
                                 "m1_hurdle_pois in test"),
                      ZeroIHM = c(sum(projectb_si_train$ihm4pain_num == 0),
                                  round(sum(predict(m1_hurdle_pois, 
                                                    type = "prob")[,1]),0),
                                  sum(projectb_si_test$ihm4pain_num == 0),
                                  round(sum(predict(m1_hurdle_pois, 
                                                    newdata = projectb_si_test, 
                                                    type = "prob")[,1]),0)),
                      OneIHM = c(sum(projectb_si_train$ihm4pain_num == 1),
                                 round(sum(predict(m1_hurdle_pois, 
                                                   type = "prob")[,2]),0),
                                 sum(projectb_si_test$ihm4pain_num == 1),
                                 round(sum(predict(m1_hurdle_pois, 
                                                   newdata = projectb_si_test, 
                                                   type = "prob")[,2]),0)),
                      TwoIHM = c(sum(projectb_si_train$ihm4pain_num == 2),
                                 round(sum(predict(m1_hurdle_pois, 
                                                   type = "prob")[,3]),0),
                                 sum(projectb_si_test$ihm4pain_num == 2),
                                 round(sum(predict(m1_hurdle_pois, 
                                                   newdata = projectb_si_test, 
                                                   type = "prob")[,3]),0)),
                      ThreeIHM = c(sum(projectb_si_train$ihm4pain_num == 3),
                                   round(sum(predict(m1_hurdle_pois, 
                                                     type = "prob")[,4]),0),
                                   sum(projectb_si_test$ihm4pain_num == 3),
                                   round(sum(predict(m1_hurdle_pois, 
                                                     newdata = projectb_si_test, 
                                                     type = "prob")[,4]),0)),
                      FourIHM = c(sum(projectb_si_train$ihm4pain_num == 4),
                                  round(sum(predict(m1_hurdle_pois, 
                                                    type = "prob")[,5]),0),
                                  sum(projectb_si_test$ihm4pain_num == 4),
                                  round(sum(predict(m1_hurdle_pois, 
                                                    newdata = projectb_si_test, 
                                                    type = "prob")[,5]),0))) |>
  mutate(MoreThan4IHM = nrow(projectb_si_train) - (ZeroIHM + OneIHM + TwoIHM + ThreeIHM + FourIHM),
         Total = (ZeroIHM + OneIHM + TwoIHM + ThreeIHM + 
                    FourIHM + MoreThan4IHM)) |>
  rename("0" = ZeroIHM,
         "1" = OneIHM,
         "2" = TwoIHM,
         "3" = ThreeIHM,
         "4" = FourIHM,
         ">4" = MoreThan4IHM)

6.11.4 Plot Comparing Observed and Predicted Counts

To facilitate a visual comparison, I generated m1_pivot and a pair of plots comparing observed vs. predicted values in the training and test samples.

Code

m1_pivot <- m1_countsum |>
  select(-c(Total)) |>
  pivot_longer(!Source, names_to = "Count", values_to = "Value") |>
  mutate(Count = factor(Count, levels = c("0", "1", "2", "3", "4", ">4")))

m1_pivot_train <- m1_pivot |>
  filter(Source %in% c("Training data observations", 
                       "m1_hurdle_pois in training")) |>
  mutate(Source = if_else(Source == "Training data observations", 
                          "Observed", "Predicted"),
         Source = factor(Source, levels = c("Observed", "Predicted")))

m1_pivot_test <- m1_pivot |>
  filter(Source %in% c("Test data observations", 
                       "m1_hurdle_pois in test")) |>
  mutate(Source = if_else(Source == "Test data observations", 
                          "Observed", "Predicted"),
         Source = factor(Source, levels = c("Observed", "Predicted")))

p1 <- ggplot(data = m1_pivot_train, aes(x = Count, y = Value, fill = Source)) +
  geom_bar(stat = "identity", position = position_dodge(), alpha = 0.75) +
  ylim(0, 2500) +
  labs(x = "Count of IHM Modalities",
       y = "Number of Subjects",
       title = "Training Data (n = 3557)") + 
  theme(legend.position="bottom",
        legend.title = element_blank())

p2 <- ggplot(data = m1_pivot_test, aes(x = Count, y = Value, fill = Source)) +
  geom_bar(stat = "identity", position = position_dodge(), alpha = 0.75) +
  ylim(0, 2500) +
  labs(x = "Count of IHM Modalities",
       y = "Number of Subjects",
       title = "Test Data (n = 3557)") +
  guides(fill="none")

p1 + p2 +
  plot_annotation("Comparing Observed v. Predicted Values in Weighted Hurdle Poisson Model")

Source	0	1	2	3	4	>4	Total
Training data observations	2,257	811	343	116	30	0	3,557
m1_hurdle_pois in training	2,329	787	312	96	25	8	3,557
Test data observations	2,297	791	324	107	38	0	3,557
m1_hurdle_pois in test	2,354	778	303	92	24	6	3,557

In examining the plots above, while there is slightly greater deviation from the observed counts in the test data, it appears the model performs quite well in predicting each level of the count of ihm4pain_num in both the training and test datasets.

7 Model 2: Nonpharmacologic Modalities Only vs. Opioids Only

For Model 2, I sought to answer the question:

Which demographic, mental health, and pain-related variables are associated with engagement in exclusively nonpharmacologic modalities rather than opioid utilization in the past 3 months among U.S. adults with chronic pain?

7.1 Create Dataset

For this model, the outcome was defined as engagement in nonpharmacologic modalities only (ie, engaging in IHM, a chronic pain self-management program, support groups, or physical, rehabilitative, occupational, or talk therapy while never using opioids) rather than opioids only (ie, using opioids on at least some days while not using any nonpharmacologic modalities). The already single-imputed data, projectb_si, were filtered to include the 3300 subjects engaging in either nonpharmacologic only or opioids only. With this new m2_dataset_si dataset, the outcome nonpharm_over_opioid was defined as 1 if “Nonpharm only” and 0 if “Opioids only.”

Code

m2_dataset_si <- projectb_si |>
  filter(nonpharm_opioid %in% 
           c("Opioids only", "Nonpharm only")) |>
  mutate(nonpharm_opioid = factor(nonpharm_opioid, 
                                  levels = c("Nonpharm only", 
                                             "Opioids only")),
         nonpharm_over_opioid = 
           if_else(nonpharm_opioid == "Nonpharm only", 1, 0))

m2_dataset_si |> tabyl(nonpharm_opioid, nonpharm_over_opioid)

 nonpharm_opioid   0    1
   Nonpharm only   0 2790
    Opioids only 510    0

7.2 Create Survey Design

I then used the svydesign() function to generate the survey design.

Code

nhis_des2 <- svydesign(data = m2_dataset_si, id = ~ subject_id,
                        weights = ~ wt_final, nest = TRUE)

7.3 Total population represented

The following code details the steps taken to generate a combined Table 2.

Code

total_n2 <- svytotal(~ one, nhis_des2, na.rm = FALSE) |> as_tibble() |>
  mutate(Variable = "Total, n",
         total = floor(total),
         total = formatC(total, format = "d", big.mark = ","),
         p.value = "") |>
  select(Variable, All = total, p.value)

Code

nonpharm_n <- svyby(~ one, ~ nonpharm_opioid, nhis_des2, svytotal) |> as_tibble() |> filter(nonpharm_opioid == "Nonpharm only") |> select(Nonpharm = one) |>
  mutate(Nonpharm = round(Nonpharm),
         Nonpharm = formatC(Nonpharm, format = "d", big.mark = ","))
opioids_n <- svyby(~ one, ~ nonpharm_opioid, nhis_des2, svytotal) |> as_tibble() |> filter(nonpharm_opioid == "Opioids only") |> select(Opioids = one) |>
  mutate(Opioids = round(Opioids),
         Opioids = formatC(Opioids, format = "d", big.mark = ","))
total_n2 <- bind_cols(total_n2, nonpharm_n, opioids_n) |>
  mutate(All = as.character(All),
         Nonpharm = as.character(Nonpharm),
         Opioids = as.character(Opioids))

Code

total_n2 <- total_n2 |> select(Variable, All, Nonpharm, Opioids, p.value) 

total_n2 |>
  flextable() |>
  autofit() |>
  bold(bold = TRUE, part = "header") |>
  theme_zebra(odd_header = "white") |>
  hline_top(part="body")

Variable	All	Nonpharm	Opioids	p.value
Total, n	22,707,275	19,099,216	3,608,059

7.4 Crosstabulation

Code

table2_cross <- nhis_des2 |>
  tbl_svysummary(
    include = c(age, sex, race_ethnicity, education, 
                fam_income1k, urban_rural, num_pain_locations,
                painlimit_lifework, phq8total),
    type = c(num_pain_locations = "continuous",
             fam_income = "continuous"),
    statistic = list(all_continuous()  ~ "{mean}",
                     all_categorical() ~ "{p}"),
    label = list(age ~ "Age in years, mean (95% CI)",
                 sex ~ "Sex, % (95% CI)",
                 race_ethnicity ~ "Race/Ethnicity, % (95% CI)",
                 education ~ "Education, % (95% CI)",
                 fam_income1k ~ "Family income in $1000s, mean (95% CI)",
                 urban_rural ~ "Proximity to metro area, % (95% CI)",
                 num_pain_locations ~ "Number of pain locations, mean (95% CI)",
                 painlimit_lifework ~ "Pain limiting life/work, % (95% CI)",
                 phq8total ~ "PHQ-8 total, mean (95% CI)"),
    digits = list(all_continuous() ~ 1,
                  all_categorical() ~ 1),
    by = nonpharm_opioid) |>
  add_overall() |>
  add_ci(pattern = "{stat} ({ci})", 
         style_fun = list(all_categorical() ~ style_percent_1digit, 
                          all_continuous() ~ style_number_1digit),
         statistic = list(all_categorical() ~ "{conf.low}-{conf.high}", 
                          all_continuous() ~ "{conf.low}-{conf.high}")) |>
  add_p(pvalue_fun = function(x) style_pvalue(x, digits = 3)) |>
  modify_header(label = "Variable",
                stat_0 ~ "All",
                stat_1 ~ "Nonpharm",
                stat_2 ~ "Opioids",
                p.value ~ "p.value")

table2_cross

Variable	All^1,2	Nonpharm^1,2	Opioids^1,2	p.value³
Age in years, mean (95% CI)	54.0 (53.2-54.7)	53.0 (52.2-53.8)	59.2 (57.6-60.8)	<0.001
Sex, % (95% CI)				0.797
Female	59.2 (57.1-61.3)	59.4 (57.1-61.6)	58.6 (53.0-63.9)
Male	40.8 (38.7-42.9)	40.6 (38.4-42.9)	41.4 (36.1-47.0)
Race/Ethnicity, % (95% CI)				0.221
NH White only	74.3 (72.4-76.1)	74.6 (72.6-76.6)	72.6 (67.2-77.3)
Hispanic	10.1 (8.91-11.5)	10.0 (8.73-11.5)	10.7 (7.59-14.8)
NH Black/African American only	9.8 (8.60-11.1)	9.2 (7.96-10.6)	12.8 (9.77-16.5)
Other single and multiple races	5.8 (4.82-6.92)	6.1 (5.11-7.31)	4.0 (1.84-8.57)
Education, % (95% CI)				<0.001
Less than high school	12.1 (10.7-13.7)	9.6 (8.25-11.2)	25.0 (20.3-30.2)
High school graduate/GED	26.7 (24.8-28.6)	25.4 (23.5-27.5)	33.4 (28.4-38.8)
Some college, no degree	20.4 (18.8-22.2)	20.6 (18.8-22.5)	19.4 (15.4-24.2)
Associate's or bachelor's degree	31.2 (29.4-33.1)	33.7 (31.6-35.8)	18.3 (14.8-22.4)
Master's, doctoral, or professional degree	9.6 (8.54-10.7)	10.6 (9.47-11.9)	3.9 (2.47-6.07)
Family income in $1000s, mean (95% CI)	68.5 (66.2-70.8)	72.8 (70.3-75.4)	45.6 (41.6-49.6)	<0.001
Proximity to metro area, % (95% CI)				<0.001
Nonmetropolitan	18.0 (16.5-19.6)	16.5 (15.0-18.2)	25.9 (21.7-30.7)
Medium and small metro	33.0 (31.1-35.0)	32.8 (30.8-34.9)	33.9 (28.9-39.3)
Large fringe metro	22.1 (20.4-23.9)	23.3 (21.5-25.3)	15.7 (12.4-19.7)
Large central metro	26.8 (25.0-28.8)	27.3 (25.3-29.4)	24.5 (19.8-29.8)
Number of pain locations, mean (95% CI)	3.2 (3.2-3.3)	3.2 (3.1-3.3)	3.3 (3.2-3.5)	0.086
Pain limiting life/work, % (95% CI)				<0.001
Never	22.1 (20.5-23.8)	24.2 (22.4-26.1)	10.9 (8.13-14.5)
Some days	38.5 (36.4-40.6)	39.3 (37.1-41.5)	34.3 (29.1-39.9)
Most days	17.2 (15.8-18.8)	16.9 (15.3-18.7)	18.9 (15.3-23.1)
Every day	22.2 (20.5-24.0)	19.6 (17.9-21.5)	36.0 (30.9-41.3)
PHQ-8 total, mean (95% CI)	5.6 (5.3-5.8)	5.4 (5.1-5.6)	6.5 (5.9-7.2)	<0.001
¹ Mean; %
² CI = Confidence Interval
³ Wilcoxon rank-sum test for complex survey samples; chi-squared test with Rao & Scott’s second-order correction

7.5 Assemble Table 2 Tibble

Code

table2_cross <- table2_cross |> as.data.frame() |> as_tibble()

table2_cross[is.na(table2_cross)] <- ""

table2_cross$p.value <- sub("0\\.",".", table2_cross$p.value)
table2_cross$p.value <- sub(" ","", table2_cross$p.value)

table2 <- bind_rows(total_n2, table2_cross) |>
  rename("Weighted variable" = Variable,
         "Combined groups" = All,
         "Nonpharm only" = Nonpharm,
         "Opioids only" = Opioids,
         "P value" = p.value)

7.6 Print Table 2

I printed Table 2 using the flextable() function.

Code

table2 |> flextable() |>
  add_header_lines(values = "Table 2: Socio-Demographic, Pain, and Mental Health Characteristics By Nonpharmacologic Modality Engagement") |>
  bold(bold = TRUE, part = "header") |>
  add_footer_lines(values = "Abbreviations: CI, confidence interval; PHQ-8, Patient Health Questionnaire-8; GED, general equivalency diploma; metro, metropolitan; NH, Non-Hispanic; Nonpharm, nonpharmacologic", 
                   top = FALSE) |>
  padding(i = c(4:5, 7:10, 12:16, 19:22, 25:28), j = 1, 
          padding.left = 20) |>
  bold(i = c(1, 2, 3, 6, 11, 17:18, 23:24, 29), j = 1, bold = TRUE) |>
  flextable::footnote(i = c(2, 17, 23, 29), j = 5, 
                      ref_symbols = "a",
                      value = as_paragraph("Wilcoxon rank-sum test for complex survey samples")) |>
  flextable::footnote(i = c(3, 6, 11, 18, 24), 
                      j = 5,
                      ref_symbols = "b",
                      value = as_paragraph("Chi-squared test with Rao & Scott’s second-order correction")) |>
  autofit(part = c("body", "header"))

Table 2: Socio-Demographic, Pain, and Mental Health Characteristics By Nonpharmacologic Modality Engagement
Weighted variable	Combined groups	Nonpharm only	Opioids only	P value
Total, n	22,707,275	19,099,216	3,608,059
Age in years, mean (95% CI)	54.0 (53.2-54.7)	53.0 (52.2-53.8)	59.2 (57.6-60.8)	<.001a
Sex, % (95% CI)				.797b
Female	59.2 (57.1-61.3)	59.4 (57.1-61.6)	58.6 (53.0-63.9)
Male	40.8 (38.7-42.9)	40.6 (38.4-42.9)	41.4 (36.1-47.0)
Race/Ethnicity, % (95% CI)				.221b
NH White only	74.3 (72.4-76.1)	74.6 (72.6-76.6)	72.6 (67.2-77.3)
Hispanic	10.1 (8.91-11.5)	10.0 (8.73-11.5)	10.7 (7.59-14.8)
NH Black/African American only	9.8 (8.60-11.1)	9.2 (7.96-10.6)	12.8 (9.77-16.5)
Other single and multiple races	5.8 (4.82-6.92)	6.1 (5.11-7.31)	4.0 (1.84-8.57)
Education, % (95% CI)				<.001b
Less than high school	12.1 (10.7-13.7)	9.6 (8.25-11.2)	25.0 (20.3-30.2)
High school graduate/GED	26.7 (24.8-28.6)	25.4 (23.5-27.5)	33.4 (28.4-38.8)
Some college, no degree	20.4 (18.8-22.2)	20.6 (18.8-22.5)	19.4 (15.4-24.2)
Associate's or bachelor's degree	31.2 (29.4-33.1)	33.7 (31.6-35.8)	18.3 (14.8-22.4)
Master's, doctoral, or professional degree	9.6 (8.54-10.7)	10.6 (9.47-11.9)	3.9 (2.47-6.07)
Family income in $1000s, mean (95% CI)	68.5 (66.2-70.8)	72.8 (70.3-75.4)	45.6 (41.6-49.6)	<.001a
Proximity to metro area, % (95% CI)				<.001b
Nonmetropolitan	18.0 (16.5-19.6)	16.5 (15.0-18.2)	25.9 (21.7-30.7)
Medium and small metro	33.0 (31.1-35.0)	32.8 (30.8-34.9)	33.9 (28.9-39.3)
Large fringe metro	22.1 (20.4-23.9)	23.3 (21.5-25.3)	15.7 (12.4-19.7)
Large central metro	26.8 (25.0-28.8)	27.3 (25.3-29.4)	24.5 (19.8-29.8)
Number of pain locations, mean (95% CI)	3.2 (3.2-3.3)	3.2 (3.1-3.3)	3.3 (3.2-3.5)	.086a
Pain limiting life/work, % (95% CI)				<.001b
Never	22.1 (20.5-23.8)	24.2 (22.4-26.1)	10.9 (8.13-14.5)
Some days	38.5 (36.4-40.6)	39.3 (37.1-41.5)	34.3 (29.1-39.9)
Most days	17.2 (15.8-18.8)	16.9 (15.3-18.7)	18.9 (15.3-23.1)
Every day	22.2 (20.5-24.0)	19.6 (17.9-21.5)	36.0 (30.9-41.3)
PHQ-8 total, mean (95% CI)	5.6 (5.3-5.8)	5.4 (5.1-5.6)	6.5 (5.9-7.2)	<.001a
Abbreviations: CI, confidence interval; PHQ-8, Patient Health Questionnaire-8; GED, general equivalency diploma; metro, metropolitan; NH, Non-Hispanic; Nonpharm, nonpharmacologic
aWilcoxon rank-sum test for complex survey samples
bChi-squared test with Rao & Scott’s second-order correction

7.7 Fit Weighted Logistic Model with Survey Weights

I fit the model m2_lrm_wt with the lrm() function, defining weights with weights = wt_final.

In this analysis, I chose to predict the outcome nonpharm_over_opioid based on 9 variables: age, sex, race_ethnicity, education, fam_income1k urban_rural, num_pain_locations, painlimit_lifework, and phq8total.

Code

set.seed(19900317)
dd <- datadist(m2_dataset_si, adjto.cat = "first")
options(datadist = "dd")

m2_lrm_wt <- lrm(nonpharm_over_opioid ~ 
                age + sex + race_ethnicity + education +
                               fam_income1k + urban_rural + num_pain_locations +
                               painlimit_lifework + phq8total,
                weights = wt_final, normwt = TRUE,
              data = m2_dataset_si, x = TRUE, y = TRUE)

m2_lrm_wt

Logistic Regression Model

lrm(formula = nonpharm_over_opioid ~ age + sex + race_ethnicity + 
    education + fam_income1k + urban_rural + num_pain_locations + 
    painlimit_lifework + phq8total, data = m2_dataset_si, x = TRUE, 
    y = TRUE, weights = wt_final, normwt = TRUE)


Sum of Weights by Response Category

        0         1 
 524.3515 2775.6485 

                        Model Likelihood        Discrimination    Rank Discrim.    
                              Ratio Test               Indexes          Indexes    
 Obs          3300    LR chi2     326.37        R2       0.161    C       0.737    
  0            510    d.f.            18      R2(18,3300)0.089    Dxy     0.475    
  1           2790    Pr(> chi2) <0.0001    R2(18,1323.1)0.208    gamma   0.475    
Sum of weights3300                              Brier    0.121    tau-a   0.124    
 max |deriv| 1e-07                                                                 

                                                     Coef    S.E.   Wald Z
Intercept                                             2.1393 0.3310  6.46 
age                                                  -0.0212 0.0033 -6.50 
sex=Male                                             -0.1856 0.1038 -1.79 
race_ethnicity=Hispanic                               0.2904 0.1785  1.63 
race_ethnicity=NH Black/African American only        -0.1415 0.1636 -0.87 
race_ethnicity=Other single and multiple races        0.4662 0.2511  1.86 
education=High school graduate/GED                    0.4940 0.1454  3.40 
education=Some college, no degree                     0.7391 0.1627  4.54 
education=Associate's or bachelor's degree            1.1555 0.1644  7.03 
education=Master's, doctoral, or professional degree  1.3719 0.2696  5.09 
fam_income1k                                          0.0074 0.0013  5.54 
urban_rural=Medium and small metro                    0.2832 0.1362  2.08 
urban_rural=Large fringe metro                        0.6227 0.1623  3.84 
urban_rural=Large central metro                       0.3709 0.1525  2.43 
num_pain_locations                                    0.0004 0.0411  0.01 
painlimit_lifework=Some days                         -0.6143 0.1669 -3.68 
painlimit_lifework=Most days                         -0.7064 0.1879 -3.76 
painlimit_lifework=Every day                         -1.0145 0.1773 -5.72 
phq8total                                            -0.0070 0.0097 -0.72 
                                                     Pr(>|Z|)
Intercept                                            <0.0001 
age                                                  <0.0001 
sex=Male                                             0.0739  
race_ethnicity=Hispanic                              0.1038  
race_ethnicity=NH Black/African American only        0.3869  
race_ethnicity=Other single and multiple races       0.0634  
education=High school graduate/GED                   0.0007  
education=Some college, no degree                    <0.0001 
education=Associate's or bachelor's degree           <0.0001 
education=Master's, doctoral, or professional degree <0.0001 
fam_income1k                                         <0.0001 
urban_rural=Medium and small metro                   0.0376  
urban_rural=Large fringe metro                       0.0001  
urban_rural=Large central metro                      0.0150  
num_pain_locations                                   0.9913  
painlimit_lifework=Some days                         0.0002  
painlimit_lifework=Most days                         0.0002  
painlimit_lifework=Every day                         <0.0001 
phq8total                                            0.4712

7.8 Numeric Summary of Coefficients (`m2_coef_wt`)

The following code generates a numeric summary of coefficients from the model and adds meaningful labels for comparison.

Code

m2_coef_wt <- summary(m2_lrm_wt, conf.int = 0.95) |> as_tibble() |>
  filter(Type == 2) |>
  mutate(Group = c("Age", "Family income",
                   "Number of pain locations",
                   "PHQ-8", 
                   "Sex", 
                   "Race/Ethnicity", "Race/Ethnicity", "Race/Ethnicity",
                   "Education", "Education", 
                   "Education", "Education", 
                   "Proximity to metro area", "Proximity to metro area",
                   "Proximity to metro area", 
                   "Pain limiting life or work", 
                   "Pain limiting life or work",
                   "Pain limiting life or work"),
         Comparison = c("Age 69 v. 46", 
                        "Family income $85k v. $22k",
                        "4 pain locations v. 2",
                        "PHQ-8 score 8 v. 1",
                        "Male v. Female", 
                        "Hispanic v. NH White",
                        "NH Black v. NH White",
                        "Other v. NH White",
                        "HS/GED v. < HS", 
                        "Some College v. < HS", 
                        "Assoc/Bach v. < HS",
                        "Masters/PhD v. < HS", 
                        "M/S metro v. Non-metro", 
                        "L fringe metro v. Non-metro",
                        "L central metro v. Non-metro", 
                        "Pain limit some days v. Never", 
                        "Pain limit most days v. Never",
                        "Pain limit every day v. Never"),
         Effect = as.numeric(Effect)) |>
  rename(Lower0.95 = "Lower 0.95", Upper0.95 = "Upper 0.95") |>
  mutate(Lower0.95 = as.numeric(Lower0.95), Upper0.95 = as.numeric(Upper0.95))

m2_coef_wt <- m2_coef_wt |>
  mutate(Comparison = factor(Comparison, levels = c("Age 69 v. 46",
                                                    "Male v. Female",
                                                    "Hispanic v. NH White",
                                                    "NH Black v. NH White",
                                                    "Other v. NH White",
                                                    "HS/GED v. < HS", 
                                                    "Some College v. < HS", 
                                                    "Assoc/Bach v. < HS",
                                                    "Masters/PhD v. < HS", 
                                                    "Family income $85k v. $22k",
                                                    "M/S metro v. Non-metro", 
                                                    "L fringe metro v. Non-metro",
                                                    "L central metro v. Non-metro", 
                                                    "4 pain locations v. 2",
                                                    "Pain limit some days v. Never", 
                                                    "Pain limit most days v. Never",
                                                    "Pain limit every day v. Never",
                                                    "PHQ-8 score 8 v. 1"))) |>
  arrange(Comparison)

The following code generates Supplemental Table 3, providing a numeric summmary of the coefficients from m2_lrm_wt.

Code

m2_coef_wt |> 
  select(Group, Comparison, Effect, Lower0.95, Upper0.95) |> 
  mutate(Effect = round(Effect, digits = 3),
         Lower0.95 = round(Lower0.95, digits = 3),
         Upper0.95 = round(Upper0.95, digits = 3)) |>
  rename(aOR = Effect,
         "95% CI Low" = Lower0.95,
         "95% CI High" = Upper0.95) |>
  flextable() |> 
  bold(bold = TRUE, part = "header") |>
  add_header_lines(values = "Supplemental Table 3: aOR of Engagement in Nonpharmacologic Only vs. Opioids Only") |>
  autofit() |>
  add_footer_lines(values = "Abbreviations: Assoc, associate’s; aOR, adjusted odds ratio; Bach, bachelor’s; CI, confidence interval; fam, family; PHQ-8, Patient Health Questionnaire-8; GED, general equivalency diploma; HS, high school; L, large; M, medium; metro, metropolitan; NH, Non-Hispanic; S, small; v., versus; yrs, years", top = FALSE) |>
  theme_zebra(odd_header = "white") |>
  hline_top(part="body")

Supplemental Table 3: aOR of Engagement in Nonpharmacologic Only vs. Opioids Only
Group	Comparison	aOR	95% CI Low	95% CI High
Age	Age 69 v. 46	0.614	0.530	0.711
Sex	Male v. Female	0.831	0.678	1.018
Race/Ethnicity	Hispanic v. NH White	1.337	0.942	1.897
Race/Ethnicity	NH Black v. NH White	0.868	0.630	1.196
Race/Ethnicity	Other v. NH White	1.594	0.974	2.607
Education	HS/GED v. < HS	1.639	1.232	2.179
Education	Some College v. < HS	2.094	1.522	2.881
Education	Assoc/Bach v. < HS	3.176	2.301	4.383
Education	Masters/PhD v. < HS	3.943	2.325	6.687
Family income	Family income $85k v. $22k	1.596	1.352	1.883
Proximity to metro area	M/S metro v. Non-metro	1.327	1.016	1.734
Proximity to metro area	L fringe metro v. Non-metro	1.864	1.356	2.562
Proximity to metro area	L central metro v. Non-metro	1.449	1.075	1.954
Number of pain locations	4 pain locations v. 2	1.001	0.852	1.176
Pain limiting life or work	Pain limit some days v. Never	0.541	0.390	0.750
Pain limiting life or work	Pain limit most days v. Never	0.493	0.341	0.713
Pain limiting life or work	Pain limit every day v. Never	0.363	0.256	0.513
PHQ-8	PHQ-8 score 8 v. 1	0.952	0.834	1.088
Abbreviations: Assoc, associate’s; aOR, adjusted odds ratio; Bach, bachelor’s; CI, confidence interval; fam, family; PHQ-8, Patient Health Questionnaire-8; GED, general equivalency diploma; HS, high school; L, large; M, medium; metro, metropolitan; NH, Non-Hispanic; S, small; v., versus; yrs, years

7.9 Effects Plot (`m2_coef_wt`)

To visualize the effects of the covariates on odds of nonpharm_over_opioid = 1, I used ggplot() to plot the effects. Education covariates are placed on a separate axis given their size to not obscure the other covariates.

Code

m2_coef_plot_wt <- m2_coef_wt |> 
  filter(Comparison != "(Intercept)", Group != "Education") |>
  mutate(Comparison = fct_inorder(Comparison, ordered = NA),
         Comparison = fct_rev(Comparison))

m2_coef_plot_educ_wt <- m2_coef_wt |> 
  filter(Comparison != "(Intercept)", Group == "Education") |>
  mutate(Comparison = fct_inorder(Comparison, ordered = NA),
         Comparison = fct_rev(Comparison))

p1 <- ggplot(m2_coef_plot_wt, aes(x = Effect, y = Comparison)) +
  geom_vline(aes(xintercept = 1), linewidth = .25, linetype = 'dashed') +
  geom_errorbarh(aes(xmax = Upper0.95, xmin = Lower0.95), linewidth = .5,
                 height = .2, color = 'gray50') +
  geom_point(size = 2.5, color = uhred) +
  labs(x = "aOR",
       y = "",
       title = "aOR of Using Nonpharmacologic Only vs. Opioids Only") +
  scale_x_continuous(breaks = seq(0,8,0.5))

p2 <- ggplot(m2_coef_plot_educ_wt, aes(x = Effect, y = Comparison)) +
  geom_vline(aes(xintercept = 1), linewidth = .25, linetype = 'dashed') +
  geom_errorbarh(aes(xmax = Upper0.95, xmin = Lower0.95), linewidth = .5,
                 height = .2, color = 'gray50') +
  geom_point(size = 2.5, color = uhred) +
  labs(x = "aOR",
       y = "") +
  scale_x_continuous(breaks = seq(0,8,0.5))

p1/p2 + plot_layout(heights = c(5,1))

7.10 Summarizing Fit (`m2_coef_wt`)

To summarize the fit of m2_lrm_wt, I simply entered m2_lrm_wt within the code. From the output, the Nagelkerke $R^2$ is 0.161 and the unvalidated area under the ROC curve (aka c-statistic) is 0.737.

Code

m2_lrm_wt

Logistic Regression Model

lrm(formula = nonpharm_over_opioid ~ age + sex + race_ethnicity + 
    education + fam_income1k + urban_rural + num_pain_locations + 
    painlimit_lifework + phq8total, data = m2_dataset_si, x = TRUE, 
    y = TRUE, weights = wt_final, normwt = TRUE)


Sum of Weights by Response Category

        0         1 
 524.3515 2775.6485 

                        Model Likelihood        Discrimination    Rank Discrim.    
                              Ratio Test               Indexes          Indexes    
 Obs          3300    LR chi2     326.37        R2       0.161    C       0.737    
  0            510    d.f.            18      R2(18,3300)0.089    Dxy     0.475    
  1           2790    Pr(> chi2) <0.0001    R2(18,1323.1)0.208    gamma   0.475    
Sum of weights3300                              Brier    0.121    tau-a   0.124    
 max |deriv| 1e-07                                                                 

                                                     Coef    S.E.   Wald Z
Intercept                                             2.1393 0.3310  6.46 
age                                                  -0.0212 0.0033 -6.50 
sex=Male                                             -0.1856 0.1038 -1.79 
race_ethnicity=Hispanic                               0.2904 0.1785  1.63 
race_ethnicity=NH Black/African American only        -0.1415 0.1636 -0.87 
race_ethnicity=Other single and multiple races        0.4662 0.2511  1.86 
education=High school graduate/GED                    0.4940 0.1454  3.40 
education=Some college, no degree                     0.7391 0.1627  4.54 
education=Associate's or bachelor's degree            1.1555 0.1644  7.03 
education=Master's, doctoral, or professional degree  1.3719 0.2696  5.09 
fam_income1k                                          0.0074 0.0013  5.54 
urban_rural=Medium and small metro                    0.2832 0.1362  2.08 
urban_rural=Large fringe metro                        0.6227 0.1623  3.84 
urban_rural=Large central metro                       0.3709 0.1525  2.43 
num_pain_locations                                    0.0004 0.0411  0.01 
painlimit_lifework=Some days                         -0.6143 0.1669 -3.68 
painlimit_lifework=Most days                         -0.7064 0.1879 -3.76 
painlimit_lifework=Every day                         -1.0145 0.1773 -5.72 
phq8total                                            -0.0070 0.0097 -0.72 
                                                     Pr(>|Z|)
Intercept                                            <0.0001 
age                                                  <0.0001 
sex=Male                                             0.0739  
race_ethnicity=Hispanic                              0.1038  
race_ethnicity=NH Black/African American only        0.3869  
race_ethnicity=Other single and multiple races       0.0634  
education=High school graduate/GED                   0.0007  
education=Some college, no degree                    <0.0001 
education=Associate's or bachelor's degree           <0.0001 
education=Master's, doctoral, or professional degree <0.0001 
fam_income1k                                         <0.0001 
urban_rural=Medium and small metro                   0.0376  
urban_rural=Large fringe metro                       0.0001  
urban_rural=Large central metro                      0.0150  
num_pain_locations                                   0.9913  
painlimit_lifework=Some days                         0.0002  
painlimit_lifework=Most days                         0.0002  
painlimit_lifework=Every day                         <0.0001 
phq8total                                            0.4712

7.11 Validating `m2_coef_wt`

To validate m2_lrm_wt, I first set a seed of 19900317 using the set.seed() function. I then used the validate() function with 40 bootstrap repetitions to generate validated (index.corrected) $R^2$ and Dxy values. I stored this validated summary as val_m2 so that I could obtain specific values for a summary table below.

Code

set.seed(19900317)
val_m2 <- validate(m2_lrm_wt, method = "boot", B = 40)

Code

fit_report <- tibble(Model = "m2") |>
  mutate("Validated R2" = case_when(
    Model == "m2" ~ val_m2[2,5]),
    "Validated C stat" = case_when(
      Model == "m2" ~  0.5 + (val_m2[1,5]/2)),
    AIC = AIC(m2_lrm_wt),
    BIC = BIC(m2_lrm_wt))
    
fit_report |> 
  select(Model, "Validated R2", "Validated C stat", AIC, BIC) |>
  flextable() |>
  colformat_double(j = 2:3, digits = 3) |>
  colformat_double(j = 4:5, digits = 0) |>
  autofit() |>
  theme_zebra(odd_header = "white") |>
  hline_top(part="body")

Model	Validated R2	Validated C stat	AIC	BIC
m2	0.148	0.728	2,601	2,717

7.12 References

Ford, C. (2016). Getting Started with Hurdle Models. University of Virginia Library. https://data.library.virginia.edu/getting-started-with-hurdle-models/

Kroenke, K., Spitzer, R. L., & Williams, J. B. (2001). The PHQ-9: validity of a brief depression severity measure. Journal of General Internal Medicine, 16(9), 606-613. https://doi.org/10.1046/j.1525-1497.2001.016009606.x

Löwe, B., Decker, O., Müller, S., Brähler, E., Schellberg, D., Herzog, W., & Herzberg, P. Y. (2008). Validation and standardization of the Generalized Anxiety Disorder Screener (GAD-7) in the general population. Medical Care, 46(3), 266-274. https://doi.org/10.1097/MLR.0b013e318160d093

National Center for Health Statistics. (2021, 2021-04-05). 2019 National Health Interview Survey. National Center for Health Statistics. https://www.cdc.gov/nchs/nhis/2019nhis.htm

Yoshida, K. (2014). Models for excess zeros using pscl package (Hurdle and zero-inflated regression models) and their interpretations. Retrieved May 1 from https://rpubs.com/kaz_yos/pscl-2

Zelaya, C. E., Dahlhamer, J. M., Lucas, J. W., & Connor, E. M. (2020). Chronic Pain and High-impact Chronic Pain Among U.S. Adults, 2019.

7.13 Acknowledgments

Robert J. Trager, DC (methodology, interpretation, literature review, first draft of introduction)
Jeffery A. Dusek, PhD (methodology, interpretation)
The National Center for Health Statistics (data collection)
Thomas E. Love, PhD (methodology instruction)

8 Session Information

Code

xfun::session_info()

R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.6.3

Locale: en_US.UTF-8 / en_US.UTF-8 / en_US.UTF-8 / C / en_US.UTF-8 / en_US.UTF-8

Package version:
  abind_1.4-5              askpass_1.2.0            backports_1.4.1         
  base64enc_0.1-3          bigD_0.2.0               bit_4.0.5               
  bit64_4.0.5              bitops_1.0.7             blob_1.2.4              
  boot_1.3.28.1            brio_1.1.4               broom_1.0.5             
  broom.helpers_1.14.0     bslib_0.6.1              cachem_1.0.8            
  callr_3.7.3              car_3.1-2                carData_3.0-5           
  caret_6.0-94             caTools_1.18.2           cellranger_1.1.0        
  checkmate_2.3.1          class_7.3-22             cli_3.6.2               
  clipr_0.8.0              clock_0.7.0              cluster_2.1.6           
  cmprsk_2.2-11            codetools_0.2-19         colorspace_2.1-0        
  commonmark_1.9.0         compiler_4.3.2           conflicted_1.2.0        
  countreg_0.2-1           cpp11_0.4.7              crayon_1.5.2            
  crul_1.4.0               curl_5.2.0               data.table_1.14.10      
  datasets_4.3.2           DBI_1.2.0                dbplyr_2.4.0            
  DEoptimR_1.1.3           desc_1.4.3               diagram_1.6.5           
  dials_1.2.0              DiceDesign_1.10          diffobj_0.3.5           
  digest_0.6.34            doRNG_1.8.6              dplyr_1.1.4             
  dtplyr_1.3.1             e1071_1.7.14             ellipsis_0.3.2          
  Epi_2.47.1               etm_1.1.1                evaluate_0.23           
  fansi_1.0.6              farver_2.1.1             fastmap_1.1.1           
  flextable_0.9.4          fontawesome_0.5.2        fontBitstreamVera_0.1.1 
  fontLiberation_0.1.0     fontquiver_0.2.1         forcats_1.0.0           
  foreach_1.5.2            foreign_0.8-86           Formula_1.2-5           
  fs_1.6.3                 furrr_0.3.1              future_1.33.1           
  future.apply_1.11.1      gargle_1.5.2             gdtools_0.3.5           
  generics_0.1.3           gfonts_0.2.0             GGally_2.2.0            
  ggformula_0.12.0         ggplot2_3.4.4            ggrepel_0.9.5           
  ggridges_0.5.5           ggstats_0.5.1            glmnet_4.1.8            
  globals_0.16.2           glue_1.7.0               googledrive_2.1.1       
  googlesheets4_1.1.1      gower_1.0.1              GPfit_1.0-8             
  gplots_3.1.3             graphics_4.3.2           grDevices_4.3.2         
  grid_4.3.2               gridExtra_2.3            gt_0.10.0               
  gtable_0.3.4             gtools_3.9.5             gtsummary_1.7.2         
  hardhat_1.3.0            haven_2.5.4              highr_0.10              
  Hmisc_5.1-1              hms_1.1.3                htmlTable_2.4.2         
  htmltools_0.5.7          htmlwidgets_1.6.4        httpcode_0.3.0          
  httpuv_1.6.13            httr_1.4.7               ids_1.0.1               
  infer_1.0.5              ipred_0.9-14             isoband_0.2.7           
  iterators_1.0.14         itertools_0.1.3          janitor_2.2.0           
  jquerylib_0.1.4          jsonlite_1.8.8           juicyjuice_0.1.0        
  kableExtra_1.3.4         KernSmooth_2.23.22       knitr_1.45              
  labeling_0.4.3           labelled_2.12.0          laeken_0.5.2            
  later_1.3.2              lattice_0.22-5           lava_1.7.3              
  lhs_1.1.6                lifecycle_1.0.4          listenv_0.9.0           
  lme4_1.1.35.1            lmtest_0.9.40            lubridate_1.9.3         
  magrittr_2.0.3           markdown_1.12            MASS_7.3-60             
  Matrix_1.6-5             MatrixModels_0.5-3       memoise_2.0.1           
  methods_4.3.2            mgcv_1.9-1               mime_0.12               
  minqa_1.2.6              missForest_1.5           mitools_2.4             
  modeldata_1.2.0          modelenv_0.1.1           ModelMetrics_1.2.2.2    
  modelr_0.1.11            mosaic_1.9.0             mosaicCore_0.9.4.0      
  mosaicData_0.20.4        multcomp_1.4-25          munsell_0.5.0           
  mvtnorm_1.2-4            naniar_1.0.0             nlme_3.1-164            
  nloptr_2.0.3             nnet_7.3-19              norm_1.0.11.1           
  numDeriv_2016.8-1.1      officer_0.6.3            openssl_2.1.1           
  parallel_4.3.2           parallelly_1.36.0        parsnip_1.1.1           
  patchwork_1.2.0          pbkrtest_0.5.2           pillar_1.9.0            
  pkgbuild_1.4.3           pkgconfig_2.0.3          pkgload_1.3.3           
  plyr_1.8.9               polspline_1.1.24         praise_1.0.0            
  prettyunits_1.2.0        pROC_1.18.5              processx_3.8.3          
  prodlim_2023.08.28       progress_1.2.3           progressr_0.14.0        
  promises_1.2.1           proxy_0.4.27             ps_1.7.5                
  pscl_1.5.5.1             purrr_1.0.2              quantreg_5.97           
  R6_2.5.1                 ragg_1.2.7               randomForest_4.7.1.1    
  ranger_0.16.0            rappdirs_0.3.3           RColorBrewer_1.1-3      
  Rcpp_1.0.12              RcppArmadillo_0.12.6.6.1 RcppEigen_0.3.3.9.4     
  reactable_0.4.4          reactR_0.5.0             readr_2.1.5             
  readxl_1.4.3             recipes_1.0.9            rematch_2.0.0           
  rematch2_2.1.2           reprex_2.1.0             reshape2_1.4.4          
  rlang_1.1.3              rmarkdown_2.25           rms_6.7-1               
  rngtools_1.5.2           robustbase_0.99.1        ROCR_1.0-11             
  rpart_4.1.23             rprojroot_2.0.4          rsample_1.2.0           
  rstudioapi_0.15.0        rvest_1.0.3              sandwich_3.1-0          
  sass_0.4.8               scales_1.3.0             selectr_0.4.2           
  shape_1.4.6              shiny_1.8.0              simputation_0.2.8       
  slider_0.3.1             snakecase_0.11.1         sourcetools_0.1.7.1     
  sp_2.1.2                 SparseM_1.81             splines_4.3.2           
  SQUAREM_2021.1           stats_4.3.2              stats4_4.3.2            
  stringi_1.8.3            stringr_1.5.1            survey_4.2-1            
  survival_3.5-7           svglite_2.1.3            sys_3.4.2               
  systemfonts_1.0.5        testthat_3.2.1           textshaping_0.3.7       
  TH.data_1.1-2            tibble_3.2.1             tidymodels_1.1.1        
  tidyr_1.3.0              tidyselect_1.2.0         tidyverse_2.0.0         
  timechange_0.2.0         timeDate_4032.109        tinytex_0.49            
  tools_4.3.2              triebeard_0.4.1          tune_1.1.2              
  tzdb_0.4.0               UpSetR_1.4.0             urltools_1.7.3          
  utf8_1.2.4               utils_4.3.2              uuid_1.1-1              
  V8_4.4.1                 vcd_1.4.12               vctrs_0.6.5             
  VGAM_1.1-9               VIM_6.2.2                viridis_0.6.4           
  viridisLite_0.4.2        visdat_0.6.0             vroom_1.6.5             
  waldo_0.5.2              warp_0.2.1               webshot_0.5.5           
  withr_2.5.2              workflows_1.1.3          workflowsets_1.0.1      
  xfun_0.41                xml2_1.3.6               xtable_1.8-4            
  yaml_2.3.8               yardstick_1.2.0          zip_2.3.0               
  zoo_1.8-12

R Packages and Setup

1 Data Source

1.1 National Health Interview Survey

1.2 The Subjects

2 Research Questions

3 Data Ingest and Management

3.1 Loading the Raw Data

3.2 Cleaning the Data

3.2.1 Recode Refused, Not Ascertained, and Don’t Know as Missing

3.2.2 Recode Binary Variables From (1 = Yes, 2 = No) to (1 = Yes, 0 = No)

3.2.3 Recode Categorical Variables to Meaningful Factors

3.2.4 Calculate GAD-7 total

3.2.5 Calculate PHQ-8 Total

3.2.6 Integrative Health and Medicine (IHM) Variables

3.2.7 Opioid frequency

3.2.8 Number of Pain Locations

3.2.9 Chronic Pain Treatment Category - SUPPORT

3.2.10 Chronic Pain Treatment Category - Opioids

3.2.11 IHM Category

3.2.12 Selecting Variables I Used and Arranging the Tibble

3.2.13 Working with Categorical Predictors

3.3 Print and Save The Final Tibble

3.3.1 Save The Tibble

4 Code Book and Description

4.1 Defining the Variables

4.2 Numerical Description

5 Single Imputation

5.1 Missingness

5.1.1 Missing Values

5.1.2 Missing Values Within Variables

5.1.3 Missing Values within Cases

5.2 Generating the Single Imputation Dataset

6 Model 1: Engagement in IHM

6.1 Table 1: Bivariate Analysis

6.1.1 Create Survey Design

6.1.2 Total population represented

6.1.3 Crosstabulation

6.1.4 Assemble Table 1 Tibble

6.1.5 Print Table 1

6.2 Splitting into Training and Testing Samples

6.3 Scatterplot Matrix and Collinearity

6.3.1 Scatterplot Matrix of Quantitative Predictors with Outcome

6.3.2 Scatterplot Matrix of Categorical Variables with Outcome

6.3.3 Scatterplot Matrix of Remaining Categorical Variables with Outcome

6.4 Considering Non-Linear Terms

6.5 Fitting Models for Comparison

6.5.1 Poisson (m1_pois)

6.5.2 Negative Binomial Regression (m1_nb)

6.5.3 Zero-inflated Poisson (m1_zip)

6.5.4 Zero-inflated Negative Binomial (m1_zinb)

6.5.5 Hurdle with Poisson (m1_hurdle_pois)

6.5.6 Hurdle with Negative Binomial (m1_hurdle_nb)

6.6 Comparing m1 Versions

6.6.1 Comparing with Rootograms

6.6.2 Comparing with Fit Statistics

6.6.3 Comparing with Vuong tests

6.7 Final Model Choice

6.8 Fitting Weighted Hurdle Model

6.9 Model Coefficients

6.9.1 Numeric Summary of Odds of ihm4pain_num != 0

6.9.2 Numeric Summary of Relative Change in Count of ihm4pain_num

6.9.3 Visual Summary of Odds of ihm4pain_num != 0

6.9.4 Visual Summary of Estimated Relative Change in Count of ihm4pain_num

6.10 Check Model Assumptions

6.11 Validating m1_hurdle_pois

6.11.1 Fit Summary in Test Data

6.11.2 Comparing Fits in Datasets

6.11.3 Table Comparing Observed and Predicted Counts

6.11.4 Plot Comparing Observed and Predicted Counts

7 Model 2: Nonpharmacologic Modalities Only vs. Opioids Only

7.1 Create Dataset

7.2 Create Survey Design

7.3 Total population represented

7.4 Crosstabulation

7.5 Assemble Table 2 Tibble

7.6 Print Table 2

7.7 Fit Weighted Logistic Model with Survey Weights

7.8 Numeric Summary of Coefficients (m2_coef_wt)

7.9 Effects Plot (m2_coef_wt)

7.10 Summarizing Fit (m2_coef_wt)

6.5.1 Poisson (`m1_pois`)

6.5.2 Negative Binomial Regression (`m1_nb`)

6.5.3 Zero-inflated Poisson (`m1_zip`)

6.5.4 Zero-inflated Negative Binomial (`m1_zinb`)

6.5.5 Hurdle with Poisson (`m1_hurdle_pois`)

6.5.6 Hurdle with Negative Binomial (`m1_hurdle_nb`)

6.6 Comparing `m1` Versions

6.9.1 Numeric Summary of Odds of `ihm4pain_num` != 0

6.9.2 Numeric Summary of Relative Change in Count of `ihm4pain_num`

6.9.3 Visual Summary of Odds of `ihm4pain_num` != 0

6.9.4 Visual Summary of Estimated Relative Change in Count of `ihm4pain_num`

6.11 Validating `m1_hurdle_pois`

7.8 Numeric Summary of Coefficients (`m2_coef_wt`)

7.9 Effects Plot (`m2_coef_wt`)

7.10 Summarizing Fit (`m2_coef_wt`)

7.11 Validating `m2_coef_wt`