Analysis of the 2019 National Health Interview Survey
Author
Samuel N. Rodgers-Melnick, MPH, MT-BC
Published
2024-01-22
You can download the .qmd file, NHISstudy2024-01-07GitHub.qmd, from my GitHub page here.
You can read the full research article in Journal of Pain Researchhere
Rodgers-Melnick SN, Trager RJ, Love TE, Dusek JA. Engagement in Integrative and Nonpharmacologic Pain Management Modalities Among Adults with Chronic Pain: Analysis of the 2019 National Health Interview Survey. J Pain Res. Jan 16 2024;17:253-264. doi:10.2147/jpr.S439682
R Packages and Setup
I loaded the following packages to ensure I had the necessary data cleaning, analysis, and visualization tools needed for this project. Given the number of packages, I used conflict_prefer() to ensure functions would not interfere with each other. I also loaded uhred for use in plots.
Code
knitr::opts_chunk$set(comment =NA) # do not remove thislibrary(janitor)library(gt)library(naniar)library(rms)library(patchwork)library(Epi)library(car)library(caret)library(GGally)library(pscl)library(VGAM)library(MASS)library(flextable)library(mosaic)library(pROC)library(ROCR)library(gtsummary)library(simputation)library(ggrepel)library(nnet)library(kableExtra)library(conflicted)library(survey)library(tidymodels)library(tidyverse)conflict_prefer("select", "dplyr")conflict_prefer("summarize", "dplyr")conflict_prefer("filter", "dplyr")conflict_prefer("mean", "base")conflict_prefer("sum", "base")conflict_prefer("vif", "car")conflict_prefer("count", "dplyr")uhred <-"#B70022"theme_set(theme_bw())
Purpose: “The main objective of the NHIS is to monitor the health of the United States population through the collection and analysis of data on a broad range of health topics. NHIS data are used widely throughout the Department of Health and Human Services (HHS) to monitor trends in illness and disability and to track progress toward achieving national health objectives. The data are also used by the public health research community for epidemiological and policy analysis of such timely issues as characterizing those with various health problems, determining barriers to accessing and using appropriate health care, and evaluating Federal health programs.”
Design: The NHIS is a cross-sectional household interview survey.
Target population: “The target population for the NHIS is the civilian non-institutionalized population residing within the 50 states and the District of Columbia at the time of the interview. Persons excluded from the universe are those with no fixed household address (e.g., homeless and/or transient persons not residing in shelters), active duty military personnel, and civilians living on military bases, persons in long-term care institutions (e.g., nursing homes for the elderly, hospitals for the chronically ill or physically or intellectually disabled, and wards for abused or neglected children), persons in correctional facilities (e.g., prisons or jails, juvenile detention centers, and halfway houses), and U.S. nationals living in foreign countries. While active-duty Armed Forces personnel cannot be sampled for inclusion in the survey, any civilians residing with Armed Forces personnel in non-military housing are eligible to be sampled.”
Sampling strategy: “To keep survey operations manageable, cost-effective, and timely, the NHIS uses geographically clustered sampling techniques to select the sample of dwelling units for the NHIS. The sample is designed in such a way that each month’s sample is nationally representative. Data collection on the NHIS is continuous, i.e., from January to December each year.”
“One ‘sample adult’ aged 18 years or older and one ‘sample child’ aged 17 years or younger (if any children live in the household) are randomly selected from each household following a brief initial interview that identifies everyone who usually lives or stays in the household. Information about the sample adult is collected from the sample adult herself or himself unless she or he is physically or mentally unable to do so, in which case a knowledgeable proxy can answer for the sample adult. Information about the sample child is collected from a parent or adult who is knowledgeable about and responsible for the health care of the sample child. This respondent may or may not also be the sample adult.”
Data collection procedures: “The U.S. Census Bureau, under a contractual agreement, is the data collection agent for the NHIS. NHIS data are collected continuously throughout the year by Census interviewers. Nationally, about 750 interviewers (also called ‘Field Representatives’) are trained and directed by health survey supervisors in the U.S. Census Bureau Regional Offices to conduct interviews for NHIS.
The NHIS is conducted using computer-assisted personal interviewing. Face-to-face interviews are conducted in respondents’ homes, but follow-ups to complete interviews may be conducted over the telephone. A telephone interview may also be conducted when the respondent requests a telephone interview or when road conditions or travel distances would make it difficult to schedule a personal visit before the required completion date. In 2019, 34.3% of the Sample Adult interviews and 31.7% of the Sample Child interviews were conducted at least partially by telephone.”
1.2 The Subjects
Subjects included in this project are adults (18 years and older) with chronic pain as defined by responding “most days” or “every day” to the survey question, “In the past 3 months, how often did you have pain?” (Zelaya et al., 2020).
2 Research Questions
Which demographic, mental health, and pain-related variables are associated with engagement in: (1) the number of IHM modalities used for pain and (2) exclusively nonpharmacologic modalities rather than opioid utilization among U.S. adults with chronic pain.
In the following code, I created a new tibble, projectb_clean, limited to respondents who stated that they had pain on most days (PAIFRQ3M_A = 3) or everyday (PAIFRQ3M_A = 4) in response to the question PAIFRQ3M_A, “In the past three months, how often did you have pain?” as this matched the chronic pain definition reported by the National Center for Health Statistics (Zelaya et al., 2020). I also set the filter at PAIFRQ3M_A < 7 to exclude responses that were missing, refused, non ascertained, or “don’t know” in response to the chronic pain question.
After filtering to the chronic pain cohort, I selected the variables from the NHIS that I needed for this analysis.
3.2.2 Recode Binary Variables From (1 = Yes, 2 = No) to (1 = Yes, 0 = No)
In the following code, I recoded binary variables with initial responses in the format, 1 = Yes, 2 = No, to binary variables coded as 1 = Yes, 0 = No using the mutate() function, subtracting 2 from the variables chiro, yoga_taichi, massage, meditation, anymeds12mo, opioids12mo, opioids3mo, opioids_chonpain3mo, pt_rehab_ot, talk_therapy, chron_pain_selfmgmt, and peer_support_grp.
3.2.3 Recode Categorical Variables to Meaningful Factors
Using the 2019 NHIS Codebook, I recoded the following variables to meaningful, ascending, and ordered factors using the mutate() and factor() functions. To make race_ethnicity 6 categories, I combined “NH Amer Indian Alaska Natv only” and “NH Amer Indian Alaska Natv and any other group” into “NH Amer Indian Alaska Natv.”
Code
projectb_clean <- projectb_clean |>mutate(sex =factor(case_when( sex ==1~"Male", sex ==2~"Female"))) |>mutate(race_ethnicity =case_when( race_ethnicity ==1~"Hispanic", race_ethnicity ==2~"NH White only", race_ethnicity ==3~"NH Black/African American only", race_ethnicity ==4~"NH Asian only", race_ethnicity ==5~"NH Amer Indian Alaska Natv", race_ethnicity ==6~"NH Amer Indian Alaska Natv", race_ethnicity ==7~"Other single and multiple races")) |>mutate(race_ethnicity =factor(race_ethnicity, levels =c("NH White only", "Hispanic", "NH Black/African American only", "NH Asian only", "NH Amer Indian Alaska Natv","Other single and multiple races"))) |>mutate(painlimit_lifework =case_when( painlimit_lifework ==1~"Never", painlimit_lifework ==2~"Some days", painlimit_lifework ==3~"Most days", painlimit_lifework ==4~"Every day")) |>mutate(painlimit_lifework =factor(painlimit_lifework, levels =c("Never", "Some days", "Most days", "Every day"))) |>mutate(anx_severity =case_when( anx_severity ==1~"None/Minimal", anx_severity ==2~"Mild", anx_severity ==3~"Moderate", anx_severity ==4~"Severe")) |>mutate(anx_severity =factor(anx_severity, levels =c("None/Minimal", "Mild", "Moderate", "Severe"))) |>mutate(dep_severity =case_when( dep_severity ==1~"None/Minimal", dep_severity ==2~"Mild", dep_severity ==3~"Moderate", dep_severity ==4~"Severe")) |>mutate(dep_severity =factor(dep_severity, levels =c("None/Minimal", "Mild", "Moderate", "Severe"))) |>mutate(opioid_freq3mo =case_when( opioid_freq3mo ==1~"Some days", opioid_freq3mo ==2~"Most days", opioid_freq3mo ==3~"Every day")) |>mutate(opioid_freq3mo =factor(opioid_freq3mo, levels =c("Some days", "Most days", "Every day"))) |>mutate(education =case_when( education ==0~"Less than high school", education ==1~"Less than high school", education ==2~"Less than high school", education ==3~"High school graduate/GED", education ==4~"High school graduate/GED", education ==5~"Some college, no degree", education ==6~"Associate's or bachelor's degree", education ==7~"Associate's or bachelor's degree", education ==8~"Associate's or bachelor's degree", education ==9~"Master's, doctoral, or professional degree", education ==10~"Master's, doctoral, or professional degree", education ==11~"Master's, doctoral, or professional degree")) |>mutate(education =factor(education, levels =c("Less than high school", "High school graduate/GED","Some college, no degree","Associate's or bachelor's degree","Master's, doctoral, or professional degree"))) |>mutate(urban_rural =case_when( urban_rural ==1~"Large central metro", urban_rural ==2~"Large fringe metro", urban_rural ==3~"Medium and small metro", urban_rural ==4~"Nonmetropolitan")) |>mutate(urban_rural =factor(urban_rural, levels =c("Nonmetropolitan", "Medium and small metro", "Large fringe metro","Large central metro")))
3.2.4 Calculate GAD-7 total
In the following code, I recoded the original NHIS responses to conform to the actual scoring method for the GAD-7 (Löwe et al., 2008). I then calculated gad7total by adding each of the individual GAD-7 responses. Finally, to verify that gad7total was scored correctly, I summarized gad7total by anx_severity using favstats() to ensure that the values were in range. Interestingly, in the 2019 NHIS, it appears that they scored subjects even if they had missing individual items on the GAD-7.
In the following code, I recoded the original NHIS responses to conform to the actual scoring method for the PHQ-8 (Kroenke et al., 2001). I then calculated phq8total by adding each of the individual PHQ-8 responses.
Finally, to verify that phq8total was scored correctly, I summarized phq8total by dep_severity using favstats() to ensure that the values were in range. Interestingly, in the 2019 NHIS, it appears that they dropped “moderate-to-severe depression” as a value given that they did not use the full PHQ-9, and they scored subjects even if they had missing individual items on the PHQ-8.
3.2.6 Integrative Health and Medicine (IHM) Variables
In the following code, I performed the following steps to create the 5-level count outcome, ihm4pain_num, and the binary variable, ihm4pain_bin.
First, I used the mutate() function to create ihm4pain_num as the sum of engaging in chiro, yoga_taichi, massage, or meditation to manage pain in the past 3 months.
Then, I created the binary outcome, ihm4pain_bin, which was equal to 1 if ihm4pain_num >= 1 and equal to 0 if ihm4pain_num == 0.
To create a single variable describing frequency of opioid use in the past 3 months to manage chronic pain, I performed the following steps:
First, since 1) the opioid frequency question was only asked to subjects who were taking prescription medications and opioid medications and 2) subjects responding “No” to the prescription medication and opioid use questions would be assumed to have never taken opioids in the past 3 months, I coded opioid_freq3mo_final as equal to “Never” if the subject answered “No” to anymeds12mo, opioids12mo, opioids3mo, or opioids_chonpain3mo.
Next, if subjects responded “Some days”, “Most days”, “Every day” to the opioid_freq3mo question, I coded the opioid_freq3mo_final variable as equal to that response.
Finally, I coded opioid_freq3mo_final as a meaningfully ordered factor.
To create a single variable quantifying the total number of locations in which subjects reported being bothered at least “a little” by pain in the past three months, I performed the following steps:
I converted the 6 individual pain variables to binary variables such that 1 indicated being bothered at least “a little” (>= 2), and 0 meant “Not at all.”
I then calculated num_pain_locations as the sum of those binary variables, ignoring missingness using the rowSums() function.
Finally, I used mutate() and if_else() to mark num_pain_locations as missing if all pain locations were missing.
In the following code, I performed the following steps to create the 4-category outcome, pain_tx_type.
First, I used the mutate() function to create support4pain_num as the sum of engaging in pt_rehab_ot, talk_therapy, chron_pain_selfmgmt, or peer_support_grp to manage pain in the past 3 months. In this case, SUPPORT stands for SUpportive, Physical, Psychological, Occupational, or Rehabilitative Therapies.
Next, I created the binary outcome, support4pain_bin, which was equal to 1 if support4pain_num >= 1 and equal to 0 if support4pain_num == 0.
Then, I created the outcome variable pain_tx_type as equal to None, IHM only, SUPPORT only, or SUPPORT and IHM based on the the support4pain_num and ihm4pain_bin variables.
Finally, I ensured pain_tx_type was coded as a factor using the factor function, with “None” as the reference category.
3.2.12 Selecting Variables I Used and Arranging the Tibble
With all variables recoded, I created the projectb_data tibble as a subset of projectb_clean, limited to the variables needed for analysis and subjects with complete data on the outcomes, ihm4pain_num and nonpharm_opioid. I also renamed opioid_freq3mo_final as opioid_freq3mo.
In examining the table above, I decided that the variable race_ethnicity required some collapsing of categories to ensure all categories contained at least 25 subjects in the testing and training samples. Therefore, I combined the race_ethnicity categories “NH Asian only”, “NH Amer Indian Alaska Natv”, and “Other single and multiple races” into “Other single and multiple races.”
Code
projectb_data <- projectb_data |>mutate(race_ethnicity =case_when( race_ethnicity =="NH White only"~"NH White only", race_ethnicity =="Hispanic"~"Hispanic", race_ethnicity =="NH Black/African American only"~"NH Black/African American only", race_ethnicity =="NH Asian only"~"Other single and multiple races", race_ethnicity =="NH Amer Indian Alaska Natv"~"Other single and multiple races", race_ethnicity =="Other single and multiple races"~"Other single and multiple races")) |>mutate(race_ethnicity =factor(race_ethnicity, levels =c("NH White only", "Hispanic", "NH Black/African American only", "Other single and multiple races")))
3.3 Print and Save The Final Tibble
Next, I verified that projectb_data was a tibble by printing it. In this tibble, I can confirm that:
It is a tibble and thus calling it prints only the first 10 rows.
The initial row tells me that this is a tibble and specifies its dimensions.
I have a complete set of 7,114 NHIS subjects.
I’ve included only 17 variables
Code
projectb_data
# A tibble: 7,114 × 17
subject_id age sex education urban_rural painlimit_lifework
<chr> <dbl> <fct> <fct> <fct> <fct>
1 H007122 60 Male Some college, no degr… Large cent… Some days
2 H007736 78 Male Some college, no degr… Large cent… Most days
3 H022161 61 Male High school graduate/… Large cent… Some days
4 H045961 71 Female Associate's or bachel… Large cent… Never
5 H025759 80 Female Some college, no degr… Large cent… Some days
6 H003675 52 Female Master's, doctoral, o… Large cent… Most days
7 H044229 25 Female Some college, no degr… Nonmetropo… Never
8 H035660 37 Female Some college, no degr… Medium and… Some days
9 H062584 65 Female Master's, doctoral, o… Medium and… Some days
10 H008849 26 Male Less than high school Nonmetropo… Some days
# ℹ 7,104 more rows
# ℹ 11 more variables: opioid_freq3mo <fct>, gad7total <dbl>, phq8total <dbl>,
# num_pain_locations <dbl>, ihm4pain_num <dbl>, race_ethnicity <fct>,
# fam_income <dbl>, nonpharm_opioid <fct>, ihm4pain_bin <dbl>,
# ihm4pain_cat <fct>, wt_final <dbl>
3.3.1 Save The Tibble
Finally, I saved the final tibble, projectb_data, as an Rds file using the saveRDS() function.
7,114 adults ages 18-85 with pain on most days or every day in the past 3 months and participating in the 2019 NHIS with data on the variables listed in the table below.
For each of my 17 selected variables, I have included the original variable name from the raw NHIS file.
Variable
Role
Type
Description (units/levels)
Original NHIS Variable Name
subject_id
identifier
-
Randomly assigned household number unique to a household (H000074-H064180)
HHX
age
input
quant
Age of subject
AGEP_A
sex
input
2-cat
Sex (Male or Female)
SEX_A
education
input
5-cat
Education level (Less than high school; High school graduate/GED; Some college, no degree; Associate’s or bachelor’s degree; Master’s, doctoral, or professional degree)
EDUC_A
urban_rural
input
4-cat
Urban-rural classification scheme for counties (Large central metro, Large fringe metro, Medium and small metro, Non-metropolitan)
URBRRL
painlimit_lifework
input
4-cat
How often pain limited the subject’s life or work activities in the past 3 months (Never, Some days, Most days, or Every day)
PAIWKLM3M_A
opioid_freq3mo
input
4-cat
How often the subject took a prescription opioid for chronic pain in the past three months (Never, Some days, Most days, or Every day)
How many integrative health and medicine (IHM) modalities (i.e., chiropractic care, yoga/Tai Chi, meditation/guided imagery, or massage) subject used to manage pain in the past three months (0, 1, 2, 3, or 4)
PAICHIRO_A, PAIYOGA_A, PAIMASSAGE_A, PAIMEDITAT_A
race_ethnicity
input
4-cat
Race and Hispanic origin (NH White only, Hispanic, NH Black/African American only, or Other single and multiple races)
HISPALLP_A
fam_income
input
quant
Adult annual family income
FAMINCTC_A
nonpharm_opioid
outcome
4-cat
Engagement in one of the following categories of care for pain in the past three months (None, Opioids only, Nonpharm only, Opioids + Nonpharm).
Whether or not subject engaged in integrative health and medicine (IHM) modalities (i.e., chiropractic care, yoga/Tai Chi, meditation/guided imagery, or massage) to manage pain in the past three months (0, 1, 2, 3, or 4)
PAICHIRO_A, PAIYOGA_A, PAIMASSAGE_A, PAIMEDITAT_A
ihm4pain_cat
outcome
quant
Which type of (IHM) modalities (i.e., chiropractic care, yoga/Tai Chi, meditation/guided imagery, or massage) subject used to manage pain in the past three months (None, Manual only, Mind-body only, Manual and mind-body)
PAICHIRO_A, PAIYOGA_A, PAIMASSAGE_A, PAIMEDITAT_A
wt_final
weight
quant
Weight - Final Annual
WTFA_A
4.2 Numerical Description
The following code provides a numerical summary of all variables included in the projectb_data dataset using the describe() function from the Hmisc() library.
In examining the data, I confirm that:
There are no impossible values for age (i.e., > 85 or < 18), fam_income, or phq8total (i.e., > 24 or < 0).
Each factor variable is in an appropriate ascending order.
All categorical and binary variables have at least 30 observations in each category.
Code
describe(projectb_data)
projectb_data
17 Variables 7114 Observations
--------------------------------------------------------------------------------
subject_id
n missing distinct
7114 0 7114
lowest : H000003 H000005 H000010 H000023 H000040
highest: H064153 H064154 H064160 H064165 H064180
--------------------------------------------------------------------------------
age
n missing distinct Info Mean Gmd .05 .10
7107 7 68 1 58.68 18.56 29 35
.25 .50 .75 .90 .95
48 60 71 80 85
lowest : 18 19 20 21 22, highest: 81 82 83 84 85
--------------------------------------------------------------------------------
sex
n missing distinct
7113 1 2
Value Female Male
Frequency 4050 3063
Proportion 0.569 0.431
--------------------------------------------------------------------------------
education
n missing distinct
7073 41 5
lowest : Less than high school High school graduate/GED Some college, no degree Associate's or bachelor's degree Master's, doctoral, or professional degree
highest: Less than high school High school graduate/GED Some college, no degree Associate's or bachelor's degree Master's, doctoral, or professional degree
--------------------------------------------------------------------------------
urban_rural
n missing distinct
7114 0 4
Value Nonmetropolitan Medium and small metro Large fringe metro
Frequency 1491 2500 1456
Proportion 0.210 0.351 0.205
Value Large central metro
Frequency 1667
Proportion 0.234
--------------------------------------------------------------------------------
painlimit_lifework
n missing distinct
7110 4 4
Value Never Some days Most days Every day
Frequency 1999 2480 1088 1543
Proportion 0.281 0.349 0.153 0.217
--------------------------------------------------------------------------------
opioid_freq3mo
n missing distinct
7114 0 4
Value Never Some days Most days Every day
Frequency 5991 342 133 648
Proportion 0.842 0.048 0.019 0.091
--------------------------------------------------------------------------------
gad7total
n missing distinct Info Mean Gmd .05 .10
7033 81 22 0.946 3.977 5.245 0 0
.25 .50 .75 .90 .95
0 2 6 12 16
lowest : 0 1 2 3 4, highest: 17 18 19 20 21
--------------------------------------------------------------------------------
phq8total
n missing distinct Info Mean Gmd .05 .10
7007 107 25 0.985 5.058 5.674 0 0
.25 .50 .75 .90 .95
1 3 8 13 17
lowest : 0 1 2 3 4, highest: 20 21 22 23 24
--------------------------------------------------------------------------------
num_pain_locations
n missing distinct Info Mean Gmd
7106 8 7 0.954 3.071 1.514
Value 0 1 2 3 4 5 6
Frequency 82 850 1554 2034 1469 819 298
Proportion 0.012 0.120 0.219 0.286 0.207 0.115 0.042
For the frequency table, variable is rounded to the nearest 0
--------------------------------------------------------------------------------
ihm4pain_num
n missing distinct Info Mean Gmd
7114 0 5 0.725 0.545 0.7913
Value 0 1 2 3 4
Frequency 4554 1602 667 223 68
Proportion 0.640 0.225 0.094 0.031 0.010
For the frequency table, variable is rounded to the nearest 0
--------------------------------------------------------------------------------
race_ethnicity
n missing distinct
7114 0 4
Value NH White only Hispanic
Frequency 5421 605
Proportion 0.762 0.085
Value NH Black/African American only Other single and multiple races
Frequency 743 345
Proportion 0.104 0.048
--------------------------------------------------------------------------------
fam_income
n missing distinct Info Mean Gmd .05 .10
7114 0 1225 1 57217 51257 8741 11000
.25 .50 .75 .90 .95
20110 40434 79000 125000 160000
lowest : 0 1 4 5 10, highest: 209500 210000 215341 218000 220000
--------------------------------------------------------------------------------
nonpharm_opioid
n missing distinct
7114 0 4
Value None Opioids only Nonpharm only
Frequency 3201 510 2790
Proportion 0.450 0.072 0.392
Value Opioids + Nonpharm
Frequency 613
Proportion 0.086
--------------------------------------------------------------------------------
ihm4pain_bin
n missing distinct Info Sum Mean Gmd
7114 0 2 0.691 2560 0.3599 0.4608
--------------------------------------------------------------------------------
ihm4pain_cat
n missing distinct
7114 0 4
Value None Manual only Mind-body only
Frequency 4554 1114 815
Proportion 0.640 0.157 0.115
Value Manual and mind-body
Frequency 631
Proportion 0.089
--------------------------------------------------------------------------------
wt_final
n missing distinct Info Mean Gmd .05 .10
7114 0 7072 1 6986 4729 2341 2761
.25 .50 .75 .90 .95
3692 5837 8509 12623 16424
lowest : 733.485 738.112 762.185 763.376 822.349
highest: 45380.6 48013.1 49073.1 51662.9 78079.9
--------------------------------------------------------------------------------
5 Single Imputation
5.1 Missingness
For this analysis, I assumed that the data were Missing at Random (MAR). Specifically, I assumed that the probability of a predictor missing data depended upon available information from the other predictors. For example, missing phq8total data may be related to other variables such as how much pain limited the subject’s life or work activities (painlimit_lifework) or race_ethnicity rather than their level of depression. In this situation, certain subjects may have been less likely to rate their anxiety based on their other characteristics. Thus, single imputation was warranted and required as missingness could be accounted for by studying variables with complete information. Missing NOT at Random (MNAR) was not an appropriate assumption here as there was no indication in the data or the 2019 NHIS Codebook that data were missing due to the variables themselves.
5.1.1 Missing Values
To explore missingness within the projectb_data, I first used the n_miss() to quantify total missing values.
Code
projectb_data |>n_miss()
[1] 249
5.1.2 Missing Values Within Variables
I then used the miss_var_summary() to determine which variables contained those 249 missing values. These variables included phq8total, gad7total, education, num_pain_locations, age, painlimit_lifework, and sex. Given the relative infrequency of missing data (highest = 1.50%), I expected that performing imputation would not alter subsequent models in any meaningful way.
Finally, to understand missing values within cases, I used the miss_case_table() function, which revealed that 159 cases were missing 1 value, 42 cases were missing 2 values, and 2 cases were missing 3 values
To perform single imputation and generate the projectb_si dataset, I performed the following steps:
Set a seed of 19900317 using the set.seed() function
Created projectb_si as a data.frame() derived from projectb_data
Used robust linear models to impute the quantitative variable gad7total based on the predictors with no missing values: urban_rural, race_ethnicity, fam_income and opioid_freq3mo,
Used robust linear models to impute the quantitative variable phq8total based on gad7total and the predictors with no missing values: urban_rural, race_ethnicity, fam_income, and opioid_freq3mo.
Used classification and regression trees (impute_cart()) to impute the categorical variable education based on the now complete variable, phq8total, and the other complete predictors.
Used predictive mean matching (impute_pmm()) to impute the quantitative integer variable num_pain_locations based on the now complete variable, education, and the other complete predictors as I wanted the imputation to be an integer.
Used robust linear models to impute the quantitative variable age based on the now complete variable, num_pain_locations, and the other complete predictors.
Used classification and regression trees to impute the categorical variable painlimit_lifework based on the now complete variable, age, and the other complete predictors.
Used classification and regression trees to impute the categorical variable sex based on the now complete variable, painlimit_lifework, and the other complete predictors.
Used the as_tibble() function to ensure projectb_si was stored as a tibble.
I performed mutations to use the variable one in the survey design and transform covariates so that coefficients would represent meaningful increases in (1) age10 [10 unit increment increase in age]; (2) gad7total5unit [5-unit increases in GAD-7]; (3) phq8total5unit [5-unit increases in PHQ-8]; (4) fam_income10k [10,000 dollar increase in family income]; (5) fam_income10k [1,000 dollar increase in family income].
I created a meaningful factor for the ihm4pain_bin_fact variable (i.e., IHM vs. No IHM)
Used the n_miss() function to verify that projectb_si contained no missing values.
Which demographic, mental health, and pain-related variables are associated with engagement in the number of IHM modalities used for pain in the past 3 months among adults with chronic pain?
In this analysis, I chose to predict ihm4pain_num based on 10 variables: age10, sex, education, urban_rural, painlimit_lifework, opioid_freq3mo, phq8total5unit, num_pain_locations, race_ethnicity, and fam_income10k.
6.1 Table 1: Bivariate Analysis
The following code details the steps taken to generate Table 1.
table1 |>flextable() |>add_header_lines(values ="Table 1: Socio-Demographic, Pain, and Mental Health Characteristics by IHM Engagement") |>bold(bold =TRUE, part ="header") |>bold(i =c(1, 2, 3, 6, 11, 17, 18, 23:24, 29, 34), j =1, bold =TRUE) |>padding(i =c(4:5, 7:10, 12:16, 19:22, 25:28, 30:33), j =1, padding.left =20) |>add_footer_lines(values ="Abbreviations: CI, confidence interval; PHQ-8, Patient Health Questionnaire-8; GED, general equivalency diploma; IHM, integrative health and medicine; metro, metropolitan; NH, Non-Hispanic",top =FALSE) |> flextable::footnote(i =c(2, 17, 23, 34), j =5, ref_symbols ="a",value =as_paragraph("Wilcoxon rank-sum test for complex survey samples")) |> flextable::footnote(i =c(3, 6, 11, 18, 24, 29), j =5, ref_symbols ="b",value =as_paragraph("Chi-squared test with Rao & Scott’s second-order correction")) |>autofit(part =c("body", "header"))
Table 1: Socio-Demographic, Pain, and Mental Health Characteristics by IHM Engagement
Weighted variable
All with chronic pain
IHM
No IHM
P value
Total, n
49,695,212
17,610,400
32,084,812
Age in years, mean (95% CI)
55.4 (54.9-55.9)
51.8 (51.0-52.6)
57.3 (56.7-57.9)
<.001a
Sex, % (95% CI)
<.001b
Female
54.9 (53.5-56.3)
60.3 (57.9-62.6)
52.0 (50.2-53.7)
Male
45.1 (43.7-46.5)
39.7 (37.4-42.1)
48.0 (46.3-49.8)
Race/Ethnicity, % (95% CI)
.005b
NH White only
73.2 (71.9-74.5)
74.8 (72.7-76.8)
72.3 (70.6-73.9)
Hispanic
10.5 (9.59-11.5)
10.1 (8.79-11.6)
10.7 (9.52-12.0)
NH Black/African American only
11.0 (10.1-11.9)
9.0 (7.73-10.5)
12.1 (11.0-13.3)
Other single and multiple races
5.3 (4.66-6.03)
6.1 (5.01-7.33)
4.9 (4.11-5.81)
Education, % (95% CI)
<.001b
Less than high school
15.5 (14.4-16.7)
8.8 (7.45-10.4)
19.2 (17.7-20.7)
High school graduate/GED
30.4 (29.1-31.7)
25.2 (23.1-27.5)
33.2 (31.6-34.9)
Some college, no degree
19.1 (18.0-20.2)
21.3 (19.4-23.4)
17.8 (16.5-19.2)
Associate's or bachelor's degree
28.2 (27.0-29.4)
34.3 (32.2-36.6)
24.8 (23.4-26.3)
Master's, doctoral, or professional degree
6.9 (6.31-7.53)
10.3 (9.11-11.6)
5.0 (4.42-5.71)
Family income in $1000s, mean (95% CI)
63.1 (61.6-64.6)
71.7 (69.0-74.3)
58.4 (56.7-60.2)
<.001a
Proximity to metro area, % (95% CI)
<.001b
Nonmetropolitan
19.9 (18.8-21.0)
16.1 (14.5-17.9)
22.0 (20.6-23.5)
Medium and small metro
33.9 (32.6-35.3)
33.1 (30.9-35.3)
34.4 (32.8-36.1)
Large fringe metro
21.7 (20.5-22.9)
23.6 (21.6-25.8)
20.6 (19.2-22.1)
Large central metro
24.4 (23.2-25.7)
27.2 (25.1-29.4)
22.9 (21.4-24.5)
Number of pain locations, mean (95% CI)
3.1 (3.1-3.1)
3.4 (3.3-3.4)
3.0 (2.9-3.0)
<.001a
Pain limiting life/work, % (95% CI)
<.001b
Never
28.2 (27.0-29.5)
22.9 (21.1-24.9)
31.1 (29.5-32.8)
Some days
35.6 (34.2-36.9)
38.6 (36.2-40.9)
33.9 (32.2-35.6)
Most days
15.4 (14.4-16.5)
17.1 (15.2-19.0)
14.5 (13.3-15.8)
Every day
20.8 (19.7-22.0)
21.5 (19.6-23.5)
20.5 (19.1-21.9)
Opioid use in past 3 months, % (95% CI)
.026b
Never
84.8 (83.7-85.8)
85.5 (83.7-87.2)
84.4 (83.1-85.7)
Some days
4.7 (4.12-5.38)
5.1 (3.97-6.45)
4.5 (3.86-5.27)
Most days
1.8 (1.47-2.19)
2.2 (1.63-2.91)
1.6 (1.20-2.09)
Every day
8.7 (7.91-9.53)
7.2 (6.12-8.49)
9.5 (8.47-10.6)
PHQ-8 total, mean (95% CI)
5.3 (5.1-5.4)
5.7 (5.4-6.0)
5.0 (4.8-5.2)
<.001a
Abbreviations: CI, confidence interval; PHQ-8, Patient Health Questionnaire-8; GED, general equivalency diploma; IHM, integrative health and medicine; metro, metropolitan; NH, Non-Hispanic
aWilcoxon rank-sum test for complex survey samples
bChi-squared test with Rao & Scott’s second-order correction
6.2 Splitting into Training and Testing Samples
I then used the initial_split(), training(), and testing() functions to generate projectb_si_train (training data) and projectb_si_test (testing data), each with n = 3557 subjects.
6.3.1 Scatterplot Matrix of Quantitative Predictors with Outcome
To investigate issues of collinearity prior to fitting models, I generated a series of scatterplot matrices using ggpairs(). First I assessed the relationship between the quantitative outcomes and ihm4pain_num.
Code
temp1quant <- projectb_si_train |>select(age10, phq8total5unit, num_pain_locations, fam_income10k, ihm4pain_num)ggpairs(temp1quant, title ="Scatterplot Matrix of Quantitative Variables with Outcome",lower =list(combo =wrap("facethist", bins =20)))
Code
rm(temp1quant)
In examining the plot above, there do not appear to be any strong (i.e., > 0.6) correlations between quantitative predictors and the outcome.
6.3.2 Scatterplot Matrix of Categorical Variables with Outcome
Then, I looked at the relationships between three of the categorical predictors and ihm4pain_num using ggpairs().
Code
temp2cat1 <- projectb_si_train |>select(sex, education, urban_rural, ihm4pain_num)ggpairs(temp2cat1, title ="Scatterplot Matrix of Categorical Variables with Outcome",lower =list(combo =wrap("facethist", bins =20)))
Code
rm(temp2cat1)
6.3.3 Scatterplot Matrix of Remaining Categorical Variables with Outcome
Finally, I looked at the relationships between the two remaining categorical predictors and ihm4pain_num using ggpairs().
Code
temp2cat2 <- projectb_si_train |>select(painlimit_lifework, opioid_freq3mo, race_ethnicity, ihm4pain_num)ggpairs(temp2cat2, title ="Scatterplot Matrix of Categorical Variables with Outcome",lower =list(combo =wrap("facethist", bins =20)))
Code
rm(temp2cat2)
To investigate this further, I calculated variance inflation factors (VIF) for each variable using the vif() function. Given the recommendation to look closely at any VIF value that is greater than 5, there does not seem to be an issue with collinearity in this full model as all VIF values are less than 1.4.
To determine how best to potentially spend degrees of freedom on non-linear terms, I generated a Spearman \(\rho^2\) plot using the plot() and spearman2()functions.
The plot suggests that I consider, for example, an interaction between education and age10 as a non-linear term. However, given that 1) this model already contains 21 degrees of freedom, I opted not to include any non-linear terms.
6.5 Fitting Models for Comparison
I chose to compare the following 6 modeling approaches for m1 before choosing a final version.
Poisson
Negative binomial regression
Zero-inflated Poisson
Zero-inflated negative binomial
Hurdle with Poisson
Hurdle with negative binomial
In the following series of code chunks, I performed the following
Fit each model using the appropriate function
Generated a summary of fit statistics including \(R^2\), RMSE, MAE, AIC, and BIC using various functions including augment(), glance(), metric_set(), and predict().
Used functions including mutate(), group_by(), and summarize() to pivot the fit statistics from each model into a tidy tibble for combination later.
Call:
zeroinfl(formula = ihm4pain_num ~ age10 + sex + education + urban_rural +
painlimit_lifework + opioid_freq3mo + phq8total5unit + num_pain_locations +
fam_income10k + race_ethnicity, data = projectb_si_train)
Pearson residuals:
Min 1Q Median 3Q Max
-1.3529 -0.6426 -0.4657 0.5057 7.1811
Count model coefficients (poisson with log link):
Estimate Std. Error
(Intercept) -0.415196 0.261037
age10 -0.135764 0.024135
sexMale -0.210056 0.077075
educationHigh school graduate/GED 0.282405 0.167720
educationSome college, no degree 0.635911 0.165162
educationAssociate's or bachelor's degree 0.619413 0.160655
educationMaster's, doctoral, or professional degree 0.761770 0.176987
urban_ruralMedium and small metro 0.012747 0.111366
urban_ruralLarge fringe metro 0.063042 0.119672
urban_ruralLarge central metro 0.186333 0.115089
painlimit_lifeworkSome days 0.204220 0.098190
painlimit_lifeworkMost days 0.319729 0.112174
painlimit_lifeworkEvery day 0.201525 0.119923
opioid_freq3moSome days 0.040076 0.141439
opioid_freq3moMost days 0.221426 0.248309
opioid_freq3moEvery day -0.075506 0.156667
phq8total5unit -0.056037 0.038500
num_pain_locations 0.082879 0.026625
fam_income10k 0.017314 0.006237
race_ethnicityHispanic -0.359617 0.095201
race_ethnicityNH Black/African American only -0.110367 0.145961
race_ethnicityOther single and multiple races -0.217057 0.206899
z value Pr(>|z|)
(Intercept) -1.591 0.111707
age10 -5.625 1.85e-08 ***
sexMale -2.725 0.006423 **
educationHigh school graduate/GED 1.684 0.092223 .
educationSome college, no degree 3.850 0.000118 ***
educationAssociate's or bachelor's degree 3.856 0.000115 ***
educationMaster's, doctoral, or professional degree 4.304 1.68e-05 ***
urban_ruralMedium and small metro 0.114 0.908874
urban_ruralLarge fringe metro 0.527 0.598340
urban_ruralLarge central metro 1.619 0.105441
painlimit_lifeworkSome days 2.080 0.037541 *
painlimit_lifeworkMost days 2.850 0.004368 **
painlimit_lifeworkEvery day 1.680 0.092869 .
opioid_freq3moSome days 0.283 0.776915
opioid_freq3moMost days 0.892 0.372536
opioid_freq3moEvery day -0.482 0.629839
phq8total5unit -1.455 0.145531
num_pain_locations 3.113 0.001853 **
fam_income10k 2.776 0.005501 **
race_ethnicityHispanic -3.777 0.000158 ***
race_ethnicityNH Black/African American only -0.756 0.449562
race_ethnicityOther single and multiple races -1.049 0.294133
Zero-inflation model coefficients (binomial with logit link):
Estimate Std. Error
(Intercept) 6.712e-01 7.979e-01
age10 -4.693e-03 7.769e-02
sexMale 2.826e-01 2.410e-01
educationHigh school graduate/GED -2.112e-02 4.783e-01
educationSome college, no degree -9.915e-02 4.802e-01
educationAssociate's or bachelor's degree -7.535e-01 5.073e-01
educationMaster's, doctoral, or professional degree -8.290e-01 5.714e-01
urban_ruralMedium and small metro -1.014e-02 3.239e-01
urban_ruralLarge fringe metro -7.261e-04 3.504e-01
urban_ruralLarge central metro -2.918e-01 3.640e-01
painlimit_lifeworkSome days 8.064e-03 3.156e-01
painlimit_lifeworkMost days 2.778e-01 3.413e-01
painlimit_lifeworkEvery day -1.917e-01 3.966e-01
opioid_freq3moSome days 1.341e-02 4.731e-01
opioid_freq3moMost days 3.755e-01 6.382e-01
opioid_freq3moEvery day 3.516e-01 4.513e-01
phq8total5unit -1.027e-01 1.431e-01
num_pain_locations -2.714e-01 8.580e-02
fam_income10k -4.744e-02 2.318e-02
race_ethnicityHispanic -1.585e+01 1.817e+03
race_ethnicityNH Black/African American only 2.591e-01 3.857e-01
race_ethnicityOther single and multiple races -1.622e-01 8.016e-01
z value Pr(>|z|)
(Intercept) 0.841 0.40024
age10 -0.060 0.95184
sexMale 1.173 0.24089
educationHigh school graduate/GED -0.044 0.96478
educationSome college, no degree -0.206 0.83644
educationAssociate's or bachelor's degree -1.485 0.13749
educationMaster's, doctoral, or professional degree -1.451 0.14687
urban_ruralMedium and small metro -0.031 0.97502
urban_ruralLarge fringe metro -0.002 0.99835
urban_ruralLarge central metro -0.802 0.42270
painlimit_lifeworkSome days 0.026 0.97961
painlimit_lifeworkMost days 0.814 0.41575
painlimit_lifeworkEvery day -0.483 0.62889
opioid_freq3moSome days 0.028 0.97738
opioid_freq3moMost days 0.588 0.55624
opioid_freq3moEvery day 0.779 0.43597
phq8total5unit -0.718 0.47280
num_pain_locations -3.163 0.00156 **
fam_income10k -2.046 0.04073 *
race_ethnicityHispanic -0.009 0.99304
race_ethnicityNH Black/African American only 0.672 0.50171
race_ethnicityOther single and multiple races -0.202 0.83960
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Number of iterations in BFGS optimization: 86
Log-likelihood: -3347 on 44 Df
Call:
hurdle(formula = ihm4pain_num ~ age10 + sex + education + urban_rural +
painlimit_lifework + opioid_freq3mo + phq8total5unit + num_pain_locations +
fam_income10k + race_ethnicity, data = projectb_si_train, dist = "negbin",
zero.dist = "binomial")
Pearson residuals:
Min 1Q Median 3Q Max
-1.3352 -0.6384 -0.4592 0.5140 7.1628
Count model coefficients (truncated negbin with log link):
Estimate Std. Error
(Intercept) -0.045178 0.281617
age10 -0.143937 0.024059
sexMale -0.178955 0.078531
educationHigh school graduate/GED 0.054212 0.206800
educationSome college, no degree 0.414166 0.200326
educationAssociate's or bachelor's degree 0.402225 0.194864
educationMaster's, doctoral, or professional degree 0.462490 0.213385
urban_ruralMedium and small metro 0.005048 0.114663
urban_ruralLarge fringe metro -0.009442 0.125995
urban_ruralLarge central metro 0.148444 0.117912
painlimit_lifeworkSome days 0.179297 0.099253
painlimit_lifeworkMost days 0.248299 0.121008
painlimit_lifeworkEvery day 0.148878 0.127017
opioid_freq3moSome days 0.059280 0.158583
opioid_freq3moMost days 0.113059 0.279709
opioid_freq3moEvery day -0.160405 0.164991
phq8total5unit -0.085068 0.040396
num_pain_locations 0.079248 0.029779
fam_income10k 0.014675 0.006929
race_ethnicityHispanic -0.220641 0.140982
race_ethnicityNH Black/African American only -0.101337 0.141084
race_ethnicityOther single and multiple races -0.098315 0.170499
Log(theta) 19.343411 NaN
z value Pr(>|z|)
(Intercept) -0.160 0.87255
age10 -5.983 2.2e-09 ***
sexMale -2.279 0.02268 *
educationHigh school graduate/GED 0.262 0.79321
educationSome college, no degree 2.067 0.03869 *
educationAssociate's or bachelor's degree 2.064 0.03900 *
educationMaster's, doctoral, or professional degree 2.167 0.03020 *
urban_ruralMedium and small metro 0.044 0.96488
urban_ruralLarge fringe metro -0.075 0.94026
urban_ruralLarge central metro 1.259 0.20805
painlimit_lifeworkSome days 1.806 0.07085 .
painlimit_lifeworkMost days 2.052 0.04018 *
painlimit_lifeworkEvery day 1.172 0.24115
opioid_freq3moSome days 0.374 0.70854
opioid_freq3moMost days 0.404 0.68606
opioid_freq3moEvery day -0.972 0.33095
phq8total5unit -2.106 0.03522 *
num_pain_locations 2.661 0.00779 **
fam_income10k 2.118 0.03419 *
race_ethnicityHispanic -1.565 0.11758
race_ethnicityNH Black/African American only -0.718 0.47259
race_ethnicityOther single and multiple races -0.577 0.56419
Log(theta) NaN NaN
Zero hurdle model coefficients (binomial with logit link):
Estimate Std. Error
(Intercept) -1.479732 0.249322
age10 -0.142153 0.023765
sexMale -0.375766 0.076124
educationHigh school graduate/GED 0.355758 0.142993
educationSome college, no degree 0.759842 0.149059
educationAssociate's or bachelor's degree 1.039827 0.141786
educationMaster's, doctoral, or professional degree 1.328910 0.180035
urban_ruralMedium and small metro 0.027743 0.105044
urban_ruralLarge fringe metro 0.114360 0.118742
urban_ruralLarge central metro 0.346088 0.114916
painlimit_lifeworkSome days 0.239884 0.096082
painlimit_lifeworkMost days 0.275086 0.122223
painlimit_lifeworkEvery day 0.325687 0.118433
opioid_freq3moSome days 0.043253 0.167722
opioid_freq3moMost days 0.107665 0.265744
opioid_freq3moEvery day -0.194347 0.137609
phq8total5unit -0.001161 0.039121
num_pain_locations 0.212404 0.030285
fam_income10k 0.042015 0.008126
race_ethnicityHispanic 0.139327 0.135109
race_ethnicityNH Black/African American only -0.272954 0.128870
race_ethnicityOther single and multiple races -0.244884 0.174931
z value Pr(>|z|)
(Intercept) -5.935 2.94e-09 ***
age10 -5.982 2.21e-09 ***
sexMale -4.936 7.97e-07 ***
educationHigh school graduate/GED 2.488 0.01285 *
educationSome college, no degree 5.098 3.44e-07 ***
educationAssociate's or bachelor's degree 7.334 2.24e-13 ***
educationMaster's, doctoral, or professional degree 7.381 1.57e-13 ***
urban_ruralMedium and small metro 0.264 0.79170
urban_ruralLarge fringe metro 0.963 0.33550
urban_ruralLarge central metro 3.012 0.00260 **
painlimit_lifeworkSome days 2.497 0.01254 *
painlimit_lifeworkMost days 2.251 0.02440 *
painlimit_lifeworkEvery day 2.750 0.00596 **
opioid_freq3moSome days 0.258 0.79650
opioid_freq3moMost days 0.405 0.68537
opioid_freq3moEvery day -1.412 0.15786
phq8total5unit -0.030 0.97632
num_pain_locations 7.013 2.33e-12 ***
fam_income10k 5.170 2.34e-07 ***
race_ethnicityHispanic 1.031 0.30244
race_ethnicityNH Black/African American only -2.118 0.03417 *
race_ethnicityOther single and multiple races -1.400 0.16155
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Theta: count = 251615122.8885
Number of iterations in BFGS optimization: 49
Log-likelihood: -3346 on 45 Df
6.6 Comparing m1 Versions
6.6.1 Comparing with Rootograms
To begin a comparison of the 4 variations of Model 1, I generated four rootogram plots using the countreg::rootogram() function.
Code
par(mfrow =c(3,2))countreg::rootogram(m1_pois, main ="Rootogram for m1_pois")countreg::rootogram(m1_nb, main ="Rootogram for m1_nb")countreg::rootogram(m1_zip, main ="Rootogram for m1_zip")countreg::rootogram(m1_zinb, main ="Rootogram for m1_zinb")countreg::rootogram(m1_hurdle_pois, main ="Rootogram for m1_hurdle_pois")countreg::rootogram(m1_hurdle_nb, main ="Rootogram for m1_hurdle_nb")
Code
par(mfrow =c(1,1))
Looking at the plots above, there is a clear improvement observed between 1) the Poisson (m1_pois) and negative binomial (m1_nb) models; and 2) the zero-inflated (m1_zip and m1_zinb) and hurdle (m1_hurdle_pois and mt_hurdle_nb) models as the latter appear to fit 0 counts perfectly and lack the over/under-fitting of counts 1-3. However, there do not appear to be any noticeable differences in predicting counts between the bottom four models. Furthermore, it appears each model predicts counts of ihm4pain_num > 4, but not to a substantial degree. Certainly, some subjects may have engaged in more than four IHM modalities for pain, but these were not measured in the 2019 NHIS.
6.6.2 Comparing with Fit Statistics
To further examine differences, I generated m1_summary as a combination of the 6 previous model summaries using the bind_rows() function. I then selected the Model, \(R^2\), RMSE, MAE, AIC, and BIC values for output.
Similar to the rootograms, there is a clear improvement in \(R^2\) between 1) the Poisson (m1_pois) and negative binomial (m1_nb) models; and 2) the zero-inflated (m1_zip and m1_zinb) and hurdle (m1_hurdle_pois and mt_hurdle_nb) models. m1_hurdle_pois appears to be the best fit based on it’s \(R^2\) (0.121), RMSE (0.805), and AIC (6780 [lowest overall]). Importantly, m1_hurdle_pois has the lowest BIC among the four models that had better rootograms in the plot above.
6.6.3 Comparing with Vuong tests
Finally, given the inclination to use m1_hurdle_pois, I used the vuong() function to compare it to the other models.
m1_hurdle_pois v. m1_pois
Code
vuong(m1_hurdle_pois, m1_pois)
Vuong Non-Nested Hypothesis Test-Statistic:
(test-statistic is asymptotically distributed N(0,1) under the
null that the models are indistinguishible)
-------------------------------------------------------------
Vuong z-statistic H_A p-value
Raw 4.991730 model1 > model2 2.992e-07
AIC-corrected 2.913802 model1 > model2 0.00178528
BIC-corrected -3.503540 model2 > model1 0.00022956
m1_hurdle_pois is a better fit than m1_pois based on small p-values from the the Raw, and AIC-corrected Vuong tests.
m1_hurdle_pois v. m1_nb
Code
vuong(m1_hurdle_pois, m1_nb)
Vuong Non-Nested Hypothesis Test-Statistic:
(test-statistic is asymptotically distributed N(0,1) under the
null that the models are indistinguishible)
-------------------------------------------------------------
Vuong z-statistic H_A p-value
Raw 4.754520 model1 > model2 9.9459e-07
AIC-corrected 1.369766 model1 > model2 0.08538
BIC-corrected -9.083491 model2 > model1 < 2.22e-16
m1_hurdle_pois is a better fit than m1_nb based on small p-values from the the Raw Vuong test.
m1_hurdle_pois v. m1_zip
Code
vuong(m1_hurdle_pois, m1_zip)
Vuong Non-Nested Hypothesis Test-Statistic:
(test-statistic is asymptotically distributed N(0,1) under the
null that the models are indistinguishible)
-------------------------------------------------------------
Vuong z-statistic H_A p-value
Raw 0.3795831 model1 > model2 0.35213
AIC-corrected 0.3795831 model1 > model2 0.35213
BIC-corrected 0.3795831 model1 > model2 0.35213
m1_hurdle_pois is not meaningfully different than m1_zip (p = .352).
m1_hurdle_pois v. m1_zinb
Code
vuong(m1_hurdle_pois, m1_zinb)
Vuong Non-Nested Hypothesis Test-Statistic:
(test-statistic is asymptotically distributed N(0,1) under the
null that the models are indistinguishible)
-------------------------------------------------------------
Vuong z-statistic H_A p-value
Raw 0.3795879 model1 > model2 0.35213
AIC-corrected 0.3795879 model1 > model2 0.35213
BIC-corrected 0.3795879 model1 > model2 0.35213
m1_hurdle_pois is not meaningfully different than m1_zinb (p = .352).
m1_hurdle_pois v. m1_hurdle_nb
Code
vuong(m1_hurdle_pois, m1_hurdle_nb)
Vuong Non-Nested Hypothesis Test-Statistic:
(test-statistic is asymptotically distributed N(0,1) under the
null that the models are indistinguishible)
-------------------------------------------------------------
Vuong z-statistic H_A p-value
Raw -0.007316386 model2 > model1 0.49708
AIC-corrected -0.007316386 model2 > model1 0.49708
BIC-corrected -0.007316386 model2 > model1 0.49708
m1_hurdle_pois is not meaningfully different than m1_hurdle_nb (p = .497).
6.7 Final Model Choice
Based on a combination of the rootograms, the fit statistics, and the Vuong tests, I chose m1_hurdle_pois as the model for the count outcome.
The hurdle model also makes sense given its description (Ford 2016).
The hurdle model is a two-part model that specifies one process for zero counts and another process for positive counts. The idea is that positive counts occur once a threshold is crossed, or put another way, a hurdle is cleared. If the hurdle is not cleared, then we have a count of 0.
Conceptually, moving from no engagement in IHM modalities for pain to engaging in any number of IHM modalities for pain seems like a hurdle to model independently of the count data.
6.8 Fitting Weighted Hurdle Model
In the following code, I added weights to m1_hurdle_pois model using the code, weights = wt_final/mean(wt_final). I then printed a summary of the model.
Call:
hurdle(formula = ihm4pain_num ~ age10 + sex + race_ethnicity + education +
fam_income10k + urban_rural + num_pain_locations + painlimit_lifework +
opioid_freq3mo + phq8total5unit, data = projectb_si_train, weights = wt_final/mean(wt_final),
dist = "poisson", zero.dist = "binomial")
Pearson residuals:
Min 1Q Median 3Q Max
-1.8226 -0.5770 -0.3611 0.4732 8.1198
Count model coefficients (truncated poisson with log link):
Estimate Std. Error
(Intercept) -0.288706 0.286847
age10 -0.143633 0.025058
sexMale -0.254302 0.080070
race_ethnicityHispanic -0.187190 0.136711
race_ethnicityNH Black/African American only -0.096093 0.139174
race_ethnicityOther single and multiple races -0.306117 0.184700
educationHigh school graduate/GED 0.239399 0.208846
educationSome college, no degree 0.638642 0.202086
educationAssociate's or bachelor's degree 0.572008 0.199347
educationMaster's, doctoral, or professional degree 0.648524 0.222696
fam_income10k 0.010163 0.006892
urban_ruralMedium and small metro 0.066435 0.124326
urban_ruralLarge fringe metro 0.056516 0.132087
urban_ruralLarge central metro 0.216033 0.125488
num_pain_locations 0.097209 0.029301
painlimit_lifeworkSome days 0.190048 0.101775
painlimit_lifeworkMost days 0.273623 0.122793
painlimit_lifeworkEvery day 0.165813 0.129089
opioid_freq3moSome days -0.118804 0.180672
opioid_freq3moMost days 0.235398 0.269783
opioid_freq3moEvery day -0.265695 0.189130
phq8total5unit -0.104129 0.039421
z value Pr(>|z|)
(Intercept) -1.006 0.314184
age10 -5.732 9.92e-09 ***
sexMale -3.176 0.001493 **
race_ethnicityHispanic -1.369 0.170925
race_ethnicityNH Black/African American only -0.690 0.489910
race_ethnicityOther single and multiple races -1.657 0.097443 .
educationHigh school graduate/GED 1.146 0.251674
educationSome college, no degree 3.160 0.001576 **
educationAssociate's or bachelor's degree 2.869 0.004112 **
educationMaster's, doctoral, or professional degree 2.912 0.003589 **
fam_income10k 1.475 0.140336
urban_ruralMedium and small metro 0.534 0.593090
urban_ruralLarge fringe metro 0.428 0.668743
urban_ruralLarge central metro 1.722 0.085154 .
num_pain_locations 3.318 0.000908 ***
painlimit_lifeworkSome days 1.867 0.061854 .
painlimit_lifeworkMost days 2.228 0.025859 *
painlimit_lifeworkEvery day 1.284 0.198972
opioid_freq3moSome days -0.658 0.510815
opioid_freq3moMost days 0.873 0.382911
opioid_freq3moEvery day -1.405 0.160072
phq8total5unit -2.641 0.008255 **
Zero hurdle model coefficients (binomial with logit link):
Estimate Std. Error
(Intercept) -1.678505 0.240666
age10 -0.152575 0.023318
sexMale -0.218202 0.076494
race_ethnicityHispanic -0.064007 0.129418
race_ethnicityNH Black/African American only -0.380766 0.128323
race_ethnicityOther single and multiple races -0.353984 0.170466
educationHigh school graduate/GED 0.415864 0.135167
educationSome college, no degree 0.785447 0.141858
educationAssociate's or bachelor's degree 1.053611 0.135944
educationMaster's, doctoral, or professional degree 1.396878 0.184676
fam_income10k 0.036441 0.007846
urban_ruralMedium and small metro 0.064509 0.110355
urban_ruralLarge fringe metro 0.262959 0.120586
urban_ruralLarge central metro 0.463635 0.118048
num_pain_locations 0.180661 0.029999
painlimit_lifeworkSome days 0.327998 0.097323
painlimit_lifeworkMost days 0.410528 0.122285
painlimit_lifeworkEvery day 0.472318 0.120034
opioid_freq3moSome days 0.139740 0.168191
opioid_freq3moMost days 0.161674 0.270131
opioid_freq3moEvery day -0.344714 0.145339
phq8total5unit 0.071419 0.037659
z value Pr(>|z|)
(Intercept) -6.974 3.07e-12 ***
age10 -6.543 6.01e-11 ***
sexMale -2.853 0.004337 **
race_ethnicityHispanic -0.495 0.620900
race_ethnicityNH Black/African American only -2.967 0.003005 **
race_ethnicityOther single and multiple races -2.077 0.037841 *
educationHigh school graduate/GED 3.077 0.002093 **
educationSome college, no degree 5.537 3.08e-08 ***
educationAssociate's or bachelor's degree 7.750 9.16e-15 ***
educationMaster's, doctoral, or professional degree 7.564 3.91e-14 ***
fam_income10k 4.644 3.41e-06 ***
urban_ruralMedium and small metro 0.585 0.558844
urban_ruralLarge fringe metro 2.181 0.029207 *
urban_ruralLarge central metro 3.928 8.58e-05 ***
num_pain_locations 6.022 1.72e-09 ***
painlimit_lifeworkSome days 3.370 0.000751 ***
painlimit_lifeworkMost days 3.357 0.000788 ***
painlimit_lifeworkEvery day 3.935 8.32e-05 ***
opioid_freq3moSome days 0.831 0.406064
opioid_freq3moMost days 0.599 0.549505
opioid_freq3moEvery day -2.372 0.017702 *
phq8total5unit 1.896 0.057901 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Number of iterations in BFGS optimization: 29
Log-likelihood: -3275 on 44 Df
6.9 Model Coefficients
To generate a singular summary of coefficients from m1_hurdle_pois, I performed the following
Generated an initial m1_hurdle_pois_coef tibble containing the variables Type (i.e., count or non-zero), Group (i.e., each variable group), and Comparison (e.g., Age +10yrs, Male v. Female) to match the output from m1_hurdle_pois
Generated m1_hurdle_pois_pe as a tibble of the exponentiated coefficients from m1_hurdle_pois
Generated m1_hurdle_pois_ci as a tibble of the exponentiated 95% CI from m1_hurdle_pois
Bound all of the above into the tibble, m1_hurdle_pois_coef.
Code
m1_hurdle_pois_coef <-tibble(Type =c("count", "count", "count", "count", "count", "count","count", "count", "count","count", "count", "count","count", "count", "count", "count", "count", "count","count", "count", "count", "count","non-zero", "non-zero", "non-zero", "non-zero", "non-zero", "non-zero","non-zero", "non-zero", "non-zero", "non-zero", "non-zero", "non-zero","non-zero", "non-zero", "non-zero", "non-zero", "non-zero", "non-zero","non-zero", "non-zero", "non-zero", "non-zero"),Group =c("(Intercept)", "Age", "Sex", "Race/Ethnicity", "Race/Ethnicity", "Race/Ethnicity","Education", "Education", "Education", "Education", "Family income", "Proximity to metro area", "Proximity to metro area","Proximity to metro area", "Number of pain locations", "Pain limiting life or work", "Pain limiting life or work","Pain limiting life or work", "Opioid use past 3 mo", "Opioid use past 3 mo","Opioid use past 3 mo", "PHQ-8", "(Intercept)", "Age", "Sex", "Race/Ethnicity", "Race/Ethnicity", "Race/Ethnicity","Education", "Education", "Education", "Education", "Family income", "Proximity to metro area", "Proximity to metro area","Proximity to metro area", "Number of pain locations", "Pain limiting life or work", "Pain limiting life or work","Pain limiting life or work", "Opioid use past 3 mo", "Opioid use past 3 mo","Opioid use past 3 mo", "PHQ-8"),Comparison =c("(Intercept)", "Age +10yrs", "Male v. Female", "Hispanic v. NH White","NH Black v. NH White","Other v. NH White","HS/GED v. < HS", "Some College v. < HS", "Assoc/Bach v. < HS","Masters/PhD v. < HS", "Fam income +$10,000","M/S metro v. Non-metro", "L fringe metro v. Non-metro","L central metro v. Non-metro", "Pain locations +1", "Pain limit some days v. Never", "Pain limit most days v. Never","Pain limit every day v. Never", "Opioids some days v. Never", "Opioids most days v. Never","Opioids every day v. Never", "PHQ-8 +5 units", "(Intercept)", "Age +10yrs", "Male v. Female", "Hispanic v. NH White","NH Black v. NH White","Other v. NH White","HS/GED v. < HS", "Some College v. < HS", "Assoc/Bach v. < HS","Masters/PhD v. < HS", "Fam income +$10,000","M/S metro v. Non-metro", "L fringe metro v. Non-metro","L central metro v. Non-metro", "Pain locations +1", "Pain limit some days v. Never", "Pain limit most days v. Never","Pain limit every day v. Never", "Opioids some days v. Never", "Opioids most days v. Never","Opioids every day v. Never", "PHQ-8 +5 units"))m1_hurdle_pois_pe <-exp(coef(m1_hurdle_pois)) |>as_tibble()m1_hurdle_pois_ci <-exp(confint(m1_hurdle_pois)) |>as_tibble() |>clean_names() |>rename(Conf.low ="x2_5_percent",Conf.high ="x97_5_percent")m1_hurdle_pois_coef <-bind_cols(m1_hurdle_pois_coef, m1_hurdle_pois_pe, m1_hurdle_pois_ci) |>rename(Estimate = value)
6.9.1 Numeric Summary of Odds of ihm4pain_num != 0
In the following code, I generated a numeric summary of the exponentiated coefficients from m1_hurdle_pois describing the odds of ihm4pain_num != 0.
Code
m1_hurdle_pois_coef |>filter(Type =="non-zero") |>select(Group, Comparison, Estimate, Conf.low, Conf.high) |>mutate(Estimate =round(Estimate, digits =3),Conf.low =round(Conf.low, digits =3),Conf.high =round(Conf.high, digits =3)) |>rename(aOR = Estimate,"95% CI Low"= Conf.low,"95% CI High"= Conf.high) |>flextable() |>bold(bold =TRUE, part ="header") |>add_header_lines(values ="Supplemental Table 1: aOR of Engagement in IHM Modalities for Pain") |>autofit() |>add_footer_lines(values ="Abbreviations: Assoc, associate’s; aOR, adjusted odds ratio; Bach, bachelor’s; CI, confidence interval; fam, family; PHQ-8, Patient Health Questionnaire-8; GED, general equivalency diploma; HS, high school; L, large; M, medium; metro, metropolitan; mo, months; NH, Non-Hispanic; S, small; v., versus; yrs, years", top =FALSE) |>theme_zebra(odd_header ="white") |>hline_top(part="body")
Supplemental Table 1: aOR of Engagement in IHM Modalities for Pain
Group
Comparison
aOR
95% CI Low
95% CI High
(Intercept)
(Intercept)
0.187
0.116
0.299
Age
Age +10yrs
0.858
0.820
0.899
Sex
Male v. Female
0.804
0.692
0.934
Race/Ethnicity
Hispanic v. NH White
0.938
0.728
1.209
Race/Ethnicity
NH Black v. NH White
0.683
0.531
0.879
Race/Ethnicity
Other v. NH White
0.702
0.503
0.980
Education
HS/GED v. < HS
1.516
1.163
1.975
Education
Some College v. < HS
2.193
1.661
2.896
Education
Assoc/Bach v. < HS
2.868
2.197
3.744
Education
Masters/PhD v. < HS
4.043
2.815
5.806
Family income
Fam income +$10,000
1.037
1.021
1.053
Proximity to metro area
M/S metro v. Non-metro
1.067
0.859
1.324
Proximity to metro area
L fringe metro v. Non-metro
1.301
1.027
1.648
Proximity to metro area
L central metro v. Non-metro
1.590
1.261
2.004
Number of pain locations
Pain locations +1
1.198
1.130
1.271
Pain limiting life or work
Pain limit some days v. Never
1.388
1.147
1.680
Pain limiting life or work
Pain limit most days v. Never
1.508
1.186
1.916
Pain limiting life or work
Pain limit every day v. Never
1.604
1.268
2.029
Opioid use past 3 mo
Opioids some days v. Never
1.150
0.827
1.599
Opioid use past 3 mo
Opioids most days v. Never
1.175
0.692
1.996
Opioid use past 3 mo
Opioids every day v. Never
0.708
0.533
0.942
PHQ-8
PHQ-8 +5 units
1.074
0.998
1.156
Abbreviations: Assoc, associate’s; aOR, adjusted odds ratio; Bach, bachelor’s; CI, confidence interval; fam, family; PHQ-8, Patient Health Questionnaire-8; GED, general equivalency diploma; HS, high school; L, large; M, medium; metro, metropolitan; mo, months; NH, Non-Hispanic; S, small; v., versus; yrs, years
In examining the effects table above, I will summarize the effect of Masters/PhD v. < HS using two fictional cases: Luke Skywalker and Din Djarin
Suppose Luke Skywalker and Din Djarin 1) have the same demographic profile (i.e., same age, male); 2) reside in rural areas; 3) rated similar challenges related to pain interfering with life or work in the past 3 months; 4) used the same amount of opioids for chronic pain within the past 3 months; 5) reported the same score on the PHQ-8; 6) reported the same number of locations where they were bothered at least “a little” by pain in the past three months; 7) have the same family income; and 8) are of the same race and ethnicity. However, Luke Skywalker has a PhD in being a Jedi while Din Djarin did not graduate “This is the Way” High School. In this scenario, m1_hurdle_pois predicts that Luke Skywalker’s odds of ihm4pain_num != 0 will be 4.043 times that of Din Djarin.
The 95% confidence interval for this estimated odds ratio for the effect of Masters/PhD v. < HS on ihm4pain_num != 0 is (2.815, 5.806). This confidence interval does not include one. Thus, holding all other variables constant, having a Masters, PhD, or professional degree is associated with higher odds of having engaged in an IHM modality to manage pain in the past three months.
6.9.2 Numeric Summary of Relative Change in Count of ihm4pain_num
In the following code, I generated a numeric summary of the exponentiated coefficients from m1_hurdle_pois describing the predicted relative change in the count of ihm4pain_num.
Code
m1_hurdle_pois_coef |>filter(Type =="count") |>select(Group, Comparison, Estimate, Conf.low, Conf.high) |>mutate(Estimate =round(Estimate, digits =3),Conf.low =round(Conf.low, digits =3),Conf.high =round(Conf.high, digits =3)) |>rename("Relative change"= Estimate,"95% CI Low"= Conf.low,"95% CI High"= Conf.high) |>flextable() |>bold(bold =TRUE, part ="header") |>add_header_lines(values ="Supplemental Table 2: Relative Change in Count of IHM Modalities Used for Pain") |>autofit() |>add_footer_lines(values ="Abbreviations: Assoc, associate’s; Bach, bachelor’s; CI, confidence interval; fam, family; PHQ-8, Patient Health Questionnaire-8; GED, general equivalency diploma; HS, high school; L, large; M, medium; metro, metropolitan; mo, months; NH, Non-Hispanic; S, small; v., versus; yrs, years", top =FALSE) |>theme_zebra(odd_header ="white") |>hline_top(part="body")
Supplemental Table 2: Relative Change in Count of IHM Modalities Used for Pain
Group
Comparison
Relative change
95% CI Low
95% CI High
(Intercept)
(Intercept)
0.749
0.427
1.315
Age
Age +10yrs
0.866
0.825
0.910
Sex
Male v. Female
0.775
0.663
0.907
Race/Ethnicity
Hispanic v. NH White
0.829
0.634
1.084
Race/Ethnicity
NH Black v. NH White
0.908
0.692
1.193
Race/Ethnicity
Other v. NH White
0.736
0.513
1.057
Education
HS/GED v. < HS
1.270
0.844
1.913
Education
Some College v. < HS
1.894
1.275
2.814
Education
Assoc/Bach v. < HS
1.772
1.199
2.619
Education
Masters/PhD v. < HS
1.913
1.236
2.959
Family income
Fam income +$10,000
1.010
0.997
1.024
Proximity to metro area
M/S metro v. Non-metro
1.069
0.838
1.364
Proximity to metro area
L fringe metro v. Non-metro
1.058
0.817
1.371
Proximity to metro area
L central metro v. Non-metro
1.241
0.971
1.587
Number of pain locations
Pain locations +1
1.102
1.041
1.167
Pain limiting life or work
Pain limit some days v. Never
1.209
0.991
1.476
Pain limiting life or work
Pain limit most days v. Never
1.315
1.034
1.672
Pain limiting life or work
Pain limit every day v. Never
1.180
0.916
1.520
Opioid use past 3 mo
Opioids some days v. Never
0.888
0.623
1.265
Opioid use past 3 mo
Opioids most days v. Never
1.265
0.746
2.147
Opioid use past 3 mo
Opioids every day v. Never
0.767
0.529
1.111
PHQ-8
PHQ-8 +5 units
0.901
0.834
0.973
Abbreviations: Assoc, associate’s; Bach, bachelor’s; CI, confidence interval; fam, family; PHQ-8, Patient Health Questionnaire-8; GED, general equivalency diploma; HS, high school; L, large; M, medium; metro, metropolitan; mo, months; NH, Non-Hispanic; S, small; v., versus; yrs, years
In examining the effects table above, I will summarize the effect of Age using two fictional cases: Han Solo and Luke Skywalker
Per the guidance by Yoshida, given the response is positive (among those who have positive counts for engaging in IHM modalities), the average count is 0.701 (Intercept).
Suppose Han Solo and Luke Skywalker 1) are both male and of the same race and ethnicity; 2) have the same level of education; 3) reside in rural areas; 4) rated similar challenges related to pain interfering with life or work in the past 3 months; 5) used the same amount of opioids for chronic pain within the past 3 months; 6) reported the same score on the PHQ-8; 7) reported the same number of locations where they were bothered at least “a little” by pain in the past three months; and 8) have the same family income. However, Han Solo is 10 years older than Luke Skywalker. In this scenario, m1_hurdle_pois predicts that the number of IHM modalities that Han Solo engaged in for pain in the past 3 months will be .866 times that of Luke’s (13.4% lower than Luke’s).
The 95% confidence interval for this estimated effect of age10 on ihm4pain_num is (0.825, 0.910). This confidence interval does not include one. Thus, holding all other variables constant, being 10 years older is associated with lower counts of IHM modalities used to manage pain in the past three months.
6.9.3 Visual Summary of Odds of ihm4pain_num != 0
To visualize the effects of the covariates on odds of ihm4pain_num != 0, I first generated the m1_coef_non0 tibble, limited to the “non-zero” coefficients. I then used ggplot() to plot the effects. Education covariates are placed on a separate axis given their size to not obscure the other covariates.
Code
m1_coef_non0 <- m1_hurdle_pois_coef |>filter(Type =="non-zero", Comparison !="(Intercept)", Group !="Education") |>mutate(Comparison =fct_inorder(Comparison, ordered =NA),Comparison =fct_rev(Comparison))m1_coef_non0educ <- m1_hurdle_pois_coef |>filter(Type =="non-zero", Comparison !="(Intercept)", Group =="Education") |>mutate(Comparison =fct_inorder(Comparison, ordered =NA),Comparison =fct_rev(Comparison))p1 <-ggplot(m1_coef_non0, aes(x = Estimate, y = Comparison)) +geom_vline(aes(xintercept =1), linewidth = .25, linetype ='dashed') +geom_errorbarh(aes(xmax = Conf.high, xmin = Conf.low), linewidth = .5, height = .2, color ='gray50') +geom_point(size =2.5, color = uhred) +labs(x ="aOR",y ="",title ="aOR of Engaging in ≥ 1 IHM Modality for Pain") +scale_x_continuous(breaks =seq(0,4,0.25))p2 <-ggplot(m1_coef_non0educ, aes(x = Estimate, y = Comparison)) +geom_vline(aes(xintercept =1), linewidth = .25, linetype ='dashed') +geom_errorbarh(aes(xmax = Conf.high, xmin = Conf.low), linewidth = .5,height = .2, color ='gray50') +geom_point(size =2.5, color = uhred) +labs(x ="aOR",y ="") +scale_x_continuous(breaks =seq(0,8,0.5))p1/p2 +plot_layout(heights =c(5,1))
6.9.4 Visual Summary of Estimated Relative Change in Count of ihm4pain_num
To visualize the effects of the covariates on the estimated relative change in count of ihm4pain_num, I first generated the m1_coef_count tibble, limited to the “count” coefficients. I then used ggplot() to plot the effects. Education covariates are placed on a separate axis given their size to not obscure the other covariates.
Code
m1_coef_count <- m1_hurdle_pois_coef |>filter(Type =="count", Comparison !="(Intercept)", Group !="Education") |>mutate(Comparison =fct_inorder(Comparison, ordered =NA),Comparison =fct_rev(Comparison))m1_coef_count_educ <- m1_hurdle_pois_coef |>filter(Type =="count", Comparison !="(Intercept)", Group =="Education") |>mutate(Comparison =fct_inorder(Comparison, ordered =NA),Comparison =fct_rev(Comparison))p1 <-ggplot(m1_coef_count, aes(x = Estimate, y = Comparison)) +geom_vline(aes(xintercept =1), linewidth = .25, linetype ='dashed') +geom_errorbarh(aes(xmax = Conf.high, xmin = Conf.low), linewidth = .5, height = .2, color ='gray50') +geom_point(size =2.5, color = uhred) +labs(x ="Relative Change in Count",y ="",title ="Relative Change in Count of IHM Modalities Used for Pain",subtitle ="Assuming Engagement in > 0 IHM Modalities") +scale_x_continuous(breaks =seq(0,7,0.25))p2 <-ggplot(m1_coef_count_educ, aes(x = Estimate, y = Comparison)) +geom_vline(aes(xintercept =1), linewidth = .25, linetype ='dashed') +geom_errorbarh(aes(xmax = Conf.high, xmin = Conf.low), linewidth = .5, height = .2, color ='gray50') +geom_point(size =2.5, color = uhred) +labs(x ="Relative Change in Count",y ="") +scale_x_continuous(breaks =seq(0,7,0.5))p1/p2 +plot_layout(heights =c(5,1))
6.10 Check Model Assumptions
To generate a residuals vs. fitted plot from m1_hurdle_pois, I used ggplot() and geom_point().
Code
ggplot(m1_hurdle_pois_aug, aes(x = .fitted, y = .resid)) +geom_point() +labs(title ="Residuals vs. Fitted `ihm4pain_num`",subtitle ="Hurdle model with Poisson counts")
In examining the plot above, I see no major issues of concern.
6.11 Validating m1_hurdle_pois
6.11.1 Fit Summary in Test Data
To validate m1_hurdle_pois in the projectb_si_test data, I performed the following to generate m1test_hurdle_pois_summary, a summary of fit statistics.
I created m1test_hurdle_pois_aug as an augmentation of m1_hurdle_pois applied to the projectb_si_test data. This time I used mutate() to obtain .fitted and .resid values.
I generated, \(R^2\), RMSE, and MAE values using the mets() function.
I used group_by() and summarize() to pivot m1test_hurdle_pois_summary such that there would be one column for each fit statistic.
I then bound together m1_hurdle_pois_summary and m1test_hurdle_pois_summary into m1_summaryfinal to present a table comparing the \(R^2\), RMSE, and MAE values from the training and testing data.
In examining the table above, it appears the model does not perform as well in the testing data given the lower \(R^2\) (0.0001 vs. 0.1182). m1_hurdle_pois also has higher RMSE and MAE in the testing data.
6.11.3 Table Comparing Observed and Predicted Counts
To further explore the accuracy of m1_hurdle_pois, I created a table (m1_countsum) summarizing the observed and expected counts from the testing and training datasets as well as their respective m1_hurdle_pois applications.
Code
m1_countsum <-tibble(Source =c("Training data observations", "m1_hurdle_pois in training", "Test data observations", "m1_hurdle_pois in test"),ZeroIHM =c(sum(projectb_si_train$ihm4pain_num ==0),round(sum(predict(m1_hurdle_pois, type ="prob")[,1]),0),sum(projectb_si_test$ihm4pain_num ==0),round(sum(predict(m1_hurdle_pois, newdata = projectb_si_test, type ="prob")[,1]),0)),OneIHM =c(sum(projectb_si_train$ihm4pain_num ==1),round(sum(predict(m1_hurdle_pois, type ="prob")[,2]),0),sum(projectb_si_test$ihm4pain_num ==1),round(sum(predict(m1_hurdle_pois, newdata = projectb_si_test, type ="prob")[,2]),0)),TwoIHM =c(sum(projectb_si_train$ihm4pain_num ==2),round(sum(predict(m1_hurdle_pois, type ="prob")[,3]),0),sum(projectb_si_test$ihm4pain_num ==2),round(sum(predict(m1_hurdle_pois, newdata = projectb_si_test, type ="prob")[,3]),0)),ThreeIHM =c(sum(projectb_si_train$ihm4pain_num ==3),round(sum(predict(m1_hurdle_pois, type ="prob")[,4]),0),sum(projectb_si_test$ihm4pain_num ==3),round(sum(predict(m1_hurdle_pois, newdata = projectb_si_test, type ="prob")[,4]),0)),FourIHM =c(sum(projectb_si_train$ihm4pain_num ==4),round(sum(predict(m1_hurdle_pois, type ="prob")[,5]),0),sum(projectb_si_test$ihm4pain_num ==4),round(sum(predict(m1_hurdle_pois, newdata = projectb_si_test, type ="prob")[,5]),0))) |>mutate(MoreThan4IHM =nrow(projectb_si_train) - (ZeroIHM + OneIHM + TwoIHM + ThreeIHM + FourIHM),Total = (ZeroIHM + OneIHM + TwoIHM + ThreeIHM + FourIHM + MoreThan4IHM)) |>rename("0"= ZeroIHM,"1"= OneIHM,"2"= TwoIHM,"3"= ThreeIHM,"4"= FourIHM,">4"= MoreThan4IHM)
6.11.4 Plot Comparing Observed and Predicted Counts
To facilitate a visual comparison, I generated m1_pivot and a pair of plots comparing observed vs. predicted values in the training and test samples.
Code
m1_pivot <- m1_countsum |>select(-c(Total)) |>pivot_longer(!Source, names_to ="Count", values_to ="Value") |>mutate(Count =factor(Count, levels =c("0", "1", "2", "3", "4", ">4")))m1_pivot_train <- m1_pivot |>filter(Source %in%c("Training data observations", "m1_hurdle_pois in training")) |>mutate(Source =if_else(Source =="Training data observations", "Observed", "Predicted"),Source =factor(Source, levels =c("Observed", "Predicted")))m1_pivot_test <- m1_pivot |>filter(Source %in%c("Test data observations", "m1_hurdle_pois in test")) |>mutate(Source =if_else(Source =="Test data observations", "Observed", "Predicted"),Source =factor(Source, levels =c("Observed", "Predicted")))p1 <-ggplot(data = m1_pivot_train, aes(x = Count, y = Value, fill = Source)) +geom_bar(stat ="identity", position =position_dodge(), alpha =0.75) +ylim(0, 2500) +labs(x ="Count of IHM Modalities",y ="Number of Subjects",title ="Training Data (n = 3557)") +theme(legend.position="bottom",legend.title =element_blank())p2 <-ggplot(data = m1_pivot_test, aes(x = Count, y = Value, fill = Source)) +geom_bar(stat ="identity", position =position_dodge(), alpha =0.75) +ylim(0, 2500) +labs(x ="Count of IHM Modalities",y ="Number of Subjects",title ="Test Data (n = 3557)") +guides(fill="none")p1 + p2 +plot_annotation("Comparing Observed v. Predicted Values in Weighted Hurdle Poisson Model")
Source
0
1
2
3
4
>4
Total
Training data observations
2,257
811
343
116
30
0
3,557
m1_hurdle_pois in training
2,329
787
312
96
25
8
3,557
Test data observations
2,297
791
324
107
38
0
3,557
m1_hurdle_pois in test
2,354
778
303
92
24
6
3,557
In examining the plots above, while there is slightly greater deviation from the observed counts in the test data, it appears the model performs quite well in predicting each level of the count of ihm4pain_num in both the training and test datasets.
7 Model 2: Nonpharmacologic Modalities Only vs. Opioids Only
For Model 2, I sought to answer the question:
Which demographic, mental health, and pain-related variables are associated with engagement in exclusively nonpharmacologic modalities rather than opioid utilization in the past 3 months among U.S. adults with chronic pain?
7.1 Create Dataset
For this model, the outcome was defined as engagement in nonpharmacologic modalities only (ie, engaging in IHM, a chronic pain self-management program, support groups, or physical, rehabilitative, occupational, or talk therapy while never using opioids) rather than opioids only (ie, using opioids on at least some days while not using any nonpharmacologic modalities). The already single-imputed data, projectb_si, were filtered to include the 3300 subjects engaging in either nonpharmacologic only or opioids only. With this new m2_dataset_si dataset, the outcome nonpharm_over_opioid was defined as 1 if “Nonpharm only” and 0 if “Opioids only.”
table2 |>flextable() |>add_header_lines(values ="Table 2: Socio-Demographic, Pain, and Mental Health Characteristics By Nonpharmacologic Modality Engagement") |>bold(bold =TRUE, part ="header") |>add_footer_lines(values ="Abbreviations: CI, confidence interval; PHQ-8, Patient Health Questionnaire-8; GED, general equivalency diploma; metro, metropolitan; NH, Non-Hispanic; Nonpharm, nonpharmacologic", top =FALSE) |>padding(i =c(4:5, 7:10, 12:16, 19:22, 25:28), j =1, padding.left =20) |>bold(i =c(1, 2, 3, 6, 11, 17:18, 23:24, 29), j =1, bold =TRUE) |> flextable::footnote(i =c(2, 17, 23, 29), j =5, ref_symbols ="a",value =as_paragraph("Wilcoxon rank-sum test for complex survey samples")) |> flextable::footnote(i =c(3, 6, 11, 18, 24), j =5,ref_symbols ="b",value =as_paragraph("Chi-squared test with Rao & Scott’s second-order correction")) |>autofit(part =c("body", "header"))
Table 2: Socio-Demographic, Pain, and Mental Health Characteristics By Nonpharmacologic Modality Engagement
Weighted variable
Combined groups
Nonpharm only
Opioids only
P value
Total, n
22,707,275
19,099,216
3,608,059
Age in years, mean (95% CI)
54.0 (53.2-54.7)
53.0 (52.2-53.8)
59.2 (57.6-60.8)
<.001a
Sex, % (95% CI)
.797b
Female
59.2 (57.1-61.3)
59.4 (57.1-61.6)
58.6 (53.0-63.9)
Male
40.8 (38.7-42.9)
40.6 (38.4-42.9)
41.4 (36.1-47.0)
Race/Ethnicity, % (95% CI)
.221b
NH White only
74.3 (72.4-76.1)
74.6 (72.6-76.6)
72.6 (67.2-77.3)
Hispanic
10.1 (8.91-11.5)
10.0 (8.73-11.5)
10.7 (7.59-14.8)
NH Black/African American only
9.8 (8.60-11.1)
9.2 (7.96-10.6)
12.8 (9.77-16.5)
Other single and multiple races
5.8 (4.82-6.92)
6.1 (5.11-7.31)
4.0 (1.84-8.57)
Education, % (95% CI)
<.001b
Less than high school
12.1 (10.7-13.7)
9.6 (8.25-11.2)
25.0 (20.3-30.2)
High school graduate/GED
26.7 (24.8-28.6)
25.4 (23.5-27.5)
33.4 (28.4-38.8)
Some college, no degree
20.4 (18.8-22.2)
20.6 (18.8-22.5)
19.4 (15.4-24.2)
Associate's or bachelor's degree
31.2 (29.4-33.1)
33.7 (31.6-35.8)
18.3 (14.8-22.4)
Master's, doctoral, or professional degree
9.6 (8.54-10.7)
10.6 (9.47-11.9)
3.9 (2.47-6.07)
Family income in $1000s, mean (95% CI)
68.5 (66.2-70.8)
72.8 (70.3-75.4)
45.6 (41.6-49.6)
<.001a
Proximity to metro area, % (95% CI)
<.001b
Nonmetropolitan
18.0 (16.5-19.6)
16.5 (15.0-18.2)
25.9 (21.7-30.7)
Medium and small metro
33.0 (31.1-35.0)
32.8 (30.8-34.9)
33.9 (28.9-39.3)
Large fringe metro
22.1 (20.4-23.9)
23.3 (21.5-25.3)
15.7 (12.4-19.7)
Large central metro
26.8 (25.0-28.8)
27.3 (25.3-29.4)
24.5 (19.8-29.8)
Number of pain locations, mean (95% CI)
3.2 (3.2-3.3)
3.2 (3.1-3.3)
3.3 (3.2-3.5)
.086a
Pain limiting life/work, % (95% CI)
<.001b
Never
22.1 (20.5-23.8)
24.2 (22.4-26.1)
10.9 (8.13-14.5)
Some days
38.5 (36.4-40.6)
39.3 (37.1-41.5)
34.3 (29.1-39.9)
Most days
17.2 (15.8-18.8)
16.9 (15.3-18.7)
18.9 (15.3-23.1)
Every day
22.2 (20.5-24.0)
19.6 (17.9-21.5)
36.0 (30.9-41.3)
PHQ-8 total, mean (95% CI)
5.6 (5.3-5.8)
5.4 (5.1-5.6)
6.5 (5.9-7.2)
<.001a
Abbreviations: CI, confidence interval; PHQ-8, Patient Health Questionnaire-8; GED, general equivalency diploma; metro, metropolitan; NH, Non-Hispanic; Nonpharm, nonpharmacologic
aWilcoxon rank-sum test for complex survey samples
bChi-squared test with Rao & Scott’s second-order correction
7.7 Fit Weighted Logistic Model with Survey Weights
I fit the model m2_lrm_wt with the lrm() function, defining weights with weights = wt_final.
In this analysis, I chose to predict the outcome nonpharm_over_opioid based on 9 variables: age, sex, race_ethnicity, education, fam_income1kurban_rural, num_pain_locations, painlimit_lifework, and phq8total.
Code
set.seed(19900317)dd <-datadist(m2_dataset_si, adjto.cat ="first")options(datadist ="dd")m2_lrm_wt <-lrm(nonpharm_over_opioid ~ age + sex + race_ethnicity + education + fam_income1k + urban_rural + num_pain_locations + painlimit_lifework + phq8total,weights = wt_final, normwt =TRUE,data = m2_dataset_si, x =TRUE, y =TRUE)m2_lrm_wt
Logistic Regression Model
lrm(formula = nonpharm_over_opioid ~ age + sex + race_ethnicity +
education + fam_income1k + urban_rural + num_pain_locations +
painlimit_lifework + phq8total, data = m2_dataset_si, x = TRUE,
y = TRUE, weights = wt_final, normwt = TRUE)
Sum of Weights by Response Category
0 1
524.3515 2775.6485
Model Likelihood Discrimination Rank Discrim.
Ratio Test Indexes Indexes
Obs 3300 LR chi2 326.37 R2 0.161 C 0.737
0 510 d.f. 18 R2(18,3300)0.089 Dxy 0.475
1 2790 Pr(> chi2) <0.0001 R2(18,1323.1)0.208 gamma 0.475
Sum of weights3300 Brier 0.121 tau-a 0.124
max |deriv| 1e-07
Coef S.E. Wald Z
Intercept 2.1393 0.3310 6.46
age -0.0212 0.0033 -6.50
sex=Male -0.1856 0.1038 -1.79
race_ethnicity=Hispanic 0.2904 0.1785 1.63
race_ethnicity=NH Black/African American only -0.1415 0.1636 -0.87
race_ethnicity=Other single and multiple races 0.4662 0.2511 1.86
education=High school graduate/GED 0.4940 0.1454 3.40
education=Some college, no degree 0.7391 0.1627 4.54
education=Associate's or bachelor's degree 1.1555 0.1644 7.03
education=Master's, doctoral, or professional degree 1.3719 0.2696 5.09
fam_income1k 0.0074 0.0013 5.54
urban_rural=Medium and small metro 0.2832 0.1362 2.08
urban_rural=Large fringe metro 0.6227 0.1623 3.84
urban_rural=Large central metro 0.3709 0.1525 2.43
num_pain_locations 0.0004 0.0411 0.01
painlimit_lifework=Some days -0.6143 0.1669 -3.68
painlimit_lifework=Most days -0.7064 0.1879 -3.76
painlimit_lifework=Every day -1.0145 0.1773 -5.72
phq8total -0.0070 0.0097 -0.72
Pr(>|Z|)
Intercept <0.0001
age <0.0001
sex=Male 0.0739
race_ethnicity=Hispanic 0.1038
race_ethnicity=NH Black/African American only 0.3869
race_ethnicity=Other single and multiple races 0.0634
education=High school graduate/GED 0.0007
education=Some college, no degree <0.0001
education=Associate's or bachelor's degree <0.0001
education=Master's, doctoral, or professional degree <0.0001
fam_income1k <0.0001
urban_rural=Medium and small metro 0.0376
urban_rural=Large fringe metro 0.0001
urban_rural=Large central metro 0.0150
num_pain_locations 0.9913
painlimit_lifework=Some days 0.0002
painlimit_lifework=Most days 0.0002
painlimit_lifework=Every day <0.0001
phq8total 0.4712
7.8 Numeric Summary of Coefficients (m2_coef_wt)
The following code generates a numeric summary of coefficients from the model and adds meaningful labels for comparison.
Code
m2_coef_wt <-summary(m2_lrm_wt, conf.int =0.95) |>as_tibble() |>filter(Type ==2) |>mutate(Group =c("Age", "Family income","Number of pain locations","PHQ-8", "Sex", "Race/Ethnicity", "Race/Ethnicity", "Race/Ethnicity","Education", "Education", "Education", "Education", "Proximity to metro area", "Proximity to metro area","Proximity to metro area", "Pain limiting life or work", "Pain limiting life or work","Pain limiting life or work"),Comparison =c("Age 69 v. 46", "Family income $85k v. $22k","4 pain locations v. 2","PHQ-8 score 8 v. 1","Male v. Female", "Hispanic v. NH White","NH Black v. NH White","Other v. NH White","HS/GED v. < HS", "Some College v. < HS", "Assoc/Bach v. < HS","Masters/PhD v. < HS", "M/S metro v. Non-metro", "L fringe metro v. Non-metro","L central metro v. Non-metro", "Pain limit some days v. Never", "Pain limit most days v. Never","Pain limit every day v. Never"),Effect =as.numeric(Effect)) |>rename(Lower0.95 ="Lower 0.95", Upper0.95 ="Upper 0.95") |>mutate(Lower0.95 =as.numeric(Lower0.95), Upper0.95 =as.numeric(Upper0.95))m2_coef_wt <- m2_coef_wt |>mutate(Comparison =factor(Comparison, levels =c("Age 69 v. 46","Male v. Female","Hispanic v. NH White","NH Black v. NH White","Other v. NH White","HS/GED v. < HS", "Some College v. < HS", "Assoc/Bach v. < HS","Masters/PhD v. < HS", "Family income $85k v. $22k","M/S metro v. Non-metro", "L fringe metro v. Non-metro","L central metro v. Non-metro", "4 pain locations v. 2","Pain limit some days v. Never", "Pain limit most days v. Never","Pain limit every day v. Never","PHQ-8 score 8 v. 1"))) |>arrange(Comparison)
The following code generates Supplemental Table 3, providing a numeric summmary of the coefficients from m2_lrm_wt.
Code
m2_coef_wt |>select(Group, Comparison, Effect, Lower0.95, Upper0.95) |>mutate(Effect =round(Effect, digits =3),Lower0.95 =round(Lower0.95, digits =3),Upper0.95 =round(Upper0.95, digits =3)) |>rename(aOR = Effect,"95% CI Low"= Lower0.95,"95% CI High"= Upper0.95) |>flextable() |>bold(bold =TRUE, part ="header") |>add_header_lines(values ="Supplemental Table 3: aOR of Engagement in Nonpharmacologic Only vs. Opioids Only") |>autofit() |>add_footer_lines(values ="Abbreviations: Assoc, associate’s; aOR, adjusted odds ratio; Bach, bachelor’s; CI, confidence interval; fam, family; PHQ-8, Patient Health Questionnaire-8; GED, general equivalency diploma; HS, high school; L, large; M, medium; metro, metropolitan; NH, Non-Hispanic; S, small; v., versus; yrs, years", top =FALSE) |>theme_zebra(odd_header ="white") |>hline_top(part="body")
Supplemental Table 3: aOR of Engagement in Nonpharmacologic Only vs. Opioids Only
Group
Comparison
aOR
95% CI Low
95% CI High
Age
Age 69 v. 46
0.614
0.530
0.711
Sex
Male v. Female
0.831
0.678
1.018
Race/Ethnicity
Hispanic v. NH White
1.337
0.942
1.897
Race/Ethnicity
NH Black v. NH White
0.868
0.630
1.196
Race/Ethnicity
Other v. NH White
1.594
0.974
2.607
Education
HS/GED v. < HS
1.639
1.232
2.179
Education
Some College v. < HS
2.094
1.522
2.881
Education
Assoc/Bach v. < HS
3.176
2.301
4.383
Education
Masters/PhD v. < HS
3.943
2.325
6.687
Family income
Family income $85k v. $22k
1.596
1.352
1.883
Proximity to metro area
M/S metro v. Non-metro
1.327
1.016
1.734
Proximity to metro area
L fringe metro v. Non-metro
1.864
1.356
2.562
Proximity to metro area
L central metro v. Non-metro
1.449
1.075
1.954
Number of pain locations
4 pain locations v. 2
1.001
0.852
1.176
Pain limiting life or work
Pain limit some days v. Never
0.541
0.390
0.750
Pain limiting life or work
Pain limit most days v. Never
0.493
0.341
0.713
Pain limiting life or work
Pain limit every day v. Never
0.363
0.256
0.513
PHQ-8
PHQ-8 score 8 v. 1
0.952
0.834
1.088
Abbreviations: Assoc, associate’s; aOR, adjusted odds ratio; Bach, bachelor’s; CI, confidence interval; fam, family; PHQ-8, Patient Health Questionnaire-8; GED, general equivalency diploma; HS, high school; L, large; M, medium; metro, metropolitan; NH, Non-Hispanic; S, small; v., versus; yrs, years
7.9 Effects Plot (m2_coef_wt)
To visualize the effects of the covariates on odds of nonpharm_over_opioid = 1, I used ggplot() to plot the effects. Education covariates are placed on a separate axis given their size to not obscure the other covariates.
Code
m2_coef_plot_wt <- m2_coef_wt |>filter(Comparison !="(Intercept)", Group !="Education") |>mutate(Comparison =fct_inorder(Comparison, ordered =NA),Comparison =fct_rev(Comparison))m2_coef_plot_educ_wt <- m2_coef_wt |>filter(Comparison !="(Intercept)", Group =="Education") |>mutate(Comparison =fct_inorder(Comparison, ordered =NA),Comparison =fct_rev(Comparison))p1 <-ggplot(m2_coef_plot_wt, aes(x = Effect, y = Comparison)) +geom_vline(aes(xintercept =1), linewidth = .25, linetype ='dashed') +geom_errorbarh(aes(xmax = Upper0.95, xmin = Lower0.95), linewidth = .5,height = .2, color ='gray50') +geom_point(size =2.5, color = uhred) +labs(x ="aOR",y ="",title ="aOR of Using Nonpharmacologic Only vs. Opioids Only") +scale_x_continuous(breaks =seq(0,8,0.5))p2 <-ggplot(m2_coef_plot_educ_wt, aes(x = Effect, y = Comparison)) +geom_vline(aes(xintercept =1), linewidth = .25, linetype ='dashed') +geom_errorbarh(aes(xmax = Upper0.95, xmin = Lower0.95), linewidth = .5,height = .2, color ='gray50') +geom_point(size =2.5, color = uhred) +labs(x ="aOR",y ="") +scale_x_continuous(breaks =seq(0,8,0.5))p1/p2 +plot_layout(heights =c(5,1))
7.10 Summarizing Fit (m2_coef_wt)
To summarize the fit of m2_lrm_wt, I simply entered m2_lrm_wt within the code. From the output, the Nagelkerke \(R^2\) is 0.161 and the unvalidated area under the ROC curve (aka c-statistic) is 0.737.
Code
m2_lrm_wt
Logistic Regression Model
lrm(formula = nonpharm_over_opioid ~ age + sex + race_ethnicity +
education + fam_income1k + urban_rural + num_pain_locations +
painlimit_lifework + phq8total, data = m2_dataset_si, x = TRUE,
y = TRUE, weights = wt_final, normwt = TRUE)
Sum of Weights by Response Category
0 1
524.3515 2775.6485
Model Likelihood Discrimination Rank Discrim.
Ratio Test Indexes Indexes
Obs 3300 LR chi2 326.37 R2 0.161 C 0.737
0 510 d.f. 18 R2(18,3300)0.089 Dxy 0.475
1 2790 Pr(> chi2) <0.0001 R2(18,1323.1)0.208 gamma 0.475
Sum of weights3300 Brier 0.121 tau-a 0.124
max |deriv| 1e-07
Coef S.E. Wald Z
Intercept 2.1393 0.3310 6.46
age -0.0212 0.0033 -6.50
sex=Male -0.1856 0.1038 -1.79
race_ethnicity=Hispanic 0.2904 0.1785 1.63
race_ethnicity=NH Black/African American only -0.1415 0.1636 -0.87
race_ethnicity=Other single and multiple races 0.4662 0.2511 1.86
education=High school graduate/GED 0.4940 0.1454 3.40
education=Some college, no degree 0.7391 0.1627 4.54
education=Associate's or bachelor's degree 1.1555 0.1644 7.03
education=Master's, doctoral, or professional degree 1.3719 0.2696 5.09
fam_income1k 0.0074 0.0013 5.54
urban_rural=Medium and small metro 0.2832 0.1362 2.08
urban_rural=Large fringe metro 0.6227 0.1623 3.84
urban_rural=Large central metro 0.3709 0.1525 2.43
num_pain_locations 0.0004 0.0411 0.01
painlimit_lifework=Some days -0.6143 0.1669 -3.68
painlimit_lifework=Most days -0.7064 0.1879 -3.76
painlimit_lifework=Every day -1.0145 0.1773 -5.72
phq8total -0.0070 0.0097 -0.72
Pr(>|Z|)
Intercept <0.0001
age <0.0001
sex=Male 0.0739
race_ethnicity=Hispanic 0.1038
race_ethnicity=NH Black/African American only 0.3869
race_ethnicity=Other single and multiple races 0.0634
education=High school graduate/GED 0.0007
education=Some college, no degree <0.0001
education=Associate's or bachelor's degree <0.0001
education=Master's, doctoral, or professional degree <0.0001
fam_income1k <0.0001
urban_rural=Medium and small metro 0.0376
urban_rural=Large fringe metro 0.0001
urban_rural=Large central metro 0.0150
num_pain_locations 0.9913
painlimit_lifework=Some days 0.0002
painlimit_lifework=Most days 0.0002
painlimit_lifework=Every day <0.0001
phq8total 0.4712
7.11 Validating m2_coef_wt
To validate m2_lrm_wt, I first set a seed of 19900317 using the set.seed() function. I then used the validate() function with 40 bootstrap repetitions to generate validated (index.corrected) \(R^2\) and Dxy values. I stored this validated summary as val_m2 so that I could obtain specific values for a summary table below.
Code
set.seed(19900317)val_m2 <-validate(m2_lrm_wt, method ="boot", B =40)
Code
fit_report <-tibble(Model ="m2") |>mutate("Validated R2"=case_when( Model =="m2"~ val_m2[2,5]),"Validated C stat"=case_when( Model =="m2"~0.5+ (val_m2[1,5]/2)),AIC =AIC(m2_lrm_wt),BIC =BIC(m2_lrm_wt))fit_report |>select(Model, "Validated R2", "Validated C stat", AIC, BIC) |>flextable() |>colformat_double(j =2:3, digits =3) |>colformat_double(j =4:5, digits =0) |>autofit() |>theme_zebra(odd_header ="white") |>hline_top(part="body")
Model
Validated R2
Validated C stat
AIC
BIC
m2
0.148
0.728
2,601
2,717
7.12 References
Ford, C. (2016). Getting Started with Hurdle Models. University of Virginia Library. https://data.library.virginia.edu/getting-started-with-hurdle-models/
Kroenke, K., Spitzer, R. L., & Williams, J. B. (2001). The PHQ-9: validity of a brief depression severity measure. Journal of General Internal Medicine, 16(9), 606-613. https://doi.org/10.1046/j.1525-1497.2001.016009606.x
Löwe, B., Decker, O., Müller, S., Brähler, E., Schellberg, D., Herzog, W., & Herzberg, P. Y. (2008). Validation and standardization of the Generalized Anxiety Disorder Screener (GAD-7) in the general population. Medical Care, 46(3), 266-274. https://doi.org/10.1097/MLR.0b013e318160d093
National Center for Health Statistics. (2021, 2021-04-05). 2019 National Health Interview Survey. National Center for Health Statistics. https://www.cdc.gov/nchs/nhis/2019nhis.htm
Yoshida, K. (2014). Models for excess zeros using pscl package (Hurdle and zero-inflated regression models) and their interpretations. Retrieved May 1 from https://rpubs.com/kaz_yos/pscl-2
Zelaya, C. E., Dahlhamer, J. M., Lucas, J. W., & Connor, E. M. (2020). Chronic Pain and High-impact Chronic Pain Among U.S. Adults, 2019.
7.13 Acknowledgments
Robert J. Trager, DC (methodology, interpretation, literature review, first draft of introduction)
Jeffery A. Dusek, PhD (methodology, interpretation)
The National Center for Health Statistics (data collection)
---title: "Engagement in Integrative and Nonpharmacologic Pain Management Modalities Among Adults with Chronic Pain"subtitle: "Analysis of the 2019 National Health Interview Survey"author: "Samuel N. Rodgers-Melnick, MPH, MT-BC"date: last-modifiedformat: html: toc: true number-sections: true code-fold: show code-tools: true code-overflow: wrap embed-resources: true date-format: iso theme: spacelab---You can download the .qmd file, `NHISstudy2024-01-07GitHub.qmd`, from my GitHub page [here](https://github.com/samrodgersmelnick/NHIS2019ChronicPain).You can read the full research article in *Journal of Pain Research* [here](https://www.dovepress.com/engagement-in-integrative-and-nonpharmacologic-pain-management-modalit-peer-reviewed-fulltext-article-JPR)Rodgers-Melnick SN, Trager RJ, Love TE, Dusek JA. Engagement in Integrative and Nonpharmacologic Pain Management Modalities Among Adults with Chronic Pain: Analysis of the 2019 National Health Interview Survey. J Pain Res. Jan 16 2024;17:253-264. doi:10.2147/jpr.S439682## R Packages and Setup {.unnumbered}I loaded the following packages to ensure I had the necessary data cleaning, analysis, and visualization tools needed for this project. Given the number of packages, I used `conflict_prefer()` to ensure functions would not interfere with each other. I also loaded `uhred` for use in plots.```{r}#| message: false#| warning: falseknitr::opts_chunk$set(comment =NA) # do not remove thislibrary(janitor)library(gt)library(naniar)library(rms)library(patchwork)library(Epi)library(car)library(caret)library(GGally)library(pscl)library(VGAM)library(MASS)library(flextable)library(mosaic)library(pROC)library(ROCR)library(gtsummary)library(simputation)library(ggrepel)library(nnet)library(kableExtra)library(conflicted)library(survey)library(tidymodels)library(tidyverse)conflict_prefer("select", "dplyr")conflict_prefer("summarize", "dplyr")conflict_prefer("filter", "dplyr")conflict_prefer("mean", "base")conflict_prefer("sum", "base")conflict_prefer("vif", "car")conflict_prefer("count", "dplyr")uhred <-"#B70022"theme_set(theme_bw())```# Data Source## National Health Interview SurveyThe data for this project comes from the [2019 National Health Interview Survey (NHIS) Website](https://www.cdc.gov/nchs/nhis/2019nhis.htm) (National Center for Health Statistics, 2021). To obtain the data, I downloaded the [CSV data file for the Sample Adult Interview](https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NHIS/2019/adult19csv.zip) to my local hard drive and unzipped the `adult19.csv` file.According to the [National Health Interview Survey website](https://www.cdc.gov/nchs/nhis/about_nhis.htm):**Purpose:** "The main objective of the NHIS is to monitor the health of the United States population through the collection and analysis of data on a broad range of health topics. NHIS data are used widely throughout the Department of Health and Human Services (HHS) to monitor trends in illness and disability and to track progress toward achieving national health objectives. The data are also used by the public health research community for epidemiological and policy analysis of such timely issues as characterizing those with various health problems, determining barriers to accessing and using appropriate health care, and evaluating Federal health programs."**Design:** The NHIS is a cross-sectional household interview survey.**Target population:** "The target population for the NHIS is the civilian non-institutionalized population residing within the 50 states and the District of Columbia at the time of the interview. Persons excluded from the universe are those with no fixed household address (e.g., homeless and/or transient persons not residing in shelters), active duty military personnel, and civilians living on military bases, persons in long-term care institutions (e.g., nursing homes for the elderly, hospitals for the chronically ill or physically or intellectually disabled, and wards for abused or neglected children), persons in correctional facilities (e.g., prisons or jails, juvenile detention centers, and halfway houses), and U.S. nationals living in foreign countries. While active-duty Armed Forces personnel cannot be sampled for inclusion in the survey, any civilians residing with Armed Forces personnel in non-military housing are eligible to be sampled."**Sampling strategy:** "To keep survey operations manageable, cost-effective, and timely, the NHIS uses geographically clustered sampling techniques to select the sample of dwelling units for the NHIS. The sample is designed in such a way that each month's sample is nationally representative. Data collection on the NHIS is continuous, i.e., from January to December each year.""One 'sample adult' aged 18 years or older and one 'sample child' aged 17 years or younger (if any children live in the household) are randomly selected from each household following a brief initial interview that identifies everyone who usually lives or stays in the household. Information about the sample adult is collected from the sample adult herself or himself unless she or he is physically or mentally unable to do so, in which case a knowledgeable proxy can answer for the sample adult. Information about the sample child is collected from a parent or adult who is knowledgeable about and responsible for the health care of the sample child. This respondent may or may not also be the sample adult."**Data collection procedures:** "The U.S. Census Bureau, under a contractual agreement, is the data collection agent for the NHIS. NHIS data are collected continuously throughout the year by Census interviewers. Nationally, about 750 interviewers (also called 'Field Representatives') are trained and directed by health survey supervisors in the U.S. Census Bureau Regional Offices to conduct interviews for NHIS.The NHIS is conducted using computer-assisted personal interviewing. Face-to-face interviews are conducted in respondents' homes, but follow-ups to complete interviews may be conducted over the telephone. A telephone interview may also be conducted when the respondent requests a telephone interview or when road conditions or travel distances would make it difficult to schedule a personal visit before the required completion date. In 2019, 34.3% of the Sample Adult interviews and 31.7% of the Sample Child interviews were conducted at least partially by telephone."## The SubjectsSubjects included in this project are adults (18 years and older) with chronic pain as defined by responding "most days" or "every day" to the survey question, "In the past 3 months, how often did you have pain?" (Zelaya et al., 2020).# Research QuestionsWhich demographic, mental health, and pain-related variables are associated with engagement in: (1) the number of IHM modalities used for pain and (2) exclusively nonpharmacologic modalities rather than opioid utilization among U.S. adults with chronic pain.# Data Ingest and Management## Loading the Raw DataI ingested the `adult19.csv` raw data that I downloaded and unzipped from the [2019 National Health Interview Survey (NHIS) Website](https://www.cdc.gov/nchs/nhis/2019nhis.htm) as `projectb_raw` using the `read_csv()` function.```{r, warning = FALSE}projectb_raw <-read_csv("data/adult19.csv", show_col_types =FALSE)```## Cleaning the DataIn the following code, I created a new tibble, `projectb_clean`, limited to respondents who stated that they had pain on most days (PAIFRQ3M_A = 3) or everyday (PAIFRQ3M_A = 4) in response to the question PAIFRQ3M_A, "In the past three months, how often did you have pain?" as this matched the chronic pain definition reported by the National Center for Health Statistics (Zelaya et al., 2020). I also set the filter at PAIFRQ3M_A \< 7 to exclude responses that were missing, refused, non ascertained, or "don't know" in response to the chronic pain question.After filtering to the chronic pain cohort, I selected the variables from the NHIS that I needed for this analysis.```{r}projectb_clean <- projectb_raw |>filter(PAIFRQ3M_A >=3& PAIFRQ3M_A <7) |>#Limited to pain on most days or everyday per Zelaya et al. (2020)select(subject_id = HHX, age = AGEP_A, sex = SEX_A,wt_final = WTFA_A,education = EDUC_A, urban_rural = URBRRL,painlimit_lifework = PAIWKLM3M_A,anymeds12mo = RX12M_A, opioids12mo = OPD12M_A, opioids3mo = OPD3M_A, opioids_chonpain3mo = OPDCHRONIC_A, opioid_freq3mo = OPDFREQ_A, GAD71_A, GAD72_A, GAD73_A, GAD74_A, GAD75_A, GAD76_A, GAD77_A, anx_severity = GADCAT_A,back_pain = PAIBACK3M_A, upper_ext_pain = PAIULMB3M_A,lower_ext_pain = PAILLMB3M_A, headache_migraine = PAIHDFC3M_A,abd_pelvic_pain = PAIAPG3M_A, tooth_jaw_pain = PAITOOTH3M_A,chiro = PAICHIRO_A, yoga_taichi = PAIYOGA_A, massage = PAIMASSAGE_A, meditation = PAIMEDITAT_A,race_ethnicity = HISPALLP_A, fam_income = FAMINCTC_A,pt_rehab_ot = PAIPHYSTPY_A, talk_therapy = PAITALKTPY_A,chron_pain_selfmgmt = PAIPROGRAM_A, peer_support_grp = PAIGROUP_A, PHQ81_A, PHQ82_A, PHQ83_A, PHQ84_A, PHQ85_A, PHQ86_A, PHQ87_A, PHQ88_A, dep_severity = PHQCAT_A)```### Recode Refused, Not Ascertained, and Don't Know as MissingUsing the [2019 NHIS Codebook](https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/NHIS/2019/adult-codebook.pdf), I recoded all responses that were Refused, Not Ascertained, or Don't Know as `NA`.```{r}projectb_clean$age[projectb_clean$age >=97] <-NAprojectb_clean$sex[projectb_clean$sex >=7] <-NAprojectb_clean$education[projectb_clean$education >=97] <-NAprojectb_clean$painlimit_lifework[projectb_clean$painlimit_lifework >=7] <-NAprojectb_clean$anymeds12mo[projectb_clean$anymeds12mo >=7] <-NAprojectb_clean$opioids12mo[projectb_clean$opioids12mo >=7] <-NAprojectb_clean$opioids3mo[projectb_clean$opioids3mo >=7] <-NAprojectb_clean$opioids_chonpain3mo[projectb_clean$opioids_chonpain3mo >=7] <-NAprojectb_clean$opioid_freq3mo[projectb_clean$opioid_freq3mo >=7] <-NAprojectb_clean$GAD71_A[projectb_clean$GAD71_A >=7] <-NAprojectb_clean$GAD72_A[projectb_clean$GAD72_A >=7] <-NAprojectb_clean$GAD73_A[projectb_clean$GAD73_A >=7] <-NAprojectb_clean$GAD74_A[projectb_clean$GAD74_A >=7] <-NAprojectb_clean$GAD75_A[projectb_clean$GAD75_A >=7] <-NAprojectb_clean$GAD76_A[projectb_clean$GAD76_A >=7] <-NAprojectb_clean$GAD77_A[projectb_clean$GAD77_A >=7] <-NAprojectb_clean$anx_severity[projectb_clean$anx_severity >=8] <-NAprojectb_clean$back_pain[projectb_clean$back_pain >=7] <-NAprojectb_clean$upper_ext_pain[projectb_clean$upper_ext_pain >=7] <-NAprojectb_clean$lower_ext_pain[projectb_clean$lower_ext_pain >=7] <-NAprojectb_clean$headache_migraine[projectb_clean$headache_migraine >=7] <-NAprojectb_clean$abd_pelvic_pain[projectb_clean$abd_pelvic_pain >=7] <-NAprojectb_clean$tooth_jaw_pain[projectb_clean$tooth_jaw_pain >=7] <-NAprojectb_clean$chiro[projectb_clean$chiro >=7] <-NAprojectb_clean$yoga_taichi[projectb_clean$yoga_taichi >=7] <-NAprojectb_clean$massage[projectb_clean$massage >=7] <-NAprojectb_clean$meditation[projectb_clean$meditation >=7] <-NAprojectb_clean$race_ethnicity[projectb_clean$race_ethnicity >=97] <-NAprojectb_clean$pt_rehab_ot[projectb_clean$pt_rehab_ot >=7] <-NAprojectb_clean$talk_therapy[projectb_clean$talk_therapy >=7] <-NAprojectb_clean$chron_pain_selfmgmt[projectb_clean$chron_pain_selfmgmt >=7] <-NAprojectb_clean$peer_support_grp[projectb_clean$peer_support_grp >=7] <-NAprojectb_clean$PHQ81_A[projectb_clean$PHQ81_A >=7] <-NAprojectb_clean$PHQ82_A[projectb_clean$PHQ82_A >=7] <-NAprojectb_clean$PHQ83_A[projectb_clean$PHQ83_A >=7] <-NAprojectb_clean$PHQ84_A[projectb_clean$PHQ84_A >=7] <-NAprojectb_clean$PHQ85_A[projectb_clean$PHQ85_A >=7] <-NAprojectb_clean$PHQ86_A[projectb_clean$PHQ86_A >=7] <-NAprojectb_clean$PHQ87_A[projectb_clean$PHQ87_A >=7] <-NAprojectb_clean$PHQ88_A[projectb_clean$PHQ88_A >=7] <-NAprojectb_clean$dep_severity[projectb_clean$dep_severity >=8] <-NA```### Recode Binary Variables From (1 = Yes, 2 = No) to (1 = Yes, 0 = No)In the following code, I recoded binary variables with initial responses in the format, 1 = Yes, 2 = No, to binary variables coded as 1 = Yes, 0 = No using the `mutate()` function, subtracting 2 from the variables `chiro`, `yoga_taichi`, `massage`, `meditation`, `anymeds12mo`, `opioids12mo`, `opioids3mo`, `opioids_chonpain3mo`, `pt_rehab_ot`, `talk_therapy`, `chron_pain_selfmgmt`, and `peer_support_grp`.```{r}projectb_clean <- projectb_clean |>mutate(chiro =as.numeric(2- chiro),yoga_taichi =as.numeric(2- yoga_taichi),massage =as.numeric(2- massage),meditation =as.numeric(2- meditation),anymeds12mo =as.numeric(2- anymeds12mo),opioids12mo =as.numeric(2- opioids12mo),opioids3mo =as.numeric(2- opioids3mo),opioids_chonpain3mo =as.numeric(2- opioids_chonpain3mo),pt_rehab_ot =as.numeric(2- pt_rehab_ot),talk_therapy =as.numeric(2- talk_therapy),chron_pain_selfmgmt =as.numeric(2- chron_pain_selfmgmt),peer_support_grp =as.numeric(2- peer_support_grp))```### Recode Categorical Variables to Meaningful FactorsUsing the [2019 NHIS Codebook](https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/NHIS/2019/adult-codebook.pdf), I recoded the following variables to meaningful, ascending, and ordered factors using the `mutate()` and `factor()` functions. To make `race_ethnicity` 6 categories, I combined "NH Amer Indian Alaska Natv only" and "NH Amer Indian Alaska Natv and any other group" into "NH Amer Indian Alaska Natv."```{r}projectb_clean <- projectb_clean |>mutate(sex =factor(case_when( sex ==1~"Male", sex ==2~"Female"))) |>mutate(race_ethnicity =case_when( race_ethnicity ==1~"Hispanic", race_ethnicity ==2~"NH White only", race_ethnicity ==3~"NH Black/African American only", race_ethnicity ==4~"NH Asian only", race_ethnicity ==5~"NH Amer Indian Alaska Natv", race_ethnicity ==6~"NH Amer Indian Alaska Natv", race_ethnicity ==7~"Other single and multiple races")) |>mutate(race_ethnicity =factor(race_ethnicity, levels =c("NH White only", "Hispanic", "NH Black/African American only", "NH Asian only", "NH Amer Indian Alaska Natv","Other single and multiple races"))) |>mutate(painlimit_lifework =case_when( painlimit_lifework ==1~"Never", painlimit_lifework ==2~"Some days", painlimit_lifework ==3~"Most days", painlimit_lifework ==4~"Every day")) |>mutate(painlimit_lifework =factor(painlimit_lifework, levels =c("Never", "Some days", "Most days", "Every day"))) |>mutate(anx_severity =case_when( anx_severity ==1~"None/Minimal", anx_severity ==2~"Mild", anx_severity ==3~"Moderate", anx_severity ==4~"Severe")) |>mutate(anx_severity =factor(anx_severity, levels =c("None/Minimal", "Mild", "Moderate", "Severe"))) |>mutate(dep_severity =case_when( dep_severity ==1~"None/Minimal", dep_severity ==2~"Mild", dep_severity ==3~"Moderate", dep_severity ==4~"Severe")) |>mutate(dep_severity =factor(dep_severity, levels =c("None/Minimal", "Mild", "Moderate", "Severe"))) |>mutate(opioid_freq3mo =case_when( opioid_freq3mo ==1~"Some days", opioid_freq3mo ==2~"Most days", opioid_freq3mo ==3~"Every day")) |>mutate(opioid_freq3mo =factor(opioid_freq3mo, levels =c("Some days", "Most days", "Every day"))) |>mutate(education =case_when( education ==0~"Less than high school", education ==1~"Less than high school", education ==2~"Less than high school", education ==3~"High school graduate/GED", education ==4~"High school graduate/GED", education ==5~"Some college, no degree", education ==6~"Associate's or bachelor's degree", education ==7~"Associate's or bachelor's degree", education ==8~"Associate's or bachelor's degree", education ==9~"Master's, doctoral, or professional degree", education ==10~"Master's, doctoral, or professional degree", education ==11~"Master's, doctoral, or professional degree")) |>mutate(education =factor(education, levels =c("Less than high school", "High school graduate/GED","Some college, no degree","Associate's or bachelor's degree","Master's, doctoral, or professional degree"))) |>mutate(urban_rural =case_when( urban_rural ==1~"Large central metro", urban_rural ==2~"Large fringe metro", urban_rural ==3~"Medium and small metro", urban_rural ==4~"Nonmetropolitan")) |>mutate(urban_rural =factor(urban_rural, levels =c("Nonmetropolitan", "Medium and small metro", "Large fringe metro","Large central metro")))```### Calculate GAD-7 totalIn the following code, I recoded the original NHIS responses to conform to the actual scoring method for the GAD-7 (Löwe et al., 2008). I then calculated `gad7total` by adding each of the individual GAD-7 responses. Finally, to verify that `gad7total` was scored correctly, I summarized `gad7total` by `anx_severity` using `favstats()` to ensure that the values were in range. Interestingly, in the 2019 NHIS, it appears that they scored subjects even if they had missing individual items on the GAD-7.```{r}projectb_clean <- projectb_clean |>mutate(GAD71_A =case_when( GAD71_A ==1~0, GAD71_A ==2~1, GAD71_A ==3~2, GAD71_A ==4~3)) |>mutate(GAD72_A =case_when( GAD72_A ==1~0, GAD72_A ==2~1, GAD72_A ==3~2, GAD72_A ==4~3)) |>mutate(GAD73_A =case_when( GAD73_A ==1~0, GAD73_A ==2~1, GAD73_A ==3~2, GAD73_A ==4~3)) |>mutate(GAD74_A =case_when( GAD74_A ==1~0, GAD74_A ==2~1, GAD74_A ==3~2, GAD74_A ==4~3)) |>mutate(GAD75_A =case_when( GAD75_A ==1~0, GAD75_A ==2~1, GAD75_A ==3~2, GAD75_A ==4~3)) |>mutate(GAD76_A =case_when( GAD76_A ==1~0, GAD76_A ==2~1, GAD76_A ==3~2, GAD76_A ==4~3)) |>mutate(GAD77_A =case_when( GAD77_A ==1~0, GAD77_A ==2~1, GAD77_A ==3~2, GAD77_A ==4~3)) |>mutate(gad7total = GAD71_A + GAD72_A + GAD73_A + GAD74_A + GAD75_A + GAD76_A + GAD77_A)favstats(gad7total ~ anx_severity, data = projectb_clean) |>flextable() |>colformat_double(digits =2) |>theme_zebra(odd_header ="white") |>hline_top(part="body")```### Calculate PHQ-8 TotalIn the following code, I recoded the original NHIS responses to conform to the actual scoring method for the PHQ-8 (Kroenke et al., 2001). I then calculated `phq8total` by adding each of the individual PHQ-8 responses.Finally, to verify that `phq8total` was scored correctly, I summarized `phq8total` by `dep_severity` using `favstats()` to ensure that the values were in range. Interestingly, in the 2019 NHIS, it appears that they dropped "moderate-to-severe depression" as a value given that they did not use the full PHQ-9, and they scored subjects even if they had missing individual items on the PHQ-8.```{r}projectb_clean <- projectb_clean |>mutate(PHQ81_A =case_when( PHQ81_A ==1~0, PHQ81_A ==2~1, PHQ81_A ==3~2, PHQ81_A ==4~3)) |>mutate(PHQ82_A =case_when( PHQ82_A ==1~0, PHQ82_A ==2~1, PHQ82_A ==3~2, PHQ82_A ==4~3)) |>mutate(PHQ83_A =case_when( PHQ83_A ==1~0, PHQ83_A ==2~1, PHQ83_A ==3~2, PHQ83_A ==4~3)) |>mutate(PHQ84_A =case_when( PHQ84_A ==1~0, PHQ84_A ==2~1, PHQ84_A ==3~2, PHQ84_A ==4~3)) |>mutate(PHQ85_A =case_when( PHQ85_A ==1~0, PHQ85_A ==2~1, PHQ85_A ==3~2, PHQ85_A ==4~3)) |>mutate(PHQ86_A =case_when( PHQ86_A ==1~0, PHQ86_A ==2~1, PHQ86_A ==3~2, PHQ86_A ==4~3)) |>mutate(PHQ87_A =case_when( PHQ87_A ==1~0, PHQ87_A ==2~1, PHQ87_A ==3~2, PHQ87_A ==4~3)) |>mutate(PHQ88_A =case_when( PHQ88_A ==1~0, PHQ88_A ==2~1, PHQ88_A ==3~2, PHQ88_A ==4~3)) |>mutate(phq8total = PHQ81_A + PHQ82_A + PHQ83_A + PHQ84_A + PHQ85_A + PHQ86_A + PHQ87_A + PHQ88_A)favstats(phq8total ~ dep_severity, data = projectb_clean) |>flextable() |>colformat_double(digits =2) |>theme_zebra(odd_header ="white") |>hline_top(part="body")```### Integrative Health and Medicine (IHM) VariablesIn the following code, I performed the following steps to create the 5-level count outcome, `ihm4pain_num`, and the binary variable, `ihm4pain_bin`.- First, I used the `mutate()` function to create `ihm4pain_num` as the sum of engaging in `chiro`, `yoga_taichi`, `massage`, or `meditation` to manage pain in the past 3 months.- Then, I created the binary outcome, `ihm4pain_bin`, which was equal to 1 if `ihm4pain_num`\>= 1 and equal to 0 if `ihm4pain_num` == 0.```{r}projectb_clean <- projectb_clean |>mutate(ihm4pain_num = chiro + yoga_taichi + massage + meditation,ihm4pain_bin =case_when(ihm4pain_num >=1~1, ihm4pain_num ==0~0))```### Opioid frequencyTo create a single variable describing frequency of opioid use in the past 3 months to manage chronic pain, I performed the following steps:- First, since 1) the opioid frequency question was only asked to subjects who were taking prescription medications and opioid medications and 2) subjects responding "No" to the prescription medication and opioid use questions would be assumed to have never taken opioids in the past 3 months, I coded `opioid_freq3mo_final` as equal to "Never" if the subject answered "No" to `anymeds12mo`, `opioids12mo`, `opioids3mo`, or `opioids_chonpain3mo`.- Next, if subjects responded "Some days", "Most days", "Every day" to the `opioid_freq3mo` question, I coded the `opioid_freq3mo_final` variable as equal to that response.- Finally, I coded `opioid_freq3mo_final` as a meaningfully ordered factor.```{r}projectb_clean <- projectb_clean |>mutate(opioid_freq3mo_final =case_when( anymeds12mo ==0| opioids12mo ==0| opioids3mo ==0| opioids_chonpain3mo ==0~"Never", opioid_freq3mo %in%c("Some days", "Most days", "Every day") ~ opioid_freq3mo),opioid_freq3mo_final =factor(opioid_freq3mo_final, levels =c("Never","Some days","Most days","Every day")))```### Number of Pain LocationsTo create a single variable quantifying the total number of locations in which subjects reported being bothered at least "a little" by pain in the past three months, I performed the following steps:- I converted the 6 individual pain variables to binary variables such that 1 indicated being bothered at least "a little" (\>= 2), and 0 meant "Not at all."- I then calculated `num_pain_locations` as the sum of those binary variables, ignoring missingness using the `rowSums()` function.- Finally, I used `mutate()` and `if_else()` to mark `num_pain_locations` as missing if all pain locations were missing.```{r}projectb_clean <- projectb_clean |>mutate(back_pain =if_else(back_pain >=2, 1, 0),upper_ext_pain =if_else(upper_ext_pain >=2, 1, 0),lower_ext_pain =if_else(lower_ext_pain >=2, 1, 0),headache_migraine =if_else(headache_migraine >=2, 1, 0),abd_pelvic_pain =if_else(abd_pelvic_pain >=2, 1, 0),tooth_jaw_pain =if_else(tooth_jaw_pain >=2, 1, 0))projectb_clean$num_pain_locations <-rowSums(projectb_clean[,c("back_pain","upper_ext_pain","lower_ext_pain","headache_migraine","abd_pelvic_pain","tooth_jaw_pain")], na.rm=TRUE)projectb_clean <- projectb_clean |>mutate(num_pain_locations =if_else(is.na(back_pain) &is.na(upper_ext_pain) &is.na(lower_ext_pain) &is.na(headache_migraine) &is.na(abd_pelvic_pain) &is.na(tooth_jaw_pain), NA, num_pain_locations))```### Chronic Pain Treatment Category - SUPPORTIn the following code, I performed the following steps to create the 4-category outcome, `pain_tx_type`.- First, I used the `mutate()` function to create `support4pain_num` as the sum of engaging in `pt_rehab_ot`, `talk_therapy`, `chron_pain_selfmgmt`, or `peer_support_grp` to manage pain in the past 3 months. In this case, **SUPPORT** stands for **SU**pportive, **P**hysical, **P**sychological, **O**ccupational, or **R**ehabilitative **T**herapies.- Next, I created the binary outcome, `support4pain_bin`, which was equal to 1 if `support4pain_num`\>= 1 and equal to 0 if `support4pain_num` == 0.- Then, I created the outcome variable `pain_tx_type` as equal to None, IHM only, SUPPORT only, or SUPPORT and IHM based on the the `support4pain_num` and `ihm4pain_bin` variables.- Finally, I ensured `pain_tx_type` was coded as a factor using the `factor` function, with "None" as the reference category.```{r}projectb_clean <- projectb_clean |>mutate(support4pain_num = pt_rehab_ot + talk_therapy + chron_pain_selfmgmt + peer_support_grp,support4pain_bin =case_when(support4pain_num >=1~1, support4pain_num ==0~0),pain_tx_type =case_when( ihm4pain_bin ==0& support4pain_bin ==0~"None", ihm4pain_bin ==1& support4pain_bin ==0~"IHM only", ihm4pain_bin ==0& support4pain_bin ==1~"SUPPORT only", ihm4pain_bin ==1& support4pain_bin ==1~"SUPPORT and IHM"),pain_tx_type =factor(pain_tx_type, levels =c("None", "IHM only", "SUPPORT only", "SUPPORT and IHM")))```### Chronic Pain Treatment Category - Opioids```{r}projectb_clean <- projectb_clean |>mutate(nonpharm_opioid =case_when( opioid_freq3mo_final =="Never"& ihm4pain_bin ==0& support4pain_bin ==0~"None", opioid_freq3mo_final %in%c("Some days","Most days","Every day") & ihm4pain_bin ==0& support4pain_bin ==0~"Opioids only", opioid_freq3mo_final =="Never"& (ihm4pain_bin ==1|support4pain_bin ==1) ~"Nonpharm only", opioid_freq3mo_final %in%c("Some days","Most days","Every day") & (ihm4pain_bin ==1|support4pain_bin ==1) ~"Opioids + Nonpharm"),nonpharm_opioid =factor(nonpharm_opioid, levels =c("None", "Opioids only", "Nonpharm only", "Opioids + Nonpharm")))projectb_clean |>tabyl(nonpharm_opioid) |>adorn_pct_formatting() |>flextable() |>colformat_double(digits =0) |>autofit() |>theme_zebra(odd_header ="white") |>hline_top(part="body")```### IHM Category```{r}projectb_clean <- projectb_clean |>mutate(manualtx4pain =case_when( chiro + massage >=1~1, chiro + massage ==0~0),mindbodytx4pain =case_when( yoga_taichi + meditation >=1~1, yoga_taichi + meditation ==0~0),ihm4pain_cat =case_when( manualtx4pain + mindbodytx4pain ==0~"None", manualtx4pain ==1& mindbodytx4pain ==0~"Manual only", manualtx4pain ==0& mindbodytx4pain ==1~"Mind-body only", manualtx4pain ==1& mindbodytx4pain ==1~"Manual and mind-body"),ihm4pain_cat =factor(ihm4pain_cat, levels =c("None", "Manual only", "Mind-body only", "Manual and mind-body")))projectb_clean |>tabyl(ihm4pain_cat) |>adorn_pct_formatting() |>flextable() |>colformat_double(digits =0) |>autofit() |>theme_zebra(odd_header ="white") |>hline_top(part="body")```### Selecting Variables I Used and Arranging the TibbleWith all variables recoded, I created the `projectb_data` tibble as a subset of `projectb_clean`, limited to the variables needed for analysis and subjects with complete data on the outcomes, `ihm4pain_num` and `nonpharm_opioid`. I also renamed `opioid_freq3mo_final` as `opioid_freq3mo`.```{r}projectb_data <- projectb_clean |>select(subject_id, age, sex, education, urban_rural, painlimit_lifework, opioid_freq3mo = opioid_freq3mo_final, gad7total, phq8total, num_pain_locations, ihm4pain_num, race_ethnicity, fam_income, nonpharm_opioid, ihm4pain_bin, ihm4pain_cat, wt_final) |>filter(!is.na(ihm4pain_num), !is.na(nonpharm_opioid))```### Working with Categorical PredictorsTo examine distribution of categorical predictors, I used the `tbl_summary()` function to summarize the factor variables in `projectb_data`.```{r}projectb_data |>select(where(~is.factor(.x))) |>tbl_summary() |>modify_header(label ~"**Variable**") |>bold_labels()```In examining the table above, I decided that the variable `race_ethnicity` required some collapsing of categories to ensure all categories contained at least 25 subjects in the testing and training samples. Therefore, I combined the `race_ethnicity` categories "NH Asian only", "NH Amer Indian Alaska Natv", and "Other single and multiple races" into "Other single and multiple races."```{r}projectb_data <- projectb_data |>mutate(race_ethnicity =case_when( race_ethnicity =="NH White only"~"NH White only", race_ethnicity =="Hispanic"~"Hispanic", race_ethnicity =="NH Black/African American only"~"NH Black/African American only", race_ethnicity =="NH Asian only"~"Other single and multiple races", race_ethnicity =="NH Amer Indian Alaska Natv"~"Other single and multiple races", race_ethnicity =="Other single and multiple races"~"Other single and multiple races")) |>mutate(race_ethnicity =factor(race_ethnicity, levels =c("NH White only", "Hispanic", "NH Black/African American only", "Other single and multiple races")))```## Print and Save The Final TibbleNext, I verified that `projectb_data` was a tibble by printing it. In this tibble, I can confirm that:- It is a tibble and thus calling it prints only the first 10 rows.- The initial row tells me that this is a tibble and specifies its dimensions.- I have a complete set of 7,114 NHIS subjects.- I've included only 17 variables```{r}projectb_data```### Save The TibbleFinally, I saved the final tibble, `projectb_data`, as an Rds file using the `saveRDS()` function.```{r}saveRDS(projectb_data, file ="data/projectb_data2024-01-07.Rds")```# Code Book and Description## Defining the Variables7,114 adults ages 18-85 with pain on most days or every day in the past 3 months and participating in the 2019 NHIS with data on the variables listed in the table below.For each of my 17 selected variables, I have included the original variable name from the raw NHIS file.Variable | Role | Type | Description (units/levels) | Original NHIS Variable Name--------- | ---- | ---- | :------------------------------: | ---------- |`subject_id` | identifier | - | Randomly assigned household number unique to a household (H000074-H064180) | HHX`age` | input | quant | Age of subject | AGEP_A`sex` | input | 2-cat | Sex (Male or Female) | SEX_A`education` | input | 5-cat | Education level (Less than high school; High school graduate/GED; Some college, no degree; Associate's or bachelor's degree; Master's, doctoral, or professional degree) | EDUC_A`urban_rural` | input | 4-cat | Urban-rural classification scheme for counties (Large central metro, Large fringe metro, Medium and small metro, Non-metropolitan) | URBRRL`painlimit_lifework` | input | 4-cat | How often pain limited the subject's life or work activities in the past 3 months (Never, Some days, Most days, or Every day) | PAIWKLM3M_A`opioid_freq3mo` | input | 4-cat | How often the subject took a prescription opioid for chronic pain in the past three months (Never, Some days, Most days, or Every day) | RX12M_A, OPD12M_A, OPD3M_A, OPDCHRONIC_A, OPDFREQ_A`gad7total` | input | quant | Subject's total score on the GAD-7 anxiety scale (scale units 0 - 21) | **SUM(**GAD71_A, GAD72_A, GAD73_A, GAD74_A, GAD75_A, GAD76_A, GAD77_A**)**`phq8total` | input | quant | Subject's total score on the PHQ-8 depression scale (scale units 0 - 24) | **SUM(**PHQ81_A, PHQ82_A, PHQ83_A, PHQ84_A, PHQ85_A, PHQ86_A, PHQ87_A, PHQ88_A**)**`num_pain_locations` | input | quant | Total number of locations in which subject reported being bothered at least "a little" by pain in the past three months (0 - 6) | PAIBACK3M_A, PAIULMB3M_A, PAILLMB3M_A, PAIHDFC3M_A, PAIAPG3M_A, PAITOOTH3M_A`ihm4pain_num` | outcome | quant | How many integrative health and medicine (IHM) modalities (i.e., chiropractic care, yoga/Tai Chi, meditation/guided imagery, or massage) subject used to manage pain in the past three months (0, 1, 2, 3, or 4) | PAICHIRO_A, PAIYOGA_A, PAIMASSAGE_A, PAIMEDITAT_A`race_ethnicity` | input | 4-cat | Race and Hispanic origin (NH White only, Hispanic, NH Black/African American only, or Other single and multiple races) | HISPALLP_A`fam_income` | input | quant | Adult annual family income | FAMINCTC_A`nonpharm_opioid` | outcome | 4-cat | Engagement in one of the following categories of care for pain in the past three months (None, Opioids only, Nonpharm only, Opioids + Nonpharm).| PAICHIRO_A, PAIYOGA_A, PAIMASSAGE_A, PAIMEDITAT_A, PAIPHYSTPY_A, PAITALKTPY_A, PAIPROGRAM_A, PAIGROUP_A, RX12M_A, OPD12M_A, OPD3M_A, OPDCHRONIC_A, OPDFREQ_A`ihm4pain_bin` | outcome | quant | Whether or not subject engaged in integrative health and medicine (IHM) modalities (i.e., chiropractic care, yoga/Tai Chi, meditation/guided imagery, or massage) to manage pain in the past three months (0, 1, 2, 3, or 4) | PAICHIRO_A, PAIYOGA_A, PAIMASSAGE_A, PAIMEDITAT_A`ihm4pain_cat` | outcome | quant | Which type of (IHM) modalities (i.e., chiropractic care, yoga/Tai Chi, meditation/guided imagery, or massage) subject used to manage pain in the past three months (None, Manual only, Mind-body only, Manual and mind-body) | PAICHIRO_A, PAIYOGA_A, PAIMASSAGE_A, PAIMEDITAT_A`wt_final` | weight | quant | Weight - Final Annual | WTFA_A## Numerical DescriptionThe following code provides a numerical summary of all variables included in the `projectb_data` dataset using the `describe()` function from the `Hmisc()` library.In examining the data, I confirm that:- There are no impossible values for `age` (i.e., \> 85 or \< 18), `fam_income`, or `phq8total` (i.e., \> 24 or \< 0).- Each factor variable is in an appropriate ascending order.- All categorical and binary variables have at least 30 observations in each category.```{r}describe(projectb_data)```# Single Imputation## MissingnessFor this analysis, I assumed that the data were **Missing at Random (MAR)**. Specifically, I assumed that the probability of a predictor missing data depended upon available information from the other predictors. For example, missing `phq8total` data may be related to other variables such as how much pain limited the subject's life or work activities (`painlimit_lifework`) or `race_ethnicity` rather than their level of depression. In this situation, certain subjects may have been less likely to rate their anxiety based on their other characteristics. Thus, single imputation was warranted and required as missingness could be accounted for by studying variables with complete information. Missing NOT at Random (MNAR) was not an appropriate assumption here as there was no indication in the data or the [2019 NHIS Codebook](https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/NHIS/2019/adult-codebook.pdf) that data were missing due to the variables themselves.### Missing ValuesTo explore missingness within the `projectb_data`, I first used the `n_miss()` to quantify total missing values.```{r}projectb_data |>n_miss()```### Missing Values Within VariablesI then used the `miss_var_summary()` to determine which variables contained those 249 missing values. These variables included `phq8total`, `gad7total`, `education`, `num_pain_locations`, `age`, `painlimit_lifework`, and `sex`. Given the relative infrequency of missing data (highest = 1.50%), I expected that performing imputation would not alter subsequent models in any meaningful way.```{r}miss_var_summary(projectb_data) |>flextable() |>colformat_double(digits =2) |>autofit() |>theme_zebra(odd_header ="white") |>hline_top(part="body")```### Missing Values within CasesFinally, to understand missing values within cases, I used the `miss_case_table()` function, which revealed that 159 cases were missing 1 value, 42 cases were missing 2 values, and 2 cases were missing 3 values```{r}miss_case_table(projectb_data) |>flextable() |>colformat_double(digits =0) |>autofit() |>theme_zebra(odd_header ="white") |>hline_top(part="body")```## Generating the Single Imputation DatasetTo perform single imputation and generate the `projectb_si` dataset, I performed the following steps:- Set a seed of 19900317 using the `set.seed()` function- Created `projectb_si` as a `data.frame()` derived from `projectb_data`- Used robust linear models to impute the quantitative variable `gad7total` based on the predictors with no missing values: `urban_rural`, `race_ethnicity`, `fam_income` and `opioid_freq3mo`, - Used robust linear models to impute the quantitative variable `phq8total` based on `gad7total` and the predictors with no missing values: `urban_rural`, `race_ethnicity`, `fam_income`, and `opioid_freq3mo`.- Used classification and regression trees (`impute_cart()`) to impute the categorical variable `education` based on the now complete variable, `phq8total`, and the other complete predictors.- Used predictive mean matching (`impute_pmm()`) to impute the quantitative integer variable `num_pain_locations` based on the now complete variable, `education`, and the other complete predictors as I wanted the imputation to be an integer.- Used robust linear models to impute the quantitative variable `age` based on the now complete variable, `num_pain_locations`, and the other complete predictors.- Used classification and regression trees to impute the categorical variable `painlimit_lifework` based on the now complete variable, `age`, and the other complete predictors.- Used classification and regression trees to impute the categorical variable `sex` based on the now complete variable, `painlimit_lifework`, and the other complete predictors.- Used the `as_tibble()` function to ensure `projectb_si` was stored as a tibble.- I performed mutations to use the variable one in the survey design and transform covariates so that coefficients would represent meaningful increases in (1) `age10`[10 unit increment increase in age]; (2) `gad7total5unit`[5-unit increases in GAD-7]; (3) `phq8total5unit`[5-unit increases in PHQ-8]; (4) `fam_income10k`[10,000 dollar increase in family income]; (5) `fam_income10k`[1,000 dollar increase in family income].- I created a meaningful factor for the `ihm4pain_bin_fact` variable (i.e., IHM vs. No IHM)- Used the `n_miss()` function to verify that `projectb_si` contained no missing values.```{r}set.seed(19900317)projectb_si <- projectb_data |>data.frame() |>impute_rlm(gad7total ~ urban_rural + race_ethnicity + fam_income + opioid_freq3mo) |>impute_rlm(phq8total ~ gad7total + urban_rural + race_ethnicity + fam_income + opioid_freq3mo) |>impute_cart(education ~ phq8total + gad7total + urban_rural + race_ethnicity + fam_income + opioid_freq3mo) |>impute_pmm(num_pain_locations ~ phq8total + opioid_freq3mo + education + gad7total + urban_rural + race_ethnicity + fam_income) |>impute_rlm(age ~ phq8total + num_pain_locations + opioid_freq3mo + education + gad7total + urban_rural + race_ethnicity + fam_income) |>impute_cart(painlimit_lifework ~ phq8total + age + num_pain_locations + opioid_freq3mo + education + gad7total + urban_rural + race_ethnicity + fam_income) |>impute_cart(sex ~ phq8total + painlimit_lifework + age + num_pain_locations + opioid_freq3mo + education + gad7total + urban_rural + race_ethnicity + fam_income) |>as_tibble() |>mutate(one =1,age10 = age/10,gad7total5unit = gad7total/5,phq8total5unit = phq8total/5,fam_income10k =round(fam_income/10000, digits =2),fam_income1k =round(fam_income/1000, digits =2),ihm4pain_bin_fact =factor(case_when(ihm4pain_bin ==0~"No IHM", ihm4pain_bin ==1~"IHM")))n_miss(projectb_si)```# Model 1: Engagement in IHM**For Model 1, I sought to answer the question:**> Which demographic, mental health, and pain-related variables are associated with engagement in the number of IHM modalities used for pain in the past 3 months among adults with chronic pain?In this analysis, I chose to predict `ihm4pain_num` based on 10 variables: `age10`, `sex`, `education`, `urban_rural`, `painlimit_lifework`, `opioid_freq3mo`, `phq8total5unit`, `num_pain_locations`, `race_ethnicity`, and `fam_income10k`.## Table 1: Bivariate AnalysisThe following code details the steps taken to generate Table 1.### Create Survey Design```{r}nhis_des <-svydesign(data = projectb_si, id =~ subject_id,weights =~ wt_final, nest =TRUE)```### Total population represented```{r}total_n <-svytotal(~ one, nhis_des, na.rm =FALSE) |>as_tibble() |>mutate(Variable ="Total, n",total =floor(total),total =formatC(total, format ="d", big.mark =","),p.value ="") |>select(Variable, All = total, p.value)``````{r}ihm_n <-svyby(~ one, ~ ihm4pain_bin, nhis_des, svytotal) |>as_tibble() |>filter(ihm4pain_bin ==1) |>select(IHM = one) |>mutate(IHM =round(IHM),IHM =formatC(IHM, format ="d", big.mark =","))no_ihm_n <-svyby(~ one, ~ ihm4pain_bin, nhis_des, svytotal) |>as_tibble() |>filter(ihm4pain_bin ==0) |>select(NoIHM = one) |>mutate(NoIHM =round(NoIHM),NoIHM =formatC(NoIHM, format ="d", big.mark =","))total_n <-bind_cols(total_n, ihm_n, no_ihm_n) |>mutate(All =as.character(All),IHM =as.character(IHM),NoIHM =as.character(NoIHM))``````{r}total_n <- total_n |>select(Variable, All, IHM, NoIHM, p.value) total_n |>flextable() |>autofit() |>bold(bold =TRUE, part ="header") |>theme_zebra(odd_header ="white") |>hline_top(part="body")```### Crosstabulation```{r}style_percent_1digit <- purrr::partial(gtsummary::style_percent, digits =1)style_number_1digit <- purrr::partial(gtsummary::style_number, digits =1)table1_cross <- nhis_des |>tbl_svysummary(include =c(age, sex, race_ethnicity, education, fam_income1k, urban_rural, num_pain_locations, painlimit_lifework, opioid_freq3mo, phq8total),type =c(num_pain_locations ="continuous",fam_income ="continuous"),statistic =list(all_continuous() ~"{mean}",all_categorical() ~"{p}"),label =list(age ~"Age in years, mean (95% CI)", sex ~"Sex, % (95% CI)", race_ethnicity ~"Race/Ethnicity, % (95% CI)", education ~"Education, % (95% CI)", fam_income1k ~"Family income in $1000s, mean (95% CI)", urban_rural ~"Proximity to metro area, % (95% CI)", num_pain_locations ~"Number of pain locations, mean (95% CI)", painlimit_lifework ~"Pain limiting life/work, % (95% CI)", opioid_freq3mo ~"Opioid use in past 3 months, % (95% CI)", phq8total ~"PHQ-8 total, mean (95% CI)"),digits =list(all_continuous() ~1,all_categorical() ~1),by = ihm4pain_bin_fact) |>add_overall() |>add_ci(pattern ="{stat} ({ci})", style_fun =list(all_categorical() ~ style_percent_1digit, all_continuous() ~ style_number_1digit),statistic =list(all_categorical() ~"{conf.low}-{conf.high}", all_continuous() ~"{conf.low}-{conf.high}")) |>add_p(pvalue_fun =function(x) style_pvalue(x, digits =3)) |>modify_header(label ="Variable", stat_0 ~"All", stat_1 ~"IHM", stat_2 ~"NoIHM", p.value ~"p.value")table1_cross```### Assemble Table 1 Tibble```{r}table1_cross <- table1_cross |>as.data.frame() |>as_tibble()table1_cross[is.na(table1_cross)] <-""table1_cross$p.value <-sub("0\\.",".", table1_cross$p.value)table1_cross$p.value <-sub(" ","", table1_cross$p.value)table1 <-bind_rows(total_n, table1_cross) |>rename("No IHM"= NoIHM,"All with chronic pain"= All,"Weighted variable"= Variable,"P value"= p.value)```### Print Table 1I used flextable to format Table 1.```{r}table1 |>flextable() |>add_header_lines(values ="Table 1: Socio-Demographic, Pain, and Mental Health Characteristics by IHM Engagement") |>bold(bold =TRUE, part ="header") |>bold(i =c(1, 2, 3, 6, 11, 17, 18, 23:24, 29, 34), j =1, bold =TRUE) |>padding(i =c(4:5, 7:10, 12:16, 19:22, 25:28, 30:33), j =1, padding.left =20) |>add_footer_lines(values ="Abbreviations: CI, confidence interval; PHQ-8, Patient Health Questionnaire-8; GED, general equivalency diploma; IHM, integrative health and medicine; metro, metropolitan; NH, Non-Hispanic",top =FALSE) |> flextable::footnote(i =c(2, 17, 23, 34), j =5, ref_symbols ="a",value =as_paragraph("Wilcoxon rank-sum test for complex survey samples")) |> flextable::footnote(i =c(3, 6, 11, 18, 24, 29), j =5, ref_symbols ="b",value =as_paragraph("Chi-squared test with Rao & Scott’s second-order correction")) |>autofit(part =c("body", "header"))```## Splitting into Training and Testing SamplesI then used the `initial_split()`, `training()`, and `testing()` functions to generate `projectb_si_train` (training data) and `projectb_si_test` (testing data), each with *n* = 3557 subjects.```{r}set.seed(19900317)projectb_split <-initial_split(projectb_si, prop =0.5)projectb_si_train <-training(projectb_split)projectb_si_test <-testing(projectb_split)nrow(projectb_si_train)nrow(projectb_si_test)```## Scatterplot Matrix and Collinearity### Scatterplot Matrix of Quantitative Predictors with OutcomeTo investigate issues of collinearity prior to fitting models, I generated a series of scatterplot matrices using `ggpairs()`. First I assessed the relationship between the quantitative outcomes and `ihm4pain_num`.```{r, message = FALSE}temp1quant <- projectb_si_train |>select(age10, phq8total5unit, num_pain_locations, fam_income10k, ihm4pain_num)ggpairs(temp1quant, title ="Scatterplot Matrix of Quantitative Variables with Outcome",lower =list(combo =wrap("facethist", bins =20)))rm(temp1quant)```In examining the plot above, there do not appear to be any strong (i.e., \> 0.6) correlations between quantitative predictors and the outcome.### Scatterplot Matrix of Categorical Variables with OutcomeThen, I looked at the relationships between three of the categorical predictors and `ihm4pain_num` using `ggpairs()`.```{r, message = FALSE}temp2cat1 <- projectb_si_train |>select(sex, education, urban_rural, ihm4pain_num)ggpairs(temp2cat1, title ="Scatterplot Matrix of Categorical Variables with Outcome",lower =list(combo =wrap("facethist", bins =20)))rm(temp2cat1)```### Scatterplot Matrix of Remaining Categorical Variables with OutcomeFinally, I looked at the relationships between the two remaining categorical predictors and `ihm4pain_num` using `ggpairs()`.```{r, message = FALSE}temp2cat2 <- projectb_si_train |>select(painlimit_lifework, opioid_freq3mo, race_ethnicity, ihm4pain_num)ggpairs(temp2cat2, title ="Scatterplot Matrix of Categorical Variables with Outcome",lower =list(combo =wrap("facethist", bins =20)))rm(temp2cat2)```To investigate this further, I calculated variance inflation factors (VIF) for each variable using the `vif()` function. **Given the recommendation to look closely at any VIF value that is greater than 5, there does not seem to be an issue with collinearity in this full model as all VIF values are less than 1.4.**```{r}vif(lm(ihm4pain_num ~ age10 + sex + education + urban_rural + painlimit_lifework + opioid_freq3mo + phq8total5unit + num_pain_locations + fam_income10k + race_ethnicity, data = projectb_si_train))```## Considering Non-Linear TermsTo determine how best to potentially spend degrees of freedom on non-linear terms, I generated a Spearman $\rho^2$ plot using the `plot()` and `spearman2()`functions.```{r}plot(spearman2(ihm4pain_num ~ age10 + sex + education + urban_rural + painlimit_lifework + opioid_freq3mo + phq8total5unit + num_pain_locations + fam_income10k + race_ethnicity, data = projectb_si_train))```The plot suggests that I consider, for example, an interaction between `education` and `age10` as a non-linear term. However, given that 1) this model already contains 21 degrees of freedom, I opted not to include any non-linear terms.## Fitting Models for ComparisonI chose to compare the following 6 modeling approaches for `m1` before choosing a final version.1. Poisson2. Negative binomial regression3. Zero-inflated Poisson4. Zero-inflated negative binomial5. Hurdle with Poisson6. Hurdle with negative binomialIn the following series of code chunks, I performed the following- Fit each model using the appropriate function- Generated a summary of fit statistics including $R^2$, RMSE, MAE, AIC, and BIC using various functions including `augment()`, `glance()`, `metric_set()`, and `predict()`.- Used functions including `mutate()`, `group_by()`, and `summarize()` to pivot the fit statistics from each model into a tidy tibble for combination later.### Poisson (`m1_pois`)```{r}m1_pois <-glm(ihm4pain_num ~ age10 + sex + education + urban_rural + painlimit_lifework + opioid_freq3mo + phq8total5unit + num_pain_locations + fam_income10k + race_ethnicity, data = projectb_si_train, family ="poisson")m1_pois_aug <-augment(m1_pois, projectb_si_train, type.predict ="response")m1_pois_glance <-glance(m1_pois)mets <-metric_set(rsq, rmse, mae)m1_pois_summary <-mets(m1_pois_aug, truth = ihm4pain_num, estimate = .fitted) |>mutate(Model ="m1_pois",R2 =if_else(.metric =="rsq", .estimate, NA),RMSE =if_else(.metric =="rmse", .estimate, NA),MAE =if_else(.metric =="mae", .estimate, NA)) |>relocate(Model)m1_pois_summary <- m1_pois_summary |>group_by(Model) |>summarize(R2 =mean(R2, na.rm =TRUE),RMSE =mean(RMSE, na.rm =TRUE),MAE =mean(MAE, na.rm =TRUE))m1_pois_summary <-bind_cols(m1_pois_summary, m1_pois_glance) |>select(-c(logLik))summary(m1_pois)```### Negative Binomial Regression (`m1_nb`)```{r, warning = FALSE}m1_nb <-glm.nb(ihm4pain_num ~ age10 + sex + education + urban_rural + painlimit_lifework + opioid_freq3mo + phq8total5unit + num_pain_locations + fam_income10k + race_ethnicity, data = projectb_si_train)m1_nb_aug <-augment(m1_nb, projectb_si_train, type.predict ="response")m1_nb_glance <-glance(m1_nb)m1_nb_summary <-mets(m1_nb_aug, truth = ihm4pain_num, estimate = .fitted) |>mutate(Model ="m1_nb",R2 =if_else(.metric =="rsq", .estimate, NA),RMSE =if_else(.metric =="rmse", .estimate, NA),MAE =if_else(.metric =="mae", .estimate, NA)) |>relocate(Model)m1_nb_summary <- m1_nb_summary |>group_by(Model) |>summarize(R2 =mean(R2, na.rm =TRUE),RMSE =mean(RMSE, na.rm =TRUE),MAE =mean(MAE, na.rm =TRUE))m1_nb_summary <-bind_cols(m1_nb_summary, m1_nb_glance) |>select(-c(logLik))summary(m1_nb)```### Zero-inflated Poisson (`m1_zip`)```{r, warning = FALSE}m1_zip <-zeroinfl(ihm4pain_num ~ age10 + sex + education + urban_rural + painlimit_lifework + opioid_freq3mo + phq8total5unit + num_pain_locations + fam_income10k + race_ethnicity, data = projectb_si_train)m1_zip_aug <- projectb_si_train |>mutate(".fitted"=predict(m1_zip, type ="response"),".resid"=resid(m1_zip, type ="response"))m1_zip_summary <-mets(m1_zip_aug, truth = ihm4pain_num, estimate = .fitted) |>mutate(Model ="m1_zip",R2 =if_else(.metric =="rsq", .estimate, NA),RMSE =if_else(.metric =="rmse", .estimate, NA),MAE =if_else(.metric =="mae", .estimate, NA)) |>relocate(Model)m1_zip_summary <- m1_zip_summary |>group_by(Model) |>summarize(R2 =mean(R2, na.rm =TRUE),RMSE =mean(RMSE, na.rm =TRUE),MAE =mean(MAE, na.rm =TRUE))m1_zip_summary <- m1_zip_summary |>mutate(AIC =AIC(m1_zip),BIC =BIC(m1_zip))summary(m1_zip)```### Zero-inflated Negative Binomial (`m1_zinb`)```{r, warning = FALSE}m1_zinb <-zeroinfl(ihm4pain_num ~ age10 + sex + education + urban_rural + painlimit_lifework + opioid_freq3mo + phq8total5unit + num_pain_locations + fam_income10k + race_ethnicity,dist ="negbin", data = projectb_si_train)m1_zinb_aug <- projectb_si_train |>mutate(".fitted"=predict(m1_zinb, type ="response"),".resid"=resid(m1_zinb, type ="response"))m1_zinb_summary <-mets(m1_zinb_aug, truth = ihm4pain_num, estimate = .fitted) |>mutate(Model ="m1_zinb",R2 =if_else(.metric =="rsq", .estimate, NA),RMSE =if_else(.metric =="rmse", .estimate, NA),MAE =if_else(.metric =="mae", .estimate, NA)) |>relocate(Model)m1_zinb_summary <- m1_zinb_summary |>group_by(Model) |>summarize(R2 =mean(R2, na.rm =TRUE),RMSE =mean(RMSE, na.rm =TRUE),MAE =mean(MAE, na.rm =TRUE))m1_zinb_summary <- m1_zinb_summary |>mutate(AIC =AIC(m1_zinb),BIC =BIC(m1_zinb))summary(m1_zinb)```### Hurdle with Poisson (`m1_hurdle_pois`)```{r, warning = FALSE}m1_hurdle_pois <-hurdle(ihm4pain_num ~ age10 + sex + race_ethnicity + education + fam_income10k + urban_rural + num_pain_locations + painlimit_lifework + opioid_freq3mo + phq8total5unit,dist ="poisson", zero.dist ="binomial", data = projectb_si_train)m1_hurdle_pois_aug <- projectb_si_train |>mutate(".fitted"=predict(m1_hurdle_pois, type ="response"),".resid"=resid(m1_hurdle_pois, type ="response"))m1_hurdle_pois_summary <-mets(m1_hurdle_pois_aug, truth = ihm4pain_num, estimate = .fitted) |>mutate(Model ="m1_hurdle_pois",R2 =if_else(.metric =="rsq", .estimate, NA),RMSE =if_else(.metric =="rmse", .estimate, NA),MAE =if_else(.metric =="mae", .estimate, NA)) |>relocate(Model)m1_hurdle_pois_summary <- m1_hurdle_pois_summary |>group_by(Model) |>summarize(R2 =mean(R2, na.rm =TRUE),RMSE =mean(RMSE, na.rm =TRUE),MAE =mean(MAE, na.rm =TRUE))m1_hurdle_pois_summary <- m1_hurdle_pois_summary |>mutate(AIC =AIC(m1_hurdle_pois),BIC =BIC(m1_hurdle_pois))summary(m1_hurdle_pois)```### Hurdle with Negative Binomial (`m1_hurdle_nb`)```{r, warning = FALSE}m1_hurdle_nb <-hurdle(ihm4pain_num ~ age10 + sex + education + urban_rural + painlimit_lifework + opioid_freq3mo + phq8total5unit + num_pain_locations + fam_income10k + race_ethnicity,dist ="negbin", zero.dist ="binomial", data = projectb_si_train)m1_hurdle_nb_aug <- projectb_si_train |>mutate(".fitted"=predict(m1_hurdle_nb, type ="response"),".resid"=resid(m1_hurdle_nb, type ="response"))m1_hurdle_nb_summary <-mets(m1_hurdle_nb_aug, truth = ihm4pain_num, estimate = .fitted) |>mutate(Model ="m1_hurdle_nb",R2 =if_else(.metric =="rsq", .estimate, NA),RMSE =if_else(.metric =="rmse", .estimate, NA),MAE =if_else(.metric =="mae", .estimate, NA)) |>relocate(Model)m1_hurdle_nb_summary <- m1_hurdle_nb_summary |>group_by(Model) |>summarize(R2 =mean(R2, na.rm =TRUE),RMSE =mean(RMSE, na.rm =TRUE),MAE =mean(MAE, na.rm =TRUE))m1_hurdle_nb_summary <- m1_hurdle_nb_summary |>mutate(AIC =AIC(m1_hurdle_nb),BIC =BIC(m1_hurdle_nb))summary(m1_hurdle_nb)```## Comparing `m1` Versions### Comparing with RootogramsTo begin a comparison of the 4 variations of Model 1, I generated four rootogram plots using the `countreg::rootogram()` function.```{r, message = FALSE}#| fig-height: 8par(mfrow =c(3,2))countreg::rootogram(m1_pois, main ="Rootogram for m1_pois")countreg::rootogram(m1_nb, main ="Rootogram for m1_nb")countreg::rootogram(m1_zip, main ="Rootogram for m1_zip")countreg::rootogram(m1_zinb, main ="Rootogram for m1_zinb")countreg::rootogram(m1_hurdle_pois, main ="Rootogram for m1_hurdle_pois")countreg::rootogram(m1_hurdle_nb, main ="Rootogram for m1_hurdle_nb")par(mfrow =c(1,1))```Looking at the plots above, there is a clear improvement observed between 1) the Poisson (`m1_pois`) and negative binomial (`m1_nb`) models; and 2) the zero-inflated (`m1_zip` and `m1_zinb`) and hurdle (`m1_hurdle_pois` and `mt_hurdle_nb`) models as the latter appear to fit 0 counts perfectly and lack the over/under-fitting of counts 1-3. However, there do not appear to be any noticeable differences in predicting counts between the bottom four models. Furthermore, it appears each model predicts counts of `ihm4pain_num`\> 4, but not to a substantial degree. Certainly, some subjects may have engaged in more than four IHM modalities for pain, but these were not measured in the 2019 NHIS.### Comparing with Fit StatisticsTo further examine differences, I generated `m1_summary` as a combination of the 6 previous model summaries using the `bind_rows()` function. I then selected the Model, $R^2$, RMSE, MAE, AIC, and BIC values for output.```{r}m1_summary <-bind_rows(m1_pois_summary, m1_nb_summary, m1_zip_summary, m1_zinb_summary, m1_hurdle_pois_summary, m1_hurdle_nb_summary)m1_summary |>select(Model, R2, RMSE, MAE, AIC, BIC) |>flextable() |>colformat_double(j =2:4, digits =3) |>colformat_double(j =5:6, digits =0) |>bold(i =5, j =1:6, bold =TRUE) |>autofit() |>theme_zebra(odd_header ="white") |>hline_top(part="body")```Similar to the rootograms, there is a clear improvement in $R^2$ between 1) the Poisson (`m1_pois`) and negative binomial (`m1_nb`) models; and 2) the zero-inflated (`m1_zip` and `m1_zinb`) and hurdle (`m1_hurdle_pois` and `mt_hurdle_nb`) models. **m1_hurdle_pois** appears to be the best fit based on it's $R^2$ (0.121), RMSE (0.805), and AIC (6780 \[lowest overall\]). Importantly, `m1_hurdle_pois` has the lowest BIC among the four models that had better rootograms in the plot above.### Comparing with Vuong testsFinally, given the inclination to use `m1_hurdle_pois`, I used the `vuong()` function to compare it to the other models.**`m1_hurdle_pois` v. `m1_pois`**```{r}vuong(m1_hurdle_pois, m1_pois)````m1_hurdle_pois` is a better fit than `m1_pois` based on small *p*-values from the the Raw, and AIC-corrected Vuong tests.**`m1_hurdle_pois` v. `m1_nb`**```{r}vuong(m1_hurdle_pois, m1_nb)````m1_hurdle_pois` is a better fit than `m1_nb` based on small *p*-values from the the Raw Vuong test.**`m1_hurdle_pois` v. `m1_zip`**```{r}vuong(m1_hurdle_pois, m1_zip)````m1_hurdle_pois` is not meaningfully different than `m1_zip` (*p* = .352).**`m1_hurdle_pois` v. `m1_zinb`**```{r}vuong(m1_hurdle_pois, m1_zinb)````m1_hurdle_pois` is not meaningfully different than `m1_zinb` (*p* = .352).**`m1_hurdle_pois` v. `m1_hurdle_nb`**```{r}vuong(m1_hurdle_pois, m1_hurdle_nb)````m1_hurdle_pois` is not meaningfully different than `m1_hurdle_nb` (*p* = .497).## Final Model ChoiceBased on a combination of the rootograms, the fit statistics, and the Vuong tests, I chose **`m1_hurdle_pois`** as the model for the count outcome.The hurdle model also makes sense given its description (Ford 2016).> The hurdle model is a two-part model that specifies one process for zero counts and another process for positive counts. The idea is that positive counts occur once a threshold is crossed, or put another way, a hurdle is cleared. If the hurdle is not cleared, then we have a count of 0.Conceptually, moving from no engagement in IHM modalities for pain to engaging in any number of IHM modalities for pain seems like a hurdle to model independently of the count data.## Fitting Weighted Hurdle ModelIn the following code, I added weights to `m1_hurdle_pois` model using the code, `weights = wt_final/mean(wt_final)`. I then printed a summary of the model.```{r, warning = FALSE}m1_hurdle_pois <-hurdle(ihm4pain_num ~ age10 + sex + race_ethnicity + education + fam_income10k + urban_rural + num_pain_locations + painlimit_lifework + opioid_freq3mo + phq8total5unit,dist ="poisson", zero.dist ="binomial", weights = wt_final/mean(wt_final),data = projectb_si_train)mets <-metric_set(rsq, rmse, mae)m1_hurdle_pois_aug <- projectb_si_train |>mutate(".fitted"=predict(m1_hurdle_pois, type ="response"),".resid"=resid(m1_hurdle_pois, type ="response"))m1_hurdle_pois_summary <-mets(m1_hurdle_pois_aug, truth = ihm4pain_num, estimate = .fitted) |>mutate(Model ="m1_hurdle_pois",R2 =if_else(.metric =="rsq", .estimate, NA),RMSE =if_else(.metric =="rmse", .estimate, NA),MAE =if_else(.metric =="mae", .estimate, NA)) |>relocate(Model)m1_hurdle_pois_summary <- m1_hurdle_pois_summary |>group_by(Model) |>summarize(R2 =mean(R2, na.rm =TRUE),RMSE =mean(RMSE, na.rm =TRUE),MAE =mean(MAE, na.rm =TRUE))m1_hurdle_pois_summary <- m1_hurdle_pois_summary |>mutate(AIC =AIC(m1_hurdle_pois),BIC =BIC(m1_hurdle_pois))summary(m1_hurdle_pois)```## Model CoefficientsTo generate a singular summary of coefficients from `m1_hurdle_pois`, I performed the following- Generated an initial `m1_hurdle_pois_coef` tibble containing the variables `Type` (i.e., count or non-zero), `Group` (i.e., each variable group), and `Comparison` (e.g., Age +10yrs, Male v. Female) to match the output from `m1_hurdle_pois`- Generated `m1_hurdle_pois_pe` as a tibble of the exponentiated coefficients from `m1_hurdle_pois`- Generated `m1_hurdle_pois_ci` as a tibble of the exponentiated 95% CI from `m1_hurdle_pois`- Bound all of the above into the tibble, `m1_hurdle_pois_coef`.```{r}m1_hurdle_pois_coef <-tibble(Type =c("count", "count", "count", "count", "count", "count","count", "count", "count","count", "count", "count","count", "count", "count", "count", "count", "count","count", "count", "count", "count","non-zero", "non-zero", "non-zero", "non-zero", "non-zero", "non-zero","non-zero", "non-zero", "non-zero", "non-zero", "non-zero", "non-zero","non-zero", "non-zero", "non-zero", "non-zero", "non-zero", "non-zero","non-zero", "non-zero", "non-zero", "non-zero"),Group =c("(Intercept)", "Age", "Sex", "Race/Ethnicity", "Race/Ethnicity", "Race/Ethnicity","Education", "Education", "Education", "Education", "Family income", "Proximity to metro area", "Proximity to metro area","Proximity to metro area", "Number of pain locations", "Pain limiting life or work", "Pain limiting life or work","Pain limiting life or work", "Opioid use past 3 mo", "Opioid use past 3 mo","Opioid use past 3 mo", "PHQ-8", "(Intercept)", "Age", "Sex", "Race/Ethnicity", "Race/Ethnicity", "Race/Ethnicity","Education", "Education", "Education", "Education", "Family income", "Proximity to metro area", "Proximity to metro area","Proximity to metro area", "Number of pain locations", "Pain limiting life or work", "Pain limiting life or work","Pain limiting life or work", "Opioid use past 3 mo", "Opioid use past 3 mo","Opioid use past 3 mo", "PHQ-8"),Comparison =c("(Intercept)", "Age +10yrs", "Male v. Female", "Hispanic v. NH White","NH Black v. NH White","Other v. NH White","HS/GED v. < HS", "Some College v. < HS", "Assoc/Bach v. < HS","Masters/PhD v. < HS", "Fam income +$10,000","M/S metro v. Non-metro", "L fringe metro v. Non-metro","L central metro v. Non-metro", "Pain locations +1", "Pain limit some days v. Never", "Pain limit most days v. Never","Pain limit every day v. Never", "Opioids some days v. Never", "Opioids most days v. Never","Opioids every day v. Never", "PHQ-8 +5 units", "(Intercept)", "Age +10yrs", "Male v. Female", "Hispanic v. NH White","NH Black v. NH White","Other v. NH White","HS/GED v. < HS", "Some College v. < HS", "Assoc/Bach v. < HS","Masters/PhD v. < HS", "Fam income +$10,000","M/S metro v. Non-metro", "L fringe metro v. Non-metro","L central metro v. Non-metro", "Pain locations +1", "Pain limit some days v. Never", "Pain limit most days v. Never","Pain limit every day v. Never", "Opioids some days v. Never", "Opioids most days v. Never","Opioids every day v. Never", "PHQ-8 +5 units"))m1_hurdle_pois_pe <-exp(coef(m1_hurdle_pois)) |>as_tibble()m1_hurdle_pois_ci <-exp(confint(m1_hurdle_pois)) |>as_tibble() |>clean_names() |>rename(Conf.low ="x2_5_percent",Conf.high ="x97_5_percent")m1_hurdle_pois_coef <-bind_cols(m1_hurdle_pois_coef, m1_hurdle_pois_pe, m1_hurdle_pois_ci) |>rename(Estimate = value)```### Numeric Summary of Odds of `ihm4pain_num` != 0In the following code, I generated a numeric summary of the exponentiated coefficients from `m1_hurdle_pois` describing the odds of `ihm4pain_num` != 0.```{r}m1_hurdle_pois_coef |>filter(Type =="non-zero") |>select(Group, Comparison, Estimate, Conf.low, Conf.high) |>mutate(Estimate =round(Estimate, digits =3),Conf.low =round(Conf.low, digits =3),Conf.high =round(Conf.high, digits =3)) |>rename(aOR = Estimate,"95% CI Low"= Conf.low,"95% CI High"= Conf.high) |>flextable() |>bold(bold =TRUE, part ="header") |>add_header_lines(values ="Supplemental Table 1: aOR of Engagement in IHM Modalities for Pain") |>autofit() |>add_footer_lines(values ="Abbreviations: Assoc, associate’s; aOR, adjusted odds ratio; Bach, bachelor’s; CI, confidence interval; fam, family; PHQ-8, Patient Health Questionnaire-8; GED, general equivalency diploma; HS, high school; L, large; M, medium; metro, metropolitan; mo, months; NH, Non-Hispanic; S, small; v., versus; yrs, years", top =FALSE) |>theme_zebra(odd_header ="white") |>hline_top(part="body")```In examining the effects table above, I will summarize the effect of `Masters/PhD v. < HS` using two fictional cases: Luke Skywalker and Din DjarinSuppose Luke Skywalker and Din Djarin 1) have the same demographic profile (i.e., same age, male); 2) reside in rural areas; 3) rated similar challenges related to pain interfering with life or work in the past 3 months; 4) used the same amount of opioids for chronic pain within the past 3 months; 5) reported the same score on the PHQ-8; 6) reported the same number of locations where they were bothered at least "a little" by pain in the past three months; 7) have the same family income; and 8) are of the same race and ethnicity. However, Luke Skywalker has a PhD in being a Jedi while Din Djarin did not graduate "This is the Way" High School. In this scenario, `m1_hurdle_pois` predicts that Luke Skywalker's odds of `ihm4pain_num` != 0 will be 4.043 times that of Din Djarin.The 95% confidence interval for this estimated odds ratio for the effect of `Masters/PhD v. < HS` on `ihm4pain_num` != 0 is (2.815, 5.806). This confidence interval does not include one. Thus, holding all other variables constant, having a Masters, PhD, or professional degree is associated with higher odds of having engaged in an IHM modality to manage pain in the past three months.### Numeric Summary of Relative Change in Count of `ihm4pain_num`In the following code, I generated a numeric summary of the exponentiated coefficients from `m1_hurdle_pois` describing the predicted relative change in the count of `ihm4pain_num`.```{r}m1_hurdle_pois_coef |>filter(Type =="count") |>select(Group, Comparison, Estimate, Conf.low, Conf.high) |>mutate(Estimate =round(Estimate, digits =3),Conf.low =round(Conf.low, digits =3),Conf.high =round(Conf.high, digits =3)) |>rename("Relative change"= Estimate,"95% CI Low"= Conf.low,"95% CI High"= Conf.high) |>flextable() |>bold(bold =TRUE, part ="header") |>add_header_lines(values ="Supplemental Table 2: Relative Change in Count of IHM Modalities Used for Pain") |>autofit() |>add_footer_lines(values ="Abbreviations: Assoc, associate’s; Bach, bachelor’s; CI, confidence interval; fam, family; PHQ-8, Patient Health Questionnaire-8; GED, general equivalency diploma; HS, high school; L, large; M, medium; metro, metropolitan; mo, months; NH, Non-Hispanic; S, small; v., versus; yrs, years", top =FALSE) |>theme_zebra(odd_header ="white") |>hline_top(part="body")```In examining the effects table above, I will summarize the effect of `Age` using two fictional cases: Han Solo and Luke SkywalkerPer the guidance by [Yoshida](https://rpubs.com/kaz_yos/pscl-2), given the response is positive (among those who have positive counts for engaging in IHM modalities), the average count is 0.701 (Intercept).Suppose Han Solo and Luke Skywalker 1) are both male and of the same race and ethnicity; 2) have the same level of education; 3) reside in rural areas; 4) rated similar challenges related to pain interfering with life or work in the past 3 months; 5) used the same amount of opioids for chronic pain within the past 3 months; 6) reported the same score on the PHQ-8; 7) reported the same number of locations where they were bothered at least "a little" by pain in the past three months; and 8) have the same family income. However, Han Solo is 10 years older than Luke Skywalker. In this scenario, `m1_hurdle_pois` predicts that the number of IHM modalities that Han Solo engaged in for pain in the past 3 months will be .866 times that of Luke's (13.4% lower than Luke's).The 95% confidence interval for this estimated effect of `age10` on `ihm4pain_num` is (0.825, 0.910). This confidence interval does not include one. Thus, holding all other variables constant, being 10 years older is associated with lower counts of IHM modalities used to manage pain in the past three months.### Visual Summary of Odds of `ihm4pain_num` != 0To visualize the effects of the covariates on odds of `ihm4pain_num` != 0, I first generated the `m1_coef_non0` tibble, limited to the "non-zero" coefficients. I then used `ggplot()` to plot the effects. Education covariates are placed on a separate axis given their size to not obscure the other covariates.```{r}#| fig-height: 6m1_coef_non0 <- m1_hurdle_pois_coef |>filter(Type =="non-zero", Comparison !="(Intercept)", Group !="Education") |>mutate(Comparison =fct_inorder(Comparison, ordered =NA),Comparison =fct_rev(Comparison))m1_coef_non0educ <- m1_hurdle_pois_coef |>filter(Type =="non-zero", Comparison !="(Intercept)", Group =="Education") |>mutate(Comparison =fct_inorder(Comparison, ordered =NA),Comparison =fct_rev(Comparison))p1 <-ggplot(m1_coef_non0, aes(x = Estimate, y = Comparison)) +geom_vline(aes(xintercept =1), linewidth = .25, linetype ='dashed') +geom_errorbarh(aes(xmax = Conf.high, xmin = Conf.low), linewidth = .5, height = .2, color ='gray50') +geom_point(size =2.5, color = uhred) +labs(x ="aOR",y ="",title ="aOR of Engaging in ≥ 1 IHM Modality for Pain") +scale_x_continuous(breaks =seq(0,4,0.25))p2 <-ggplot(m1_coef_non0educ, aes(x = Estimate, y = Comparison)) +geom_vline(aes(xintercept =1), linewidth = .25, linetype ='dashed') +geom_errorbarh(aes(xmax = Conf.high, xmin = Conf.low), linewidth = .5,height = .2, color ='gray50') +geom_point(size =2.5, color = uhred) +labs(x ="aOR",y ="") +scale_x_continuous(breaks =seq(0,8,0.5))p1/p2 +plot_layout(heights =c(5,1))```### Visual Summary of Estimated Relative Change in Count of `ihm4pain_num`To visualize the effects of the covariates on the estimated relative change in count of `ihm4pain_num`, I first generated the `m1_coef_count` tibble, limited to the "count" coefficients. I then used `ggplot()` to plot the effects. Education covariates are placed on a separate axis given their size to not obscure the other covariates.```{r}#| fig-height: 6m1_coef_count <- m1_hurdle_pois_coef |>filter(Type =="count", Comparison !="(Intercept)", Group !="Education") |>mutate(Comparison =fct_inorder(Comparison, ordered =NA),Comparison =fct_rev(Comparison))m1_coef_count_educ <- m1_hurdle_pois_coef |>filter(Type =="count", Comparison !="(Intercept)", Group =="Education") |>mutate(Comparison =fct_inorder(Comparison, ordered =NA),Comparison =fct_rev(Comparison))p1 <-ggplot(m1_coef_count, aes(x = Estimate, y = Comparison)) +geom_vline(aes(xintercept =1), linewidth = .25, linetype ='dashed') +geom_errorbarh(aes(xmax = Conf.high, xmin = Conf.low), linewidth = .5, height = .2, color ='gray50') +geom_point(size =2.5, color = uhred) +labs(x ="Relative Change in Count",y ="",title ="Relative Change in Count of IHM Modalities Used for Pain",subtitle ="Assuming Engagement in > 0 IHM Modalities") +scale_x_continuous(breaks =seq(0,7,0.25))p2 <-ggplot(m1_coef_count_educ, aes(x = Estimate, y = Comparison)) +geom_vline(aes(xintercept =1), linewidth = .25, linetype ='dashed') +geom_errorbarh(aes(xmax = Conf.high, xmin = Conf.low), linewidth = .5, height = .2, color ='gray50') +geom_point(size =2.5, color = uhred) +labs(x ="Relative Change in Count",y ="") +scale_x_continuous(breaks =seq(0,7,0.5))p1/p2 +plot_layout(heights =c(5,1))```## Check Model AssumptionsTo generate a residuals vs. fitted plot from `m1_hurdle_pois`, I used `ggplot()` and `geom_point()`.```{r}ggplot(m1_hurdle_pois_aug, aes(x = .fitted, y = .resid)) +geom_point() +labs(title ="Residuals vs. Fitted `ihm4pain_num`",subtitle ="Hurdle model with Poisson counts")```In examining the plot above, I see no major issues of concern.## Validating `m1_hurdle_pois`### Fit Summary in Test DataTo validate `m1_hurdle_pois` in the `projectb_si_test` data, I performed the following to generate `m1test_hurdle_pois_summary`, a summary of fit statistics.- I created `m1test_hurdle_pois_aug` as an augmentation of `m1_hurdle_pois` applied to the `projectb_si_test` data. This time I used `mutate()` to obtain .fitted and .resid values.- I generated, $R^2$, RMSE, and MAE values using the `mets()` function.- I used `group_by()` and `summarize()` to pivot `m1test_hurdle_pois_summary` such that there would be one column for each fit statistic.```{r, warning = FALSE}m1test_hurdle_pois_aug <- projectb_si_test |>mutate(".fitted"=predict(m1_hurdle_pois, type ="response"),".resid"=resid(m1_hurdle_pois, type ="response"))m1test_hurdle_pois_summary <-mets(m1test_hurdle_pois_aug, truth = ihm4pain_num, estimate = .fitted) |>mutate(Model ="m1_hurdle_pois",R2 =if_else(.metric =="rsq", .estimate, NA),RMSE =if_else(.metric =="rmse", .estimate, NA),MAE =if_else(.metric =="mae", .estimate, NA)) |>relocate(Model)m1test_hurdle_pois_summary <- m1test_hurdle_pois_summary |>group_by(Model) |>summarize(R2 =mean(R2, na.rm =TRUE),RMSE =mean(RMSE, na.rm =TRUE),MAE =mean(MAE, na.rm =TRUE))```### Comparing Fits in DatasetsI then bound together `m1_hurdle_pois_summary` and `m1test_hurdle_pois_summary` into `m1_summaryfinal` to present a table comparing the $R^2$, RMSE, and MAE values from the training and testing data.```{r}m1_summaryfinal <-bind_rows(m1_hurdle_pois_summary, m1test_hurdle_pois_summary) |>mutate(Sample =c("Training", "Testing"))m1_summaryfinal |>select(Sample, R2, RMSE, MAE) |>flextable() |>colformat_double(digits =4) |>autofit() |>theme_zebra(odd_header ="white") |>hline_top(part="body")```In examining the table above, it appears the model does not perform as well in the testing data given the lower $R^2$ (0.0001 vs. 0.1182). `m1_hurdle_pois` also has higher RMSE and MAE in the testing data.### Table Comparing Observed and Predicted CountsTo further explore the accuracy of `m1_hurdle_pois`, I created a table (`m1_countsum`) summarizing the observed and expected counts from the testing and training datasets as well as their respective `m1_hurdle_pois` applications.```{r}m1_countsum <-tibble(Source =c("Training data observations", "m1_hurdle_pois in training", "Test data observations", "m1_hurdle_pois in test"),ZeroIHM =c(sum(projectb_si_train$ihm4pain_num ==0),round(sum(predict(m1_hurdle_pois, type ="prob")[,1]),0),sum(projectb_si_test$ihm4pain_num ==0),round(sum(predict(m1_hurdle_pois, newdata = projectb_si_test, type ="prob")[,1]),0)),OneIHM =c(sum(projectb_si_train$ihm4pain_num ==1),round(sum(predict(m1_hurdle_pois, type ="prob")[,2]),0),sum(projectb_si_test$ihm4pain_num ==1),round(sum(predict(m1_hurdle_pois, newdata = projectb_si_test, type ="prob")[,2]),0)),TwoIHM =c(sum(projectb_si_train$ihm4pain_num ==2),round(sum(predict(m1_hurdle_pois, type ="prob")[,3]),0),sum(projectb_si_test$ihm4pain_num ==2),round(sum(predict(m1_hurdle_pois, newdata = projectb_si_test, type ="prob")[,3]),0)),ThreeIHM =c(sum(projectb_si_train$ihm4pain_num ==3),round(sum(predict(m1_hurdle_pois, type ="prob")[,4]),0),sum(projectb_si_test$ihm4pain_num ==3),round(sum(predict(m1_hurdle_pois, newdata = projectb_si_test, type ="prob")[,4]),0)),FourIHM =c(sum(projectb_si_train$ihm4pain_num ==4),round(sum(predict(m1_hurdle_pois, type ="prob")[,5]),0),sum(projectb_si_test$ihm4pain_num ==4),round(sum(predict(m1_hurdle_pois, newdata = projectb_si_test, type ="prob")[,5]),0))) |>mutate(MoreThan4IHM =nrow(projectb_si_train) - (ZeroIHM + OneIHM + TwoIHM + ThreeIHM + FourIHM),Total = (ZeroIHM + OneIHM + TwoIHM + ThreeIHM + FourIHM + MoreThan4IHM)) |>rename("0"= ZeroIHM,"1"= OneIHM,"2"= TwoIHM,"3"= ThreeIHM,"4"= FourIHM,">4"= MoreThan4IHM)```### Plot Comparing Observed and Predicted CountsTo facilitate a visual comparison, I generated `m1_pivot` and a pair of plots comparing observed vs. predicted values in the training and test samples.```{r}#| fig-width: 8m1_pivot <- m1_countsum |>select(-c(Total)) |>pivot_longer(!Source, names_to ="Count", values_to ="Value") |>mutate(Count =factor(Count, levels =c("0", "1", "2", "3", "4", ">4")))m1_pivot_train <- m1_pivot |>filter(Source %in%c("Training data observations", "m1_hurdle_pois in training")) |>mutate(Source =if_else(Source =="Training data observations", "Observed", "Predicted"),Source =factor(Source, levels =c("Observed", "Predicted")))m1_pivot_test <- m1_pivot |>filter(Source %in%c("Test data observations", "m1_hurdle_pois in test")) |>mutate(Source =if_else(Source =="Test data observations", "Observed", "Predicted"),Source =factor(Source, levels =c("Observed", "Predicted")))p1 <-ggplot(data = m1_pivot_train, aes(x = Count, y = Value, fill = Source)) +geom_bar(stat ="identity", position =position_dodge(), alpha =0.75) +ylim(0, 2500) +labs(x ="Count of IHM Modalities",y ="Number of Subjects",title ="Training Data (n = 3557)") +theme(legend.position="bottom",legend.title =element_blank())p2 <-ggplot(data = m1_pivot_test, aes(x = Count, y = Value, fill = Source)) +geom_bar(stat ="identity", position =position_dodge(), alpha =0.75) +ylim(0, 2500) +labs(x ="Count of IHM Modalities",y ="Number of Subjects",title ="Test Data (n = 3557)") +guides(fill="none")p1 + p2 +plot_annotation("Comparing Observed v. Predicted Values in Weighted Hurdle Poisson Model")``````{r, echo = FALSE}m1_countsum |>flextable() |>autofit() |>theme_zebra(odd_header ="white") |>hline_top(part="body")```In examining the plots above, while there is slightly greater deviation from the observed counts in the test data, it appears the model performs quite well in predicting each level of the count of `ihm4pain_num` in both the training and test datasets.# Model 2: Nonpharmacologic Modalities Only vs. Opioids Only**For Model 2, I sought to answer the question:**> Which demographic, mental health, and pain-related variables are associated with engagement in exclusively nonpharmacologic modalities rather than opioid utilization in the past 3 months among U.S. adults with chronic pain?## Create DatasetFor this model, the outcome was defined as engagement in nonpharmacologic modalities only (ie, engaging in IHM, a chronic pain self-management program, support groups, or physical, rehabilitative, occupational, or talk therapy while never using opioids) rather than opioids only (ie, using opioids on at least some days while not using any nonpharmacologic modalities). The already single-imputed data, `projectb_si`, were filtered to include the 3300 subjects engaging in either nonpharmacologic only or opioids only. With this new `m2_dataset_si` dataset, the outcome `nonpharm_over_opioid` was defined as 1 if "Nonpharm only" and 0 if "Opioids only."```{r}m2_dataset_si <- projectb_si |>filter(nonpharm_opioid %in%c("Opioids only", "Nonpharm only")) |>mutate(nonpharm_opioid =factor(nonpharm_opioid, levels =c("Nonpharm only", "Opioids only")),nonpharm_over_opioid =if_else(nonpharm_opioid =="Nonpharm only", 1, 0))m2_dataset_si |>tabyl(nonpharm_opioid, nonpharm_over_opioid)```## Create Survey DesignI then used the `svydesign()` function to generate the survey design.```{r}nhis_des2 <-svydesign(data = m2_dataset_si, id =~ subject_id,weights =~ wt_final, nest =TRUE)```## Total population representedThe following code details the steps taken to generate a combined Table 2.```{r}total_n2 <-svytotal(~ one, nhis_des2, na.rm =FALSE) |>as_tibble() |>mutate(Variable ="Total, n",total =floor(total),total =formatC(total, format ="d", big.mark =","),p.value ="") |>select(Variable, All = total, p.value)``````{r}nonpharm_n <-svyby(~ one, ~ nonpharm_opioid, nhis_des2, svytotal) |>as_tibble() |>filter(nonpharm_opioid =="Nonpharm only") |>select(Nonpharm = one) |>mutate(Nonpharm =round(Nonpharm),Nonpharm =formatC(Nonpharm, format ="d", big.mark =","))opioids_n <-svyby(~ one, ~ nonpharm_opioid, nhis_des2, svytotal) |>as_tibble() |>filter(nonpharm_opioid =="Opioids only") |>select(Opioids = one) |>mutate(Opioids =round(Opioids),Opioids =formatC(Opioids, format ="d", big.mark =","))total_n2 <-bind_cols(total_n2, nonpharm_n, opioids_n) |>mutate(All =as.character(All),Nonpharm =as.character(Nonpharm),Opioids =as.character(Opioids))``````{r}total_n2 <- total_n2 |>select(Variable, All, Nonpharm, Opioids, p.value) total_n2 |>flextable() |>autofit() |>bold(bold =TRUE, part ="header") |>theme_zebra(odd_header ="white") |>hline_top(part="body")```## Crosstabulation```{r}table2_cross <- nhis_des2 |>tbl_svysummary(include =c(age, sex, race_ethnicity, education, fam_income1k, urban_rural, num_pain_locations, painlimit_lifework, phq8total),type =c(num_pain_locations ="continuous",fam_income ="continuous"),statistic =list(all_continuous() ~"{mean}",all_categorical() ~"{p}"),label =list(age ~"Age in years, mean (95% CI)", sex ~"Sex, % (95% CI)", race_ethnicity ~"Race/Ethnicity, % (95% CI)", education ~"Education, % (95% CI)", fam_income1k ~"Family income in $1000s, mean (95% CI)", urban_rural ~"Proximity to metro area, % (95% CI)", num_pain_locations ~"Number of pain locations, mean (95% CI)", painlimit_lifework ~"Pain limiting life/work, % (95% CI)", phq8total ~"PHQ-8 total, mean (95% CI)"),digits =list(all_continuous() ~1,all_categorical() ~1),by = nonpharm_opioid) |>add_overall() |>add_ci(pattern ="{stat} ({ci})", style_fun =list(all_categorical() ~ style_percent_1digit, all_continuous() ~ style_number_1digit),statistic =list(all_categorical() ~"{conf.low}-{conf.high}", all_continuous() ~"{conf.low}-{conf.high}")) |>add_p(pvalue_fun =function(x) style_pvalue(x, digits =3)) |>modify_header(label ="Variable", stat_0 ~"All", stat_1 ~"Nonpharm", stat_2 ~"Opioids", p.value ~"p.value")table2_cross```## Assemble Table 2 Tibble```{r}table2_cross <- table2_cross |>as.data.frame() |>as_tibble()table2_cross[is.na(table2_cross)] <-""table2_cross$p.value <-sub("0\\.",".", table2_cross$p.value)table2_cross$p.value <-sub(" ","", table2_cross$p.value)table2 <-bind_rows(total_n2, table2_cross) |>rename("Weighted variable"= Variable,"Combined groups"= All,"Nonpharm only"= Nonpharm,"Opioids only"= Opioids,"P value"= p.value)```## Print Table 2I printed Table 2 using the `flextable()` function.```{r}table2 |>flextable() |>add_header_lines(values ="Table 2: Socio-Demographic, Pain, and Mental Health Characteristics By Nonpharmacologic Modality Engagement") |>bold(bold =TRUE, part ="header") |>add_footer_lines(values ="Abbreviations: CI, confidence interval; PHQ-8, Patient Health Questionnaire-8; GED, general equivalency diploma; metro, metropolitan; NH, Non-Hispanic; Nonpharm, nonpharmacologic", top =FALSE) |>padding(i =c(4:5, 7:10, 12:16, 19:22, 25:28), j =1, padding.left =20) |>bold(i =c(1, 2, 3, 6, 11, 17:18, 23:24, 29), j =1, bold =TRUE) |> flextable::footnote(i =c(2, 17, 23, 29), j =5, ref_symbols ="a",value =as_paragraph("Wilcoxon rank-sum test for complex survey samples")) |> flextable::footnote(i =c(3, 6, 11, 18, 24), j =5,ref_symbols ="b",value =as_paragraph("Chi-squared test with Rao & Scott’s second-order correction")) |>autofit(part =c("body", "header"))```## Fit Weighted Logistic Model with Survey WeightsI fit the model `m2_lrm_wt` with the `lrm()` function, defining weights with `weights = wt_final`.In this analysis, I chose to predict the outcome `nonpharm_over_opioid` based on 9 variables: `age`, `sex`, `race_ethnicity`, `education`, `fam_income1k``urban_rural`, `num_pain_locations`, `painlimit_lifework`, and `phq8total`.```{r, warning = FALSE}set.seed(19900317)dd <-datadist(m2_dataset_si, adjto.cat ="first")options(datadist ="dd")m2_lrm_wt <-lrm(nonpharm_over_opioid ~ age + sex + race_ethnicity + education + fam_income1k + urban_rural + num_pain_locations + painlimit_lifework + phq8total,weights = wt_final, normwt =TRUE,data = m2_dataset_si, x =TRUE, y =TRUE)m2_lrm_wt```## Numeric Summary of Coefficients (`m2_coef_wt`)The following code generates a numeric summary of coefficients from the model and adds meaningful labels for comparison.```{r}m2_coef_wt <-summary(m2_lrm_wt, conf.int =0.95) |>as_tibble() |>filter(Type ==2) |>mutate(Group =c("Age", "Family income","Number of pain locations","PHQ-8", "Sex", "Race/Ethnicity", "Race/Ethnicity", "Race/Ethnicity","Education", "Education", "Education", "Education", "Proximity to metro area", "Proximity to metro area","Proximity to metro area", "Pain limiting life or work", "Pain limiting life or work","Pain limiting life or work"),Comparison =c("Age 69 v. 46", "Family income $85k v. $22k","4 pain locations v. 2","PHQ-8 score 8 v. 1","Male v. Female", "Hispanic v. NH White","NH Black v. NH White","Other v. NH White","HS/GED v. < HS", "Some College v. < HS", "Assoc/Bach v. < HS","Masters/PhD v. < HS", "M/S metro v. Non-metro", "L fringe metro v. Non-metro","L central metro v. Non-metro", "Pain limit some days v. Never", "Pain limit most days v. Never","Pain limit every day v. Never"),Effect =as.numeric(Effect)) |>rename(Lower0.95 ="Lower 0.95", Upper0.95 ="Upper 0.95") |>mutate(Lower0.95 =as.numeric(Lower0.95), Upper0.95 =as.numeric(Upper0.95))m2_coef_wt <- m2_coef_wt |>mutate(Comparison =factor(Comparison, levels =c("Age 69 v. 46","Male v. Female","Hispanic v. NH White","NH Black v. NH White","Other v. NH White","HS/GED v. < HS", "Some College v. < HS", "Assoc/Bach v. < HS","Masters/PhD v. < HS", "Family income $85k v. $22k","M/S metro v. Non-metro", "L fringe metro v. Non-metro","L central metro v. Non-metro", "4 pain locations v. 2","Pain limit some days v. Never", "Pain limit most days v. Never","Pain limit every day v. Never","PHQ-8 score 8 v. 1"))) |>arrange(Comparison)```The following code generates Supplemental Table 3, providing a numeric summmary of the coefficients from `m2_lrm_wt`.```{r}m2_coef_wt |>select(Group, Comparison, Effect, Lower0.95, Upper0.95) |>mutate(Effect =round(Effect, digits =3),Lower0.95 =round(Lower0.95, digits =3),Upper0.95 =round(Upper0.95, digits =3)) |>rename(aOR = Effect,"95% CI Low"= Lower0.95,"95% CI High"= Upper0.95) |>flextable() |>bold(bold =TRUE, part ="header") |>add_header_lines(values ="Supplemental Table 3: aOR of Engagement in Nonpharmacologic Only vs. Opioids Only") |>autofit() |>add_footer_lines(values ="Abbreviations: Assoc, associate’s; aOR, adjusted odds ratio; Bach, bachelor’s; CI, confidence interval; fam, family; PHQ-8, Patient Health Questionnaire-8; GED, general equivalency diploma; HS, high school; L, large; M, medium; metro, metropolitan; NH, Non-Hispanic; S, small; v., versus; yrs, years", top =FALSE) |>theme_zebra(odd_header ="white") |>hline_top(part="body")```## Effects Plot (`m2_coef_wt`)To visualize the effects of the covariates on odds of nonpharm_over_opioid = 1, I used `ggplot()` to plot the effects. Education covariates are placed on a separate axis given their size to not obscure the other covariates.```{r}#| fig-height: 6m2_coef_plot_wt <- m2_coef_wt |>filter(Comparison !="(Intercept)", Group !="Education") |>mutate(Comparison =fct_inorder(Comparison, ordered =NA),Comparison =fct_rev(Comparison))m2_coef_plot_educ_wt <- m2_coef_wt |>filter(Comparison !="(Intercept)", Group =="Education") |>mutate(Comparison =fct_inorder(Comparison, ordered =NA),Comparison =fct_rev(Comparison))p1 <-ggplot(m2_coef_plot_wt, aes(x = Effect, y = Comparison)) +geom_vline(aes(xintercept =1), linewidth = .25, linetype ='dashed') +geom_errorbarh(aes(xmax = Upper0.95, xmin = Lower0.95), linewidth = .5,height = .2, color ='gray50') +geom_point(size =2.5, color = uhred) +labs(x ="aOR",y ="",title ="aOR of Using Nonpharmacologic Only vs. Opioids Only") +scale_x_continuous(breaks =seq(0,8,0.5))p2 <-ggplot(m2_coef_plot_educ_wt, aes(x = Effect, y = Comparison)) +geom_vline(aes(xintercept =1), linewidth = .25, linetype ='dashed') +geom_errorbarh(aes(xmax = Upper0.95, xmin = Lower0.95), linewidth = .5,height = .2, color ='gray50') +geom_point(size =2.5, color = uhred) +labs(x ="aOR",y ="") +scale_x_continuous(breaks =seq(0,8,0.5))p1/p2 +plot_layout(heights =c(5,1))```## Summarizing Fit (`m2_coef_wt`)To summarize the fit of `m2_lrm_wt`, I simply entered `m2_lrm_wt` within the code. From the output, the Nagelkerke $R^2$ is 0.161 and the unvalidated area under the ROC curve (aka c-statistic) is 0.737.```{r}m2_lrm_wt```## Validating `m2_coef_wt`To validate `m2_lrm_wt`, I first set a seed of 19900317 using the `set.seed()` function. I then used the `validate()` function with 40 bootstrap repetitions to generate validated (index.corrected) $R^2$ and Dxy values. I stored this validated summary as `val_m2` so that I could obtain specific values for a summary table below.```{r}set.seed(19900317)val_m2 <-validate(m2_lrm_wt, method ="boot", B =40)``````{r}fit_report <-tibble(Model ="m2") |>mutate("Validated R2"=case_when( Model =="m2"~ val_m2[2,5]),"Validated C stat"=case_when( Model =="m2"~0.5+ (val_m2[1,5]/2)),AIC =AIC(m2_lrm_wt),BIC =BIC(m2_lrm_wt))fit_report |>select(Model, "Validated R2", "Validated C stat", AIC, BIC) |>flextable() |>colformat_double(j =2:3, digits =3) |>colformat_double(j =4:5, digits =0) |>autofit() |>theme_zebra(odd_header ="white") |>hline_top(part="body")```## ReferencesFord, C. (2016). Getting Started with Hurdle Models. University of Virginia Library. https://data.library.virginia.edu/getting-started-with-hurdle-models/Kroenke, K., Spitzer, R. L., & Williams, J. B. (2001). The PHQ-9: validity of a brief depression severity measure. Journal of General Internal Medicine, 16(9), 606-613. https://doi.org/10.1046/j.1525-1497.2001.016009606.xLöwe, B., Decker, O., Müller, S., Brähler, E., Schellberg, D., Herzog, W., & Herzberg, P. Y. (2008). Validation and standardization of the Generalized Anxiety Disorder Screener (GAD-7) in the general population. Medical Care, 46(3), 266-274. https://doi.org/10.1097/MLR.0b013e318160d093National Center for Health Statistics. (2021, 2021-04-05). 2019 National Health Interview Survey. National Center for Health Statistics. https://www.cdc.gov/nchs/nhis/2019nhis.htmYoshida, K. (2014). Models for excess zeros using pscl package (Hurdle and zero-inflated regression models) and their interpretations. Retrieved May 1 from https://rpubs.com/kaz_yos/pscl-2Zelaya, C. E., Dahlhamer, J. M., Lucas, J. W., & Connor, E. M. (2020). Chronic Pain and High-impact Chronic Pain Among U.S. Adults, 2019.## Acknowledgments- Robert J. Trager, DC (methodology, interpretation, literature review, first draft of introduction)- Jeffery A. Dusek, PhD (methodology, interpretation)- The National Center for Health Statistics (data collection)- Thomas E. Love, PhD (methodology instruction)# Session Information```{r}xfun::session_info()```