This project involves the application of Variational Bayes (VB) methods to fit a probabilistic model to the presence and onset times of various LTCs within a study population. The data is structured in an .rds file containing a named list with multiple components, each representing different aspects of the dataset.
To run the code in this project, the following R libraries are required:
extraDistr: For working with additional distributions not available in base R.
The .rds file contains a named list with the following components:
d: An N x M matrix containing data about the presence of conditions for each individual. Each element represents whether a specific condition is present (1 for presence, 0 for absence).t: An N x M matrix containing data about the onset ages of conditions for each individual. Each element represents the age at which the condition was observed or diagnosed.rho: A vector of length N containing the study start age for each individual. Each element represents the age at the start of the study for a corresponding individual.tau: A vector of length N containing the study end age for each individual. Each element represents the age at the of the study for a corresponding individual.iota: A vector of length N indicating the status of each individual at the time of tau. Each element is a binary indicator, where 1 represents that the individual is alive and 0 represents that the individual is deceased.N: An integer representing the number of individuals in the dataset.K: An integer representing the number of clusters in the dataset.M: An integer representing the number of conditions in the dataset.sex: A vector representing the sex of the individuals in the dataset. This is set toNULLif the information is not available.birth_conds: A vector containing the column indices indthat correspond to conditions which only occur at birth. This is set toNULLif there are no such conditions or the information is not available.male_conds: A vector containing the column indices indthat correspond to conditions which only occur in males. This is set toNULLif there are no such conditions or the information is not available.female_conds: A vector containing the column indices indthat correspond to conditions which only occur in females. This is set toNULLif there are no such conditions or the information is not available.cond_list: A string vector containing the names of the conditions included in the dataset. Each element in the vector corresponds to a condition listed in the matrixd.
Purpose: Perform Variational Bayes (VB) updates for the Gaussian latent class model with fixed K and censored data.
Inputs:
d: A N x M matrix containing data about the presence of conditions.t: A N x M matrix containing data about the onset ages of conditions.rho: A vector of length N containing study start ages.tau: A vector of length N containing study end ages.iota: A vector of length N indicating if individuals are alive/deceased at age tau.hyperparameters: A list of hyperparameters of the prior.initial_Cstar: A N x K matrix containing an initial value to initialize the latent variablez.initial_Dstar: A N x M matrix containing an initial value to initialize the latent variabled.initial_pstar: A N x M matrix containing an initial value to initialize the latent variabled.initial_qstar: A N x M matrix containing an initial value to initialize the latent variablet.initial_rstar: A N x M matrix containing an initial value to initialize the latent variablet.N: The number of individuals in the data.K: The number of clusters.M: The number of conditions in the data.epsilon: A number used to determine the stopping condition.sex: The sex of individuals in the data.birth_conds: Column indices fordindicating conditions that only occur at birth.male_conds: Column indices fordindicating conditions which only occur in males.female_conds: Column indices fordindicating conditions which only occur in females.cond_list: A string vector containing the names of the conditions.
Outputs:
theta_star: A vector of parameters for the VB posterior ofgamma.a_star: A M x K matrix of parameters for the VB posterior ofpi.b_star: A M x K matrix of parameters for the VB posterior ofpi.u_star,v_star,alpha_star,beta_star: M x K matrices of parameters for the VB posterior ofmuandsigma^2.C_star: A N x K matrix of parameters for the VB posterior ofz.p_star,q_star,r_star,D_star: N x M matrices of parameters for the VB posterior ofdandt.n_steps: The number of iterations required to achieve the stopping condition.
Purpose: Calculate the expected value of a left-truncated location-scale t-distribution.
Inputs:
df: Degrees of freedom of the t-distribution.mu: Location parameter of the t-distribution.sigma: Scale parameter of the t-distribution.tau: The left truncation point.
Outputs:
u: The expected value.
Purpose: Estimate the parameters of the posterior predictive distribution for a new individual.
Inputs:
hyperparameters: A list of hyperparameters of the posterior.M_obs: Indices of the individual's fully observed conditions.M_part: Indices of the individual's partially observed conditions.M_unobs: Indices of the individual's unobserved conditions.d_obs: A vector containing the absence/presence of the fully observed conditions as 0s and 1s.t_obs: A vector containing the onset ages of the fully observed conditions.d_part: A vector containing the absence/presence of the partially observed conditions as 0s and 1s.rho: A vector of length N containing observation start ages.tau: A vector of length N containing observation end ages.M: The number of conditions in the data.
Outputs:
phi: A vector of cluster probabilities.eta: A matrix of condition probabilities given cluster.varpi: A vector of useful intermediate condition probabilities conditional on clusters.
Purpose: Estimate the expected time of occurrence of conditions in an individual after tau.
Inputs:
parameters: A list of posterior predictive parameters (the output ofVB_gaussian_predictive_density).hyperparameters: A list of hyperparameters of the posterior.tau: The individual's age at end of observation.M: The number of conditions.
Outputs:
Et: Expected onset times.
Purpose: Estimate the probability of conditions occurring in an individual by time T.
Inputs:
parameters: A list of posterior predictive parameters (the output ofVB_gaussian_predictive_density).hyperparameters: A list of hyperparameters of the posterior.T: The age by which the conditions should occur.tau: The individual's age at end of observation.M: The number of conditions.
Outputs:
prob: A vector of probabilities.