Add calculate_cng_indices by sjawhar · Pull Request #97 · EducationalTestingService/factor_analyzer

sjawhar · 2022-03-15T03:38:27Z

I implemented this function while working on a paper and decided it belongs in this excellent project. I will add tests (I have been comparing to output from R), but there is a major potential issue in the code which I will mark out with inline comments.

sjawhar · 2022-03-15T03:43:55Z

+    if model == "factors":
+        data -= np.linalg.pinv(np.diag(np.diag(np.linalg.pinv(data))))
+        # TODO: Should this line be here?
+        data = covariance_to_correlation(data)


This line comes from the eigenComputes function. When the data passed in are correlations and cor == TRUE, it calls eigen(cov2cor(corFa(x))). However, it doesn't do this if the variables/observations themselves are passed, in which case it calls eigen(corFA(cov(x))). This produces very different results! I think this is only needed if covariances are passed in rather than correlations, but since this implementation calculates the correlations I think this line should be removed. Agreed?

So, are we assuming the input value here will always be eigenForm == "data"? If so, I think you're right that this should be removed, but I may be misunderstanding.

Yeah, I think you're right. I was getting confused by the case where dataType == correlation && cor == FALSE, but I think in that case the data passed in should actually be a covariance matrix, not a correlation matrix.

jbiggsets · 2022-03-17T20:31:34Z

+
+    values = np.sort(np.linalg.eigvals(data))[::-1]
+
+    num_variables = len(data)


Won't len(data) give you the number of observations, rather than the number of variables, assuming data is a numpy array or pandas data frame?

Unfortunately, this doesn't translate exactly from R:

> df <- data.frame(A = c(1, 2, 3), B = c(1, 2, 3)) > length(df) [1] 2

>>> import pandas as pd >>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [1, 2, 3]}) >>> len(df) 3 >>> len(df.values) 3

Also, should we move the line below to the beginning?

At this line of the code data refers to the correlation matrix, which is square. data.shape[0] == data.shape[1]

jbiggsets · 2022-03-17T20:32:54Z

+        The eigenvalues and CNG indices of the dataset
+    """
+    data = corr(data)
+    if model == "factors":


If we're just going to have "factors", do we need the model argument? Does it make sense to exclude this for now?

if model == "components", the adjustment underneath is not needed. In other words, only taking the correlation of the data is sufficient. We might want to add elif model != "components": raise ValueError

jbiggsets · 2022-03-17T20:39:25Z

+    if model == "factors":
+        data -= np.linalg.pinv(np.diag(np.diag(np.linalg.pinv(data))))
+        # TODO: Should this line be here?
+        data = covariance_to_correlation(data)


So, are we assuming the input value here will always be eigenForm == "data"? If so, I think you're right that this should be removed, but I may be misunderstanding.

jbiggsets · 2022-03-17T20:40:57Z

Thanks so much for submitting this PR! I had a few comments/questions.

Let me know if any of them don't make sense, or if you just want me to submit a PR to merge into this branch.

sjawhar commented Mar 15, 2022

View reviewed changes

Add calculate_cng_indices

9143d0e

sjawhar force-pushed the feature/cng-indices branch from 62823a7 to 9143d0e Compare March 15, 2022 03:44

jbiggsets requested changes Mar 17, 2022

View reviewed changes

sjawhar added 2 commits March 18, 2022 20:40

Address PR comments

c1e57d2

Merge branch 'main' into feature/cng-indices

5d9afbd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add calculate_cng_indices#97

Add calculate_cng_indices#97
sjawhar wants to merge 3 commits into
EducationalTestingService:mainfrom
sjawhar:feature/cng-indices

sjawhar commented Mar 15, 2022

Uh oh!

sjawhar Mar 15, 2022

Uh oh!

jbiggsets Mar 17, 2022 •

edited

Loading

Uh oh!

sjawhar Mar 19, 2022

Uh oh!

jbiggsets Mar 17, 2022

Uh oh!

sjawhar Mar 19, 2022

Uh oh!

jbiggsets Mar 17, 2022

Uh oh!

sjawhar Mar 19, 2022

Uh oh!

jbiggsets Mar 17, 2022 •

edited

Loading

Uh oh!

jbiggsets commented Mar 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		values = np.sort(np.linalg.eigvals(data))[::-1]

		num_variables = len(data)

Conversation

sjawhar commented Mar 15, 2022

Uh oh!

sjawhar Mar 15, 2022

Choose a reason for hiding this comment

Uh oh!

jbiggsets Mar 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sjawhar Mar 19, 2022

Choose a reason for hiding this comment

Uh oh!

jbiggsets Mar 17, 2022

Choose a reason for hiding this comment

Uh oh!

sjawhar Mar 19, 2022

Choose a reason for hiding this comment

Uh oh!

jbiggsets Mar 17, 2022

Choose a reason for hiding this comment

Uh oh!

sjawhar Mar 19, 2022

Choose a reason for hiding this comment

Uh oh!

jbiggsets Mar 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbiggsets commented Mar 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jbiggsets Mar 17, 2022 •

edited

Loading

jbiggsets Mar 17, 2022 •

edited

Loading