Skip to content

CiteSource 0.2.0 (CRAN Submission Ready)#256

Open
TNRiley wants to merge 58 commits into
mainfrom
dev
Open

CiteSource 0.2.0 (CRAN Submission Ready)#256
TNRiley wants to merge 58 commits into
mainfrom
dev

Conversation

@TNRiley
Copy link
Copy Markdown
Collaborator

@TNRiley TNRiley commented May 15, 2026

No description provided.

TNRiley and others added 30 commits November 19, 2025 21:25
- Add expand_metadata_columns() for expanding multiple columns efficiently
- Add expand_single_metadata_column() for single column operations
- Standardizes separator pattern to ',\\s*' for consistent handling
- Reduces code duplication and improves maintainability
- Replace base R strsplit/table approach with tidyr-based helper
- Standardizes separator pattern to ',\\s*' for consistent handling
- More readable and maintainable code
- Output format remains compatible with existing code
- Replace three separate separate_rows() calls with single expand_metadata_columns() call
- More efficient single-pass expansion of all metadata columns
- Standardizes separator pattern to ',\\s*'
- Maintains same functionality and output format
- Replace three separate if blocks with a loop using column mapping
- Use expand_single_metadata_column() instead of separate_rows()
- Standardizes separator pattern to ',\\s*'
- More maintainable and efficient code
- Maintains all existing functionality including label warnings
- Update calculate_initial_records() to use expand_single_metadata_column()
- Update calculate_detailed_records() to use expand_single_metadata_column()
- Update calculate_phase_records() to use expand_single_metadata_column() and expand_metadata_columns()
- Standardizes separator pattern to ',\\s*' throughout
- More efficient and maintainable code
- Replace two separate_rows() calls with single expand_metadata_columns() call
- Standardizes separator pattern to ',\\s*'
- More efficient single-pass expansion
- Removes redundant str_trim() calls (handled by helper function)
- Replace separate_rows() with expand_single_metadata_column()
- Use proper column reference instead of positional index
- Standardizes separator pattern to ',\\s*'
- More maintainable and consistent code
- Fix incorrect calculation: use nrow(rv) instead of nrow(rv)
  - rv is expanded metadata (count_unique output) which inflates the count
  - rv is the actual unique citations after deduplication
- Add calculation of duplicates removed (n_citations - n_unique_records)
- Improve message formatting with:
  - Clear line breaks for readability
  - Number formatting with commas (e.g., 9,175 instead of 9175)
  - More informative structure showing totals, unique, and duplicates removed
- Apply fixes to both app.R and app2.R
- Resolves issue where message showed more unique citations than total citations
- Replace separate_rows() calls in unique_separated_phase() with expand_metadata_columns()
- Replace separate_rows() + mutate(trimws) + filter in detailed_table_data() with expand_single_metadata_column()
- Standardizes separator pattern to ',\\\\s*' throughout
- More efficient and consistent with R package functions
- Replace separate_rows() calls in unique_separated_phase() with expand_metadata_columns()
- Replace separate_rows() + mutate(trimws) + filter in detailed_table_data() with expand_single_metadata_column()
- Replace separate_rows() in completeness_data() with expand_single_metadata_column()
- Standardizes separator pattern to ',\\\\s*' throughout
- More efficient and consistent with R package functions
- Update all calls to expand_metadata_columns() and expand_single_metadata_column() to use CiteSource::: prefix
- This allows Shiny apps to access internal (non-exported) helper functions
- Fixes error: 'could not find function expand_single_metadata_column'
- Helper functions remain internal (not exported) as per best practices
- Apply changes to both app.R and app2.R
- Exclude 'unknown' cite_source values after expanding metadata columns
- Records with 'screened' or 'final' labels intentionally have empty cite_source
- These get converted to 'unknown' during deduplication with show_unknown_tags=TRUE
- Including 'unknown' in detailed record table causes:
  * Misleading source row that isn't actually a search source
  * Incorrect counts (Records Imported, Distinct Records)
  * Skewed percentage calculations (Source Contribution %, etc.)
  * Affected Total row calculations
- Filtering 'unknown' ensures table only analyzes actual search sources
- Consistent with calculate_phase_records() which already filters 'unknown'
- Does not affect other tables/visuals (precision/sensitivity table uses different function)
- Apply fix to both app.R and app2.R
- Added interactive "Use This" buttons to Card View for granular selection of Title, Author, Abstract, and Journal.
- Implemented custom JavaScript to handle cross-column button toggling and provide immediate visual feedback.
- Added server-side listener `input$field_preference_click` to store user preferences.
- Implemented `apply_field_preferences` logic to overwrite surviving records with user selections in the final merged dataset.
- Updated "Default" badge logic to correctly handle cases where one value is missing.
- Fixed `record_id` vs `duplicate_id` column resolution during merge.
- Resolved "unknown column" warnings by properly initializing `field_preferences` before JSON serialization.
- Add parameters use_custom, blocking_rounds, validation_criteria to dedup_citations().
- When use_custom=TRUE, call internal dedup_citations_custom() with configurable
  blocking rounds and validation criteria; fall back to default ASySD on error or
  if custom implementation is not loaded.
- New R/dedup_custom.R: custom ASySD wrapper with default blocking rounds and
  validation criteria matching ASySD behaviour, plus optional stats on which
  criteria identified duplicate pairs.

Shiny app (inst/shiny-app/CiteSource/app2.R):
- Wire in custom deduplication option so users can run configurable dedup from
  the app (e.g. UI for use_custom and related options).

Documentation:
- Add AUTHOR_HANDLING.md: end-to-end documentation of author name handling,
  including expected format (e.g. "Last, First and Last, First"), cleaning rules,
  and behaviour in import, cleaning, deduplication, citation generation, export,
  and data conversion.
…nette revisions

- Shiny app.R: full UI overhaul with bslib cards, workflow stepper, export hub,
  bidirectional filters, smart empty states, card view dedup, global.R bootstrap
- Vendor ASySD deduplication engine into R/asys_dedup.R; remove ASySD GitHub dependency
- Drop app2.R (superseded by refactored app.R)
- Migrate all pipes to native |>; vectorize APA citation generation
- CRAN compliance: specific @importFrom declarations, globalVariables, remove plogr
- Add renv and renv.lock for reproducible dependency management
- Revise vignettes: benchmark testing, screening phases, db-validation, topic coverage
- Add search string comparison vignette
- Fix deployment workflow: restore rsconnect + deploy steps, add RENV_CONFIG_SNAPSHOT_VALIDATE
- Update CITATION.cff to v0.2.0 with release date
- NEWS.md: full v0.2.0 changelog
- Add custom_dedup_notes.md to .gitignore (local reference doc, not for commit)
- Remove citesource_working_example and citesource_benchmark_testing (deleted)
- Add citesource_search_string_comparison (new)
- Change url from http to https to match DESCRIPTION
- Remove deprecated record_counts_table(), record_summary_table(),
  precision_sensitivity_table() from R/tables.R
- Add CITESOURCE2_ANALYSIS.md to .Rbuildignore
- Fix pkgdown workflow: install from local source (devtools::install)
  instead of remotes::install_github() which installed from main branch
split(.$type) and split(.$facet) use magrittr's dot which is not
supported by the native pipe - replace with anonymous function wrappers
calculate_initial_records(), calculate_detailed_records(), and
calculate_phase_records() moved to count.R; create_initial_record_table(),
create_detailed_record_table(), and create_precision_sensitivity_table()
moved to tables.R. Delete new_count_and_table.R.
- Add CITATION.cff, REQUIREMENTS.*, ASySD info to .Rbuildignore
- Document show_labels and log_scale params in plot_source_overlap_heatmap
- Fix devtools::install(upgrade = 'never') to upgrade = FALSE (older devtools compat)
TNRiley and others added 25 commits May 13, 2026 17:54
Wrap inline if/else in parentheses so R parsers on older server versions
can unambiguously determine the expression boundary before the trailing comma.
Fix ambiguous if/else parse in app.R tibble construction
Remove extra closing paren introduced during merge (line 838), and
wrap inline if/else in parentheses to resolve ambiguous parse on
older R server versions.
Fix two app.R parse errors preventing Shiny startup
rsconnect uses Remotes: field to know CiteSource comes from GitHub.
Without this, CiteSource was missing from the deployment manifest
and shinyapps.io never installed it.
Add DESCRIPTION to Shiny app dir so rsconnect includes CiteSource
roxygen2 8.0.0 rejects @importFrom tags that span multiple lines via
continuation indentation. Split each into separate single-line entries.
Fix multi-line @importFrom tags for roxygen2 8.0.0 compatibility
remotes::install_github installs into a separate library that
renv/pak (used by rsconnect for dependency detection) cannot see.
Installing via extra-packages puts CiteSource in the same library
so rsconnect includes it in the deployment manifest.
Install CiteSource via pak in setup step so rsconnect detects it
…urce

pak (used by extra-packages) marks transitive deps as 'deps' source type,
which shinyapps.io rejects. renv::install installs into renv's own library
with proper GitHub source metadata that rsconnect and shinyapps.io both
understand.
Fix workflow conditional syntax for GitHub Pages deploy step
- Fix Title to use title case per CRAN policy
- Rewrite Description to not start with package name
- Add CRAN eval guard to all vignettes so chunks are skipped when
  vignette data is absent (R CMD check), preventing file-not-found errors
- Exclude vignette data directories, shinytest fixtures, and .claude
  worktree from build via .Rbuildignore, reducing tarball from 31 MB
  to 2.6 MB
- Update.Rbuildignore to prevent worktree from entering tarball
- Quote 'RIS' and 'CSV' in Description to avoid spell-check flags
- Replace pre-built with prebuilt in Description to avoid hyphen-split flag
- Fix maintainer email to tnriley@gmail.com
- Replace \dontrun{} with \donttest{} in reimport_csv example
- Update cran-comments.md: correct note count, add reverse dependency statement
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants