Skip to content

docs: add development estimator doctest examples#799

Open
EKtheSage wants to merge 5 commits into
casact:mainfrom
EKtheSage:docs/704-development-estimator-examples
Open

docs: add development estimator doctest examples#799
EKtheSage wants to merge 5 commits into
casact:mainfrom
EKtheSage:docs/704-development-estimator-examples

Conversation

@EKtheSage
Copy link
Copy Markdown
Contributor

@EKtheSage EKtheSage commented May 16, 2026

Summary: Add Sphinx doctest examples for Barnett-Zehnwirth, GLM, IncrementalAdditive, DevelopmentML, Outstanding, and MackChainladder. Split from the larger #786/#792 work and intentionally excludes .github/workflows/sync-main-to-docs.yml. Refs #704


Note

Low Risk
Low risk: changes are almost entirely docstring/documentation updates (examples, wording, and minor typos), with no substantive algorithm or API behavior changes expected.

Overview
Adds doctest-style usage examples and more detailed narrative docs to key estimators: BarnettZehnwirth, TweedieGLM, DevelopmentML, IncrementalAdditive, CaseOutstanding, and MackChainladder.

Also tightens wording/parameter descriptions (e.g., response typos, clarifying case_n_periods) and makes small doc formatting fixes (reST directives, newline at EOF) to support Sphinx doctest execution.

Reviewed by Cursor Bugbot for commit 2f5ee76. Bugbot is set up for automated code reviews on this repo. Configure here.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.23%. Comparing base (72b270c) to head (2f5ee76).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #799   +/-   ##
=======================================
  Coverage   86.23%   86.23%           
=======================================
  Files          86       86           
  Lines        4947     4947           
  Branches      643      643           
=======================================
  Hits         4266     4266           
  Misses        484      484           
  Partials      197      197           
Flag Coverage Δ
unittests 86.23% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread chainladder/development/barnzehn.py Outdated

Examples
--------
A patsy ``formula`` and the built-in ``alpha`` / ``gamma`` / ``iota`` PTF
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for putting some examples together. a new user probably still needs more explanation of what each of the examples are actually accomplishing

Comment thread chainladder/development/incremental.py Outdated

Examples
--------
Basic fit on ``ia_sample`` with exposure on the latest diagonal.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are we hoping for a new user to take away from the first example?

Comment thread chainladder/development/learning.py Outdated

Examples
--------
``fit_incrementals`` toggles whether the pipeline fits on incrementals
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does the user take away here? it reads like "look, changing this parameter is actually doing something!". there is an incredible amount of nuance here. depending on the usage, incremental could mean the age to age factor, or could mean the actual incremental dollar amounts.

@EKtheSage EKtheSage force-pushed the docs/704-development-estimator-examples branch from eee0725 to c547ad3 Compare May 17, 2026 04:20
Comment thread chainladder/development/learning.py Outdated
--------
Choose the response scale before fitting a machine-learning development
model. In this class, ``fit_incrementals=True`` means the scikit-learn model
is trained on actual incremental dollar amounts. Setting it to ``False``
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an inaccurate statement. when the link function is identity, then it's fitting on actual dollar amounts.

@henrydingliu
Copy link
Copy Markdown
Collaborator

the general approach in this batch of PRs is straying from the original intent.

  • narratively, these examples are not meant to be a slight rewording of the existing parameter descriptions. an example should generally follow the structure of "sometimes you run into this realistic situation" -> "you use this parameter this way to deal with that situation".
  • the actual tests need to be more extensive, rather than just showing and reconciling to the first ldf

Address @henrydingliu feedback: reframe all examples as problem-solution
narratives, fix inaccurate fit_incrementals description in DevelopmentML,
add autoregressive example, expand test outputs beyond single LDF values,
and explain weighted_step Pipeline routing mechanism.
Comment thread chainladder/development/barnzehn.py Outdated
breakpoints for a piecewise linear development-age trend, and ``iota``
defines breakpoints for a piecewise linear calendar-year (diagonal) trend.

The ``abc`` triangle has 11 accident years (1977-1987) and 11 development
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good use case. we can probably move the first two paragraphs to the beginning of the docstring.

the rest of the explanation, however, isn't technically robust. the two sets of coefficients from the two patsy formulas aren't apples-to-apples (i.e. first coefficient always applies to the first accident year, second coefficient always applies to the second accident year, etc.)

Comment thread chainladder/development/incremental.py Outdated
Use ``IncrementalAdditive`` when incremental losses per unit of exposure
are expected to be more stable across accident years than age-to-age
factors. This is common in lines where claim payments are driven by
exposure volume: for example, a workers' compensation book where
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is true. Can we source this back to the paper that defined this method?

Comment thread chainladder/development/glm.py Outdated

Examples
--------
``TweedieGLM`` with ``power=1`` and log link implements the
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example not linked to a realistic use case.

Comment thread chainladder/development/learning.py Outdated

Examples
--------
``DevelopmentML`` bridges scikit-learn and the chainladder workflow:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what this means

Comment thread chainladder/development/learning.py Outdated
``ldf_`` patterns usable with tail selection, ``Chainladder``, and other
methods.

``fit_incrementals`` controls what the estimator is trained to predict.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example not based on a realistic use case.

Comment thread chainladder/development/outstanding.py Outdated
tells the estimator which triangle columns represent the paid and incurred
series.

When case reserving practices have changed recently (e.g., a company
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we show in this in the example?

Expand doctest Examples for TweedieGLM, CaseOutstanding,
IncrementalAdditive, DevelopmentML, and Barnett-Zehnwirth to match
guide narratives, reviewer feedback, and stable outputs.

Refs casact#704

Co-authored-by: Cursor <cursoragent@cursor.com>
@EKtheSage
Copy link
Copy Markdown
Contributor Author

@henrydingliu , I think I made an update to the PR, not sure if you've had the chance to review?


Examples
--------
When many accident years are available but you want a smaller number of
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for crisping up the example narrative. this reads great


.. testoutput::

[2.2854 2.2854 2.2854 2.2854]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we perform the test on the full ldf_?


.. testoutput::

3.491
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we perform the test on the full ldf_?

3.4906
[3.491 3.491 3.491 3.491]

Patsy R-style formulas set ``design_matrix``; continuous ``development``
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for continuity, we can describe that continuous terms involves removing the C()


tri = cl.load_sample("genins")
glm = cl.TweedieGLM(design_matrix="development + origin").fit(tri)
print(len(glm.coef_))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an explicit comparison of the ldf_ between continuous terms and categorical terms would be more illustrative

class IncrementalAdditive(DevelopmentBase):
""" The Incremental Additive Method.

This estimator implements the additive method of Schmidt (2006), Section 4.7:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very helpful!

``i`` and ``gamma_k`` is an incremental loss ratio at development age ``k``
that is common to all accident years. The fitted ``zeta_`` estimates those
common ``gamma_k``; unobserved incrementals are completed as
``zeta_ * sample_weight``. Dollar ``incremental_`` differ by origin because
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect that we will be doing some refactoring of this method in the near future in order to replicate friedland's frequency/severity methods. This is also the only transformer we have that also serves as a predictor. So let's get these docstrings into the repo now and revise as needed once we settle on the necessary refactor.

.. testoutput::

10
[ 0.002 0.003 -0.01 0.003 0.011 0.008 0.005 -0. -0.002]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love the full arrays


import numpy as np

tri = cl.load_sample("genins")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after reading the 3rd example, I think it may be cleaner to use clrd['comauto'] instead of genins

clrd = clrd[clrd["LOB"].isin(["ppauto", "comauto"])]
dev = cl.TweedieGLM(
design_matrix=(
"LOB+LOB:C(np.minimum(development, 36))"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this complication isn't explained

"LOB+LOB:C(np.minimum(development, 36))"
"+LOB:development+LOB:origin"
),
max_iter=1000,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first time we are passing in a value for max_iter. do we need to explain?


Examples
--------
Features from any triangle axis can enter an sklearn
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this essentially trying to accomplish the same as the 3rd example from glm? if so, let's use similar language to set up the business problem, prior to these descriptions around how the various pipelines accomplish that same problem

.. testoutput::

(6, 1, 10, 9)
[1.7448 0.9854 0.8117 0.6495]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these values seem pretty terrible. I'm not sure if they need to be exponentiated? is this because the model is linear regression? ultimately I think we should keep these examples in the realm of GAAP. I don't want any student to read this and think a linear regressor is a good idea.


.. testoutput::

3.508
Copy link
Copy Markdown
Collaborator

@henrydingliu henrydingliu May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general, please build the doctest around the full parameter array. for this example in particular, wouldn't we at least want to look beyond the 1st ldf_ to demonstrate the difference between cumulative and incremental?

3.508
3.515

Autoregressive features use prior development predictions as covariates.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't know if autoregressive is behaving correctly. it's also hard to evaluate based on this test whether the resulting estimator is reasonable. (per prior comment, I'd like to stay within GAAP) we may want to leave it to a future round of enhancements


class CaseOutstanding(DevelopmentBase):
"""A determinisic method based on outstanding case reserves.
""" Deterministic development from prior-lag case reserves.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to incrementaladditive, I expect we'll be doing some refactoring on this estimator for friedland. We can just get these docstrings in now and revise later.

cl.MackChainladder().fit(tr)
model = cl.MackChainladder().fit(tr)
print(model.ibnr_.to_frame(origin_as_datetime=False).round(1))
print(np.round(model.mack_std_err_.values[0, 0, :, -1], 1))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any chance we can get these two vectors into a single dataframe?

read the Mack standard error off the resulting Triangle.
``predict`` re-applies the fitted age-to-age factors and sigma
estimates to a new triangle without refitting. A common use is
sensitivity testing: scale the reported losses by an adverse factor
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like a linear relationship by definition. is there a reference to this in the mack paper?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants