docs: add development estimator doctest examples#799
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #799 +/- ##
=======================================
Coverage 86.23% 86.23%
=======================================
Files 86 86
Lines 4947 4947
Branches 643 643
=======================================
Hits 4266 4266
Misses 484 484
Partials 197 197
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
|
||
| Examples | ||
| -------- | ||
| A patsy ``formula`` and the built-in ``alpha`` / ``gamma`` / ``iota`` PTF |
There was a problem hiding this comment.
thanks for putting some examples together. a new user probably still needs more explanation of what each of the examples are actually accomplishing
|
|
||
| Examples | ||
| -------- | ||
| Basic fit on ``ia_sample`` with exposure on the latest diagonal. |
There was a problem hiding this comment.
what are we hoping for a new user to take away from the first example?
|
|
||
| Examples | ||
| -------- | ||
| ``fit_incrementals`` toggles whether the pipeline fits on incrementals |
There was a problem hiding this comment.
what does the user take away here? it reads like "look, changing this parameter is actually doing something!". there is an incredible amount of nuance here. depending on the usage, incremental could mean the age to age factor, or could mean the actual incremental dollar amounts.
eee0725 to
c547ad3
Compare
| -------- | ||
| Choose the response scale before fitting a machine-learning development | ||
| model. In this class, ``fit_incrementals=True`` means the scikit-learn model | ||
| is trained on actual incremental dollar amounts. Setting it to ``False`` |
There was a problem hiding this comment.
this is an inaccurate statement. when the link function is identity, then it's fitting on actual dollar amounts.
|
the general approach in this batch of PRs is straying from the original intent.
|
Address @henrydingliu feedback: reframe all examples as problem-solution narratives, fix inaccurate fit_incrementals description in DevelopmentML, add autoregressive example, expand test outputs beyond single LDF values, and explain weighted_step Pipeline routing mechanism.
| breakpoints for a piecewise linear development-age trend, and ``iota`` | ||
| defines breakpoints for a piecewise linear calendar-year (diagonal) trend. | ||
|
|
||
| The ``abc`` triangle has 11 accident years (1977-1987) and 11 development |
There was a problem hiding this comment.
good use case. we can probably move the first two paragraphs to the beginning of the docstring.
the rest of the explanation, however, isn't technically robust. the two sets of coefficients from the two patsy formulas aren't apples-to-apples (i.e. first coefficient always applies to the first accident year, second coefficient always applies to the second accident year, etc.)
| Use ``IncrementalAdditive`` when incremental losses per unit of exposure | ||
| are expected to be more stable across accident years than age-to-age | ||
| factors. This is common in lines where claim payments are driven by | ||
| exposure volume: for example, a workers' compensation book where |
There was a problem hiding this comment.
I don't think this is true. Can we source this back to the paper that defined this method?
|
|
||
| Examples | ||
| -------- | ||
| ``TweedieGLM`` with ``power=1`` and log link implements the |
There was a problem hiding this comment.
example not linked to a realistic use case.
|
|
||
| Examples | ||
| -------- | ||
| ``DevelopmentML`` bridges scikit-learn and the chainladder workflow: |
There was a problem hiding this comment.
not sure what this means
| ``ldf_`` patterns usable with tail selection, ``Chainladder``, and other | ||
| methods. | ||
|
|
||
| ``fit_incrementals`` controls what the estimator is trained to predict. |
There was a problem hiding this comment.
example not based on a realistic use case.
| tells the estimator which triangle columns represent the paid and incurred | ||
| series. | ||
|
|
||
| When case reserving practices have changed recently (e.g., a company |
There was a problem hiding this comment.
can we show in this in the example?
Expand doctest Examples for TweedieGLM, CaseOutstanding, IncrementalAdditive, DevelopmentML, and Barnett-Zehnwirth to match guide narratives, reviewer feedback, and stable outputs. Refs casact#704 Co-authored-by: Cursor <cursoragent@cursor.com>
|
@henrydingliu , I think I made an update to the PR, not sure if you've had the chance to review? |
|
|
||
| Examples | ||
| -------- | ||
| When many accident years are available but you want a smaller number of |
There was a problem hiding this comment.
thanks for crisping up the example narrative. this reads great
|
|
||
| .. testoutput:: | ||
|
|
||
| [2.2854 2.2854 2.2854 2.2854] |
There was a problem hiding this comment.
can we perform the test on the full ldf_?
|
|
||
| .. testoutput:: | ||
|
|
||
| 3.491 |
There was a problem hiding this comment.
can we perform the test on the full ldf_?
| 3.4906 | ||
| [3.491 3.491 3.491 3.491] | ||
|
|
||
| Patsy R-style formulas set ``design_matrix``; continuous ``development`` |
There was a problem hiding this comment.
for continuity, we can describe that continuous terms involves removing the C()
|
|
||
| tri = cl.load_sample("genins") | ||
| glm = cl.TweedieGLM(design_matrix="development + origin").fit(tri) | ||
| print(len(glm.coef_)) |
There was a problem hiding this comment.
an explicit comparison of the ldf_ between continuous terms and categorical terms would be more illustrative
| class IncrementalAdditive(DevelopmentBase): | ||
| """ The Incremental Additive Method. | ||
|
|
||
| This estimator implements the additive method of Schmidt (2006), Section 4.7: |
| ``i`` and ``gamma_k`` is an incremental loss ratio at development age ``k`` | ||
| that is common to all accident years. The fitted ``zeta_`` estimates those | ||
| common ``gamma_k``; unobserved incrementals are completed as | ||
| ``zeta_ * sample_weight``. Dollar ``incremental_`` differ by origin because |
There was a problem hiding this comment.
I expect that we will be doing some refactoring of this method in the near future in order to replicate friedland's frequency/severity methods. This is also the only transformer we have that also serves as a predictor. So let's get these docstrings into the repo now and revise as needed once we settle on the necessary refactor.
| .. testoutput:: | ||
|
|
||
| 10 | ||
| [ 0.002 0.003 -0.01 0.003 0.011 0.008 0.005 -0. -0.002] |
There was a problem hiding this comment.
love the full arrays
|
|
||
| import numpy as np | ||
|
|
||
| tri = cl.load_sample("genins") |
There was a problem hiding this comment.
after reading the 3rd example, I think it may be cleaner to use clrd['comauto'] instead of genins
| clrd = clrd[clrd["LOB"].isin(["ppauto", "comauto"])] | ||
| dev = cl.TweedieGLM( | ||
| design_matrix=( | ||
| "LOB+LOB:C(np.minimum(development, 36))" |
There was a problem hiding this comment.
this complication isn't explained
| "LOB+LOB:C(np.minimum(development, 36))" | ||
| "+LOB:development+LOB:origin" | ||
| ), | ||
| max_iter=1000, |
There was a problem hiding this comment.
first time we are passing in a value for max_iter. do we need to explain?
|
|
||
| Examples | ||
| -------- | ||
| Features from any triangle axis can enter an sklearn |
There was a problem hiding this comment.
is this essentially trying to accomplish the same as the 3rd example from glm? if so, let's use similar language to set up the business problem, prior to these descriptions around how the various pipelines accomplish that same problem
| .. testoutput:: | ||
|
|
||
| (6, 1, 10, 9) | ||
| [1.7448 0.9854 0.8117 0.6495] |
There was a problem hiding this comment.
these values seem pretty terrible. I'm not sure if they need to be exponentiated? is this because the model is linear regression? ultimately I think we should keep these examples in the realm of GAAP. I don't want any student to read this and think a linear regressor is a good idea.
|
|
||
| .. testoutput:: | ||
|
|
||
| 3.508 |
There was a problem hiding this comment.
in general, please build the doctest around the full parameter array. for this example in particular, wouldn't we at least want to look beyond the 1st ldf_ to demonstrate the difference between cumulative and incremental?
| 3.508 | ||
| 3.515 | ||
|
|
||
| Autoregressive features use prior development predictions as covariates. |
There was a problem hiding this comment.
i don't know if autoregressive is behaving correctly. it's also hard to evaluate based on this test whether the resulting estimator is reasonable. (per prior comment, I'd like to stay within GAAP) we may want to leave it to a future round of enhancements
|
|
||
| class CaseOutstanding(DevelopmentBase): | ||
| """A determinisic method based on outstanding case reserves. | ||
| """ Deterministic development from prior-lag case reserves. |
There was a problem hiding this comment.
similar to incrementaladditive, I expect we'll be doing some refactoring on this estimator for friedland. We can just get these docstrings in now and revise later.
| cl.MackChainladder().fit(tr) | ||
| model = cl.MackChainladder().fit(tr) | ||
| print(model.ibnr_.to_frame(origin_as_datetime=False).round(1)) | ||
| print(np.round(model.mack_std_err_.values[0, 0, :, -1], 1)) |
There was a problem hiding this comment.
any chance we can get these two vectors into a single dataframe?
| read the Mack standard error off the resulting Triangle. | ||
| ``predict`` re-applies the fitted age-to-age factors and sigma | ||
| estimates to a new triangle without refitting. A common use is | ||
| sensitivity testing: scale the reported losses by an adverse factor |
There was a problem hiding this comment.
seems like a linear relationship by definition. is there a reference to this in the mack paper?
Summary: Add Sphinx doctest examples for Barnett-Zehnwirth, GLM, IncrementalAdditive, DevelopmentML, Outstanding, and MackChainladder. Split from the larger #786/#792 work and intentionally excludes .github/workflows/sync-main-to-docs.yml. Refs #704
Note
Low Risk
Low risk: changes are almost entirely docstring/documentation updates (examples, wording, and minor typos), with no substantive algorithm or API behavior changes expected.
Overview
Adds doctest-style usage examples and more detailed narrative docs to key estimators:
BarnettZehnwirth,TweedieGLM,DevelopmentML,IncrementalAdditive,CaseOutstanding, andMackChainladder.Also tightens wording/parameter descriptions (e.g., response typos, clarifying
case_n_periods) and makes small doc formatting fixes (reST directives, newline at EOF) to support Sphinx doctest execution.Reviewed by Cursor Bugbot for commit 2f5ee76. Bugbot is set up for automated code reviews on this repo. Configure here.