Estimation Enhancements #917

dhensle · 2024-12-17T01:21:45Z

Estimation work as part of ActivitySim's Phase 9B development effort.

Implement multiprocessing for estimation mode
Implement destination choice sampling in estimation mode
Change the formatting and data written to EDBs
Update Larch integration to accept new file formats
Functionality to quickly test different specifications
Improved larch reporting on estimated models
Adding "predict" functionality with estimated models in larch
Unit testing for the above features
Updated documentation

Estimation enhancements pt1

* pydantic for estimation settings * allow df as type in config * fix table_info * repair for Pydantic * df is attribute

* pydantic for estimation settings * allow df as type in config * fix table_info * auto ownership * repair for pydantic * update for ruff * updated for simple models * repair for Pydantic * simple simulate and location choice * df is attribute * scheduling * stop freq * test locations * cdap * nonmand_and_joint_tour_dest_choice * nonmand_tour_freq * fix ci to stop using mamba * test updates * use larch6 from pip * use numba for stop freq * fix for pandas 1.5 * fix stop freq test for numba * Sharrow Cache Dir Setting (ActivitySim#893) * setting necessary filesystem changes from settings file * set for multiprocessing * repair github actions * github action updates (ActivitySim#903) * script to make data * unified script for making data * remove older * bug * doc note * load from parquet if available * add original alt ids to EDB output when using compact * fix MP race * script arg to skip to EDB * clean up CDAP and blacken * refactor model_estimation_table_types change to estimation_table_types, to avoid pydantic namespace clash * repair drop_dupes * blacken * location choice with compact * choice_def for compact * spec changes for simple-simulate * re-estimation demo for auto ownership * clean up status messages * change name to stop pydantic warnings * edit configs * default estimation sample size is same as regular sample size * allow location alts not in cv format * dummy zones for location choice * update scheduling model estimation * various cleanup * stop freq * tidy build script * update 02 school location for larger example * update notebook 04 * editable model re-estimation for location choice * fix test names * update notebooks * cdap print filenames as loading * notebook 07 * tests thru 07 * notebooks 08 09 * build the data first * runnable script * change larch version dependency * keep pandas<2 * notebooks 10 11 * notebook 12 * remove odd print * add matplotlib * notebook 13 14 * test all the notebooks * add xlsxwriter to tests * notebook 15 * CDAP revise model spec demo * notebook 16 * notebook 17 * longer timeout * notebook 18 * notebook 19 * notebook 20 * smaller notebook 15 * configurable est mode setup * notebook 21 * notebook 22 * config sample size in GA * notebook 23 * updates for larch and graphviz * change default to compact * compare model 03 * test updates * rename test targets * repair_av_zq * move doctor up * add another repair * oops --------- Co-authored-by: David Hensle <51132108+dhensle@users.noreply.github.com>

…ents

asiripanich · 2025-05-10T11:50:59Z

Hi @dhensle, do you have an ETA for when this PR will be completed? Has it been decided which version will include this enhancement? I have been testing it with our Victoria ActivitySim implementation, so I have a vested interest in your PR. Thank you! :)

dhensle · 2025-05-14T00:13:49Z

Hi @asiripanich, thank you for your interest and testing! We are just putting the finishing touches on this and hope to have it pulled in by the end of the month.

Any feedback from you on the new features? Now's the time to get any changes you might want put in!

* handle dev versions of Larch * test stability * pin multimethod < 2.0 * add availability_expression * starting est docs * Resolve package version conflicts (ActivitySim#923) * limit multimethod version to 2.0 and earlier * add multimethod version to other settings * [makedocs] update installer download link * [makedocs] update branch docs * GitHub Actions updates (ActivitySim#926) * use libmamba solver * add permissions [makedocs] * add write permission for dev docs [makedocs] * conda-solver: classic * trace proto tables if available, otherwise synthetic population (ActivitySim#901) Co-authored-by: Jeffrey Newman <jeff@driftless.xyz> * release instructions (ActivitySim#927) * use libmamba solver * add permissions [makedocs] * add write permission for dev docs [makedocs] * conda-solver: classic * include workflow dispatch option for tests * update release instructions * add installer build to instructions * Pin mamba for now, per conda-incubator/setup-miniconda#392 * conda-remove-defaults * when no unavailability parameters are included * some general estimation docs * Use pandas 2 for docbuild environment (ActivitySim#928) * fix link * allow failure to import larch * workflow * blacken * try some pins * speed up docbuild * use pandas 2 for docs * oops wrong file * restore foundation * Update HOW_TO_RELEASE.md * refactor(shadow_pricing.py): remove a duplicated `default_segment_to_name_dict` (ActivitySim#930) * fix typo * fixing disaggregate accessibility bug in zone sampler * Revert "fixing disaggregate accessibility bug in zone sampler" This reverts commit be5d093. * notes on size terms * clean up docbuild * fix version check * add some doc * tidy * estimation docs * more on alternative avail * model evaluation * add doc on component_model * documentation enhancements * larch6 is now larch>6 * branch docs on workflow_dispatch * missing doc section on model respec --------- Co-authored-by: Yue Shuai <48269801+yueshuaing@users.noreply.github.com> Co-authored-by: David Hensle <51132108+dhensle@users.noreply.github.com> Co-authored-by: amarin <17020181+asiripanich@users.noreply.github.com> Co-authored-by: Ali Etezady <58451076+aletzdy@users.noreply.github.com> Co-authored-by: Sijia Wang <wangsijia0628@gmail.com>

…ents

asiripanich · 2025-05-21T00:58:11Z

Hi @asiripanich, thank you for your interest and testing! We are just putting the finishing touches on this and hope to have it pulled in by the end of the month.

Any feedback from you on the new features? Now's the time to get any changes you might want put in!

I have been able to get this enhanced estimation mode to work with our travel survey. The parquet EBDs are definitely a huge improvement over the CSVs. Thanks for the work!

A few comments:

I remember the CDAP estimation function wasn't working because my ActivitySim outputs were in Parquet format. I had to add a line to reset the index when reading in Parquet inputs; this is not an issue if you are reading in CSV files.
I feel like the specification files could be more compact if the coefficient values weren't separated into another file just to specify their constraint values. I understand this would require some work and planning, but how about adding a symbol (e.g., an asterisk *) in front of the coefficient values in the specification file if you want to fix their value? Creating a new label column and coefficient names feels unnecessary and just increases the number of input files that one has to manage, which means a higher chance of making errors. If the label field is required in the specification file, I think it could be automatically generated from the description field by converting the description to underscore case.

Example:

Description	Expression	M	N
Full-time worker alternative-specific constants	ptype == 1	*0.885080091	0.531583624
Part-time worker alternative-specific constants	ptype == 2	*-0.920808727	1.117988879
University student alternative-specific constants	ptype == 3	1.898468936	-0.380144113

dhensle and others added 30 commits August 16, 2024 17:33

multiprocess initial commit

44e3c21

blacken

9b29350

parquet format for EDBs

3434c95

adding pkl, fixing edb concat and write

914b9ca

fixing double naming of coefficient files

d2e181f

blacken

c138f0f

fixing missing cdap coefficients file, write pickle function

6d35f9f

combact edb writing, index duplication, parquet datatypes

27c4ce4

sorting dest choice bundles

cd3d07e

adding coalesce edbs as its own step

8a1fa3c

CI testing initial commit

e8c03e6

Merge pull request #1 from dhensle/estimation_enhancements

fe625e2

Estimation enhancements pt1

infer.py CI testing

8d80e2e

estimation sampling for non-mandatory and joint tours

1459e48

adding survey choice to choices_df in interaction_sample

3fd7851

adding option to delete the mp edb subdirs

23ba662

changes supporting sandag abm3 estimation mode

0a1bd5c

running test sandag example through trip dest sample

8a4b281

Estimation Pydantic (#2)

6a50abb

* pydantic for estimation settings * allow df as type in config * fix table_info * repair for Pydantic * df is attribute

Estimation settings pydantic update

45ee4e8

new compact formatting

4af3fa9

handling multiple columns for parquet write

36dfb45

dropping duplicate columns

e4eb045

actually removing duplicate columns

b2972cc

dfs with correct indexes and correct mp sorting

8d4dd37

ignore index on sort for mp coalesce edbs

1fb41a8

updating estimation checks to allow for non-zero household_sample_size

87b414f

Removing estimation.yaml settings that are no longer needed

aa874f6

Merge remote-tracking branch 'upstream/main' into estimation_enhancem…

a5e137b

…ents

dhensle and others added 6 commits December 13, 2024 16:42

fixing unit tests, setting parquet edb default

af7e67e

one more missed estimation.yaml

99822ca

using df.items for pandas 2 compatibility

1777637

tidy doc

420ed8e

updating edb file name for NMTF

44bf037

updating numba and pandas in the conda env files

8bccf2f

jpn-- and others added 2 commits May 14, 2025 17:29

Merge remote-tracking branch 'upstream/main' into estimation_enhancem…

04c43a3

…ents

dhensle mentioned this pull request May 15, 2025

Location and modechoice logsum written out in estimation mode #898

Closed

dhensle marked this pull request as ready for review May 15, 2025 18:31

jpn-- requested a review from i-am-sijia May 15, 2025 22:42

dhensle and others added 3 commits May 15, 2025 16:24

handling missing data or availability conditions

ed3ee7f

add docs on locking size terms

4ed400e

Merge branch 'main' into estimation_enhancements

64cd4b7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Estimation Enhancements #917

Estimation Enhancements #917

dhensle commented Dec 17, 2024 •

edited

Loading

asiripanich commented May 10, 2025

dhensle commented May 14, 2025

asiripanich commented May 21, 2025 •

edited

Loading

Estimation Enhancements #917

Are you sure you want to change the base?

Estimation Enhancements #917

Conversation

dhensle commented Dec 17, 2024 • edited Loading

asiripanich commented May 10, 2025

dhensle commented May 14, 2025

asiripanich commented May 21, 2025 • edited Loading

dhensle commented Dec 17, 2024 •

edited

Loading

asiripanich commented May 21, 2025 •

edited

Loading