I’m the co-volunteer coordinator for NYC FIRST. Every year we are faced with a problem: we want to export the volunteer data including preferences for offseason events. The system provides an export feature but does not include a few fields we want. A few years ago, my friend Norm said “if only we could export those fields.” I’m a programmer; of course we can!
So I wrote him a program to do just this. It’s export-vol-data at Github. And fittingly, he “paid” me with free candy from the NYC FIRST office. Once a year we meet, Norm gives his credentials to the program and we wait. And wait. And wait. This year NYC FIRST had more events than ever before so it took a really long time. I wanted to tune it.
Getting test data
The problems with tuning have been:
- I have no control over when people volunteer for the event. It’s hard to performance test when the data set keeps changing.
- The time period when I have access to the event is not the time period that I have the most free time.
Norm solved these problems by creating a test event for me. I started over the summer, but then got accepted to speak at JavaOne and was really busy getting ready for that. Then I went back to it and someone deleted my test event. Norm solved that problem by creating a new event called “TEST EVENT FOR SOFTWARE DEVELOPMENT – DO NOT ENROLL OR DELETE, please. – FLL”. And one person did volunteer for that. But not a lot so it helped.
Performance tuning
I tried the following performance improvements based on our experience exporting in April 2017.
- SUCCESS: Run the program on the largest events first. (It’s feasible to manually export the data for small events. Plus those have largely people who also volunteered at a larger event.) This allows us to run for the events with the most business value first. It also allows us to abort the program at any time.
- SUCCESS: Skip events and roles with zero volunteers. For some reason, it takes a lot longer to load a page with no volunteers. So skipping this makes the program MUCH faster.
- SKIP: Add parallelization. I wound up not doing this because the program is so fast now.
- FAILED: Switch from Firefox driver to PhantomJS. I knew the site didn’t function with HtmlUnitDriver. I thought maybe it would work with PhantomJS – an in memory driver with better JavaScript support. Alas it didn’t.
- FAILED: Try to go directly to URLs with data. FIRST prevents this from working. You can’t simply simulate the REST calls externally.
- SUCCESS: Switch from Firefox driver to Chrome driver. This made a huge difference in both performance and stability. The program would crash periodically in Firefox. I was never able to figure out why. I have retry/resume logic, but having to manually click “continue” makes it slower.
- UNKNOWN: I added support for Headless Chrome in the program. It doesn’t seem noticeably faster though. And it is fun for Norm and I to watch the program “click” through the site. So I left it as an option, but not the default.
Results
Like any good programming exercise, some things worked and some didn’t. The program is an order of magnitude faster now that at the start though so I declare this a success!