Friday, May 05, 2017

Good ideas

One of the talks concentrated on how we should try to organize analyses, and not keep re-inventing the wheel. She showed lists of related analyses, and urged us to make sure that all code was in the repository and not private sandboxes. And that students should learn how to have their code reviewed.

All noble goals. (She has led a working group, is one of the young big names in the experiment, and is a pleasure to work with.)

However, several people who've trained more students than she pointed out a few problems. Part of the apprenticeship is to do some of the exercises yourself, and there's no better way to learn how to minimize a log likelihood function on a complicated data set than to do it yourself. And, then, when you want to try a full-blown analysis, you'll tend to use what you developed and understand the best.

Another problem is validation--are you sure you don't have some subtle bugs in your code or your procedures? They do a lot of cross-checking, but often there's no substitute for an independent analysis. Sam Ting tries to make sure that two independent analysis groups don't communicate with each other. He allegedly is the only one that sees both.

On the other hand, my analyses over the years would have been improved by some better coding practices and review. I kept a record of all my coding errors over a year, and found that the plurality were cut-and-paste errors. She's quite right about that.

And private code from somebody's sandbox is hard to maintain, or re-use when the student has graduated and somebody else wants to process a couple more years' data.

No comments: