Wednesday, February 09, 2005

Day 5 of the evening Shifts

We inherited a store. Unfortunately we are plagued with DAQ errors, and we just got the bird-chirp that signaled a CMX trip: probably the same wedges that have been giving trouble all this week. The expert hinted that it would probably be OK to turn those off and run without them. Did I mention that our trigger inhibit turns on whenever one of our detectors trips? It makes utter scramble of analysis to have different detectors on and off during the same run.

One of the more startling things about standing shifts of this type, as opposed to the offline shifts I had been doing, is that you meet more people that you know--and discover the amazing amount of grey hair that seems to have magically appeared. Bob is our Operations Manager (seated in the far curved table): I don't remember that ring of grey.

We keep getting that ugly "Chung!" with the synthetic voice interrupting the run. The same process also pops up an alarm window on the Ace's monitor: the same garish yellow color that you find on illuminated signs in the cruddier sections of town.

Chasing silicon problems: the Monitoring Ace's instructions say that when one of the silicon monitoring displays turns pink, he should ask the CO to correlate this with a status map. Nothing shows, none of the histograms look all that different from their neighbors. I'm puzzled, and a silicon expert has been paged. We're having crate readout problems, so we're not taking data.

Think of it like a hospital. Doctors arrive, poke the patient here and there, confer gravely in a corner for a while, poke the patient some more, then go off for mysterious activity. And maybe their diagnosis fixes things, and maybe not. The hardware event builder has been coughing up a lung all evening.

An event builder is hardware and software that coordinates the collection of data from all the different readout systems and organizes it into the single block of bits we call an event. Everything had better come from the same beam crossing: it makes no sense to have muons from one interaction and calorimetry from another.

Switched the consumers to a different partition, trying to save the SVMon info. No, I don't know why we had to switch; that's Ace magic as they try to get the readout to work.

People watching: the Monitoring Ace was talking to a young Italian lady. When she was turned sideways to him, she kept her eyes on his face. When she turned to face him, she looked down, up, at him, away, down . ..

Snow coming! About an inch, they say, starting at about 11. There's a big red blotch on the radar map that's been heading north-east, so they may be right.

I'm trying to clean up the CO's environment a little bit, trying to make it easier to find the things we're supposed to do. I can't get at the instructions, though--and the instructions need a little work.

Somehow DAQErrMon lost connection with the world, and our main control system got worried. Restarted everything. Again. This is kind of annoying, since I'm supposed to be doing system checks and it takes time to accumulate enough statistics to spot subtle problems.

More little problems: an silicon channel that is going out of tolerance, the plug electron shower maximum detector has more hits above than below (??), and some events turned up without all their data banks!

The silicon detector is made of giant chips of silicon wafer--the same sort of thing computer chips are made of--subdivided into thin strips. When a charged particle goes through a strip it leaves a trail of ions behind, and before they recombine a current flows from the positive to the negative side, and we can detect the result, reading it out rather like a region of memory in a computer. The detector is delicate, needs careful alignment, and needs a great deal of cooling! If it is hit with too many charged particles, the minor damage caused by a single one accumulates to the point where the boundary between the two doped regions becomes blurred, just as it would if you got the thing too hot. And if the particles hit when the voltage is on, the current flow causes it to heat up and blur the boundary even more. The difference between the two regions (you did look at the URL, didn't you?) is that each region contains small amounts of a contaminant (the dopant) which gives the regions different properties. But atoms in a solid can move around, and defects appear, and the boundary between the two regions lose its character--and that region of the chip will fail.

Ack. A man from the Main Control Room came by to give us the bad news in person. One of the quadrapole power supplies for the Main Injector blew up. That's the second such power supply in a month; it isn't an off-the-shelf item, and they need every one. The one that blew up last time wasn't on a quadrapole (and they could run without it), but it blew its door 90 feet away. I remember the old hazard training films. Radiation areas were one of the hazardous zones, but so was the region around the ring power supplies. Two explosions in a month is probably going to bring on a safety review, which may add still more time before we get beam again. I gather the best case is that we might get beam in another week.

We'll see what develops. If there's no beam, there may not be any need for a CO, and the powers-that-be sometimes let the CO go home.

Little things are constantly drifting in an out of tolerance. It isn't that things are built carelessly, but that they were built for use near the bleeding edge, and some of the most critical pieces of apparatus are sitting in a radiation environment with (at the moment) high humidity. And everything is being run 24/7. A lot of "wonderful" "cheap" hardware--commodity PC's come to mind--just isn't made to run 24 hours a day with dust and vibration and the other hazards of life.

Of course some things have been made--let's not say sloppily, but rather with an eye to the bench. Years ago we bought DEC Alpha computer chips and tried to use them in specialized processor boards as part of our readout system. One of the little gotchas was that the timing and level shapes had to be just so for the chip to work reliably. Apparently we didn't throw enough engineering talent into designing every millimeter of the boards, and it took quite a while to debug them. Signals supplied by our bleeding edge electronics didn't always match the specs demanded by the chip.

More and more hardware is being made to high standards. Experiments are lasting longer, tolerances are tighter, and beam time is more expensive. Some of the stuff I've seen pictures of for CMS looks like works of art, and was made robust and easy to pluck in and out of position. It is a bit embarrassing to remember some of our old scintillator boxes--if we didn't get the higher ones tightened just so, we had to use a sledgehammer to force in the bottom ones. (No, we didn't damage the scintillator, or the electronics.) And plywood played an important role in the construction of some of the old drift chambers.

During a lull, our Monitoring Ace was showing the Ace how to use online ticket sites to get lower air fairs back to Korea. Something wasn't working quite right there either. (Maybe dumbness in the site that demanded that you use Internet Explorer--I've seen that a lot.)

Now the weather report says 1-3 inches. Not much headed for us on the weather map right now, though.

Done. Now post this and go to sleep.

No comments: