Looked at some trouble on two of the newer 1&1 "rootserver XXL (cache)" boxen (3.06GHz P4 with Apollo Pro266 chipset) recently, which both seemed to have major disk I/O problems.
One of them, running a 2.4 series kernel, kept throwing messages like:
hda: dma_timer_expiry: dma status == 0x24
hda: DMA interrupt recovery
hda: lost interrupt
The other, with a recent 2.6 kernel, had no such warnings, but it's load went through the roof under heavy disk i/o, the system grew extremely sluggish, and sooner or later, the OOM killer would start to shoot out processes, although the memory usage was negligible, until the system died with a kernel panic. (Perhaps there is a problem with the OOM killer when a swap device is not available? Need to do some debugging here.)
Several runs at trying to optimize disk performace didn't produce any improvement, and no way of system tuning solved the problem.
Finally, on a whim, I tried booting the system with the "noapic" kernel option set, and suddenly all problems disappeared on both systems.
Seems something is wrong with the VIA Apollo IDE controller and Linux' interrupt routing via IO-APIC. Don't understand why I haven't seen anyone else having similar trouble, I assume there should be some more Linux setups on these 1&1 systems...
One of them, running a 2.4 series kernel, kept throwing messages like:
hda: dma_timer_expiry: dma status == 0x24
hda: DMA interrupt recovery
hda: lost interrupt
The other, with a recent 2.6 kernel, had no such warnings, but it's load went through the roof under heavy disk i/o, the system grew extremely sluggish, and sooner or later, the OOM killer would start to shoot out processes, although the memory usage was negligible, until the system died with a kernel panic. (Perhaps there is a problem with the OOM killer when a swap device is not available? Need to do some debugging here.)
Several runs at trying to optimize disk performace didn't produce any improvement, and no way of system tuning solved the problem.
Finally, on a whim, I tried booting the system with the "noapic" kernel option set, and suddenly all problems disappeared on both systems.
Seems something is wrong with the VIA Apollo IDE controller and Linux' interrupt routing via IO-APIC. Don't understand why I haven't seen anyone else having similar trouble, I assume there should be some more Linux setups on these 1&1 systems...