OProfile and strace tools showed me the guilty: thousands of time() syscalls per second combined with a "slow" kernel high-resolution timer (ACPI PM timer). In these conditions every time() syscall wastes from 5 to 6 usecs and system cpu time is as high as user cpu time!
Fortunately, most of time() syscalls seem not to be needed (I hope so

My simple patch: http://pastie.org/704420