Hacker News new | comments | show | ask | jobs | submitlogin
How to sleep a million years (2013) (idea.popcount.org)
208 points by bemmu a year ago | hide | past | web | 21 comments | favorite

An example usage from the article:

    $ time sleep 12
    real    0m12.003s

    $ time ./fluxcapacitor -- sleep 12
    real    0m0.057s
In the author's words:

    Fluxcapacitor is really good at speeding
    up client/server or protocol tests.

    Actually fluxcapacitor was originally
    created to speed up the sockjs-protocol
    test suite. It wasn't possible to mock
    up a time library - we needed to run
    the tests against any sockjs http server,
    whether it's in erlang, node.js or
    python. It turns out the only way to run
    timeout-related tests in a reasonable
    time is to use fluxcapacitor.

    You might ask how fluxcapacitor works.
    In short - it uses ptrace to catch
    syscalls like clock_gettime or
    gettimeofday and overwrite the kernel
    response with a fake time.
    Additionally it short-circuits
    syscalls that can block for a timeout
    like select or poll. For technical
    details see the README.
It seems that this library might be useful to anyone needing to test communication protocols by helping tickle those edge cases that happen with timing.

Yeah, this is amazing. It allows for a whole different aspect of troubleshooting (or maybe even fuzzing?). What happens if two different pieces of software have to communicate but can't agree on the time? Is there an annoying caching bug that normally only gets triggered every few days?

(author here) That's the intention! Protocol development, protocol testing. Think - leap seconds. While we don't support having different timer for each process yet, this is easy to imagine.

Also - this is somewhat similar to golang race detector in spirit - looking for explicit synchronizations.

The general idea is that you should be able to run multiple processes within the thing, and that they should be able to just talk to each other freely.

There are a number of "protocol simulators" out there, but I always found them academic. I wanted to build a tool that will allow me run and test and do code coverage of my code, without having to wait forever for timeouts and timers to kick in to test rare branches.

Try using it for consensus debugging. Like wrapping a few zookeeper or console processes with FC. Time is a fun one to debug, slips are not fun to debug in prod even though one thinks ntp is always reliable.

Doesn't 0.05s still seem like quite a long runtime for something like this? I wonder what it is doing under the hood that takes so long.

For start, it seems like they hijack a bunch of clock calls with LD_PRELOAD and then redirect vdso ones to syscalls just so they can interpose with ptrace. Why not do the manipulation in the PRELOAD?

I guess using the debugger API makes it easier to coordinate a bunch of processes, but you could probably do the same thing with some sort of IPC mechanism between PRELOADED targets. In the common case (single program running under FC, no IPC is needed.

It's unclear to me why they force any sleep duration at all, e.g., 10ms: https://github.com/majek/fluxcapacitor/blob/master/src/main....

It seems like there is room to optimize here, if one were really inspired :-).

> it uses ptrace to catch syscalls like clock_gettime or gettimeofday and overwrite the kernel response with a fake time

I would love to hear more about how this works, or get pointers to where one can learn about doing that kind of thing.

I'd start with this tutorial on how to write a debugger using ptrace, it's not too difficult to go to spoofing syscalls from there: https://eli.thegreenplace.net/2011/01/23/how-debuggers-work-...

This is cool but it should be used with caution. The time your thread spends sleeping may be reduced to nothing, but the time it spends working won't be. This could lead to all kinds of scenarios (timeouts, deadlocks, etc) that are only theoretically possible in realtime. It may or may not be useful to encounter these cases.

Edit: also worth mentioning that a good way of doing this kind of testing when you have full control of the code is by abstracting access to time operations to go through an interface and injecting mock/fake time objects.

But you don't always have access to the code, so I think this tool is very useful.

Sounds like a good way to discover those timing-dependent scenarios before they happen in production ;)

One problem of creating your own time abstraction class is not only that you need full control of the code base and introduce everyone new to project to always use this class. Some time-based operations can be quite hard to mock, such as wait for thread synchronization primitive or wait for IO with timeout. However those are most often usually used in the domain where you don't want to mock time because, as you say, there the wall-time and the processing-time is more often dependent on each other and you probably want to test everything exactly as it would run in production.

> You might ask how fluxcapacitor works. In short - it uses ptrace to catch syscalls like clock_gettime or gettimeofday and overwrite the kernel response with a fake time.

In Linux, ptrace is not re-entrant. This means you can't use fluxcapacitor on itself, or any other program that uses ptrace, e.g. a debugger.

I tried patching for macOS then quickly gave up.

Here's a patch to get you started: https://gist.github.com/anonymous/bab9c0fcafd9a03acfaa89741f...

I don't know whether it even makes sense to try porting this to macOS, but good luck.

You may want to take a look at task_for_pid and other Mach APIs–ptrace is crippled on macOS.

Finally, sleepsort is practical!

Sleepsort is fastest sorting algo in some cases already. It complexity: O(max(input)+n). It bad algo for CPU, of course, but should be OK for massive parallel architecture or a FPGA.

> Unfortunately my operating system (bash?) can't express dates after the year 2550

Can anyone explain what happened here?

Very good question - nothing near that date shows up on this very exhaustive list: http://skeena.net/kb/big%20list%20of%20critical%20dates.html

This may have just made my production cluster WAY faster! ;)

Hey, there's a great song about this! https://www.youtube.com/watch?v=uSgQiltRqNI

Very cool. Unfortunately, it doesn't seem like it captures filesystem effects like 'mtime' or other ways the real time can leak out implicitly.

This is fantastic, I was working on some tests for a scheduler, I can speed up running these tests using fluxcapacitor!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact