Channels
ArXiv has an endorsement system for a reason. I would only offer endorsement to whom I have direct academic collaboration or mentorship with, since I'm putting my own academic reputation on the stake. This is also the standard of almost any serious academic researcher I am aware of. Now ArXiv is making effort to crack down AI slop and banning accounts uploading low-quality research papers, which is a great initiative. By definition of an "endorsement", I wish ArXiv could backtrack and at least issue warnings to their endorsers, and if this happens multiple times (let's say three), people giving out careless endorsement should also face consequences. submitted by /u/AffectionateLife5693 [link] [Kommentare]
A colleague of mine was recently discussing a connectivity monitoring system he is working on with me. It’s nothing fancy, just sending ICMP Echo Requests to a couple of different servers, and monitoring latency and dropped packet averages over 1-minute, 5-minute, and 15-minute periods. Up came the topic of how this data should be stored, the natural thought was a 512 entry ring buffer, containing entries like the following: 12 KiB. Pretty wasteful, right? We can certainly do better. Do we need to keep fields for both sent and received? What we’re really interested is the latency. We need to know when a packet was sent, only up until we know when it was received, at that point, the data we want to keep is received - sent, so why don’t we make it a tagged union? Not bad, we’ve shaved off an entire page. We can still do better. Nanosecond precision? In our case, ping times are measured in the tens or hundreds, or even thousands of milliseconds. We don’t need to keep all of those extra bits around. If we change the unit from nanosecond to 100 microsecond increments (0.1ms), then 43-bits is sufficient for us to keep track of pings for up to 20-years1. 20-years is a bit excessive still, but it doesn’t hurt to be at least a little bit future proof. And received? 8-bit for a true/false value seems altogether too much. The answer: bitfields. Wait, what? Why haven’t we saved any space? The answer is struct padding. The layout of ping_timestamp_2 looks like this: Where the padding byte at the end is to ensure alignment requirements. ping_timestamp_3 on the other end, looks like this: So our optimization there didn’t actually save any space. We’re wasting 36-bits of padding. Is there any way we can somehow do better? We keep track of the source address due to frequent changes while our product is in operation (on a mobile data network). When the address changes, we also reset the sequence number for reasons that aren’t relevant to the current topic. We have seen, in the past, packets with differing source addreses but identical sequence numbers due to be processed by our application at the same time (the joys of asynchronous programming), so the source address serves to disambiguate these changes. But there’s another way to disambiguate. An ICMP echo request has a 16-bit long identifier field to allow applications to identify which echo request packets were sent by them. Its value is completely arbitray. On Linux iputils ping sets it to getpid() & 0xFFFF; on OpenBSD a random number is used instead. Although it’s 16-bits long, we don’t actually need to use the full 16-bits. There’s 4 free bits left in the first 8-bytes of our ping_timestamp_3. Our thought was to use a rolling 4-bit counter, that is increased whenever our source address changes (this is monitored elsewhere in the application), allowing us to uniquely identify2 which source address the packet came from. Much better. A whole 8-kilobytes of savings, and down to a single page of data. You may have noticed that I have changed the order of the fields slightly. This is to line seq_no up on a 16-bit boundary, so that loading it is a single ldrh instruction rather than require a shift. Similarly, reading from elapsed_or_sent_ts only requires a mask. In the end, this was a completely pointless exercise. Our application isn’t remotely memory constrained. I realized there’s a way to “optimize” this slightly further. By switching the order of the received/counter fields, accessing the received bit only requires a shift instruction rather than a shift and a mask: There’s a slight “issue”3 with the above code: received is now much cheaper to access, at the expense of counter needing to mask out the received bit. But we can fix this! We only ever read counter when received is true, i.e. 1. If received were zero, and we could tell the compiler to assume that it were zero, then no mask would be necessary. The solution? Flip the meaning of the received bit. Now, if the read of counter only ever happens inside of a conditional that checks if not_received is zero, then the compiler is able to completely elide the mask. The timestamps are taken from the linux kernel’s monotonic clock, which measures time elapsed since boot. If we were measuring from the Unix epoch, then we would need 51-bits.↩︎ As long the IP address doesn’t change more than 16-times in the period we’re monitoring, which is not something we have ever seen.↩︎ I’m using that term loosely, we haven’t even benchmarked anything.↩︎