Comments Page - Caches: LRU vs. Random

« Back Caches: LRU vs. Randomdanluu.comSubmitted by gslin 6 days ago

bob1029 4 days ago
> But what if we take two random choices (2-random) and just use LRU between those two choices?
> Also, we can see that pseudo 3-random is substantially better than pseudo 2-random, which indicates that k-random is probably an improvement over 2-random for the k. Some k-random policy might be an improvement over DIP.
This sounds very similar to tournament selection schemes in evolutionary algorithms. You can control the amount of selective pressure by adjusting the tournament size.
I think the biggest advantage here is performance. A 1v1 tournament is extremely cheap to run. You don't need to maintain a total global ordering of anything.
- nielsole 4 days ago
  It's also similar to load balancing. Least requests vs. best of two. Benefit being that you never serve the most loaded backend. I guess the feared failure mode of least requests and LRU is similar. Picking the obvious choice might be the worst choice in certain scenarios (fast failures and cache churning respectively)
- smusamashah 4 days ago
  There was an article about this phenomenon but with interactive visualisations showing packets moving and load balancing. And there was an optimal number of random, gong higher wasn't improving things.
pvillano 4 days ago
The idea of using randomness to extend cliffs really tickles my brain.
Consider repeatedly looping through n+1 objects when only n fit in cache. In that case LRU misses/evicts on every lookup! Your cache is useless and performance falls of a cliff! 2-random turns that performance cliff into a gentle slope with a long tail(?)
I bet this effect happens when people try to be smart and loop through n items, but have too much additional data to fit in registers.
- phamilton 4 days ago
  This feels similar to when I heard they use bubble sort in game development.
  Bubble sort seems pretty terrible, until you realize that it's interruptible. The set is always a little more sorted than before. So if you have realtime requirements and best-effort sorting, you can sort things between renders and live with the possibility of two things relative close to each other appearing a little glitched for a frame.
  8n4vidtmkvmk 3 days ago
  I thought it was insertion sort? Works well on nearly sorted lists.
  phamilton 3 days ago
  That's a different problem. To quickly sort a nearly sorted list, we can use insertion sort. However the goal is to make progress with as little as one iteration.
  One iteration of insertion sort will place one additional element in its correct place, but it leaves the unsorted portion basically untouched.
  One iteration of bubble sort will place one additional element into the sorted section and along the way do small swaps/corrections. The % of data in the correct location is the same, but the overall "sortedness" of the data is much better.
  8n4vidtmkvmk 3 days ago
  That's interesting. I never considered this before. I came across this years ago and settled on insertion sort the first time I tried rendering a waterfall (translucency!). Will have to remember bubble sort for next time
  chipsa 3 days ago
  Quick sort usually uses something like insertion sort when the number of items is low, because the constants are better at low n, even if O(n) isn’t as good.
hinkley 4 days ago
I have never been able to wrap my head around why 2 random works better in load balancing than leastconn. At least in caching it makes sense why it would work better than another heuristic.
- yuliyp 4 days ago
  There are a few reasons:
  1. Random is the one algorithm that can't be fooled. So even if there's something against number of connections as a load metric, not using that metric alone dampens the problems.
  2. There is a lag between selection and actually incrementing the load metric for the next request, meaning that just using the load metric alone is prone to oscillation
  3. A machine that's broken (immediately errors all requests) can capture almost all requests, while 2-random means its damage is limited to 2x its weight fraction
  4. For requests which are a mix of CPU and IO work, reducing convoying (i.e. many requests in similar phases) is good for reducing CPU scheduling delay. You want some requests to be in CPU-heavy phases while others are in IO-heavy phases; not bunched.
  hinkley 4 days ago
  I’m fine with the random part. What I don’t get is why 2 works just as well as four, or square root of n. It seems like 3 should do much, much better and it doesn’t.
  It’s one of those things I put in the “unreasonably effective” category.
  fwlr 21 hours ago
  In code the semantic difference is pretty small between “select one at random” and “select two at random and perform a trivial comparison” - roughly about the same difference as to “select three at random and perform two trivial comparisons”. That is, they are all just specific instances of the “best-of-x” algorithm: “select x at random and perform x-1 comparisons”. Natural to wonder why going from “best-of-1” to “best-of-2” makes such a big difference, but going from “best-of-2” to “best-of-3” doesn’t.
  In complexity analysis however it is the presence or absence of “comparisons” that makes all the difference. “Best-of-1” does not have comparisons, while “best-of-2”, “best-of-3”, etc., do have comparisons. There’s a weaker “selections” class, and a more powerful “selections+comparisons” class. Doing more comparisons might move you around within the internal rankings of the “selections+comparisons” class but the differences within the class are small compared to the differences between the classes.
  An alternative, less rigorous intuition: behind door number 1 is a Lamborghini, behind door number 2 is a Toyota, and behind door number 3 is cancer. Upgrading to “best of 2” ensures you will never get cancer, while upgrading again to “best of 3” merely gets you a sweeter ride.
  Straw 4 days ago
  It's because going from 1 to 2 changes the expected worst case load from an asymptotic log to an asymptotic log log, and further increases just change a constant.
  See https://en.wikipedia.org/wiki/Balls_into_bins_problem
  nielsole 4 days ago
  I wonder if someone tried a probabilistic "best of 1.5" or similar and if two is just a relatively high number.
  hinkley 4 days ago
  If I had to guess it’s related to e. In which case maybe choosing 2 30% of the time and 3 70% of the time is a better outcome.
  adgjlsfhk1 4 days ago
  it's not. you can get good behavior by choosing 1 90% of the time and 2 10% of the time.
- contravariant 4 days ago
  Technically it doesn't, it's just really hard to implement leastconn correctly.
  If you had perfect information and could just pick whichever was provably lowest that'd would probably work. However keeping that information up to date also takes effort. And if your information is outdated it's easy to overload a server that you think doesn't have much to do or underload one that's long since finished with its tasks. Picking between 2 random servers introduces some randomness without allowing the spread to become huge.
  hinkley 4 days ago
  When the cost of different requests varies widely it’s difficult to get it right. When we rolled out docker I saw a regression in p95 time. I countered this by doubling our instance size and halving the count, which made the number of processes per machine slightly more instead of way less than the number of machines. I reasoned that the local load balancing would be a bit fairer and that proved out in the results.
  contravariant 4 days ago
  I'm not 100% sure if it's just load balancing. It would depend on the details of the setup but that situation also allows you to throw more resources at each request.
  I mean obviously there is a point where splitting up the instances doesn't help because you're just leaving more instances completely idle, or with too little resources to be helpful.
- beoberha 4 days ago
  Try giving Marc Brooker’s blog on this a read: https://brooker.co.za/blog/2012/01/17/two-random.html
  It is only better than leastconn when you have stale information to base your decision on. If you have perfect, live information, best will always be optimal.
- kgeist 4 days ago
  By the time you decide to route to a particular node, conditions on that node might have already changed. So, from what I understand, there can be worst-case scenarios in usage patterns where the same nodes keep getting stressed due to repeatedly stale data in the load balancer. Randomization helps ensure the load is spread out more uniformly.