scheduling off the waiting list

Mon Dec 28 22:20:15 CET 2009

Hi Poul and all,

Nils Goroll wrote:
>>> What I would really like to see is that the waitinglist gets rescheduled when 
>>> the busy object is actually becomes in the cache. I am suspecting this has to do 
>>> with calling HSH_Deref(&Parent) in HSH_Unbusy and/or the fact that HSH_Drop 
>>> calls both Unbusy and Deref, but I don't understand this yet.
>> That is how it is supposed to work, and I belive, how it works.
> 
> Good. Then I am either messing up this behavior with my config, or I've hit a 
> corner case. I need to have a break now, but I will definitely get back to you 
> on this when I have gained new insights.

I'm trying to sort my thoughts on this in public:

- A fundamental issue seems to be that the waitinglist is attached to the object 
head, and if no proper match is found in the cache, we wait for whatever is to 
come, even if this is not what we are going to need.

On the other hand, while the object is busy, not all selection criteria will be 
known a priori (in particular not the Vary header), so this design might just be 
as good as it can be.

- The only way a session can get onto the waiting list is when there is a busy 
object being waited for

- but hsh_rush is not only called when an object gets unbusied (HSH_Unbusy), but 
also whenever is it dereferenced (HSH_Deref)

Call trees are:

cnt_fetch -> HSH_Unbusy->hsh_rush
              ^    |
	    /     |
     HSH_Drop   (parent)
	    \     |
              V    V
	     HSH_Deref->hsh_rush

HSH_Deref is called from cache_expire EXP_NukeOne and exp_timer, as well as 
cache_center cnt_hit (if not delivering), cnt_lookup (if it's a pass) and 
cnt_deliver.

HSH_Drop is called from various functions in cache_center.

So basically there are two different scenarios when hsh_rush is called.

* Trigger delivery of an object which just got unbusied
* and trigger delivery of more sessions which did not fire in the first round

The point is that when many sessions are waiting on a busy object, there are 
many reasons for those to be rescheduled even if the object they are waiting for 
has not yet become available - in particular as many different objects may live 
under the object head.

I think we need to change that.

The only reason why we need to call hsh_rush outside cnt_fetch->HSH_Unbusy case 
is that we have the rush_exponent and limit the number of sessions to be 
rescheduled with each hsh_rush, so one option would be to do away with the 
rush_exponent and the the waiters loose all at once. This would also solve the 
case where, once a session get its thread, the cached content has become 
invalidated so it would itself fetch again.

I am not sure about an alternative solution.

When we unbusy an object, we have a good chance that it's actually worth 
rescheduling waiting sessions, but for the other cases, we can't easily tell if 
the session would wait again or not.

What if we noted in the object head the number of busy objects so hsh_rush would 
only actually schedule sessions if there aren't any or when called from cnt_fetch?

Any better ideas?

Thank you for reading,

Nils