[Nottingham] Eager Queues and GNU Hurd.

Jason Cozens jason.cozens at computer.org
Fri Jan 19 21:47:11 GMT 2007


Hi,

I've just registered for a SourceForge project relating to what I
mentioned last night. This is what I told them, I hope it's not too long
for this list:

Eager Queue Protocol
====================

The Problem - Simple Statement
==============================

A pool of processors is available to do work. The work is of a
heterogeneous nature so that the processors can not be organised into
a regular structure. The work requests come from the processors
themselves. Processors may also come and go from the pool.

How can the processors be organised so that the work is scheduled in
a reasonably good way so that there is no centralised scheduling
process.

Solution
========

A protocol that uses UDP is defined called the Eager Queue Protocol.
This protocol works in a similar way to a job exchange. Workers enter
the exchange and join a queue. Jobs are advertised on the job board at
the front of the queue. The worker at the front of the queue takes the
job and leaves the job exchange to sort out the contract with the
employer who advertised the job. Workers can act as both workers and
employers.

In EQP a channel is reserved solely for the purpose of scheduling. The
job exchange is implemented as a packet on the scheduling channel. All
workers broadcast to the scheduling channel and all workers can see the
broadcast packets.

When a worker joins the queue it broadcasts an update packet and enters
the front of the queue. If nothing is happening on the scheduling
channel the worker at the back of the queue moves to the front of the
queue and broadcasts an update. In this way the workers at the front
of the queue are likely to be awake. If a worker misses its broadcast
slot all other workers remove it from the queue assuming it has left the
job exchange.

When a job request is broadcast it is expected that the worker at the
front of the queue will take the job. If this doesn't happen the second
worker takes the job and marks the first worker as having left the job
exchange. If the second worker doesn't take the job the third worker
will take it and mark the first and second workers as having left. This
continues down the queue.

The protocol is then extended to take account of workers having
different capabilities. That is it can support multiple queues and a
worker can be in more than one queue.

EQP is only for scheduling. It doesn't guarantee completion of a job
once scheduled, this must be done out of band. This is similar to a
worker going for a job and getting lost on the way. The employer will
then need to re-advertise the job at the job exchange.

Benefits of Developing the Protocol
===================================

The protocol uses UDP a well established and widely available protocol.
The main benefit of the protocol will be to move away from a centralised
OS Kernel that is responsible for the scheduling of processes. Since
there is no central processor it should be able to make the system fault
tolerant and also have live hardware updates.

It is hoped that one of the main benefits of parallel computing might be
realised, that is that high performance is gained in a heterogeneous
environment by aggregating many cheap processors rather than by the use
of ever more powerful and expensive processors.

>From initial investigations it look as though EQP will be able to
combine with the translators of GNU Hurd.

Rough Outline of What Happens
=============================

The following is a very quick overview of the protocol.

The processors have the following structure.

          O-------
          | O----- Data Channels
          | | O---
          | | |
        |-----------|
        |  Worker   |
        |  Process  |
        |-----------|
             | |
        |-----------|
        | Scheduler |
        |  Process  |
        |-----------|
          |
          O-- Scheduling Channel

Key
===

Queues are represented as lists:

        [Head, . . . ,Tail]

Pn!<Message> represents processor n broadcasting a <Message> to all
the other processors. <Message> is a UDP packet. It will be seen that
the state of the system is constantly being broadcast.

New processors join the front of the queue. This is for fault
tolerance. An idle processor that has just broadcast is more likely to
be awake than a processor that hasn't broadcast for a while.

Idling Eager Queue
==================

The basic mechanism of an eager queue is when all processors are
idling. Processors can come and go from the queue.

The scheduling channel is broken down into equal time units. Processors
listen on the channel. If there is an empty slot the processor at the
back of the queue moves to the front and broadcasts. The first processor
to join the queue listens for two time slots if it hears nothing it
broadcasts a JOIN and then waits broadcasting UPDATE messages every
other time slot.

  P1 Listens.
  -
  P1!JOIN:[P1]                          1st Broadcast
  -
  P1!UPDATE:[P1]                        2nd Broadcast
  -
  P1!UPDATE:[P1]
  P2!JOIN:[P2,P1]
  -
  P1!UPDATE:[P1,P2]
  -
  P2!UPDATE:[P2,P1]
  P3!JOIN:[P3,P2,P1]
  -
  P1!UPDATE:[P1,P3,P2]
  P4!JOIN:[P4,P1,P3,P2]
  -
  P2!UPDATE:[P2,P4,P1,P3]
  P5!JOIN:[P5,P2,P4,P1,P3] X P6!JOIN:[P6,P2,P4,P1,P3]     Collision.
  P6!JOIN:[P6,P2,P4,P1,P3]
  P5!JOIN:[P5,P6,P2,P4,P1,P3]
  -
  P3!UPDATE:[P3,P5,P6,P2,P4,P1]
  -
  X P1!UPDATE:[P1,P3,P5,P6,P2,P4]       P1 fails to broadcast.
  P4!UPDATE:[P4,P3,P5,P6,P2]
  -

Job Requests
============

A job request that is corretly picked up.

  P4!UPDATE:[P4,P3,P5,P6,P2]
  P5!REQUEST:[P4,P3,P6,P2]
  P4!ACCEPT:[P3,P6,P2]
  .
  .
  .
  P3!UPDATE:[P3,P6,P2]
  P4!JOIN:[P4,P3,P6,P2]

  A job request that is missed:

  P4!UPDATE:[P4,P3,P5,P6,P2]
  P5!REQUEST:[P4,P3,P6,P2]
  -
  P3!ACCEPT:[P6,P2]

References
==========

Some earlier ideas in eager queues can be seen at:

        http://developer.novell.com/wiki/index.php/OpenEd_-_Lab_4






More information about the Nottingham mailing list