Class FD_SOCK

java.lang.Object
org.jgroups.stack.Protocol
org.jgroups.protocols.FD_SOCK
All Implemented Interfaces:
Runnable, Lifecycle

public class FD_SOCK extends Protocol implements Runnable
Failure detection protocol based on sockets. Failure detection is ring-based. Each member creates a server socket and announces its address together with the server socket's address in a multicast.

A pinger thread will be started when the membership goes above 1 and will be stopped when it drops below 2. The pinger thread connects to its neighbor on the right and waits until the socket is closed. When the socket is closed by the monitored peer in an abnormal fashion (IOException), the neighbor will be suspected.

The main feature of this protocol is that no ping messages need to be exchanged between any 2 peers, as failure detection relies entirely on TCP sockets. The advantage is that no activity will take place between 2 peers as long as they are alive (i.e. have their server sockets open). The disadvantage is that hung servers or crashed routers will not cause sockets to be closed, therefore they won't be detected.

The costs involved are 2 additional threads: one that monitors the client side of the socket connection (to monitor a peer) and another one that manages the server socket. However, those threads will be idle as long as both peers are running.

  • Field Details

    • NORMAL_TERMINATION

      protected static final int NORMAL_TERMINATION
      See Also:
    • ABNORMAL_TERMINATION

      protected static final int ABNORMAL_TERMINATION
      See Also:
    • bind_addr

      protected InetAddress bind_addr
    • external_addr

      protected InetAddress external_addr
    • external_port

      protected int external_port
    • get_cache_timeout

      protected long get_cache_timeout
    • cache_max_elements

      protected int cache_max_elements
    • cache_max_age

      protected long cache_max_age
    • suspect_msg_interval

      protected long suspect_msg_interval
    • num_tries

      protected int num_tries
    • start_port

      protected int start_port
    • client_bind_port

      protected int client_bind_port
    • port_range

      protected int port_range
    • keep_alive

      protected boolean keep_alive
    • sock_conn_timeout

      protected int sock_conn_timeout
    • num_suspect_events

      protected int num_suspect_events
    • suspect_history

      protected final BoundedList<String> suspect_history
    • members

      protected volatile List<Address> members
    • suspected_mbrs

      protected final Set<Address> suspected_mbrs
    • pingable_mbrs

      protected final List<Address> pingable_mbrs
    • srv_sock_sent

      protected volatile boolean srv_sock_sent
    • get_cache_promise

      protected final Promise<Map<Address,IpAddress>> get_cache_promise
      Used to rendezvous on GET_CACHE and GET_CACHE_RSP
    • got_cache_from_coord

      protected volatile boolean got_cache_from_coord
    • srv_sock

      protected ServerSocket srv_sock
    • srv_sock_handler

      protected FD_SOCK.ServerSocketHandler srv_sock_handler
    • srv_sock_addr

      protected IpAddress srv_sock_addr
    • ping_dest

      protected Address ping_dest
    • ping_sock

      protected Socket ping_sock
    • ping_input

      protected InputStream ping_input
    • pinger_thread

      protected volatile Thread pinger_thread
    • cache

      Cache of member addresses and their ServerSocket addresses
    • ping_addr_promise

      protected final Promise<IpAddress> ping_addr_promise
    • lock

      protected final Lock lock
    • timer

      protected TimeScheduler timer
    • bcast_task

      protected final FD_SOCK.BroadcastTask bcast_task
    • regular_sock_close

      protected volatile boolean regular_sock_close
    • shuttin_down

      protected volatile boolean shuttin_down
    • log_suspected_msgs

      protected boolean log_suspected_msgs
  • Constructor Details

    • FD_SOCK

      public FD_SOCK()
  • Method Details

    • getMembers

      public String getMembers()
    • getPingableMembers

      public String getPingableMembers()
    • getSuspectedMembers

      public String getSuspectedMembers()
    • getNumSuspectedMembers

      public int getNumSuspectedMembers()
    • getPingDest

      public String getPingDest()
    • getNumSuspectEventsGenerated

      public int getNumSuspectEventsGenerated()
    • isNodeCrashMonitorRunning

      public boolean isNodeCrashMonitorRunning()
    • isLogSuspectedMessages

      public boolean isLogSuspectedMessages()
    • setLogSuspectedMessages

      public FD_SOCK setLogSuspectedMessages(boolean log_suspected_msgs)
    • getClientBindPortActual

      public int getClientBindPortActual()
    • getBindAddress

      public InetAddress getBindAddress()
    • setBindAddress

      public FD_SOCK setBindAddress(InetAddress b)
    • getExternalAddress

      public InetAddress getExternalAddress()
    • setExternalAddress

      public FD_SOCK setExternalAddress(InetAddress e)
    • getExternalPort

      public int getExternalPort()
    • setExternalPort

      public FD_SOCK setExternalPort(int e)
    • getGetCacheTimeout

      public long getGetCacheTimeout()
    • setGetCacheTimeout

      public FD_SOCK setGetCacheTimeout(long g)
    • getCacheMaxElements

      public int getCacheMaxElements()
    • setCacheMaxElements

      public FD_SOCK setCacheMaxElements(int c)
    • getCacheMaxAge

      public long getCacheMaxAge()
    • setCacheMaxAge

      public FD_SOCK setCacheMaxAge(long c)
    • getSuspectMsgInterval

      public long getSuspectMsgInterval()
    • setSuspectMsgInterval

      public FD_SOCK setSuspectMsgInterval(long s)
    • getNumTries

      public int getNumTries()
    • setNumTries

      public FD_SOCK setNumTries(int n)
    • getStartPort

      public int getStartPort()
    • setStartPort

      public FD_SOCK setStartPort(int s)
    • getClientBindPort

      public int getClientBindPort()
    • setClientBindPort

      public FD_SOCK setClientBindPort(int c)
    • getPortRange

      public int getPortRange()
    • setPortRange

      public FD_SOCK setPortRange(int p)
    • keepAlive

      public boolean keepAlive()
    • keepAlive

      public FD_SOCK keepAlive(boolean k)
    • getSockConnTimeout

      public int getSockConnTimeout()
    • setSockConnTimeout

      public FD_SOCK setSockConnTimeout(int s)
    • printSuspectHistory

      public String printSuspectHistory()
    • printCache

      public String printCache()
    • startNodeCrashMonitor

      public boolean startNodeCrashMonitor()
    • init

      public void init() throws Exception
      Description copied from class: Protocol
      Called after a protocol has been created and before the protocol is started. Attributes are already set. Other protocols are not yet connected and events cannot yet be sent.
      Specified by:
      init in interface Lifecycle
      Overrides:
      init in class Protocol
      Throws:
      Exception - Thrown if protocol cannot be initialized successfully. This will cause the ProtocolStack to fail, so the the channel constructor will throw an exception
    • start

      public void start() throws Exception
      Description copied from class: Protocol
      This method is called on a JChannel.connect(String); starts work. Protocols are connected ready to receive events. Will be called from bottom to top.
      Specified by:
      start in interface Lifecycle
      Overrides:
      start in class Protocol
      Throws:
      Exception - Thrown if protocol cannot be started successfully. This will cause the ProtocolStack to fail, so JChannel.connect(String) will throw an exception
    • stop

      public void stop()
      Description copied from class: Protocol
      Called on a JChannel.disconnect(); stops work (e.g. by closing multicast socket). Will be called from top to bottom.
      Specified by:
      stop in interface Lifecycle
      Overrides:
      stop in class Protocol
    • resetStats

      public void resetStats()
      Overrides:
      resetStats in class Protocol
    • up

      public Object up(Event evt)
      Description copied from class: Protocol
      An event was received from the protocol below. Usually the current protocol will want to examine the event type and - depending on its type - perform some computation (e.g. removing headers from a MSG event type, or updating the internal membership list when receiving a VIEW_CHANGE event). Finally, the event is either a) discarded, or b) an event is sent down the stack using down_prot.down() or c) the event (or another event) is sent up the stack using up_prot.up().
      Overrides:
      up in class Protocol
    • up

      public Object up(Message msg)
      Description copied from class: Protocol
      A single message was received. Protocols may examine the message and do something (e.g. add a header) with it before passing it up.
      Overrides:
      up in class Protocol
    • down

      public Object down(Event evt)
      Description copied from class: Protocol
      An event is to be sent down the stack. A protocol may want to examine its type and perform some action on it, depending on the event's type. If the event is a message MSG, then the protocol may need to add a header to it (or do nothing at all) before sending it down the stack using down_prot.down().
      Overrides:
      down in class Protocol
    • run

      public void run()
      Runs as long as there are 2 members and more. Determines the member to be monitored and fetches its server socket address (if n/a, sends a message to obtain it). The creates a client socket and listens on it until the connection breaks. If it breaks, emits a SUSPECT message. It the connection is closed regularly, nothing happens. In both cases, a new member to be monitored will be chosen and monitoring continues (unless there are fewer than 2 members).
      Specified by:
      run in interface Runnable
    • isPingerThreadRunning

      protected boolean isPingerThreadRunning()
    • resetPingableMembers

      protected void resetPingableMembers(Collection<Address> new_mbrs)
    • hasPingableMembers

      protected boolean hasPingableMembers()
    • removeFromPingableMembers

      protected boolean removeFromPingableMembers(Address mbr)
    • printPingableMembers

      protected String printPingableMembers()
    • suspect

      protected void suspect(Set<Address> suspects)
    • unsuspect

      protected void unsuspect(Address mbr)
    • handleSocketClose

      protected void handleSocketClose(Exception ex)
    • startPingerThread

      protected boolean startPingerThread()
      Does *not* need to be synchronized on pinger_mutex because the caller (down()) already has the mutex acquired
    • interruptPingerThread

      protected void interruptPingerThread(boolean sendTerminationSignal)
      Interrupts the pinger thread. The Thread.interrupt() method doesn't seem to work under Linux with JDK 1.3.1 (JDK 1.2.2 had no problems here), therefore we close the socket (setSoLinger has to be set !) if we are running under Linux. This should be tested under Windows. (Solaris 8 and JDK 1.3.1 definitely works).

      Oct 29 2001 (bela): completely removed Thread.interrupt(), but used socket close on all OSs. This makes this code portable and we don't have to check for OSs.

    • stopPingerThread

      protected void stopPingerThread()
    • sendPingTermination

      protected void sendPingTermination()
    • sendPingSignal

      protected void sendPingSignal(int signal)
    • startServerSocket

      protected void startServerSocket() throws Exception
      Throws:
      Exception
    • stopServerSocket

      public void stopServerSocket(boolean graceful)
    • setupPingSocket

      protected boolean setupPingSocket(IpAddress dest)
      Creates a socket to dest, and assigns it to ping_sock. Also assigns ping_input
    • teardownPingSocket

      protected void teardownPingSocket()
    • getCacheFromCoordinator

      protected void getCacheFromCoordinator()
      Determines coordinator C. If C is null and we are the first member, return. Else loop: send GET_CACHE message to coordinator and wait for GET_CACHE_RSP response. Loop until valid response has been received.
    • broadcastSuspectMessage

      protected void broadcastSuspectMessage(Address suspected_mbr)
      Sends a SUSPECT message to all group members. Only the coordinator (or the next member in line if the coord itself is suspected) will react to this message by installing a new view. To overcome the unreliability of the SUSPECT message (it may be lost because we are not above any retransmission layer), the following scheme is used: after sending the SUSPECT message, it is also added to the broadcast task, which will periodically re-send the SUSPECT until a view is received in which the suspected process is not a member anymore. The reason is that - at one point - either the coordinator or another participant taking over for a crashed coordinator, will react to the SUSPECT message and issue a new view, at which point the broadcast task stops.
    • broadcastUnuspectMessage

      protected void broadcastUnuspectMessage(Address mbr)
    • sendIHaveSockMessage

      protected void sendIHaveSockMessage(Address dst, Address mbr, IpAddress addr)
      Sends or broadcasts a I_HAVE_SOCK response. If 'dst' is null, the reponse will be broadcast, otherwise it will be unicast back to the requester
    • fetchPingAddress

      protected IpAddress fetchPingAddress(Address mbr)
      Attempts to obtain the ping_addr first from the cache, then by unicasting q request to mbr, then by multicasting a request to all members.
    • determinePingDest

      protected Address determinePingDest()
    • marshal

      public static ByteArray marshal(LazyRemovalCache<Address,IpAddress> addrs)
    • unmarshal

      protected Map<Address,IpAddress> unmarshal(byte[] buffer, int offset, int length)
    • determineCoordinator

      protected Address determineCoordinator()
    • signalToString

      protected static String signalToString(int signal)