Class FD_SOCK
A pinger thread will be started when the membership goes above 1 and will be stopped when it drops below 2. The pinger thread connects to its neighbor on the right and waits until the socket is closed. When the socket is closed by the monitored peer in an abnormal fashion (IOException), the neighbor will be suspected.
The main feature of this protocol is that no ping messages need to be exchanged between any 2 peers, as failure detection relies entirely on TCP sockets. The advantage is that no activity will take place between 2 peers as long as they are alive (i.e. have their server sockets open). The disadvantage is that hung servers or crashed routers will not cause sockets to be closed, therefore they won't be detected.
The costs involved are 2 additional threads: one that monitors the client side of the socket connection (to monitor a peer) and another one that manages the server socket. However, those threads will be idle as long as both peers are running.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprotected classTask that periodically broadcasts a list of suspected members to the group.protected static classHandles a client connection; multiple client can connect at the same timestatic classprotected classHandles the server-side of a client-server socket connection. -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected static final intprotected final FD_SOCK.BroadcastTaskprotected InetAddressprotected LazyRemovalCache<Address, IpAddress> Cache of member addresses and their ServerSocket addressesprotected longprotected intprotected intprotected InetAddressprotected intUsed to rendezvous on GET_CACHE and GET_CACHE_RSPprotected longprotected booleanprotected booleanprotected final Lockprotected booleanprotected static final intprotected intprotected intprotected Addressprotected InputStreamprotected Socketprotected Threadprotected intprotected booleanprotected booleanprotected intprotected ServerSocketprotected IpAddressprotected FD_SOCK.ServerSocketHandlerprotected booleanprotected intprotected final BoundedList<String> protected longprotected TimeSchedulerFields inherited from class org.jgroups.stack.Protocol
after_creation_hook, down_prot, ergonomics, id, local_addr, log, policies, stack, stats, up_prot -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected voidbroadcastSuspectMessage(Address suspected_mbr) Sends a SUSPECT message to all group members.protected voidprotected Addressprotected AddressAn event is to be sent down the stack.protected IpAddressfetchPingAddress(Address mbr) Attempts to obtain the ping_addr first from the cache, then by unicasting q request tombr, then by multicasting a request to all members.protected voidDetermines coordinator C.longintintintintlongintintintintintintlongprotected voidprotected booleanvoidinit()Called after a protocol has been created and before the protocol is started.protected voidinterruptPingerThread(boolean sendTerminationSignal) Interrupts the pinger thread.booleanbooleanprotected booleanbooleankeepAlive(boolean k) static ByteArraymarshal(LazyRemovalCache<Address, IpAddress> addrs) protected Stringprotected booleanprotected voidresetPingableMembers(Collection<Address> new_mbrs) voidvoidrun()Runs as long as there are 2 members and more.protected voidsendIHaveSockMessage(Address dst, Address mbr, IpAddress addr) Sends or broadcasts a I_HAVE_SOCK response.protected voidsendPingSignal(int signal) protected voidsetCacheMaxAge(long c) setCacheMaxElements(int c) setClientBindPort(int c) setExternalPort(int e) setGetCacheTimeout(long g) setLogSuspectedMessages(boolean log_suspected_msgs) setNumTries(int n) setPortRange(int p) setSockConnTimeout(int s) setStartPort(int s) setSuspectMsgInterval(long s) protected booleansetupPingSocket(IpAddress dest) Creates a socket todest, and assigns it to ping_sock.protected static StringsignalToString(int signal) voidstart()This method is called on aJChannel.connect(String); starts work.booleanprotected booleanDoes *not* need to be synchronized on pinger_mutex because the caller (down()) already has the mutex acquiredprotected voidvoidstop()Called on aJChannel.disconnect(); stops work (e.g.protected voidvoidstopServerSocket(boolean graceful) protected voidprotected voidunmarshal(byte[] buffer, int offset, int length) protected voidAn event was received from the protocol below.A single message was received.Methods inherited from class org.jgroups.stack.Protocol
accept, addPolicy, addr, addr, afterCreationHook, destroy, down, down, enableStats, getAddress, getComponents, getDownProtocol, getDownServices, getId, getIdsAbove, getLevel, getLog, getName, getPolicies, getProtocolStack, getSocketFactory, getThreadFactory, getTransport, getUpProtocol, getUpServices, getValue, isErgonomics, level, parse, policies, providedDownServices, providedUpServices, removePolicy, requiredDownServices, requiredUpServices, resetStatistics, setAddress, setDownProtocol, setErgonomics, setId, setLevel, setPolicies, setProtocolStack, setSocketFactory, setUpProtocol, setValue, statsEnabled, toString, up
-
Field Details
-
NORMAL_TERMINATION
protected static final int NORMAL_TERMINATION- See Also:
-
ABNORMAL_TERMINATION
protected static final int ABNORMAL_TERMINATION- See Also:
-
bind_addr
-
external_addr
-
external_port
protected int external_port -
get_cache_timeout
protected long get_cache_timeout -
cache_max_elements
protected int cache_max_elements -
cache_max_age
protected long cache_max_age -
suspect_msg_interval
protected long suspect_msg_interval -
num_tries
protected int num_tries -
start_port
protected int start_port -
client_bind_port
protected int client_bind_port -
port_range
protected int port_range -
keep_alive
protected boolean keep_alive -
sock_conn_timeout
protected int sock_conn_timeout -
num_suspect_events
protected int num_suspect_events -
suspect_history
-
members
-
suspected_mbrs
-
pingable_mbrs
-
srv_sock_sent
protected volatile boolean srv_sock_sent -
get_cache_promise
Used to rendezvous on GET_CACHE and GET_CACHE_RSP -
got_cache_from_coord
protected volatile boolean got_cache_from_coord -
srv_sock
-
srv_sock_handler
-
srv_sock_addr
-
ping_dest
-
ping_sock
-
ping_input
-
pinger_thread
-
cache
Cache of member addresses and their ServerSocket addresses -
ping_addr_promise
-
lock
-
timer
-
bcast_task
-
regular_sock_close
protected volatile boolean regular_sock_close -
shuttin_down
protected volatile boolean shuttin_down -
log_suspected_msgs
protected boolean log_suspected_msgs
-
-
Constructor Details
-
FD_SOCK
public FD_SOCK()
-
-
Method Details
-
getMembers
-
getPingableMembers
-
getSuspectedMembers
-
getNumSuspectedMembers
public int getNumSuspectedMembers() -
getPingDest
-
getNumSuspectEventsGenerated
public int getNumSuspectEventsGenerated() -
isNodeCrashMonitorRunning
public boolean isNodeCrashMonitorRunning() -
isLogSuspectedMessages
public boolean isLogSuspectedMessages() -
setLogSuspectedMessages
-
getClientBindPortActual
public int getClientBindPortActual() -
getBindAddress
-
setBindAddress
-
getExternalAddress
-
setExternalAddress
-
getExternalPort
public int getExternalPort() -
setExternalPort
-
getGetCacheTimeout
public long getGetCacheTimeout() -
setGetCacheTimeout
-
getCacheMaxElements
public int getCacheMaxElements() -
setCacheMaxElements
-
getCacheMaxAge
public long getCacheMaxAge() -
setCacheMaxAge
-
getSuspectMsgInterval
public long getSuspectMsgInterval() -
setSuspectMsgInterval
-
getNumTries
public int getNumTries() -
setNumTries
-
getStartPort
public int getStartPort() -
setStartPort
-
getClientBindPort
public int getClientBindPort() -
setClientBindPort
-
getPortRange
public int getPortRange() -
setPortRange
-
keepAlive
public boolean keepAlive() -
keepAlive
-
getSockConnTimeout
public int getSockConnTimeout() -
setSockConnTimeout
-
printSuspectHistory
-
printCache
-
startNodeCrashMonitor
public boolean startNodeCrashMonitor() -
init
Description copied from class:ProtocolCalled after a protocol has been created and before the protocol is started. Attributes are already set. Other protocols are not yet connected and events cannot yet be sent. -
start
Description copied from class:ProtocolThis method is called on aJChannel.connect(String); starts work. Protocols are connected ready to receive events. Will be called from bottom to top. -
stop
public void stop()Description copied from class:ProtocolCalled on aJChannel.disconnect(); stops work (e.g. by closing multicast socket). Will be called from top to bottom. -
resetStats
public void resetStats()- Overrides:
resetStatsin classProtocol
-
up
Description copied from class:ProtocolAn event was received from the protocol below. Usually the current protocol will want to examine the event type and - depending on its type - perform some computation (e.g. removing headers from a MSG event type, or updating the internal membership list when receiving a VIEW_CHANGE event). Finally, the event is either a) discarded, or b) an event is sent down the stack usingdown_prot.down()or c) the event (or another event) is sent up the stack usingup_prot.up(). -
up
Description copied from class:ProtocolA single message was received. Protocols may examine the message and do something (e.g. add a header) with it before passing it up. -
down
Description copied from class:ProtocolAn event is to be sent down the stack. A protocol may want to examine its type and perform some action on it, depending on the event's type. If the event is a message MSG, then the protocol may need to add a header to it (or do nothing at all) before sending it down the stack usingdown_prot.down(). -
run
public void run()Runs as long as there are 2 members and more. Determines the member to be monitored and fetches its server socket address (if n/a, sends a message to obtain it). The creates a client socket and listens on it until the connection breaks. If it breaks, emits a SUSPECT message. It the connection is closed regularly, nothing happens. In both cases, a new member to be monitored will be chosen and monitoring continues (unless there are fewer than 2 members). -
isPingerThreadRunning
protected boolean isPingerThreadRunning() -
resetPingableMembers
-
hasPingableMembers
protected boolean hasPingableMembers() -
removeFromPingableMembers
-
printPingableMembers
-
suspect
-
unsuspect
-
handleSocketClose
-
startPingerThread
protected boolean startPingerThread()Does *not* need to be synchronized on pinger_mutex because the caller (down()) already has the mutex acquired -
interruptPingerThread
protected void interruptPingerThread(boolean sendTerminationSignal) Interrupts the pinger thread. The Thread.interrupt() method doesn't seem to work under Linux with JDK 1.3.1 (JDK 1.2.2 had no problems here), therefore we close the socket (setSoLinger has to be set !) if we are running under Linux. This should be tested under Windows. (Solaris 8 and JDK 1.3.1 definitely works).Oct 29 2001 (bela): completely removed Thread.interrupt(), but used socket close on all OSs. This makes this code portable and we don't have to check for OSs.
-
stopPingerThread
protected void stopPingerThread() -
sendPingTermination
protected void sendPingTermination() -
sendPingSignal
protected void sendPingSignal(int signal) -
startServerSocket
- Throws:
Exception
-
stopServerSocket
public void stopServerSocket(boolean graceful) -
setupPingSocket
Creates a socket todest, and assigns it to ping_sock. Also assigns ping_input -
teardownPingSocket
protected void teardownPingSocket() -
getCacheFromCoordinator
protected void getCacheFromCoordinator()Determines coordinator C. If C is null and we are the first member, return. Else loop: send GET_CACHE message to coordinator and wait for GET_CACHE_RSP response. Loop until valid response has been received. -
broadcastSuspectMessage
Sends a SUSPECT message to all group members. Only the coordinator (or the next member in line if the coord itself is suspected) will react to this message by installing a new view. To overcome the unreliability of the SUSPECT message (it may be lost because we are not above any retransmission layer), the following scheme is used: after sending the SUSPECT message, it is also added to the broadcast task, which will periodically re-send the SUSPECT until a view is received in which the suspected process is not a member anymore. The reason is that - at one point - either the coordinator or another participant taking over for a crashed coordinator, will react to the SUSPECT message and issue a new view, at which point the broadcast task stops. -
broadcastUnuspectMessage
-
sendIHaveSockMessage
Sends or broadcasts a I_HAVE_SOCK response. If 'dst' is null, the reponse will be broadcast, otherwise it will be unicast back to the requester -
fetchPingAddress
Attempts to obtain the ping_addr first from the cache, then by unicasting q request tombr, then by multicasting a request to all members. -
determinePingDest
-
marshal
-
unmarshal
-
determineCoordinator
-
signalToString
-