$Id: specification.txt,v 1.20 2001/04/05 22:36:54 ned21 Exp $ Specification Document Place where I throw design decisions in a random order --------------------------------------------------------------------- Network Delay Measurement The sensible way of measuring network delay would be to make use of "ICMP echo" messages, so the server wouldn't have to run any special daemon. Unfortunately, you have to be root to open the raw sockets required to use ICMP (/bin/ping is suid) so I'm going to have to write a server-client combo that uses UDP. Each UDP packet should contain an 8 bit sequence number. The server simply echoes back a UDP packet containing the same number. Server: A simple UDP echo daemon. Will be non-concurrent so as to make programming easier and also to increase accuracy of measurements (want to measure network, not server load). HOWEVER, we do want to measure server load in a way. Need to think of a way to do this (or ignore it...) Client: Modified echo client. Send packets at 2s intervals. Stop sending once we get three back. Wait 2s. Take mean RTT of all packets that have returned. Return that value. NB: Need to know value for "timeout", after which we assume server is dead. This needs to take many different types of network into account. 30s w/o receiving a response it probably sufficient (if RTT is greater than this than network is too slow to be usable anyway!) (23/11/00) After writing the server and part of the client, I realised that in order to do this properly I need a concurrent client. It's much more sensible to pass the procedure a list of hosts to test and return the best one and I need to concurrently listen and write to the socket. A little investigation of Comer & Stevens shows a way of measuring network throughput by using TCP (chapter 16). This seems much more sensible, esp. as I've already written a TCPechod in module1 ! --------------------------------------------------------------------- Port numbers Since I don't really want my servers to *have* to be root, there's always the possibility that the port I choose will be taken. Therefore I propose to make port_base (default=1024) a parameter over the server name, and if connection fails on one port, it should try port+1. ping: 7 Initial client connections: 8 Initial server connections: 9 (03/01/01) This is actually pretty pointless since if something else is using the first port then the client probably won't know this - it will just think the server is not responding. Therefore, in the list of servers we will have a field for port number. This will be the port for initial client connections, the port for initial server connections will be this+1 and the port for network measurements will be this+2. If the server cannot bind all these ports at start up it will complain and choose a different "base port". To avoid confusing, canonical representation of a server will always be a (servername, client connection port) tuple. --------------------------------------------------------------------- Modules Common: - msg parser object Server: - main() loop: listen for new connections - client_connection object: 1 per connected client - server_connection object: 1 per server we're connected to - network delay measurer - database manager object - echo server (for measuring network delay) Client (object): - main(): listen for connections from other objects - server_connection object - network delay measurer - send messages - process received messages --------------------------------------------------------------------- User Information Database Three parts: user info (controlled by , contact list (i.e. list of people user wishes to be notified of status changes) and signature of the first two to prevent tampering by server admin. UserID (64 bit number) - could just use public key but this makes "insecure logins" harder to implement. NickName Name E-mail Location (e.g. "Cambridge, UK"; "UK"; etc) Website Status StatusMessage IPaddr Port EventCount ( unsigned longbool ) PublicKey Need to leave space in database for more fields - e.g. "password" on a "Trusted" Server. ==> use XML for database format Make each user record the same as what we send as ... Separate database for contact lists: UserID followed by comma separated list of UserIDs that user has on their contact list. NB: Since XML is case sensitive, tags must appear as they are written above. --------------------------------------------------------------------- Message Parsing Object The aim is for the program (client or server) to receive a "message" as a string, pass this to the parser which returns a the information in an easy to manipulate format. Most of the message formats I have defined are extremely shallow and don't contain nested data. Therefore, it seems sensible to return a vector which has the following format: [0] = version number of client that sent message [1] = IP address of sender [2] = message type [n] for n>2 = dependent on message type, but broadly: - for odd n: [n] is the name of the tag - for even n: [n] is the data stored in the tag (as a string) NB: This spec forces a different tag for each message - can't multiplex messages. Remember: XML tags are case sensitive! (28/01/01) Realised that this return format is very wasteful! A map (use multimap to be safe) would be much more efficient. Rewrote MessageParser to do this... In the new format there are a number of "special" keys: msgtype - type of message (e.g. UserInfo, txt, etc ) ipaddr - IP Address of sender version - Version number of protocol used --------------------------------------------------------------------- User Registration Need to find a way of allocating a UserID. Can't use locking as we might get a "netsplit". Therefore, we allocate a UserID semi-randomly, perhaps based on IP and then do a check to make sure we don't have a user with that ID already. (Invent some sort of IP->UserID hash). --------------------------------------------------------------------- UserDatabase Object This object will log every transaction, primarily for debugging purposes. When a server logs into the network, it will need the updated database files. Rather than sending the whole database (which would cause problems in the event of a netsplit) each database will contain a "LastModified" field. The server-server will scan its database files for a "LastModified" time > the one supplied by the server-client and then send a patch file. (see "GNU patch") Creating and sending the patch file The easiest way to do this seems to be for the UserDatabase object to do the scanning and return a "patch" object. The server_connection object then sends this over the network to a receiver which generates the patch files and calls "patch". OTOH, maybe it's just easier if the Database object returns a vector< string > of the changed UINs which is then sent over the network and processed by the Database object at that end. Of course, since we cannot rely on servers to have synchronised clocks, we have to find an alternative method of working out which parts of a servers database are out of date. The best way of doing this seems to be to keep an "event" count for each user, which is incremented each time their information is updated. (So we can keep separate event counts for UserInfo and Contact list). Two problems: 1. Overflow. Solution: Set an "overflow" bit. Each time we overflow we invert the bit. As long as we don't overflow twice in a short period of time, this should work. => make counter a 32-bit long and a bool. 2. Knowing which records to update. We either have to send the whole database across the network and do the comparison at the server-client end (using diff and patch?) or we have to send a list of UIDs and their most recent event counts, compare them at the server-server end and send back only the required entries. The first approach requires less work from the CPU, but more network bandwidth (what if the server only goes down briefly?). The second requires the whole database to be parsed three times. Since even with the first method, the server-client still has to parse the database three times anyway, the second method is preferable. --------------------------------------------------------------------- Datafile Formats Two types of datafile: .udb = user information format is:uidevent_countrolloverUserInfo(XML string) .cl = contact list information format is: uidevent_countrollovercl_uid1cl_uid2etc NB: In order to maintain the order of events we need to make event counts "global" and therefore part of the string that is going to be transferred over the network. Therefore, getInfo() will return a string: event_countrolloverUserInfo(XML string). setInfo() will take a string in the same format. getContactList() and setContactList may have to be changed to allow the extraction/setting of event counts too. --------------------------------------------------------------------- Server Connections The server object itself will parse all messages and then call the appropriate methods on a server connection to intiate action. If the message is parsed by the server connection object then we need to implement some sort of callback in order to propogate messages to other servers. --------------------------------------------------------------------- Message Type list: UserInfo (subtypes defined above) serverlist (subtypes: and reqServerlist addServer (hostname, port) deleteServer updateRTTs (hostname, r{IPAddr}) disconnect ChangeCL (UserID, contactList) newuser (for server->server indicating they have a new user info) eventCounts (userdb, contactlist) changedEntries serverID (hostname and port) client: authenticate - userid server->client: error (to indicate that a request could not be met) reqUserInfo reqContactList server->client: servermsg contactlist: UserInfo strings of all people on our contactlist client->client: ............ --------------------------------------------------------------------- Adding a New User Generating a new user's UID will be done using a hash on their e-mail address. If a good hash algorithm is chosen then collisions will be unlikely, but we still need to test for the (unlikely) case that two users are trying to create an account at the same time AND their e-mail address happens to hash to the same user ID. Solution: Find calculate what the user's ID should be. Check that it doesn't already exist in the local database. Return user id to server module which then contacts all the other servers it can currently reach. Two options: optimistic or pessimistic. Optimistic: Create the user on each server that we contact. In the event of finding a clash need to back off and abort all the creations. This is actually quite hard since there is no facility for deleting users and what if both creation processes abort - who gets to use that particular ID? --REJECTED Pessimistic: Lock each server, prevent them from adding a new user. Once we have all the servers locked, create the user and distribute the information to all the servers. This can get messy in the event of the initiating server crashing, so we'll timeout 15 minutes after the lock was taken out. To prevent DoS we will only lock a particular ID. (19/01/01) The hash on the e-mail address failed as the conversion from ASCII to a natural (radix-128) number produced such a huge number, that operations on it became fiddly to do properly. New plan: Most sig 32-bits = number of seconds since 1 Jan 2000 Least sig 32-bits = IP address This hash is guaranteed to be unique (unless a single user initiated two "new user" setups from the same IP at the same second (multi-user machine?). Can detect this if on same server, could be slightly tricky other wise! However, I think this is a cost we can live with! Of course, once everyone gets their own 128-bit IP address, still use LS 32-bits of IP with time, but collisions are more likely. At this point a scheme like the one described above could be used. --------------------------------------------------------------------- For future compatibility, store all IP addresses as IPv6 --------------------------------------------------------------------- Minimum Spanning Tree algorithm Notes: Since (nearly) every node is interconnected number of edges is n^2 where n is number of nodes. -> Should limit search through edges as far as possible. Prim's algorithm (i.e. grow a tree from one node). Two sets of nodes: servers in tree; servers not in tree. Priority queue of all edges from nodes in tree to nodes not in tree. List of edges in tree. Start: Add ourself to the tree. ( Actually this is broken - need all nodes to add same node to start in order to generate same MST; therefore start with first one alphabetically ). For each node we add to tree: Extract entry for node added from LookupTable. Iterate over this for nodes not in tree and add their RTT times to edge queue. Extract edge at front of queue. Check if both ends are in tree; if not, add it. Loop until all nodes in tree. I believe this algorithm to be more space efficient than Kruskal as that would require n sets to start with (assuming each set has some fixed overhead) plus it requires putting n^2 edges into the priority queue. OTOH, while Prim will examine the same number of edges, it will do less lookups on the table to extract the edge information as it will only add edges as they are required. --------------------------------------------------------------------- Handshaking When a server contacts another one the first XML message that should be sent is a "serverID" message. --------------------------------------------------------------------- The Client See OO design on paper. Basically the server will push changes to us whenever someone on our contact list updates their information. Therefore CtoSconnection needs facilities to request our contact list, and update our userinfo and contact list. Client<->Server Handshake protocol C->S: authenticate (uid) S->C: UserInfo (for uid) S->C: ContactList (for uid) with IP and port for all users that are online The last step means that whenever we call a sendMsg() method on the contacts database we can be sure we have the correct information in the database. --------------------------------------------------------------------- Client to Client Messages Message format: uin --------------------------------------------------------------------- Concurrency in the Client A problem with client<->server communication is that ideally it should be asynchronus. i.e. after a client sends the server a message such as "reqContactList" it shouldn't matter if the next message the server sends is not a response to that query but different message. Therefore incoming messages must be processed "centrally" like in a server BUT in a server incoming messages only affect internal data structures while in a client we need to do I/O (e.g. do somethign with the contact list we request). The solution seems to be to create a GossipClient class which offers an interface to a UI (e.g. a GUI or text based system). This would contain a message buffer which would contain parsed messages that are of interest to the user (e.g. updated UserInfo). A UI would poll this periodically, looking for messages of a certain type and then do something with them. (Separate buffers for each message type? Seems a bit wasteful and difficult to expand). --------------------------------------------------------------------- Network Partitioning Detecting the difference between a host going down and the network link itself disappearing is hard, if not impossible without storing large amounts of state. One possible method would be to retry hosts that have gone down using an exponential back off, however this is still somewhat messy when the host itself going down is a far more frequent occurence. Since the loss of a critical network link (such as the Cambridge-JANET link) is a fairly major occurence it does not seem unreasonable to have the server administrator "prod" the server in some way. Two possibilities: 1. "persistent" servers in configuration file. Clumsy - does not really fit into current implementation scheme (do we have to always maintain a connection to a persistent server even if it's not in the min spanning tree?) 2. Have administrator send a message (signal, named pipe, socket) to manually add a server. Question: should I send "addServer" (code exists for propogation of server, but no auto connection?) or "ServerList" (can tell it about all servers on other side of network link that has come back). addServer followed by a recalc() is probably best approach, but how do we then ask existing server for other servers? Another question: Signals are nasty (esp. with threads!) so named pipe or socket? Socket: I know how to use already, allow remote admin (need to implement security). Pipe: Opposite of socket. Use a local socket, since code for socket based connection already exists. prodServer program simply connects as if it was another server and sends addServer messages for all servers in config file (having done a DNS lookup on them first so we don't get multiple connections!) Unfortunately, this doesn't quite work as we don't get any RTT information from the servers we add and so the minSpanningTree() algorithm cannot function. A solution is probably some sort of specific remote administration code. --------------------------------------------------------------------- Client Authentication Original idea: client->server: {userid}_privateKey However this is vunerable to a replay attack, or a rogue server caching these messages. Since "logging in" doesn't need to entail very much anyway (there is no concept of a "session" really since every update must be signed) and all server information is considered public, the solution seems to be to let the client just request their UserInfo and then send an update userinfo message containing their IP address and port. The problem that then presents itself is the case where a user just drops offline without sending a new UserInfo message (e.g. dialup line drops). The server can't update their status without a signature and if I remove the "status" field from the signed data then a rogue server can play havoc with everyone's status. Possible solutions: 1. Every UserInfo message to contain a second version of itself with the status set to "Offline". This isn't forwarded but just cached by the server to be used in the event of the link dieing. 2. Clients spot users it can no longer reach and marks them appropriately. Solution 2 is probably easier to implement. Note: The client must still do an "authenticate" when it wishes to login so that the server knows which uid is on the end of the connection. is equivalent to a --------------------------------------------------------------------- Security Specification NB: The key part of a security policy has to identification of threats [Part1b - Intro to Security (RJA)]. Since we can't (easily) prevent the owner of a server process snooping information from it's memory we shouldn't even pretend that information stored on or passing through the server is anything but public. Threats: - unauthorised alteration of data stored on the server - impersonation of a user - eavesdropping on client-client conversations I /believe/ that all of these threats can be countered by using private-public keypairs. --------------------------------------------------------------------- Aim: To prevent a server admin from tampering with user information or impersonating a user. Each user has a public-private keypair. Public key is stored with user info on server. Attack: Admin overwrites file on disk to force a change to user information. Defence: Stored user information is signed. Server checks that information agrees with signature and if there is a discrepancy it re-requests the information from another server. The admin could also replace the signature with by using a new keypair. This is impossible to defend against but we can prevent propogation over the network by keeping a "key signature" too - this would be immutable (i.e. the key for a particular user would be immutable). The keypair also makes on the wire encryption possible and can be used to sign every message for authentication purposes. They keypair can be used to log into a server (challenge-response). Unfortunately this compromises portability. (You can't just walk up to any internet terminal with a client installed and login). Possible Solutions: 1. Take your key with you on disk 2. "Insecure/Portable" key that can be recreated with some sort of (username,password) hash. 3. Logins via password. Problem: Can't trust server not to mis-use password (how do you store it?). Solution: Trusted Server(s) that stores your password and will authenticate you. It would be nice if we could turn solutions 2 and 3 on and off (e.g. if we know we're going away, etc) but I don't think this is possible if keys are to be immutable --------------------------------------------------------------------- Black listing servers Need to be able to do this, but what mechanisms should there be? Who decides who is blacklisted? How is this enforced fairly to ensure that no one abuses this? We could allow users to specify what server not to log into but this seems a bit pointless if their information is passed to the server anyway. DMI suggested some sort of voting system for throwing a server off the network, but this still leaves the question as to who decides who is untrustworthy (well, malisious since we already trust no one). Solution: Allow a server admin to refuse connections from specified IPs. If all the server admins then agree that an IP is banned, then everyone will black list them and they won't have access to a network. Social solutions better than technical ones.