logiweb man page

logiweb — Logiweb protocol

Description

Logiweb is a system for storing, locating, and transmitting Logiweb pages. Logiweb pages may contain free text mixed with machine intelligible objects like computer programs, testsuites, and formal proofs.

Logiweb defines a referencing scheme in which each Logiweb page has a unique Logiweb reference. A Logiweb reference is typically around 30 bytes long. A Logiweb reference contains, among other, a RIPEMD-160 hash key of the referenced page.

The purpose of the Logiweb protocol is to locate a Logiweb page, given its reference.

To maximize efficiency, the Logiweb protocol was originally intended to be a protocol of its own, using its own UDP port.

As application for a UDP port turned out to be too complicated, however, the Logiweb protocol will be channeled through http instead. A new protocol will be defined based on the protocol defined below, c.f. logiweb(5). The present document is included as logiweb(7) until the new protocol becomes available.

Protocol Definition

Internet Draft                                                   K. Grue
<draft-grue-logiweb-protocol-1-00.txt>               Associate Professor
Category: Experimental                                              DIKU
Expires September 8, 2007                                  March 8, 2007

Logiweb Protocol Version 1
<draft-grue-logiweb-protocol-1-00.txt>

    Status of this Memo
    Distribution of this memo is unlimited.

By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups. Note that other groups
may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time.  It is inappropriate to use Internet-Drafts as reference material
or to cite them other than as "work in progress."

The list of current Internet-Drafts can be accessed at 
http://www.ietf.org/1id-abstracts.html 

The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract

When publishing mechanically verified mathematics on the Internet, there is a need for referencing previously published documents. As an example, referenced documents may contain needed definitions, lemmas, and proofs. References from one mechanically verified document to another is much like any other Uniform Resource Locator, but there is a need to ensure that referenced documents do not change after publication. This is so because otherwise a change of e.g. a definition in a referenced document could invalidate the correctness of a referencing document.

The present document describes the protocol used by an experimental system named "Logiweb" which allows to publish mechanically verified, immutable mathematical documents.

Table of Contents

1. Introduction ....................................................3
2. Protocol ........................................................4
   2.1. Cardinals ..................................................5
   2.2. Identifiers ................................................6
   2.3. Logiweb identifier .........................................6
   2.4. Timestamps .................................................6
   2.5. Ping requests ..............................................7
   2.6. Pong responses .............................................7
   2.7. Event responses ............................................8
   2.8. Nop requests ...............................................8
   2.9. Prefix messages ............................................9
   2.10. Vectors ...................................................9
   2.11. Get messages .............................................10
   2.12. Server states ............................................11
   2.13. Server states are binary trees ...........................12
   2.14. The type attribute .......................................12
   2.15. The update attribute .....................................13
   2.16. The left and right attributes ............................13
   2.17. The sibling attribute ....................................13
   2.18. The url attribute ........................................15
   2.19. The leap attribute .......................................16
   2.20. Other attribute classes ..................................17
   2.21. The initial state ........................................17
   2.22. Got messages .............................................18
   2.23. Put messages .............................................20
3. Security Considerations ........................................21
   3.1. Unwanted outgoing information .............................21
   3.2. State corruption ..........................................22
   3.3. Incoming denial-of-service attacks ........................23
   3.4. Outgoing denial-of-service attacks ........................23
4. IANA Considerations ............................................24
   4.1. Well Known Port 332 .......................................24
   4.2. MIME type application/prs.logiweb .........................24
5. References .....................................................25
5.1. Normative References .........................................25
5.2. Informative References .......................................25

1. Introduction

This document defines the 'Logiweb protocol' version 1.

Logiweb is a system for publication of immutable documents of high typographic quality which contain computer programs and mathematical definitions, theorems, and proofs [Logiweb].

To understand the Logiweb protocol, only the following features of the Logiweb system are needed:

o A Logiweb document is a sequence of bytes. A Logiweb document consists of a version number followed by a RIPEMD-160 hash key [RIPEMD] followed by a time stamp followed by a sequence of bytes.
o Any Logiweb document has a 'Logiweb reference'. The reference is a sequence of bytes. The reference of a document is the version number followed by the hash key followed by the time stamp of the document.
o It is assumed (c.f. the section on security considerations later in this document) that any two Logiweb documents with the same reference are identical. This is ensured by the RIPEMD-160 hash key in all probability.
o To be considered 'published', a Logiweb document must be accessible using the World Wide Web (WWW). A published Logiweb document may be mirrored such that it is available under more than one Uniform Resource Locator (URL) [RFC3986]. A published Logiweb document may be moved and copies of it may be deleted such that the set of URLs associated with a Logiweb document may change with time.
o The Logiweb system comprises Logiweb 'servers' and Logiweb 'clients'.
o A Logiweb server is a running computer program which communicates with Logiweb clients and other Logiweb servers using the Logiweb protocol, and which provides the services described in the following.
o A Logiweb client is a running computer program which communicates with Logiweb servers using the Logiweb protocol, and which uses the services described in the following.

The main task of Logiweb servers is to keep track of the relationship between Logiweb references and their associated fluctuating set of URLs. The main service provided by Logiweb servers is to translate Logiweb references to URLs. All Logiweb servers on the Internet shall cooperate on this.

As mentioned above, a Logiweb document must be accessible using the WWW to be considered 'published'. In addition, the URL of at least one copy of the document must be known to at least one of the cooperating Logiweb servers.

As secondary services, a Logiweb server can identify itself as a Logiweb server, it can tell what time it is according to the servers clock, and it can tell what leap seconds have occurred.

Logiweb servers are not supposed to deliver Logiweb documents. They are merely supposed to translate Logiweb references to URLs. The actual delivery of Logiweb documents is supposed to be performed by http servers.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

2. Protocol

The Logiweb protocol defines the syntax and semantics of 'Logiweb messages'. Logiweb messages are the units of exchange when using the Logiweb protocol.

The Logiweb protocol is an application layer protocol. The Logiweb protocol can build on top of connection-based transport protocols like TCP [RFC0793] as well as datagram protocols like UDP [RFC0768].

When using a datagram protocol, each datagram contains one and only one Logiweb message. When using a connection-based protocol, Logiweb messages are transmitted back-to-back.

The Logiweb protocol specifies that some messages require a response and some do not. However, as an overall rule, whenever an application receives a datagram containing a Logiweb message, the application is allowed not to respond. Furthermore, whenever an application receives Logiweb messages over a connection-based transport, the application is allowed to close the connection at any time.

Applications should respond to message which require a response unless they have a reason for not doing so. Reasons for not responding to a datagram or closing a connection could be that the application is short of outgoing bandwidth, that the application thinks it is suffering a denial-of-service attack, or that the application thinks that the other end of the communication is broke or malicious.

Furthermore, whenever an application receives a message which requires a response, the application is allowed to respond with a Logiweb 'Sorry' message. A 'Sorry' message indicates that the application is unwilling to answer at the given time but may be willing to answer later if the same question is sent again.

Whenever an application receives a message which requires a response via a connection-based protocol, the application is required to respond properly OR respond with a 'Sorry' message OR disconnect. Not responding is no option when using a connection based transport.

The syntax of Logiweb messages is expressed in ABNF [RFC4234] in the following. The semantics is expressed in clear.

2.1. Cardinals
middle-septet = %d128-255
end-septet = %d000-127
cardinal = *middle-septet end-septet

A middle-septet X represents the number X minus 128. An end-septet X represents the number X. A cardinal represents a non-negative integer using little endian base 128. As an example, the cardinal

129 002

represents the non-negative integer 1 + 128 * 2 = 513. The cardinal

129 130 000

also represents 513.

2.2. Identifiers
x00 = %d000 / %d128 x00
x01 = %d001 / %d129 x00
x02 = %d002 / %d130 x00
x03 = %d003 / %d131 x00
x04 = %d004 / %d132 x00
x05 = %d005 / %d133 x00
x06 = %d006 / %d134 x00
x07 = %d007 / %d135 x00

The syntax class x03 covers all cardinals whose value is three. The other syntax classes are similar.

2.3. Logiweb identifier
L = %d204
o = %d239
g = %d231
i = %d233
w = %d247
e = %d229
b = %d226
id-version = %d001
id-Logiweb = L o g i w e b id-version

Logiweb applications use id-Logiweb for indicating that they use the Logiweb protocol. Note that the 204 is a middle septet which represents the number 204 - 128 = 76 which is the Unicode [Unicode] of a Latin capital letter L.

2.4. Timestamps
timestamp = mantissa exponent
mantissa = cardinal
exponent = cardinal

Logiweb measures the time at which an event occurred in 'Logiweb time'. Logiweb time measures the number of seconds that have elapsed according to International Atomic Time (TAI) since TAI 00:00:00 of Modified Julian Day (MJD) 0.

For information, MJD 0 is November 17, 1858 in the Gregorian calender. TAI 00:00:00 of MJD 0 equals Universal Coordinated Time (UTC) 00:00:10 of MJD 0 since, by convention, TAI and UTC were 10 seconds apart before June 30, 1972. In short, UTC is a time scale in which it is noon when Greenwich is under the sun.

A timestamp consists of two cardinals, M and E and represents the number M*10^(-E) where u^v denotes u raised to power v. As an example

129 002 009

denotes 513 nanoseconds past the Logiweb epoch where the Logiweb epoch is TAI 00:00:00 of MJD 0.

2.5. Ping requests
message = ping
ping = id-ping
id-ping = x02

A ping request represents the question 'who are you, and what time is it'.

2.6 Pong responses
message =/ pong
pong = id-pong id-Logiweb timestamp
id-pong = x03

A Logiweb server which receives a ping request shall do one of the following:

o Respond by a pong message containing the current time.
o Respond by a 'Sorry' message.
o Avoid responding if the ping is transported by a datagram.
o Disconnect if the ping is transported by a connection-based transport.

Logiweb servers are supposed to respond to ping requests. Logiweb clients should consider the other end of the connection as broke if it receives a ping request.

Logiweb applications SHALL NOT respond to pong responses.

2.7. Event responses
message =/ event
event = id-event notice
id-event = x01
notice = sorry / received / rejected
sorry = x00
received = x01
rejected = x02

A 'sorry' response indicates that the sender has received a request which it is unwilling to answer at the given time but may be willing to answer later. Logiweb applications are allowed to send a 'sorry' response to any request which requires a response.

A 'received' message indicates that the sender acknowledges the receipt of a request but is not going to give any further answer. A 'received' message is the proper response to the 'put' request described later.

A 'rejected' message indicates that the sender acknowledges the receipt of a request but will not and never will answer that particular request. Logiweb applications may respond by a 'rejected' message only when they receive a malformed request.

Logiweb applications SHALL NOT respond to event responses.

2.8. Nop requests
message =/ nop
nop = id-nop
id-nop = x00

Logiweb applications SHALL NOT respond to nop requests. Nop requests may be used for padding when using connection-based transports. There is no point in sending nop datagrams. Applications are allowed to disconnect connection-based transports at any time, so even though applications are not allowed to respond to nop requests, they may still disconnect on a 'nop' without violating the protocol.

2.9. Prefix messages

message =/ prefix
prefix = id-prefix code contents
id-prefix = x07
code = cardinal
contents = message

Whenever a Logiweb application receives a prefix message, it shall process the contents of the message. If the application responds to the contents, it shall prefix the given code to the response.

Example: Suppose an application receives a ping with two prefixes:

007 100 007 101 002

Furthermore, suppose the application decides to respond with a 'sorry' message. Then the response should be:

007 100 007 101 001 000

Because of prefixes, messages can be arbitrarily long. Messages are typically less than 100 bytes in length. Applications are suggested to process message that are up to 65536 bytes long. When receiving messages longer than that, applications are suggested to disconnect if the message is received over a connection-based transport and to discard if the message is received as a mammoth datagram.

2.10. Vectors
vector = length bytes
length = cardinal
bytes = *byte
byte = %d0-255

A vector represents a list of bits. The given length is the number of bits in the list. The syntax of vectors is NOT context free since the number of bytes must be equal to the given length divided by eight and rounded up to the nearest integer. As an example,

012 128 015

represents a list comprising twelve bits. The length field occupies the first byte. Twelve divided by eight and rounded up equals two, indicating that the next two bytes are part of the vector.

The vector 012 128 015 is translated to a list of bytes as follows. First write the bytes in binary big endian:

1000 0000  0000 1111

Then bit swap each byte:

0000 0001  1111 0000

Then pick the first twelve bits:

0000 0001 1111

Sane programmers don't bit swap. Sane programmers realize and utilize that Logiweb is little endian.

2.11. Get messages
message =/ get
get = id-get address class index
id-get = x04
address = vector
class = update / type / left / right
class =/ sibling / url / leap
update = x00
type = x01
left = x02
right = x03
sibling = x04
url = x05
leap = x06
index = cardinal

Logiweb servers are supposed to maintain a 'state' which is visible from the outside. Clients and other servers may query the state of a Logiweb server using get messages. A get message requests a Logiweb server to return the 'attribute' which the server associates to the given address, class, and index.

A Logiweb server has no other visible state than what can be queried using get messages.

2.12. Server states

The state of a server is a function which, given an address and a class, returns a list of attributes. Addresses and classes were defined in the previous section. An attribute consists of a timestamp and a value where the value is a vector as defined in Section 2.10.

Server states may change with time. When a server receives a 'get' message as described in the previous section, it responds with a 'got' message as described later. The contents of the 'got' message reflects the server state at the time the 'get' is processed by the server.

The server state may change at any time. Processing of each 'get' message is atomic, but the server state may change between any two 'get' messages.

The server state can only change in two ways: an attribute may be added or an attribute may be removed. Whenever an attribute is removed, it is removed from the list of attributes it belongs to without reordering the remaining attributes on that list. Whenever an attribute is added, it is added at the end of an attribute list. For that reason, all attribute lists are chronological with the oldest attribute first.

Every attribute comprises a timestamp and a value. The value is an arbitrary vector. The timestamp indicates at what time the given attribute was added to the server state.

A get message with address A, class C, and index I requests the I'th oldest attribute with address A and class C. The oldest attribute has index one. A get message with index zero or an index larger than the number of attributes with the given address and class requests the newest attribute with the given address and class.

2.13. Server states are binary trees

As mentioned, the state of a server is a function which, given an address and a class, returns a list of attributes. Addresses are bit vectors. We shall refer to all attributes with a given address on a given server as the 'node' at that server at that address.

We shall refer to the empty list of bits as the 'root address' and to the node with that address as the 'root node'. For all addresses A, we refer to A with a zero or one bit added at the end as the 'left' and 'right subaddress', respectively. For non-empty addresses A, we refer to the A with one bit removed at the end as the 'super-address' of A.

As an example,
1110 is the left subaddress of 111
1111 is the right subaddress of 111
11   is the superaddress of 111

We shall say that a a server 'has a node with address A' if its state contains at least one attribute with address A.

A server state is a binary tree in the sense that whenever a server has a node N1 with non-empty address A then it also has a node N2 whose address is the superaddress of A. We shall refer to N2 as the supernode of N1.

If a server has a node N with address A, then we shall refer to N as a 'leaf' node if the server has no nodes whose addresses are the left or right subaddresses of A. We shall refer to N as a 'branch' node if the server has nodes for both the left and the right subaddress of A. Server states only contain leaf and branch nodes. A server state cannot contain a node that has a left but not a right subnode or vice versa.

2.14. The type attribute

Every node of a server contains exactly one attribute of class 'type' (i.e. of class 1). The value of that attribute is the empty bit vector if the node is a leaf node. The value is a one-element bit vector whose sole bit is a one-bit if the node is a branch node. The time stamp of the attribute equals the time at which the node was created or last changed type.

2.15. The update attribute

Every node of a server has six attributes of class 'update' (i.e. of class 0). The six update attributes have values 1, 10, 11, 100, 101, and 110, respectively. The timestamps of those attributes are as follows:

1 Identical to the timestamp of the 'type' attribute.
10 The time of the last change in the left subtree of the node.
11 The time of the last change in the right subtree of the node.
100 The time of the last change in the 'sibling' attribute list of the node
101 The time of the last change in the 'url' attribute list of the node
110 The time of the last change in the 'leap' attribute list of the node

The time stamps for the update attributes with value 10 and 11 equal the timestamp of the 'type' attribute for leaf nodes. The time stamps for the update attributes with value 100, 101, and 110 equal the timestamp of the 'type' attribute if the node never has had attributes of class 'sibling', 'url', or 'leap', respectively.

Contrary to other attribute lists, update attribute lists may contain several attributes with identical timestamps. That occurs when a single addition or deletion of an attribute has consequential changes. Among other, all update attributes are set to the current server time when a node is created.

2.16. The left and right attributes

Server states have no attributes of class 2 (left) or 3 (right). These two classes only occur as values in update attributes.

2.17. The sibling attribute

Two nodes with the same address on different servers are 'siblings'. A 'branch sibling' of a node is a sibling which is at the same time a branch node. Sibling attributes of a node are references to servers that store branch siblings of the given node.

The value of a sibling attribute is a byte vector, i.e. a bit vector whose length is a multiple of 8. The bytes part of the bit vector may have a value like

"udp/logiweb.eu/65535/http://logiweb.eu/logiweb/server/relay/"

The string above contains 60 characters and, hence, 480 bits. For that reason its encoding is

224 003 117 100 112 047 108 ...

Above, the middle-septet 224 represents 224-128=96 and the length field 224 003 represents 96+128*3=480. The number 117 is a Latin small letter u as in "udp". The little-endian nature of bit vectors has no observable effect here.

In general, sibling attributes have form

protocol "/" host "/" port "/" relay

The protocol may be 'tcp' or 'udp'. The host and port identify the Logiweb server. The relay must be an URL [RFC3986].

The purpose and function of a 'relay' is outside the scope of the present document. For information, however, a relay is a special Logiweb client which runs as a CGI-program [CGI]. If a relay is invoked with a path of '/64/...' or '/32/...' or '/16/...' where the dots express a Logiweb reference expressed base 64, 32, or 16, then the relay contacts a Logiweb server to get the reference translated to an URL and returns an indirection to that URL. As an example, looking up http://logiweb.eu/logiweb/server/relay/…... in a web browser is supposed to open the Logiweb document with the given reference. Looking up e.g. http://logiweb.eu/logiweb/server/relay/… is supposed to do the same but then to back up 2 slashes and then add index.html.

Logiweb relays typically have further facilities. At the time of writing, the relay at http://logiweb.eu/server/relay contains a self-documenting interface to a Logiweb server which allows any user to experiment with the protocol described in the present document. The given relay was the first Logiweb relay established on the Internet and is supposed to exist as long as Logiweb itself exists.

Logiweb relays will not be mentioned any more in the present document.

We shall refer to sibling attributes as sibling pointers. Sibling pointers are said to be 'valid' if they point to servers which store a branch sibling of the given node. A sibling pointer is said to be 'dangling' otherwise. Hence, a sibling pointer is dangling if the server pointed to stores no sibling of the given node. Furthermore, a sibling pointer is dangling if the server pointed to does store a sibling but that sibling is a leaf node.

A server SHALL try its best to avoid dangling pointers. No server can be perfect here because the state of other servers may change without notice. But a server is supposed to validate its sibling pointers regularly.

Furthermore, each server SHALL try its best to populate all its nodes with sibling pointers. The only excuse for not populating a node with sibling pointers is if no Logiweb server in the world stores a branch sibling of the given node.

Finally, each server SHALL do its best to ensure that all branch siblings in the world of each node of the server are reachable from the node by following sibling pointers. This is even more difficult to satisfy than the two previous requirements, however, since not only may other server states change without notice but, furthermore, no server has any control over any other server. So, servers are basically required to be resonable and cooperative.

2.18. The url attribute

The address of a node is a bit vector. A Logiweb reference is also a bit vector. If the address of a node is a valid Logiweb reference then the url attributes of the node shall be Uniform Resource Locators (URLs) [RFC3986] of Logiweb documents with the given reference.

Url attributes of nodes whose addresses are not valid Logiweb references are reserved for future extensions.

2.19. The leap attribute

Only root nodes have leap attributes. Each leap attribute indicates the location of a leap second. Leap attributes are byte vectors, i.e. bit vectors whose length is a multiple of eight. Leap attributes have format

leap = step mjd
step = cardinal
mjd = cardinal

Each leap second occurs at the end of a UTC day (i.e. at midnight in Greenwich). The mjd field indicates which Modified Julian Day (MJD) is affected by the leap. The step is 1 if that day is prolonged by one second. The step is 2 if that day is shortened by one second. Hence, step is 1 for a +1 leap and 2 for a -1 leap. If the International Earth Rotation Service (IERS) ever decides to make multiple leaps, the relationship is intended to be as follows:

step  0  1  2  3  4  5  6 ...
leap  0 +1 -1 +2 -2 +3 -3 ...

IERS only intends to use leaps of +1 and -1. Leaps of -1 have never occurred and maybe never will. IERS intends to let leaps occur at the end of June 30 and December 31. IERS intends to announce leaps in advance. Leaps affect the length of the last minute of the last hour of the affected UTC day.

As for all other attributes, the timestamps of leap attributes indicate the time at which the attribute entered the state of the server. At startup, a server is likely to read leap second information from a configuration file or fetch it from another Logiweb server. Servers should arrange leaps chronologically with the oldest leap first.

Leap attributes shall comprise all past leaps announced by the IERS. Leap attributes should comprise all past and future leaps announced by the IERS. In other words, newly announced leaps shall enter the state before the leap occurs.

2.20. Other attribute classes

Only attributes of class 0, 1, 4, 5, and 6 may occur in server states. Attribute class 2 and 3 never will occur in server states. Attribute class 7 is reserved for information about which future classes a server supports. Class 8-15 are reserved for experiments. Classes from 16 to 2^160-1 inclusive are reserved for first come first served classes. Classes from 2^160 and up are reserved for classes based on the value of Logiweb references. Only class 0, 1, 4, 5, and 6 are permitted according to the present document.

2.21. The initial state

When a server starts up, its state contains one node. That node is a root node and it contains seven attributes: one 'type' attribute and six 'update' attributes. The value of the 'type' attribute is the empty bit vector indicating that the root node is a leaf. The values of the update attributes are 1, 10, 11, 100, 101, and 110. All seven timestamps are equal and indicate the time at which the root node was created.

We shall refer to sibling, url, and leap attributes as 'proper' attributes. After creation of the root node, the state is changed by adding and removing proper attributes. Update and type attributes only change as a consequence of adding and removing proper attributes. At any time, the server must contain the least number of nodes which are enough to contain the stored proper attributes. For that reason, removing a proper attribute may cause an avalanche of node deletions and adding a proper attribute may cause an avalanche of node creations.

When adding a proper attribute, the timestamp of all consequential changes must be equal to the timestamp of the new attribute which in turn must reflect the time at which the attribute was added. When removing a proper attribute, all consequential changes must have the same timestamp and that timestamp must reflect the time at which the attribute was removed. The timestamps of successive additions and removals of proper attributes must be strictly increasing. If the resolution of the server clock is insufficient for that, then the server must fake a higher resolution.

Consequential changes may involve changing the value of update and type attributes. Such changes shall be treated as a simultaneous removal of the old attribute and addition of a new one such that the new attribute appears at the end of its attribute list.

2.22. Got messages
message =/ got
got = id-got address class index
norm count timestamp value
id-got = x05
norm = cardinal
count = cardinal
value = vector

A Logiweb server which receives a get request shall do one of the following:

o Respond by a got message as described later in this section.
o Respond by a 'Sorry' message.
o Avoid responding if the get is transported by a datagram.
o Disconnect if the get is transported by a connection-based transport.

Logiweb servers are supposed to respond to get requests. Logiweb clients should consider the other end of the connection as broke if it receives a get request.

Logiweb applications SHALL NOT respond to got responses.

If a Logiweb server responds with a 'got' response to a 'get' request, then the 'got' response shall reflect the state of the server at the time the 'get' is processed. The address, class, and index of the 'got' response shall be identical to the address, class, and index of the associated 'get' request. The norm, count, timestamp, and value shall be as follows:

CASE 1: the state contains an attribute with the given address, class, and index. The norm shall be the length of the address. The count shall be the number of attributes in the state that have the given address and class. The timestamp and value shall be the time stamp and value, respectively, of the attribute with the given address, class, and index.

CASE 2: the state contains an attribute with the given address and class, but none with the given index. The norm shall be the length of the address. The count shall be the number of attributes in the state that have the given address and class. The timestamp and value shall be the time stamp and value, respectively, of the attribute with the largest index of the given address and class.

CASE 3: the state contains an attribute with the given address, but none with the given class. The norm shall be the length of the address. The count shall be zero. The timestamp shall be the current server time. The value shall be the empty bit vector.

CASE 4: the state contains no attributes with the given address. In this case, let A2 be the longest prefix of the given address for which the state does contain an attribute.

CASE 4A: the state contains a sibling attribute with address A2. The norm shall be the length of A2. The count shall be the number of sibling attributes in the state that have address A2. The timestamp and value shall be the time stamp and value, respectively, of a randomly picked attribute with address A2 and class sibling.

CASE 4B: the state contains no sibling attributes with address A2. The norm shall be the length of A2. The count shall be zero. The timestamp shall be the current server time. The value shall be the empty bit vector.

CASE 4A covers the case where the given server is unable to answer the given question (the one encoded in the get request), but is able to refer to some other server which stores a branch node with address A2. In other words, CASE 4A covers the case where a server can refer to a server more knowledgeable on the given question.

CASE 4B covers the case where the given server is unable to answer the given question and unable to refer to a server which stores a branch node with address A2. Logiweb servers SHALL try their best to avoid CASE 4B in cases where there exists a server which has a branch node with address A2. No server can be perfect here, however, since all states of all other servers may change without notice. But servers are required to crawl Logiweb to ensure they have a plentiful supply of sibling attributes for all their nodes.

Clients who need e.g. to translate a Logiweb reference R into an URL are supposed to issue a get message with address R, class URL, and index 0. When the client receives a got message whose norm equals the length of R, it uses the returned URL (if any). If the client receives a got message whose norm is less than the length of R, it resends to get request to the indicated sibling (if any). At each redirection, the norm is supposed to increase. If the norm does not increase, then the state of the penultimate server is outdated. In this case, the client may as a courtesy send the penultimate server a 'put' message which tells the server to remove its dangling sibling pointer. Put messages are described later.

When a server or a client crawls Logiweb, it may do so iteratively. As an example, a client may remember when it last visited a given server. Next time the client visits the server, it may start querying the server time with a ping request. Then the client may find out what has changed using update attributes without wasting time on attribute classes and subtrees that have not changed since last. Finally, the client may set its time of last visit to the response from the initial ping.

Whenever such a client reads a changed attribute list, it should read it in reverse chronological order. To do so, it may start with index 0 to get the newest attribute and the number C of attributes. Then it may query index C minus one, C minus two, and so on in that order. If attributes are removed between queries, then the client may receive the same attribute more than once, but it will never miss an attribute. For attributes other than update attributes, distinct attributes have distinct timestamps, so the client can eliminate duplicates on basis of timestamps.

2.23. Put messages
message =/ put
put = id-put address class operation value
id-put = x06
operation = remove / add
remove = x00
add = x01

A Logiweb server which receives a put request shall do one of the following:

o Respond by a 'Received' message.
o Respond by a 'Sorry' message.
o Avoid responding if the put is transported by a datagram.
o Disconnect if the put is transported by a connection-based transport.

Logiweb servers are supposed to respond to put requests. Logiweb clients should consider the other end of the connection as broke if it receives a put request.

A server which receives a put message whose operation is 'remove' may consider to remove an attribute with the given address, class, and value. The remove message contains no index since the index of an attribute can decrease at any time because of removal of older attributes on the same attribute list.

A server which receives a put message whose operation is 'add' may consider to add an attribute with the given address, class, and value. The add message contains no timestamp since the timestamp of the new attribute should be set to the current server time rather than being supplied.

A server should consider almost all put requests with almost infinite suspicion. A put request could be forged to corrupt the state of a server or could be forged to fool the server into participating in a denial-of-service attack on some other Logiweb server or some other service on the Internet. This is why a server only tells the sender of a put request that the server has 'received' the request. It does not reveal any information about what the server is going to do with the request. Is is perfectly legitimate for a server to ignore all put requests.

3. Security Considerations

3.1. Unwanted outgoing information

A Logiweb server provides information to the outside world through pong responses, event responses, and got responses.

Pong responses identifies the server as a Logiweb server and tells what time it is. The owner of a Logiweb server must be prepared to share this information with the world.

Event responses (received, rejected, and sorry responses) tells the world about the mood of the server. The owner of the server must be prepared to share that as well.

Got responses tell the world about the publicly available state of the server. In principle, the owner of the server should be prepared to share that as well.

A Logiweb server, however, typically indexes given subtrees of the owners web site. A Logiweb server typically does so by crawling the file system of the host. In doing so, the server could find documents whose existence the owner wants to keep secret, and then make the existence of those documents publicly known. After that, the secret documents may be retrieved from the owners web server.

As a countermeasure for that, Logiweb servers should only index files with extension 'lgw' ('lgw' for 'logiweb'). Among those files, the server should check that the first byte of the file contains the number 1, and that the next twenty bytes contain the RIPEMD-160 hash key of the remaining bytes of the file. That ensures with great likelihood that only genuine Logiweb documents are indexed, avoiding inadvertent indexing of other kinds of documents. Authors of Logiweb documents who want their Logiweb documents to remain secret should keep them out of reach of the local Logiweb server.

As another use of got messages, an attacker may use got responses to figure out how the server reacts to put requests. Doing so, the attacker may be able to find a security hole which allows the attacker to fool the server to participate in a denial-of-service attack on some other service. The ultimate countermeasure to this is to let the server ignore all put messages. Otherwise, one must try to avoid security holes in the server.

3.2. State corruption

Using put messages, an attacker may try to persuade a server to place incorrect information in the server state. The ultimate countermeasure to this is to let the server ignore all put messages. Otherwise, a server should not react directly to put messages. Rather, the server should repeatedly crawl its host file system to keep its url attributes up to date and should repeatedly crawl Logiweb to keep its sibling attributes up to date. In doing so, a server could take a put message as a hint to crawl some particular area earlier than it would otherwise do.

One source of put messages are notifications from inside the owners firewall that some Logiweb document has been added to or removed from the file system. To respond reasonably, servers are suggested to classify sender IP's suitably in order to follow up more promptly on put requests from more trusted senders. This only works, of course, for sender IP's which an attacker cannot tamper with.

Even if a server is persuaded to place incorrect information in its state, this will at most prevent clients from finding Logiweb documents. If a server translates a reference into an URL, then the client is supposed to retrieve the associated Logiweb document and to verify using RIPEMD-160 [RIPEMD] that the retrieved document is the one requested.

3.3. Incoming denial-of-service attacks

If a large number of clients start sending requests to a single Logiweb server, the ingoing bandwidth of the server may get saturated. To avoid saturating the outgoing bandwidth if this occurs, the 'sorry' message has been included in the protocol. The 'sorry' message allows the server to respond to incoming messages using little bandwidth and little computational resources. Furthermore, the protocol allows the server not to respond at all, which accounts for messages lost due to limitation of ingoing bandwidth.

Logiweb clients should maintain a list of Logiweb servers, and if one server does not respond or responds with a 'sorry', then the client should switch to another Logiweb server.

3.4. Outgoing denial-of-service attacks

An attacker may launch an indirect denial-of-service attack by sending requests to a Logiweb server whose sender field contain the IP of the victim. To counter for that, the Logiweb protocol specifies that each request can result in at most one response. In that way, an attacker cannot use a Logiweb server to 'amplify' the attack.

Logiweb servers are supposed to crawl Logiweb on their own initiative. Furthermore, put messages may suggest to Logiweb servers that they should promote crawling of particular servers. An attacker could use this to persuade a number of Logiweb servers to crawl one victim simultaneously. To counter for that, the present document does not specify exactly what a Logiweb server is supposed to do with put messages. Furthermore, Logiweb servers should approach other servers gently, waiting for their responses to see that the contacted servers do respond and do not send out 'sorry' messages. Finally, Logiweb servers should check that they actually do talk with Logiweb servers and not with some innocent other service. Logiweb servers may do so by sending a ping request to services whose identity they are not sure of.

4. IANA Considerations

4.1. Well Known Port 332

The format of sibling attributes allows Logiweb servers to run on arbitrary UDP and TCP ports. At present, Logiweb servers use UDP port 65535 by default.

To avoid making the use of port 65535 permanent, udp and tcp Well Known Port 332 is requested to be registered.

Port number 332 is suggested because 332 = 256 + 76 where 76 is the Unicode of Latin capital letter L, which is the first letter in "Logiweb". On some occasions not covered in the present document, the Logiweb system represents strings by numbers, in which case the one character string "L" happens to be represented by the number 332. Furthermore, port 332 is unassigned and appears at the end of an interval of unassigned numbers so that assignment will not lead to fragmentation.

Suggested port name: "Logiweb".

4.2. MIME type application/prs.logiweb

As mentioned, the main purpose of Logiweb servers is to translate Logiweb references into an URL of an associated Logiweb document. When looking up the URL of the Logiweb document, http servers currently deliver the Logiweb document with MIME type application/x-logiweb.

To avoid making the use of MIME type application/x-logiweb permanent, MIME type application/prs.logiweb is requested to be registered.

The format of Logiweb documents is:

document = id-version ripemd timestamp contents
id-version = %d001
ripemd = 20*20 byte
contents = *byte
byte = %d0-255

For the syntax of timestamps, see the section entitled "Timestamps".

The ripemd field of a document must be the RIPEMD-160 hash key [RIPEMD] of all bytes following the ripemd field (including the timestamp).

The reference of a Logiweb document comprises the document with the contents removed.

The description above of the contents as a sequence of bytes is sufficient as far as the Logiweb protocol is concerned. A more complete description may be found at http://logiweb.eu/logiweb/doc/server/pr….

5. References
5.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC4234] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", RFC 4234, October 2005.
5.2. Informative References
[Logiweb] http://logiweb.eu/ (see also Grue, K., "Logiweb - A System for Web Publication of Mathematics", Mathematical Software - ICMS 2006, Lecture Notes in Computer Science, pp.343--353, vol.4151, Springer, 2006).
[CGI] http://www.w3.org/CGI/
[RIPEMD] Dobbertin, H., Bosselaers, A., and Preneel, B., "RIPEMD-160: A Strengthened Version of RIPEMD", Fast Software Encryption, 71-82, 1996
[Unicode] http://www.unicode.org/
[RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, August 1980.
[RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, September 1981.
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005.
Authors' Address
      Klaus Grue
      DIKU
      University of Copenhagen
      Universitetsparken 1
      DK-2100 Copenhagen
      Denmark

      email - grue@diku.dk

    Full Copyright Statement

Copyright (C) The IETF Trust (2007).

This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.

This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST, AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

    Intellectual Property

The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed
to pertain to the implementation or use of the technology
described in this document or the extent to which any license
under such rights might or might not be available; nor does it
represent that it has made any independent effort to identify any
such rights.  Information on the procedures with respect to
rights in RFC documents can be found in BCP 78 and BCP 79.

Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use
of such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository
at http://www.ietf.org/ipr.

The IETF invites any interested party to bring to its attention
any copyrights, patents or patent applications, or other
proprietary rights that may cover technology that may be required
to implement this standard.  Please address the information to the
IETF at ietf-ipr@ietf.org.

Referenced By

logiweb(5).

JULY 2009 Logiweb File formats and protocols