Chapter 2: Streams

In its simplest form, virgule is an HTML (or more specifically, an XML) validator. The possibility exists in virgule, however, to treat tagnames as a request to execute code. One of those possibilities is to change the I/O stream half-way through a node walk. In its simplest form, this is done with an <include/> tag. So, with this:


	<page>
	  <html>
		<include file="/site/title.html"/>
	  </html>
	</page>

the tag <html/> is not understood by virgule (unless someone has written and installed an <html/> module), so is passed straight through to the output stream. however, the <include/> tag says, read in the contents of site/title.xml, parse it as if it was inline in the current document.

The code under development is a more sophisticated form of the <include/> tag - a <sock:connect/> <sock:recv/>, <sock:send/> set. Due to the possibility of doing streaming, where a data exchange of XML document fragments can take place over the same socket without closing the socket, it is hair-raising work.

I am certain that an experienced Lisp Network software engineer would feel right at home if they wrote a program that exchanged remote Lisp progams and data and dynamically executed or displayed the programs or data.

A name is associated with each stream, whether input or output. It is sensible, but not necessary, to use the same name. A new input stream and output stream are created from the same socket in the <sock:connect/>. In order to read from the socket, the <sock:recv/> tag must name the input stream so created. Where to put the output that comes back from the socket can also be named. Theoretically, it will be possible to have two sockets created by <sock:connect/>, with a script acting as a gateway between them!

The following script listens for data on sock2 and sends it to sock1, then listens on sock1 for data and sends it to sock2:


	<sock:connect in="sock1" out="sock1">
		<sock:connect in="sock2" out="sock2">
			<sock:recv in="sock2" out="sock1/>
			<sock:recv in="sock1" out="sock2/>
		</sock:connect>
	</sock:connect>

2.1: Streams Implementation

This section outlines some notes to the designer so as not to lose track of some of the most hairy code. Raise your eyebrows all you like, it's not going to help.

libxml 2.2.5 includes something called a SAX interface. This splits the parsing of an XML input stream from what to do with it. As each entity in an XML input stream is identified, the appropriate function in a table of SAX routines is called. For example, when an element node is identified, the startElement function is called.

The virgule streams code uses the SAX routines to good effect. The problem at hand is that we are parsing a potentially never-ending data stream! So the way that this is dealt with is that the two functions startElement and endElement are hooked into. site_render_sock_recv() then calls xmlParseDocument, which will result in calls to sock_start_element() and sock_end_element() whenever the start and end of an element node is reached.

The trick is that a nodewalk is performed, matching the incoming data stream's XML nodes with the current input script:

<page raw="yes" xmlns:xvl="http://undefined"
		xmlns:sock="http://undefined"
		xmlns:schema="http://undefined"
		xmlns="xvl:xvl">

<sock:connect host="127.0.0.1" port="5222" protocol="tcp" type="connect" 
    in="jabber" out="jabber"
   ><sock:send in="jabber" out="jabber">
	<stream:stream xmlns="jabber:client" to="knight"
	      xmlns:stream="http://etherx.jabber.org/streams">
	    <sock:recv in="jabber">
		<stream:stream raw="yes">
		    <sock:send out="jabber">
		      <iq type="set" id="0" >
			  <query xmlns="jabber:iq:auth">
			      <username>tharg</username>
			      <password>nopasswordtoday</password>
			      <resource>jabber</resource>
			  </query>
		      </iq>
		      <sock:recv in="jabber">
			  <iq />
		      </sock:recv>
		    </sock:send>
		    <sock:send out="jabber">
		      <message to="test3@knight">
		          <body>hello</body>
		      </message>
		      <sock:recv in="jabber">
			  <message />
		      </sock:recv>
		    </sock:send>
		</stream:stream>
	    </sock:recv>
	</stream:stream>
    </sock:send>
</sock:connect>
</page>

Conceptually, this script looks very obvious. Send a tag of <stream:stream xmlns="jabber:client" to="knight"...> and then read a tag <stream:stream>, and if you do, then send <iq type="set" ...> etc.

Underneath, what is happening is that the <sock:send> tag first switched its output stream to the out="jabber" socket, and then sent everything up until the <sock:recv> tag was reached down that output stream. When the <sock:recv> tag was reached, xmlParseDocument was called, which resulted in a read from the in="jabber" stream, as instructed. in the input stream, a <stream:stream> tag was received, which resulted in a matching nodewalk of the name <stream:stream>, which resulted in further processing being carried out. This resulted in another call, at the next <sock:recv>, to xmlParseDocument, and we still haven't returned from the current xmlParseDocument!

Not only that, but if the <stream:stream> had not been found, then the first <iq> tag would not have been sent, which would have meant that the second xmlParseDocument would not have had anything to receive (assuming that the server at the other end actually responded).

So, we'd better hope that this all works out. It seems to, although it relies heavily on the stackable and recursive nature of the SAX interface code.