Home

Lighthouses

Scottish Lighthouses

Links

Conversion Process

To do

This is a demonstration of schema driven HTML parsing which has some way to go:

Research

One interesting question is whether this process would be assisted by an RDF triple store. In this approach, the first parse of the index would generate an initial RDF set which would be uploaded. The second script would use XQuery and SPARQL to iterate over the NLB pages, generate the RDF - would this be written back progressively? The additional daat coul dbe authored as RDF and uploaded. Here the triple store implicitly merges the data rather than this being done in XML documents. The visualisation scripts would use SPARQL queries to get relevant data. One complication is the presence of multiple values e.g for images - need to get that code to group RDF rows by a key to handle this.

Links