Database

In database terminology, a page is like a record or a row, a field is like a column, and a value is still a value. So the structure of this wiki can be viewed as a database. There is an important difference in that most databases require a fixed and unchanging (or rarely changing) structure for their data. Swank has no fixed structure, and no limit to the number of extra fields which can be attached to a page even beyond the ones defined in a template. A semi-structured database would be a good match for a structured wiki, but such databases are still research topics.

We actually store the pages (by default) in ordinary xml files. They are indexed with a full text search engine called Lucene (or one of its server implemtations, such as ElasticSearch). Lucene has a big advantage over normal seach engines in that it understands fields. So Lucene provides all our indexing and search functions which would normally be provided by a database. (We do not have joins yet. bummer.) This gives us very powerful ful-text searching, which is a weakness in most databases, but certain types of database searches are more difficult (such as "field is null").

Structured-ness

What do we want from structure for organizing our data?

Track specific parts of data by name. (fields)
Other meta-data, such as data types.
Visual presentation of the data.
Input parsing and validating.
Group similar pages together.
Query pages by field.
Relate pages to each other.
Sort pages when an order is possible.
Make reports.
Reorganize on the fly (joins, group-by, sums).
Update multiple records in similar ways.
Consistency and Integrity checks.
Maybe even ACID robustness?

It should be noted that we do not want a fixed table structure, as in relational databases. We want to be able to change structure at any time, add ad-hoc fields, and even impose more than one structure on a page. Structure does not imply Schema in the context of a living document.

How does the current approach fulfill these requirements?

A structure is (usually) defined by a template page. This template combines several of the above functions. First, it defines the visual presentation of the data. Within this presentation, the data fields are defined by calls to (inclusion of) field pages. Each field page defines a type (or subtype), and provides methods for displaying the data in view mode, edit mode, and for parsing input data. Thus the template page provides the general presentation of related data on a page, and the field pages provide specific modal presentation for each data type.

Each page is considered to have the page type of the template used to create it, although this can be changed later. This is helpful when querying for pages with the same page type, and the page template can be asked what fields are useful on reports.

The fulltext index indexes all fields, so we can query on partial or complete values, do boolean and range searches, and sort results. Queries not possible: not exists, math expressions.

Named fields -- provided by calls to field pages on template page.
Data types -- provided by each field page.
Visual presentation -- provided by template page.
Input parsing and validating -- provided by field pages. Full page validation by template page method.
Group similar pages -- by page type (template page used), or other fields.
Query pages by field -- Lucene full-text index.
Relate pages to each other -- some fields are references to other pages.
Sort pages -- e.g. internal iso date format
Make reports -- query with default (template) fields or requested fields

not yet implemented

joins, group-by, sums
Update multiple records in similar ways. (may be added to reports)
Consistency and Integrity checks. (planned with on-write triggers)
ACID robustness

Document last modified: 05 Apr 2023 08:32pm