Author image Marvin Humphrey
and 1 contributors


KinoSearch::Docs::DevGuide - Hacking/debugging KinoSearch.


Developer-only documentation. If you just want to build a search engine, you probably don't need to read this.

Fundamental Classes

Most of the Perl classes in KinoSearch rely on KinoSearch::Util::Class and KinoSearch::Util::ToolSet.

At the C level, inheritance is implemented using the devel/ utility. The base class is KinoSearch::Util::Obj. If what's going on is not immediately apparent to you after spelunking a few files in the c_src directory, see boilerplater's documentation.

Object Oriented Design

Access levels

There are three access levels in KinoSearch.

  1. public: documented in "visible" pod.

  2. private: Local to one source file. Private subs are differentiated by prepending them with an _underscore, as per perlstyle guidelines.

  3. distro: anything which doesn't fall into either category above may be used anywhere within the KinoSearch distribution. If this were Java and all of KS was in the same Java package, we'd call this ACL "package-private", but since "package" already has a distinct meaning in Perl, we coin a new term.

No public member variables.

All Perl member variables are treated as private. Multiple classes defined within a single source-code file may use direct access to get at each others member variables. Everybody else has to use accessor methods.

All C-struct member variables allow distro-level access. C vars can have a more permissive scheme because C structs don't suffer from the problem of autovivification of misspelled names. This does tend to encourage tight binding between classes, which is unfortunate but manageable so long as the bad designs are purely internal. In the future, it may make sense to make C-vars private by default, but introduce voluntary conventions for identifying protected and distro-level members.

Parameter Validation

Hash-style argument lists are verified to ensure that no parameter label has been misspelled. Stronger validation is performed ad hoc.

Documentation Conventions

KinoSearch's public API is defined by what you get when you run the suite through a well-behaved pod-to-whatever converter. Developer-only documentation is limited to comments and "invisible" =for/=begin POD blocks.

Integration of XS code

XS code in KinoSearch is stored faux-Inline-style, after an __END__ token, and delimited by __XS__. and __POD__. A heavily customized Build.PL detects these code blocks and writes out hard files at install-time, so the inlining is mostly for convenience while editing: the XS code is often tightly coupled to the Perl code in a given module, and having everything in one place makes it easier to see what's going on and move things back and forth.

The content of KinoSearch.xs consists of the XS block from, followed by all the other XS blocks in an undetermined order. Ultimately, only a single compiled library gets installed along with the Perl modules.

At runtime, the only module which calls XSLoader::load is KinoSearch. Because the KinoSearch MODULE has many PACKAGEs, use KinoSearch; loads all of the XS routines in the entire KinoSearch suite. A pure-Perl version of which did the same thing might look like this...

    package KinoSearch;
    our $VERSION = 1.0;

    package KinoSearch::Index::TermInfo;
    sub get_doc_freq {
        # ...
    package KinoSearch::Store::InStream;
    sub lu_read {
        # ...
    # ...

Divison of labor between Perl, XS, and C

To maximize clarity, when possible XS in KinoSearch is limited to "glue" code, while Perl and C do the heavy lifting. Exceptions occur when XS functions need to manipulate the Perl stack, for instance when returning more than one value.

Relationship to Lucene

Given pure-ASCII source material, KinoSearch 0.05 produced indexes that could be read by Java Lucene 1.4.3 and vice versa. That was the high watermark for Lucene compatibility.

The file-format changed in version 0.06, the API was never that close, and KinoSearch 0.20 represents a further break both in terms of API and file format.

It has turned out to be impossible to provide full Lucene compatibility without making extraordinary sacrifices in both performance and code complexity -- so we have moved on without looking back.

Coding style

When possible, KinoSearch's Perl code follows the recommendations set out in Damian Conway's book, "Perl Best Practices", and its XS/C code follows Apache's guidelines.

Perl code is auto-formatted using a PerlTidy-based helper app called kinotidy, which is basically perltidy with a profile set up to use the PBP settings.

It would be nice if there were a formatter for XS and C code that was as good as PerlTidy. Since there isn't, the code is manually set to look as though it had been, with one important difference: a bias towards maximum parenthetical tightness.

In both Perl and XS/C, code is organized into commented paragraphs a few lines in length, as per PBP recommendations. Strong efforts are made to keep the comment to a single line. Stupefyingly obvious "code narration" comments are used when something more literate doesn't present itself -- the goal is to be able to grok the intended flow of a function by scanning the first line of each "paragraph" -- especially when the paragraph-summarizing comments are set off by syntax highlighting in a programmer's text editor.


Copyright 2005-2007 Marvin Humphrey


See KinoSearch version 0.20.