=head1 NAME

KinoSearch::Docs::DevGuide - Hacking/debugging KinoSearch.


Developer-only documentation.  If you just want to build a search engine, you
probably don't need to read this.

=head1 Fundamental Classes

Most of the Perl classes in KinoSearch rely on L<KinoSearch::Util::Class> and

At the C level, inheritance is implemented using the C<devel/boilerplater.pl>
utility.  The base class is KinoSearch::Util::Obj.  If what's going on is not
immediately apparent to you after spelunking a few files in the c_src
directory, see boilerplater's documentation.

=head1 Object Oriented Design

=head2 Access levels

There are three access levels in KinoSearch.  


=item 1

B<public>: documented in "visible" pod.

=item 2

B<private>: Local to one source file.  Private subs are differentiated by
prepending them with an _underscore, as per L<perlstyle> guidelines.

=item 3

B<distro>: anything which doesn't fall into either category above may be used
anywhere within the KinoSearch distribution.  If this were Java and all of KS
was in the same Java package, we'd call this ACL "package-private", but since
"package" already has a distinct meaning in Perl, we coin a new term.


=head2 No public member variables.

All Perl member variables are treated as private.  Multiple classes defined
within a single source-code file may use direct access to get at each others
member variables.  Everybody else has to use accessor methods.

All C-struct member variables allow distro-level access.  C vars can have a
more permissive scheme because C structs don't suffer from the problem of
autovivification of misspelled names.  This does tend to encourage tight
binding between classes, which is unfortunate but manageable so long as the
bad designs are purely internal.  In the future, it may make sense to make
C-vars private by default, but introduce voluntary conventions for identifying
protected and distro-level members.

=head2 Parameter Validation

Hash-style argument lists are verified to ensure that no parameter label has
been misspelled.  Stronger validation is performed ad hoc.

=head1 Documentation Conventions

KinoSearch's public API is defined by what you get when you run the suite
through a well-behaved pod-to-whatever converter.  Developer-only
documentation is limited to comments and "invisible" =for/=begin POD blocks.

=head1 Integration of XS code

XS code in KinoSearch is stored faux-L<Inline>-style, after an
C<__END__> token, and delimited by C<__XS__>. and C<__POD__>.  A heavily
customized Build.PL detects these code blocks and writes out hard files at
install-time, so the inlining is mostly for convenience while editing: the XS
code is often tightly coupled to the Perl code in a given module, and having
everything in one place makes it easier to see what's going on and move things
back and forth.

The content of KinoSearch.xs consists of the XS block from KinoSearch.pm,
followed by all the other XS blocks in an undetermined order.  Ultimately,
only a single compiled library gets installed along with the Perl modules.

At runtime, the only module which calls XSLoader::load is KinoSearch.  Because
the KinoSearch C<MODULE> has many C<PACKAGE>s, C<use KinoSearch;> loads I<all>
of the XS routines in the entire KinoSearch suite.  A pure-Perl version of
KinoSearch.pm which did the same thing might look like this...

    package KinoSearch;
    our $VERSION = 1.0;

    package KinoSearch::Index::TermInfo;
    sub get_doc_freq {
        # ...
    package KinoSearch::Store::InStream;
    sub lu_read {
        # ...
    # ...

=head2 Divison of labor between Perl, XS, and C

To maximize clarity, when possible XS in KinoSearch is limited to "glue"
code, while Perl and C do the heavy lifting.  Exceptions occur when XS
functions need to manipulate the Perl stack, for instance when returning more
than one value.

=head1 Relationship to Lucene

Given pure-ASCII source material, KinoSearch 0.05 produced indexes that could
be read by Java Lucene 1.4.3 and vice versa.  That was the high watermark for
Lucene compatibility.

The file-format changed in version 0.06, the API was never that close, and
KinoSearch 0.20 represents a further break both in terms of API and file

It has turned out to be impossible to provide full Lucene compatibility
without making extraordinary sacrifices in both performance and code
complexity -- so we have moved on without looking back.

=head1 Coding style

When possible, KinoSearch's Perl code follows the recommendations set out in
Damian Conway's book, "Perl Best Practices", and its XS/C code follows
Apache's guidelines.

Perl code is auto-formatted using a PerlTidy-based helper app called kinotidy,
which is basically perltidy with a profile set up to use the PBP settings.  

It would be nice if there were a formatter for XS and C code that was as good
as PerlTidy.  Since there isn't, the code is manually set to look as though it
had been, with one important difference: a bias towards maximum parenthetical

In both Perl and XS/C, code is organized into commented paragraphs a few lines
in length, as per PBP recommendations.  Strong efforts are made to keep the
comment to a single line.  Stupefyingly obvious "code narration" comments are
used when something more literate doesn't present itself -- the goal is to
be able to grok the intended flow of a function by scanning the first line of
each "paragraph" -- especially when the paragraph-summarizing comments are set
off by syntax highlighting in a programmer's text editor.


Copyright 2005-2007 Marvin Humphrey


See L<KinoSearch> version 0.20.