package FsDB;

use strict;
use warnings;

use Carp;
# use Digest::MurmurHash qw(murmur_hash);
use Scalar::Util qw( looks_like_number );
use Digest::MurmurHash3 qw( murmur32 );
use File::Basename;
use File::Path qw( make_path );
use File::Spec;
use Path::Tiny;
use Storable qw( thaw nfreeze );

our $VERSION = '0.02';

sub DEBUG () { 0 }

sub TIEHASH
{
    my( $class, $p ) = @_;
    DEBUG and warn "TIEHASH";
    my $self = bless { depth=>0 }, $class;

    if( ref $p ) {
        $self->{dir} = delete $p->{dir};
        $self->set_depth( delete $p->{depth} ) if exists $p->{depth};
        die "Unknown parameters: ", keys %$p if keys %$p;
    }
    else {
        $self->{dir} = $p;
    }
    return $self;
}

sub set_depth
{
    my( $self, $d ) = @_;
    croak "Depth must be a number" unless looks_like_number( $d );
    croak "A depth over 4 is not allowed" if $d > 4;
    croak "A depth may not be negative" if $d < 0;
    $self->{depth} = $d;
}

sub __hash
{
    my( $self, $key ) = @_;
    my $hash = murmur32( $key );
    return sprintf "%08x", $hash if $self->{depth} > 0;
    return $hash;
}

sub __freeze
{
    return nfreeze( [ $_[1][0], "$_[1][1]" ] );
}

sub __thaw
{
    return thaw( $_[1] );
}


sub filename
{
    my( $self, $hashed ) = @_;
    my @parts = ( $self->{dir} );
    my $N = $self->{depth};
    while( $N ) {
        push @parts, substr( $hashed, -$N*2, 2 );
        $N--;
    }
    push @parts, $hashed;
    return File::Spec->catfile( @parts );
}

sub our_file
{
    return unless defined $_[1];
    $_[1] =~ /^[a-f\d]+$/;
}


sub FETCH
{
    DEBUG and warn "FETCH";
    my( $self, $key ) = @_;
    my $file = $self->filename( $self->__hash( $key ) );
    return unless $self->is_file( $file );
    my $data = path( $file )->slurp;
    my $ret = eval { $self->__thaw( $data )->[1] };
    warn $@ if $@;
    return $ret;
}

sub EXISTS
{
    DEBUG and warn "EXISTS";
    my( $self, $key ) = @_;
    my $file = $self->filename( $self->__hash( $key ) );
    return 1 if $self->is_file( $file );
    return;
}

sub STORE
{
    DEBUG and warn "STORE";
    my( $self, $key, $data ) = @_;
    my $file = $self->filename( $self->__hash( $key ) );
    my $dir = $self->{dir};
    $dir = dirname( $file ) if $self->{depth} > 0;
    make_path( $dir ) unless -d $dir;
    my $ret = eval {
        # NB: ->spew_raw creates a temp file and then moves to dest file, so it's atomic
        path( $file )->spew_raw( $self->__freeze( [ $key, $data ] ) );
        $data;
    };
    warn $@ if $@;
    return $ret;
}

sub DELETE
{
    my( $self, $key ) = @_;
    my $file = $self->filename( $self->__hash( $key ) );
    return unless $self->is_file( $file );
    unlink $file or carp "Unable to delete $file: $!";
}

sub FIRSTKEY
{
    DEBUG and warn "FIRSTKEY";
    my( $self ) = @_;
    return unless -d $self->{dir};
    $self->{iter} = path( $self->{dir} )->iterator;
    return $self->read_key;
}

sub NEXTKEY
{
    DEBUG and warn "NEXTKEY";
    my( $self ) = @_;
    return $self->read_key;
}

sub is_file
{
    return 1 if -r $_[1] and -f _;
    return;
}


sub read_key
{
    my( $self ) = @_;
    return unless $self->{iter};
    my $file;
    do {
        $file = $self->{iter}->();
        return unless defined $file;
        if( $self->is_file( $file ) ) {
            if( $self->our_file( basename $file ) ) {
                my $ret = eval { $self->__thaw( path( $file )->slurp )->[0] };
                warn $@ if $@;
                return $ret;
            }
        }
    } while( defined $file );
    return;
}


1;
__END__

=head1 NAME

FsDB - Use the filesystem as a DB

=head1 SYNOPSIS

    use FsDB;

    my %hash;
    tie %hash, 'FsDB', "mydb";
    $hash{ $key } = $value;

    # If you are creating multiple thousands of entries:
    tie %hash, 'FsDB', { dir=>"mydb", depth=>1 };


=head1 DESCRIPTION

FsDB uses the filesystem as a DBM or more correctly, a persistent key-value
store.  FsDB will create a file for each value stored in the DBM.  The name
of the file is a hash of the key.

FsDB uses a directory per database instead of a file.  Each value is stored
in a unique file.  The unique filename is created by hashing the key with
L<Digest::MurmurHash3/murmur32>.  The value is stringified and stored in the
file.  The opposite operations are done to retrieve a value: the unique
filename is created with the hash, the file is read and the value is
returned.

FsDB is not intended to be portable nor distributed.

FsDB's defaults (depth=0) are intended for situations where you have a few
keys, less then a few thousand.  An example application would be persistent
session information in a web application.  Each session would be its own
database (aka directory) and rarely will you need to write thousands of
distinct keys.

The limit is what your filesystem can handle easily.  For example, ext4 is
fine up to a few thousand entries.  Other filesystems will have different
performance limits.

FsDB is surprisingly fast:

    Benchmark: timing 5000 iterations of BerkeleyDB, DB_File, FsDB,
                    FsDB;depth=1, FsDB;depth=1;primed, GDBM_File, QDBM_File, SQLite_File...
             BerkeleyDB: 62 wallclock secs ( 0.93 usr +  1.23 sys =  2.16 CPU) @ 2314.81/s (n=5000)
                DB_File: 61 wallclock secs ( 0.77 usr +  1.17 sys =  1.94 CPU) @ 2577.32/s (n=5000)
                   FsDB:  3 wallclock secs ( 1.47 usr +  1.11 sys =  2.58 CPU) @ 1937.98/s (n=5000)
           FsDB;depth=1:  8 wallclock secs ( 1.80 usr +  1.07 sys =  2.87 CPU) @ 1742.16/s (n=5000)
    FsDB;depth=1;primed:  3 wallclock secs ( 1.74 usr +  1.06 sys =  2.80 CPU) @ 1785.71/s (n=5000)
              GDBM_File: 44 wallclock secs ( 0.42 usr +  1.76 sys =  2.18 CPU) @ 2293.58/s (n=5000)
              QDBM_File:  1 wallclock secs ( 0.11 usr +  0.16 sys =  0.27 CPU) @ 18518.52/s (n=5000)
                (warning: too few iterations for a reliable count)
           SQLite_File: 125 wallclock secs ( 5.76 usr +  2.76 sys =  8.52 CPU) @ 586.85/s (n=5000)

The above Benchmarks were run on a VM with a ext4 filesystem, qcow2 disk
image.  The host uses ext4 and NVMe.  None of which really matters as the
operations are small enough to stay in memory buffers/cache.

=head2 Small rant

The use of DB or DBM in this module and others like it (L<DB_File>
L<Berkeley_DB>, L<DBM_File>) is misleading.  They are in fact persistent
key-value stores.


=head1 METHODS

=head2 TIEHASH

    my %hash
    tie %hash, 'FsDB', \%params;    
    tie %hash, 'FsDB', $dir, [IGNORED]; # compatible with DB_File et al

The first form is prefered.  The 2nd form makes FsDB a drop-in replacement
for DB_File.

=over 4

=item dir

Directory where the database will be stored.  This directory is created if
it doesn't exist.

=item depth

What depth of subdirectories should be crated.  C<depth=0> means that all
the files are created in the top directory.  C<depth=1> means that the top
directory will contain one level of subdirectories that will themselves
contain the files.  

The names of subdirectories are created by using 2 characters from the end
of the hashed key.  For example:

    Hash is 02f45789.
    depth=0, $dir/02f456789
    depth=1, $dir/89/02f456789
    depth=2, $dir/89/67/02f456789

=back

=head1 OVERLOADING

The following 3 methods are useful if you want to create a subclass and
modify the behaviour of FsDB.

=head2 __hash

    sub __hash 
    {
        my( $self, $key ) = @_;
        # ...
    }

Allows you to change the hashing algorythm.  Please return a string that is
at least twice as long as L</depth>.

=head2 set_depth
    
    sub set_depth
    {
        my( $self, $depth ) = @_;
        # ...
    }

If you want change the hashing algorythm to one that returns more then 32
bits, you might want more then a depth of 4.

=head2 __freeze

    sub __freeze
    {
        my( $self, $data ) = @_;
        # ...
    }

Allows you to change serialization method.  Note that C<$data> will be an
arrayref : first element is the key, second element is the value.

=head2 __thaw

    sub __thaw
    {
        my( $self, $data ) = @_;
        # ...
    }

Allows you to change serialization method.  You should return an arrayref,
the first element is the key, the second is the value.


=head1 SEE ALSO

L<perltie>, L<BerkeleyDB>, L<DB_File>, L<GDBM_File>, L<QDBM_File>

=head1 AUTHOR

Philip Gwyn, E<lt>gwyn -AT- cpan.orgE<gt>

=head1 COPYRIGHT AND LICENSE

Copyright (C) 2023 by Philip Gwyn

This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself, either Perl version 5.26.3 or,
at your option, any later version of Perl 5 you may have available.


=cut
