NAME

Statistics::Covid::Utils - assorted, convenient, stand-alone, public and semi-private subroutines

VERSION

Version 0.23

DESCRIPTION

This package contains assorted convenience subroutines. Most of which are private or semi-private but some are required by module users.

SYNOPSIS

        use Statistics::Covid;
        use Statistics::Covid::Datum;
        use Statistics::Covid::Utils;

        # read data from db
        $covid = Statistics::Covid->new({   
                'config-file' => 't/config-for-t.json',
                'debug' => 2,
        }) or die "Statistics::Covid->new() failed";
        # retrieve data from DB for selected locations (in the UK)
        # data will come out as an array of Datum objects sorted wrt time
        # (the 'datetimeUnixEpoch' field)
        my $objs = $covid->select_datums_from_db_for_specific_location_time_ascending(
                #{'like' => 'Ha%'}, # the location (wildcard)
                ['Halton', 'Havering'],
                #{'like' => 'Halton'}, # the location (wildcard)
                #{'like' => 'Havering'}, # the location (wildcard)
                'UK', # the belongsto (could have been wildcarded)
        );
        # create a dataframe
        my $df = Statistics::Covid::Utils::datums2dataframe({
                'datum-objs' => $objs,
                'groupby' => ['name'],
                'content' => ['confirmed', 'datetimeUnixEpoch'],
        });
        # convert all 'datetimeUnixEpoch' data to hours, the oldest will be hour 0
        for(sort keys %$df){
                Statistics::Covid::Utils::discretise_increasing_sequence_of_seconds(
                        $df->{$_}->{'datetimeUnixEpoch'}, # in-place modification
                        3600 # seconds->hours
                )
        }

        # This is what the dataframe looks like:
        #  {
        #  Halton   => {
        #               confirmed => [0, 0, 3, 4, 4, 5, 7, 7, 7, 8, 8, 8],
        #               datetimeUnixEpoch => [
        #                 1584262800,
        #                 1584349200,
        #                 1584435600,
        #                 1584522000,
        #                 1584637200,
        #                 1584694800,
        #                 1584781200,
        #                 1584867600,
        #                 1584954000,
        #                 1585040400,
        #                 1585126800,
        #                 1585213200,
        #               ],
        #             },
        #  Havering => {
        #               confirmed => [5, 5, 7, 7, 14, 19, 30, 35, 39, 44, 47, 70],
        #               datetimeUnixEpoch => [
        #                 1584262800,
        #                 1584349200,
        #                 1584435600,
        #                 1584522000,
        #                 1584637200,
        #                 1584694800,
        #                 1584781200,
        #                 1584867600,
        #                 1584954000,
        #                 1585040400,
        #                 1585126800,
        #                 1585213200,
        #               ],
        #             },
        #  }

        # and after converting the datetimeUnixEpoch values to hours and setting the oldest to t=0
        #  {
        #  Halton   => {
        #                confirmed => [0, 0, 3, 4, 4, 5, 7, 7, 7, 8, 8, 8],
        #                datetimeUnixEpoch => [0, 24, 48, 72, 104, 120, 144, 168, 192, 216, 240, 264],
        #              },
        #  Havering => {
        #                confirmed => [5, 5, 7, 7, 14, 19, 30, 35, 39, 44, 47, 70],
        #                datetimeUnixEpoch => [0, 24, 48, 72, 104, 120, 144, 168, 192, 216, 240, 264],
        #              },
        #  }

datums2dataframe

It will take an array of Datum objects and a set of one or more (table) column names (attributes of each object), e.g. 'confirmed' and will create a hash, where keys are column names and values are arrays of the values for that column name for each object in the order they appear in the input array. A datum object has column names and each one has values (e.g. 'confirmed', 'name' etc.) for clarity let's say that our datum objects have column names sex,age,A here they are (unquoted): (m,30,1), (m,31,2), (f,40,3), (f,41,4) a DF (dataframe) with no params will be created and returned as:

    { '*' => {sex=>[m,m,f,f], age=>[30,30,40,40], A=>[1,2,3,4]} }

which is equivalent to @groupby=() and @content_columns=(sex,age,A) (i.e. all columns) a DF groupped by column 'sex' will be

    {
      'm' => {sex=>[m,m], age=>[30,30], A=>[1,2]]},
      'f' => {sex=>[f,f], age=>[40,40], A=>[3,4]]},
    }

and a DF groupped by 'sex' and 'age':

    {
      'm|30' => {sex=>[m,m], age=>[30,30], A=>[1,2]]},
      'f|40' => {sex=>[f,f], age=>[40,40], A=>[3,4]]},
    }

notice that m|40 does not exist as it is not an existing combination in the data notice also that by specifying @content_columns, you make your DF leaner. e.g. why have sex in the hash when is also a key?

The reason why use a dataframe instead of an array of Statistics::Covid::Datum objects is economy. One Datum object represents data in a single time point. Plotting or fitting data requies a lot of data objects. whose data from specific columns/fields/attributes must be collected together in an array, possibly transformed, and plotted or fitted. If you want to plot and fit the same data you have to repeat this process twice. Whereas by inserting this data into a dataframe you can pass it around. The dataframe is a more high-level collection of data.

A good question is why a new dataframe structure when there is already existing Data::Frame. It's because the existing is based on PDL and I considered it too heavy a dependency when the plotter (Statistics::Covid::Analysis::Plot::Simple) or the model fitter (Statistics::Covid::Analysis::Model::Simple) do not use (yet) PDL.

The reason that this dataframe has not been turned into a Class is because I do not want to do one before I exhaust my search on finding an existing solution.

See Statistics::Covid::Analysis::Plot::Simple how to plot dataframes and Statistics::Covid::Analysis::Model::Simple how to fit models on data. They both take dataframes as input.

EXPORT

None by default. But Statistics::Covid::Utils::datums2dataframe() is the sub to call with full qualified name.

AUTHOR

Andreas Hadjiprocopis, <bliako at cpan.org>, <andreashad2 at gmail.com>

BUGS

This module has been put together very quickly and under pressure. There are must exist quite a few bugs.

Please report any bugs or feature requests to bug-statistics-Covid at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Statistics-Covid. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Statistics::Covid::Utils

You can also look for information at:

DEDICATIONS

Almaz

ACKNOWLEDGEMENTS

Perlmonks for supporting the world with answers and programming enlightment
DBIx::Class
the data providers:
John Hopkins University,
UK government,
https://www.bbc.co.uk (for disseminating official results)

LICENSE AND COPYRIGHT

Copyright 2020 Andreas Hadjiprocopis.

This program is free software; you can redistribute it and/or modify it under the terms of the the Artistic License (2.0). You may obtain a copy of the full license at:

http://www.perlfoundation.org/artistic_license_2_0

Any use, modification, and distribution of the Standard or Modified Versions is governed by this Artistic License. By using, modifying or distributing the Package, you accept this license. Do not use, modify, or distribute the Package, if you do not accept this license.

If your Modified Version has been derived from a Modified Version made by someone other than you, you are nevertheless required to ensure that your Modified Version complies with the requirements of this license.

This license does not grant you the right to use any trademark, service mark, tradename, or logo of the Copyright Holder.

This license includes the non-exclusive, worldwide, free-of-charge patent license to make, have made, use, offer to sell, sell, import and otherwise transfer the Package with respect to any patent claims licensable by the Copyright Holder that are necessarily infringed by the Package. If you institute patent litigation (including a cross-claim or counterclaim) against any party alleging that the Package constitutes direct or contributory patent infringement, then this Artistic License to you shall terminate on the date that such litigation is filed.

Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.