5 # This script will gather statistical information from a database
6 # containing headers and other information from a INN feed.
8 # It is part of the NewsStats package.
10 # Copyright (c) 2010 Thomas Hochstein <thh@inter.net>
12 # It can be redistributed and/or modified under the same terms under
13 # which Perl itself is published.
16 our $VERSION = "0.01";
18 push(@INC, dirname($0));
22 use NewsStats qw(:DEFAULT :TimePeriods ListNewsgroups);
26 ################################# Definitions ##################################
28 # define types of information that can be gathered
29 # all / groups (/ clients / hosts)
31 @LegalTypes{('all','groups')} = ();
33 ################################# Main program #################################
35 ### read commandline options
36 my %Options = &ReadOptions('dom:p:t:n:r:g:c:s:');
38 ### read configuration
39 my %Conf = %{ReadConfig('newsstats.conf')};
41 ### override configuration via commandline options
43 $ConfOverride{'DBTableRaw'} = $Options{'r'} if $Options{'r'};
44 $ConfOverride{'DBTableGrps'} = $Options{'g'} if $Options{'g'};
45 $ConfOverride{'DBTableClnts'} = $Options{'c'} if $Options{'c'};
46 $ConfOverride{'DBTableHosts'} = $Options{'s'} if $Options{'s'};
47 $ConfOverride{'TLH'} = $Options{'n'} if $Options{'n'};
48 &OverrideConfig(\%Conf,\%ConfOverride);
50 ### get type of information to gather, defaulting to 'all'
51 $Options{'t'} = 'all' if !$Options{'t'};
52 die "$MySelf: E: Unknown type '-t $Options{'t'}'!\n" if !exists($LegalTypes{$Options{'t'}});
54 ### get time period (-m or -p)
55 my ($StartMonth,$EndMonth) = &GetTimePeriod($Options{'m'},$Options{'p'});
58 my $DBHandle = InitDB(\%Conf,1);
60 ### get data for each month
61 warn "$MySelf: W: Output only mode. Database is not updated.\n" if $Options{'o'};
62 foreach my $Month (&ListMonth($StartMonth,$EndMonth)) {
64 print "---------- $Month ----------\n" if $Options{'d'};
66 if ($Options{'t'} eq 'all' or $Options{'t'} eq 'groups') {
67 ### ----------------------------------------------
68 ### get groups data (number of postings per group)
69 # get groups data from raw table for given month
70 my $DBQuery = $DBHandle->prepare(sprintf("SELECT newsgroups FROM %s.%s WHERE day LIKE ? AND NOT disregard",$Conf{'DBDatabase'},$Conf{'DBTableRaw'}));
71 $DBQuery->execute($Month.'-%') or die sprintf("$MySelf: E: Can't get groups data for %s from %s.%s: $DBI::errstr\n",$Month,$Conf{'DBDatabase'},$Conf{'DBTableRaw'});
73 # count postings per group
76 while (($_) = $DBQuery->fetchrow_array) {
77 # get list oft newsgroups and hierarchies from Newsgroups:
78 my %Newsgroups = ListNewsgroups($_);
79 # count each newsgroup and hierarchy once
80 foreach (sort keys %Newsgroups) {
81 # don't count newsgroup/hierarchy in wrong TLH
82 next if(defined($Conf{'TLH'}) and !/^$Conf{'TLH'}/);
87 print "----- GroupStats -----\n" if $Options{'d'};
88 foreach my $Newsgroup (sort keys %Postings) {
89 print "$Newsgroup => $Postings{$Newsgroup}\n" if $Options{'d'};
92 $DBQuery = $DBHandle->prepare(sprintf("REPLACE INTO %s.%s (month,newsgroup,postings) VALUES (?, ?, ?)",$Conf{'DBDatabase'},$Conf{'DBTableGrps'}));
93 $DBQuery->execute($Month, $Newsgroup, $Postings{$Newsgroup}) or die sprintf("$MySelf: E: Can't write groups data for %s/%s to %s.%s: $DBI::errstr\n",$Month,$Newsgroup,$Conf{'DBDatabase'},$Conf{'DBTableGrps'});
98 # other types of information go here - later on
103 $DBHandle->disconnect;
107 ################################ Documentation #################################
111 gatherstats - process statistical data from a raw source
115 B<gatherstats> [B<-Vhdo>] [B<-m> I<YYYY-MM>] [B<-p> I<YYYY-MM:YYYY-MM>] [B<-t> I<type>] [B<-n> I<TLH>] [B<-r> I<database table>] [B<-g> I<database table>] [B<-c> I<database table>] [B<-s> I<database table>]
119 See doc/README: Perl 5.8.x itself and the following modules from CPAN:
135 This script will extract and process statistical information from a
136 database table which is fed from F<feedlog.pl> for a given time period
137 and write its results to (an)other database table(s). Entries marked
138 with I<'disregard'> in the database will be ignored; currently, you have
139 to set this flag yourself, using your database management tools. You
140 can exclude erroneous entries that way (e.g. automatic reposts (think
141 of cancels flood and resurrectors); spam; ...).
143 The time period to act on defaults to last month; you can assign
144 another month via the B<-m> switch or a time period via the B<-p>
145 switch; the latter takes preference.
147 By default B<gatherstats> will process all types of information; you
148 can change that using the B<-t> switch and assigning the type of
149 information to process. Currently only processing of the number of
150 postings per group per month is implemented anyway, so that doesn't
153 Possible information types include:
157 =item B<groups> (postings per group per month)
159 B<gatherstats> will examine Newsgroups: headers. Crosspostings will be
160 counted for each single group they appear in. Groups not in I<TLH>
163 B<gatherstats> will also add up the number of postings for each
164 hierarchy level, but only count each posting once. A posting to
165 de.alt.test will be counted for de.alt.test, de.alt.ALL and de.ALL,
166 respectively. A crossposting to de.alt.test and de.alt.admin, on the
167 other hand, will be counted for de.alt.test and de.alt.admin each, but
168 only once for de.alt.ALL and de.ALL.
170 Data is written to I<DBTableGrps> (see doc/INSTALL).
176 F<gatherstats.pl> will read its configuration from F<newsstats.conf>
177 which should be present in the same directory via Config::Auto.
179 See doc/INSTALL for an overview of possible configuration options.
181 You can override configuration options via the B<-n>, B<-r>, B<-g>,
182 B<-c> and B<-s> switches, respectively.
188 =item B<-V> (version)
190 Print out version and copyright information on B<yapfaq> and exit.
194 Print this man page and exit.
198 Output debugging information to STDOUT while processing (number of
201 =item B<-o> (output only)
203 Do not write results to database. You should use B<-d> in conjunction
204 with B<-o> ... everything else seems a bit pointless.
206 =item B<-m> I<YYYY-MM> (month)
208 Set processing period to a month in YYYY-MM format. Ignored if B<-p>
211 =item B<-p> I<YYYY-MM:YYYY-MM> (period)
213 Set processing period to a time period between two month, each in
214 YYYY-MM format, separated by a colon. Overrides B<-m>.
216 =item B<-t> I<type> (type)
218 Set processing type to one of I<all> and I<groups>. Defaults to all
219 (and is currently rather pointless as only I<groups> has been
222 =item B<-n> I<TLH> (newsgroup hierarchy)
224 Override I<TLH> from F<newsstats.conf>.
226 =item B<-r> I<table> (raw data table)
228 Override I<DBTableRaw> from F<newsstats.conf>.
230 =item B<-g> I<table> (postings per group table)
232 Override I<DBTableGrps> from F<newsstats.conf>.
234 =item B<-c> I<table> (client data table)
236 Override I<DBTableClnts> from F<newsstats.conf>.
238 =item B<-s> I<table> (server/host data table)
240 Override I<DBTableHosts> from F<newsstats.conf>.
250 Process all types of information for lasth month:
254 Do a dry run, showing results of processing:
258 Process all types of information for January of 2010:
260 gatherstats -m 2010-01
262 Process only number of postings for the year of 2010:
264 gatherstats -p 2010-01:2010-12 -t groups
270 =item F<gatherstats.pl>
274 =item F<NewsStats.pm>
276 Library functions for the NewsStats package.
278 =item F<newsstats.conf>
280 Runtime configuration file for B<yapfaq>.
286 Please report any bugs or feature requests to the author or use the
287 bug tracker at L<http://bugs.th-h.de/>!
303 This script is part of the B<NewsStats> package.
307 Thomas Hochstein <thh@inter.net>
309 =head1 COPYRIGHT AND LICENSE
311 Copyright (c) 2010 Thomas Hochstein <thh@inter.net>
313 This program is free software; you may redistribute it and/or modify it
314 under the same terms as Perl itself.