Index of /~tskirvin/software/news-archive

[ICO]NameLast modifiedSizeDescription

[DIR]Parent Directory  -
[   ]Changes11-Mar-2007 11:20 1.0K
[   ]LICENSE29-Apr-2004 11:51 6.0K
[   ]News-Archive-0.10.tar.gz11-Mar-2007 11:20 29K
[   ]News-Archive-0.11.tar.gz11-Mar-2007 11:20 32K
[   ]News-Archive-0.12.tar.gz11-Mar-2007 11:20 32K
[   ]News-Archive-0.13.tar.gz11-Mar-2007 11:20 33K
[   ]News-Archive-0.14.5.tar.gz11-Mar-2007 11:20 40K
[   ]News-Archive-0.14.tar.gz11-Mar-2007 11:20 33K

NAME
    News::Archive - archive news articles for later use

SYNOPSIS
      use News::Archive;
      my $archive = new News::Archive 
                    ( 'basedir' => '/home/tskirvin/kiboze' );
 
      # Get a news article
      my $article = News::Article->new(\*STDIN);
      my $msgid = article->header('message-id');

      die "Already processed '$msgid'\n" 
                    if ($archive->article( $messageid ));

      # Get the list of groups we're supposed to be saving the article into
      my @groups = split('\s*,\s*', $article->header('newsgroups') );
      map { s/\s+//g } @groups;

      # Make sure we're subscribed to these groups
      foreach (@groups) { $archive->subscribe($_) }

      # Actually save the article.
      my $ret = $archive->save_article( 
            [ @{$article->rawheaders}, '', @{$article->body} ], @groups );
      $ret ? print "Accepted article $messageid\n"
           : print "Couldn't save article $messageid\n";

    See below for more options.

DESCRIPTION
    News::Archive is a package for storing news articles in an accessible
    form. Articles are stored one-per-file, and are accessible by either
    message-ID or overview information. The files are then accessible with a
    Net::NNTP compatible interface, for easy access by other packages.

    News::Archive keeps several files to keep track of its archives:

    active file
        Keeps track of all newsgroups we are "subscribed" to and all of the
        information that changes regularly - the number of articles we have
        archived, the current first and last article numbers, etc.

        Watched over with News::Active.

    history database
        A simple database keeping track of articles by Message-ID. Makes
        access by ID easy, and ensures that we don't save the same article
        twice. The database chosen to maintain these is user-determined.

    newsgroup file
        Keeps track of more static information about the newsgroups we are
        subscribed to - descriptions, creation dates, etc.

        Watched over with News::GroupInfo.

    archive directory
        Directory structure of all articles, with each article saved as a
        single textfile within a directory structure laid out at one section
        of the group name per directory, such as "rec/games/mecha".
        Crossposts are hardlinked to other directory structures.

        Articles are actually divided into sub-directories containing up to
        500 articles, to avoid Unix directory size performance limitations.
        Individual files are thus stored in a file such as
        "rec/games/mecha/1.500/1".

        Each newsgroup also contains overview information, watched over with
        News::Overview. This overview file goes in the top of the structure,
        such as "rec/games/mecha/.overview".

    You may note that these files are very similar to how INN does its work.
    This is intentional - this package is meant to act in many ways like a
    lighter-weight INN.

USAGE
  Global Variables
    The following variables are set within News::Archive, and are global
    throughout all invocations.

    $News::Active::DEBUG
        Default value for "debug()" in new objects.

    $News::Active::HOSTNAME
        Default value for "hostname()" in new objects. Obtained using
        "Sys::Hostname::hostname()".

    $News::Active::HASH
        The number of articles to keep in each directory. Default is 500;
        change this at your own peril, since things may get screwed up later
        if you change it after archiving any articles!

  Basic Functions
    These functions create and deal with the object itself.

    new ( HASHREF )
        Creates the News::Archive object. "HASHREF" contains initialization
        information for this object; currently supported options:

          basedir       Base directory for this object to work with.  
                        Required; we will fail without this.
          archives      Location of the post archives.  Defaults to 
                        $basedir/archives
          historyfile   Location of the history database.  Defaults to
                        $basedir/historyfile
          activefile    Location of the active file.  Defaults to
                        $basedir/active
          overfilename  File name for the overview database files in each
                        newsgroup hierarchy.  Defaults to ".overview".
          db_type       The type of perl database we will use to store 
                        files that need that level of service.  Defaults
                        to 'DB_File' 
          groupinfofile Location of the groupinfo file.  Defaults to
                        $basedir/newsgroups.
          hostname      String to use when a local hostname is required.  
                        Defaults to $News::Archive::HOSTNAME.
          debug         Should we print debugging information?  Defaults to
                        $News::Archive::DEBUG.
          readonly      Should we open this read-only?  

        The History, Active, and GroupInfo objects are loaded, and the
        archive is locked, with News::Lock.

        Returns the blessed object on success, or undef on failure.

    reload ( )
        Closes and re-opens all of the objects in the archive - History,
        Active file, and GroupInfo, at present - and locks the archive with
        a read lock. Necessary for News::Lock compatibility.

    activefile ()
        Returns the News::Active object based on "activefile", set in new().
        If this object has not already been opened and created, creates it;
        otherwise, just returns the existing object. Passes on the
        'readonly' flag.

    activeclose ()
        Writes out and closes the News::Active object.

    groupinfo ()
        Returns the News::GroupInfo object based on "groupinfofile", set in
        new(). If this object has not already been opened and created,
        creates it; otherwise, just returns the existing object. Passes on
        the 'readonly' flag.

    groupclose ()
        Writes out and closes the News::GroupInfo object.

    history ()
        Returns a tied hashref based on "historyfile", set in new(). If this
        object has not already been opened and created, creates it;
        otherwise, just returns the existing object.

    debug ()
        Returns true if we want to print debugging information, false
        otherwise. Used a lot internally, may also be used externally.

    activeentry ( GROUP )
        Returns the News::Active::Entry information for the given "GROUP".

    groupentry ( GROUP )
        Returns the News::GroupInfo::Entry information for the given
        "GROUP".

    close ()
        Close all open files.

  Error Functions
    These functions deal with the global error variable, which is currently
    not being used very effectively.

    error ( [ERROR] )
        Returns the text (a scalar) describing the last error message. If
        "ERROR" is offered, then it sets the error message to this first.

    clear_error ()
        Clears the error message.

  Net::NNTP Equivalents
    The following functions are the equivalent of the Net::NNTP commands;
    they are provided for compatibility with News::Web and other news
    functions. More information on their use is available in those manual
    pages.

    article ( [ MSGID|MSGNUM ], [FH] )
        Retrives the article indicated by "MSGID" or "MSGNUM" (Net::NNTP) as
        the headers, a blank line, and then the body of the article. Either
        prints it to "FH" (if offered) or returns an array reference
        containing the text.

        Returns undef if the article is not found.

    head ( [ MSGID|MSGNUM ], [FH] )
        As with "article()", but only returns the header of the article.

    body ( [ MSGID|MSGNUM ], [FH] )
        As with "article()", but only returns the body of the article.

    nntpstat ( [ MSGID|MSGNUM ] )
        As with "article()", but only returns the article's message-id.
        Returns undef if not set or the article didn't exist.

    group ( [GROUP] )
        Sets the current group pointer; necessary if we want to use
        "article()" or its ilk by message number and not message-ID. In
        array context, returns the active information of the group as a list
        (number of articles, first article number, last article number,
        group name). In scalar context, just returns the group name.

    ihave ( MSGID, MESSAGE )
        Writes an article to the archive with Message-ID "MSGID". "MESSAGE"
        is the actual message. Invokes "save_article()".

        (Note that this is preferred to "post()", at least here, because it
        lets us tell much earlier if we don't want the article.)

    last ()
        Unimplemented.

    date ()
        Returns the local time (in seconds since the epoch).

    postok ()
        Returns 0; we don't want anything to get the idea that it can post.

    authinfo ()
        Unimplemented.

    list ()
        Same as "active('*')", listing all active groups.

    newgroups ()
        Unimplemented.

    newnews ()
        Unimplemented.

    newnews ()
        Unimplemented.

    post ( MESSAGE )
        Writes an article to the archive. "MESSAGE" is the actual message.
        Invokes "save_article()".

    slave ()
        Unimplemented.

    quit ()
        Close the current connection; clear the current group, and reset the
        pointer. Returns 1.

    newsgroups ( [PATTERN] )
        Returns a hashref where the keys are the newsgroups that match the
        pattern "PATTERN" (uses "active()"), and the values are descriptiion
        text for the newsgroup.

    distributions
        Not implemented.

    subscriptions ()
        Returns a listref to all groups that we are subscribed to. This is
        not ideal; we may only want the ones that we have descriptions for,
        or a specific flag set in News::GroupInfo, or something. It works
        for now, though.

    overview_fmt ()
        Returns the overview format information from News::Overview, since
        that's what we're currently using.

    active_times ( [PATTERN] )
        Returns a hashref where the keys are the group names, and the values
        are the results from "News::GroupInfo::Entry-"arrayref()>.

    active ( [PATTERN] )
        Returns a hashref where the keys are the group names, and the values
        are the results from "News::Active::Entry-"arrayref()>.

    xgtitle ( [PATTERN] )
        Same as "newsgroups()"

    xhdr ( HEADER, SPEC [, PATTERN] )
    xover ( MATCH, HDR )
        Gets information from the stored overview database. See
        News::Overview for more information on how this works.

    xpath ( MID )
        Returns the full path name on the server of the location of the
        given article.

    xpat ( HEADER, SPEC [, PATTERN] )
        Same as "xhdr()".

    xrover ( SPEC )
        Same as $self->xhdr('References', SPEC)

    listgroup
        Unimplemented.

    reader ()
        Unimplemented.

  Archive Functions
    The following functions actually deal with the archive itself.

    save_article ( LINES [, GROUPS] )
        Saves an article into the archive. "LINEREF" is an arrayref that is
        passed to News::Article; "GROUPS" is an array of groups that we want
        to save the article to, if not those listed in the Newsgroups:
        header.

        The article is modified by adding "hostname()" onto the Path: header
        and creating a new Xref: header to match where we will save the
        article. The file is primarily linked to a single location, and
        hardlinks are made to the other locations. Overview information is
        generated for each group, history information is saved to ensure
        that we don't save the same article twice, and directories are
        created as needed.

        Note that there are currently some race conditions possible with
        this function, which should be partially solved be adding file and
        directory locking.

    article_is_in_archive ( MSGID )
        Returns 1 if the article is in the archive, 0 otherwise.

    subscribe ( GROUP )
        Subscribe to the given "GROUP", by adding information about the
        group to the active and groupinfo files and starting the directory
        tree.

    unsubscribe ( GROUP )
        Unsubscribe from "GROUP", by removing information about it from the
        active and groupinfo files.

    subscribed ( GROUP )
        Returns 1 if we are subscribed to "GROUP", 0 otherwise.

    overview_add ( NUMBER, GROUP, ARTICLE )
        Add information to "GROUP"'s overview information regarding article
        "NUMBER", which is "ARTICLE". Just appends the information to the
        overview database; we don't need to do anything more at this point.

    overview_read ( GROUP, MESSAGE-SPEC [, HDR ] )
        Get the overview information from "GROUP" for the articles specified
        by "MESSAGE-SPEC" (see Net::NNTP). If "HDR" is offered, only return
        that header information. Mostly invokes "xover()".

NOTES
    This module has grown out of my original kiboze.pl scripts, which
    accomplished essentially the same writing functions but none of the
    reading ones. While a write-only interface has been somewhat beneficial,
    this should be much more helpful.

TODO
    Start using the AutoLoader (or something like it)

    Close and re-open the databases periodically, to write stuff out while
    in the middle of an operation.

    While we currently have basic hashing taking place on the newsgroups to
    prevent the directories from getting too large, it would be nice if this
    were instead done as a time-hash - that is, if the article was from 28
    Apr 2004, we could make directories that looked like 2004.01.01 (yearly
    hashing), 2004.04.01 (monthly), or 2004.04.28 (daily).

    More News::Web changes to better connect with News::Archive would be
    nice.

    Using a different Overview format may make sense.

    Offer some functions to rebuild overview information later.

    Offer something to make default ~/.kibozerc files.

REQUIREMENTS
    "Net::NNTP::Functions", News::Article, News::Overview, News::Lock,
    News::Active, News::GroupInfo, DB_File

SEE ALSO
    Modules: News::Active, News::GroupInfo, News::Article, News::Web,
    newslib, newsrecurse.pl

    Scripts: kiboze.pl, newsarchive.pl, mbox2news.pl

AUTHOR
    Tim Skirvin <tskirvin@killfile.org>

HOMEPAGE
    http://www.killfile.org/~tskirvin/software/news-archive/

LICENSE
    This code may be redistributed under the same terms as Perl itself.

COPYRIGHT
    Copyright 2003-2007, Tim Skirvin.