libreary package¶
Subpackages¶
Submodules¶
libreary.adapter_manager module¶
-
class
libreary.adapter_manager.AdapterManager(config: dict)[source]¶ Bases:
objectThe AdapterManager is responsible for all interaction with adapters, except for intial ingestion.
It is able to keep track of all of the adapters we have, do integrity checks on them, perform initial distribution, compare versions from different adapters, and make insert and delete calls.
The Adapter Manager is responsible for most of the operation of ingestion, deletion, and management of digital objects within LIBREary. Most customization will occur by subclassing the AdapterManager.
This class currently contains the following methods:
- reload_levels_adapters (create adapter objects and set levels based on configuration)
- get_all_levels (get levels based on what exists in metadata db)
- get_all_adapters (create adapter objects based on levels)
- set_additional_adapter (manually create an adapter object and add it to the AdapterManager’s list of adapters)
- verify_adapter (make sure an adapter is working properly)
- create_adapter [static method] (factory function for adapter objects)
- send_resource_to_adapters (send copies a resource to all the places they need to be)
- get_adapters_by_level (get all adapters from a level)
- delete_resource_from_adapters (delete non-canonical copies of an object)
- change_resource_level (change the level of an object)
- get_canonical_copy_metadata
- summarize_copies
- retrieve_by_preference (retrieve an object, prefering the canonical adapter)
- check_single_resource_single_adapter (make sure that a copy matches the canonical copy)
- verify_adapter_metadata (verify that a file can be retrieved)
- get_resource_metadata
- restore_canonical_copy (restore a faulty canonical copy)
- compare_copies (check if two copies of a resource are the same)
- verify_copy (check if a copy matches the canonicl copy)
-
change_resource_level(r_id: str, new_levels: List[str]) → None[source]¶ Assign a new set of levels to a resource. Removes all levels from a resource, replaces them with :param new_levels
:param r_id - UUID of resource you’d like to change the levels of :param new_levels: list of names of levels to assign to the resource
-
check_single_resource_single_adapter(r_id: str, adapter_type: str, adapter_id: str) → bool[source]¶ Ensure that a copy of an object matches its canonical checksum. This method trusts that the metadata db has the proper canonical checksum.
If a copy is found to be faulty, a restore is attempted. If a copy is fount to not exist, it is created.
:param r_id - UUID of resource you’d like to check :param r_id - adapter_id for copy of resource you are checking
-
compare_copies(r_id: str, adapter_id_1: str, adapter_id_2: str, deep: bool = False) → bool[source]¶ - Compare copies of a resource in two adapters. Returns True iff
- the checksums of each copy match.
- A deep compare will actually compute the current checksum of the
- file stored in the adapter specified. Some adapters can do this with no file I/O, while others will have to actually retrieve the file to perform this operation
:param r_id - the UUID of the resource to compare :param adapter_id_1 - Adapter ID of the first adapter :param adapter_id_2 - Adapter ID of the second adapter :param deep - specify whether to run a deep or shallow check
-
static
create_adapter(adapter_type: str, adapter_id: str, config_dir: str) → libreary.adapters.BaseAdapter.BaseAdapter[source]¶ Static method for creating and returning an adapter object. This is essentially an Adapter factory.
:param adapter_type - must be the name of a valid adapter class. :param adapter_id - the identifier you want to label this adapter with :param config_dir - configuration directory. Must contain a file called
{adapter_id}_config.json
-
static
create_config_for_adapter(adapter_id: str, adapter_type: str, config_dir: str) → dict[source]¶ Static method for creating an adapter configuration. This is necessary for the adapter factory.
:param adapter_type - must be the name of a valid adapter class. :param adapter_id - the identifier you want to label this adapter with :param config_dir - configuration directory. Must contain a file called
{adapter_id}_config.json
-
delete_resource_from_adapters(r_id: str) → None[source]¶ Deletes a resource from all adapters it’s stored in. Does not delete canonical copy
:param r_id - UUID of resource to delete copies of
-
get_adapters_by_level(level: str) → List[libreary.adapters.BaseAdapter.BaseAdapter][source]¶ Get a list of adapter objects based on a level. Returns a list of callable adapter objects.
:param level - the name of the level you want the adapters for
-
get_all_adapters() → List[dict][source]¶ Set up all of the adapters we will need, based on all levels from the metadata database.
Parses self.levels to create adapter objects.
The structure of the return value is is a dictionary structured as follows:
``` {
“adapter_id1”: <AdapterObject>, “adapter_id2”: <AdapterObject>,…Ensure that self.levels is set properly before running this, by calling self._set_levels()
-
get_all_levels() → List[dict][source]¶ Returns all levels in the metadata database.
Returns a list of dictionaries each with the following format: ``` {“id”: (int) level ID,
“name”: (str) level name, “frequency”: (int) scheduled check frequency, “adapters”: (dict) dictionary of adapters associated with this level}
-
get_canonical_copy_metadata(r_id: str) → List[List[str]][source]¶ Get a summary of the canonical copy of an object’s medatada. That summary includes: copy_id, resource_id, adapter_identifier, locator, checksum, adapter type, canonical (bool)
:param r_id - UUID of resource you’d like to learn about
-
get_resource_metadata(r_id: str) → List[List[str]][source]¶ Get a summary of information about a resource. That summary includes:
id, path, levels, file name, checksum, object uuid, description
This method trusts the metadata database. There should be a separate method to verify the metadata db so that we know we can trust this info
:param r_id - UUID of resource you’d like to learn about
-
reload_levels_adapters() → None[source]¶ Set the self.adapters and self.levels instance variables.
This object needs to be stateful in this way because each adapter might either require time-sensitive authentication information (tokens, etc), or may be computationally expensive to create. For this reason, we want the AdapterManager to have instance variables with adapter objects.
-
restore_canonical_copy(r_id: str) → None[source]¶ Attempt to Restore a detected fault in the canonical copy of an object.
Delete the canonical copy of an object, but keep non-canonical copies. After that, create a new canonical copy, preserving resource UUID,
but with the correct object contents.:param r_id - UUID of resource you’d like to restore
-
restore_from_canonical_copy(adapter_id: str, r_id: str) → None[source]¶ Restore a copy of an object from its canonical copy. To restore from the canonical copy, we can simply delete and
re-ingest the fraudulent copy.:param adapter_id - the ID of the adapter with the broken copy :param r_id - The resource UUID of the resource we’ve detected an issue with
-
retrieve_by_preference(r_id: str) → str[source]¶ Retrieve a resource.
Get a copy of a file, preferring canonical adapter, then enforcing some preference hierarchy This will be called when Libreary is asked to retrieve.
This places a file in the configured output_dir and returns a path to the retrieved file.
Keep in mind that the output directory may be volatile and should not be used for storage.
:param r_id - UUID of resource you’d like to retrieve
-
send_resource_to_adapters(r_id: str, delete_after_send: bool = False) → None[source]¶ Sends a resource to all the places it should go. The resource must have already been ingested through the Ingester. This method:
- Figures out what levels a resource has been assigned
- Figures out what adapters are associated with that level
- Figured out any overlap, to avoid storing things twice in one adapter
- Stores copies to each adapter
- optionally, deletes any remaining files in the dropbox directory
:param r_id - resource UUID you wish to distribute :param delete_after_send - boolean indicating whether to delete
files after storage
-
set_additional_adapter(adapter_id: str, adapter_type: str) → libreary.adapters.BaseAdapter.BaseAdapter[source]¶ Manually add an adapter to the pool of adapters.
- :param adapter_id - the adapter ID of the adapter you’re creating
- There should be a matching config file for this adapter ID
- :param adapter_type - the type of the adapter you wish to create.
- Must be the actual class name, i.e. “LocalAdapter”.
-
summarize_copies(r_id: str) → List[List[str]][source]¶ Get a summary of all copies of a single resource. That summary includes:
copy_id, resource_id, adapter_identifier, locator, checksum, adapter type, canonical (bool) for each copy
This method trusts the metadata database. There should be a separate method to verify the metadata db so that we know we can trust this info
:param r_id - UUID of resource you’d like to learn about
-
verify_adapter(adapter_id: str) → bool[source]¶ Make sure an adapter is working. To do this, we store, retrieve, and delete a file that we know the contents of, and make sure the checksums are as they should be.
:param adapter_id - The adapter ID you’d like to verify.
-
verify_adapter_metadata(adapter_id: str, r_id: str, delete_after_check: bool = True) → bool[source]¶ Ensure that a copy of an object matches its canonical checksum. This method does not trust that the metadata db has the proper canonical checksum.
Verifies that the file is actually retirevable via adapter id, not just there according to the metadata.
Note, this retrieves the file, so it’s relatively expensive.
If a copy is found to be faulty, a restore is attempted. If a copy is fount to not exist, it is created.
:param r_id - UUID of resource you’d like to check :param r_id - adapter_id for copy of resource you are checking
-
verify_copy(r_id: str, adapter_id: str, deep: bool = False) → bool[source]¶ Compare copies of a resource in two adapters, one being canonical. Returns True iff the checksums of each copy match.
- A deep compare will actually compute the current checksum of the
- file stored in the adapter specified. Some adapters can do this with no file I/O, while others will have to actually retrieve the file to perform this operation
:param r_id - the UUID of the resource to compare :param adapter_id - Adapter ID of the adapter to check against :param deep - specify whether to run a deep or shallow check
libreary.exceptions module¶
Bases:
Exception
libreary.ingester module¶
-
class
libreary.ingester.Ingester(config: dict)[source]¶ Bases:
object-
delete_resource(r_id: str) → None[source]¶ Delete a resource from the LIBREary.
- This method deletes the canonical copy and removes the corresponding entry in the resources
- table.
:param r_id - the UUID of the resouce you’re deleting
-
ingest(current_file_path: str, levels: List[str], description: str, delete_after_store: bool = False) → str[source]¶ Ingest an object to LIBREary. This method: - Creates the canonical copy of the object - Creates the entry in the resources table describing the resource - Optionally, delete the file out of the dropbox dir.
:param current_file_path -
-
list_resources() → List[List[str]][source]¶ Return a list of summaries of each resource. This summary includes:
id, path, levels, file name, checksum, object uuid, description
This method trusts the metadata database. There should be a separate method to verify the metadata db so that we know we can trust this info
-
libreary.libreary module¶
-
class
libreary.libreary.Libreary(config_dir: str)[source]¶ Bases:
objectThis is the user-facing class for LIBRE-ary. Users of LIBRE-ary should only interact with this class directly. LIBRE-ary objects are able to handle all of the functionality of this module. Developers should feel free to extend the functionality of this class and are encouraged to submit pull requests to the main repository.
This class currently contains the following methods:
- ingest (load a resource into LIBRE-ary)
- retrieve (retrieve a copy of an object)
- delete (delete an object)
- update (update an object)
- search (search for information about objects)
- run_full_check (check all resources to verify integrity)
- check_single_resource (check only a single resource)
-
add_level(name: str, frequency: int, adapters: List[dict], copies=1) → None[source]¶ Add a level to the metadata database.
:param name - name for the level :param frequency - check frequency for level. Currently unimplemented :param adapters - dict object specifying adapters the level uses. Example:
:param copies - copies to store for each adapter. Currently, only 1 is supported
-
check_single_resource(r_id: str, deep: bool = False) → bool[source]¶ Check a single object in the LIBRE-ary. This follows the following process:
Get canonical copy and actual checksum. Make sure canonical copy matches expected checksum If it doesn’t:
Attempt to recover canonical copyGet a list of all levels that the object has been labelled as: For each level:
Get a list of adapters is is stored in: For each adapter:
Check to make sure that copy’s checksum matches what it should: If it doesn’t:
Attempt to recover it.:param deep speficies whether to use a deep search. A deep search will calculate actual checksums of each copy of each object, while a shallow one will trust that the checksum in the metadata database matches that of the actual object.
:param r_id - the resource ID of the object you’d like to check
-
delete(r_id: str) → None[source]¶ - Delete an object from LIBRE-ary. This:
- Deletes the resource from all of the adapters it was stored in
- Deletes the resource from the metadata db entirely
- Removes the canonical copy
Be careful with this function, as there is no undo option.
-
ingest(current_file_path: str, levels: List[str], description: str, delete_after_store: bool = False) → str[source]¶ - Ingest a new object to the LIBRE-ary. This:
- Creates an entry in the resources table in the metadata db
- Creates an object UUID
- Ingests the canonical copy
- Sends copies to adapters which match specified levels
- Returns object ID
:param current_file_path - the current path to the file you wish to ingest :param levels - a list of names of levels. These levels must exist in the
levels table in the metadata db- :param description - a description of this object. This is useful when you
- want to search for objects later
:param delete_after_store - Boolean. If True, the Ingester will delete the object after it’s stored.
-
retrieve(r_id: str) → str[source]¶ - Retrieve an object. This will save a copy of the object
- as <self.output_dir>/<object_filename>
The output and dropbox directories are volatile and should not be used for object storage.
Adapters and other objects frequently may delete or write files in these directories.
:param r_id - The resource UUID that corresponds to the object you’d like to retrieve.
Returns a path to the retireved object.
-
run_check() → List[str][source]¶ Check all of the objects in the LIBRE-ary. This follows the following process:
- For each object:
Get canonical copy and actual checksum. Make sure canonical copy matches expected checksum If it doesn’t:
Attempt to recover canonical copyGet a list of all levels that the object has been labelled as: For each level:
Get a list of adapters is is stored in: For each adapter:
Check to make sure that copy’s checksum matches what it should: If it doesn’t:
Attempt to recover it.
:param deep speficies whether to use a deep search. A deep search will calculate actual checksums of each copy of each object, while a shallow one will trust that the checksum in the metadata database matches that of the actual object.
libreary.scheduler module¶
libreary.version module¶
Set module version. <Major>.<Minor>.<maintenance>[alpha/beta/..] Alphas will be numbered like this -> 0.4.0-a0
Module contents¶
-
libreary.AUTO_LOGNAME= -1¶ Following the general logging philosophy of python libraries, by default LIBREary doesn’t log anything.
However the following helper functions are provided for logging: 1. set_stream_logger
This sets the logger to the StreamHandler. This is quite useful when working from a Jupyter notebook.- set_file_logger
- This sets the logging to a file. This is ideal for reporting issues to the dev team.
- AUTO_LOGNAME
- Special value that indicates libreary should construct a filename for logging.
-
class
libreary.NullHandler(level=0)[source]¶ Bases:
logging.HandlerSetup default logging to /dev/null since this is library.
-
libreary.set_file_logger(filename: str, name: str = 'libreary', level: int = 10, format_string: Optional[str] = None)[source]¶ Add a stream log handler. Args:
- filename (string): Name of the file to write logs to
- name (string): Logger name
- level (logging.LEVEL): Set the logging level.
- format_string (string): Set the format string
- Returns:
- None
-
libreary.set_stream_logger(name: str = 'libreary', level: int = 10, format_string: Optional[str] = None)[source]¶ Add a stream log handler. Args:
- name (string) : Set the logger name.
- level (logging.LEVEL) : Set to logging.DEBUG by default.
- format_string (string) : Set to None by default.
- Returns:
- None