Datasets
Managers for loading datasets
- construe.datasets.loaders.load_all_datasets(sample=True, data_home=None)[source]
Load all available datasets as defined by __all__
- construe.datasets.loaders.cleanup_all_datasets(data_home=None)[source]
Delete everything in the data home directory
- construe.datasets.loaders.load_dialects(sample=True, data_home=None, no_dirs=True, pattern=None)
- construe.datasets.loaders.cleanup_dialects(sample=True, data_home=None)
- construe.datasets.loaders.load_lowlight(sample=True, data_home=None, no_dirs=True, *, pattern='lowlight/**/*.png')
- construe.datasets.loaders.cleanup_lowlight(sample=True, data_home=None)
- construe.datasets.loaders.load_reddit(sample=True, data_home=None)
- construe.datasets.loaders.cleanup_reddit(sample=True, data_home=None)
- construe.datasets.loaders.load_movies(sample=True, data_home=None, no_dirs=True, pattern=None)
- construe.datasets.loaders.cleanup_movies(sample=True, data_home=None)
- construe.datasets.loaders.load_essays(sample=True, data_home=None)
- construe.datasets.loaders.cleanup_essays(sample=True, data_home=None)
- construe.datasets.loaders.load_aegis(sample=True, data_home=None)
- construe.datasets.loaders.cleanup_aegis(sample=True, data_home=None)
- construe.datasets.loaders.load_nsfw(sample=True, data_home=None, no_dirs=True, *, pattern='nsfw/**/*.jpg')
- construe.datasets.loaders.cleanup_nsfw(sample=True, data_home=None)
Manifest
Manifest handlers for datasets
- construe.datasets.manifest.load_manifest(path='/home/runner/work/llm-benchmark/llm-benchmark/llm-benchmark/construe/datasets/manifest.json')[source]
Path Helpers
Path handling for downloads
- construe.datasets.path.get_data_home(path=None)[source]
Return the path of the Construe data directory. This folder is used by dataset loaders to avoid downloading data several times.
By default, this folder is colocated with the code in the install directory so that data shipped with the package can be easily located. Alternatively it can be set by the
$CONSTRUE_DATA
environment variable, or programmatically by giving a folder path. Note that the'~'
symbol is expanded to the user home directory, and environment variables are also expanded when resolving the path.
- construe.datasets.path.find_dataset_path(dataset, data_home=None, fname=None, ext=None, raises=True)[source]
Looks up the path to the dataset specified in the data home directory, which is found using the
get_data_home
function. By default data home is in a config directory in the user’s home folder, but can be modified with the $CONSTRUE_DATA environment variable, or passing in a different directory.If the dataset is not found a
DatasetsError
is raised by default.
- construe.datasets.path.dataset_exists(dataset, data_home=None)[source]
Checks to see if a directory with the name of the specified dataset exists in the data home directory, found with
get_data_home
.
- construe.datasets.path.dataset_archive(dataset, signature, data_home=None, ext='.zip')[source]
Checks to see if the dataset archive file exists in the data home directory, found with
get_data_home
. By specifying the signature, this function also checks to see if the archive is the latest version by comparing the sha256sum of the local archive with the specified signature.