Development Guidelines
If you are a construe developer there are several helper utilities built into the library that will allow you to manage datasets and models both locally and in the cloud. But first, there are additional dependencies that you must install.
In requirements.txt
uncomment the section that says: "# Packaging Dependencies"
, e.g. your requirements should now have a section that appears similar to:
# Packaging Dependencies
black==24.10.0
build==1.2.2.post1
datasets==3.1.0
flake8==7.1.1
google-cloud-storage==2.19.0
packaging==24.2
pip==24.3.1
setuptools==75.3.0
twine==5.1.1
wheel==0.45.0
NOTE: the docs might not be up to date with all required dependencies, so make sure you use the latest requirements.txt
.
Then install these dependencies and the test dependencies:
$ pip install -r requirements.txt
$ pip install -r tests/requirements.txt
Tests and Linting
All tests are in the tests
folder and are structured similarly to the construe
module. All tests can be run with pytest
:
$ pytest
We use flake8
for linting as configured in setup.cfg
– note that the .flake8
file is for IDEs only and is not used when running tests. If you want to use black
to automatically format your files:
$ black path/to/file.py
Dataset Management
The python -m construe.datasets
utility provides some helper functionality for managing datasets including the following commands:
manifest: Generate a manifest file from local fixtures.
originals: Download original datasets and store them in fixtures.
sample: Create a sample dataset from the original that is smaller.
upload: Upload datasets to GCP for user downloads.
To regenerate the datasets you would run the originals
command first to download the datasets from HuggingFace or elsewhere on the web, then run sample
to create statistical samples on those datasets. Run manifest
to generate the new manifest for the datasets and SHA256 signatures, then run upload
to save them to our GCP bucket.
You must have valid GCP service account credentials to upload datasets.
Models Management
The python -m construe.models
utility provides helpers for managing models and converting them to the tflite format including the following commands:
convert: Convert source models to the tflite format for use in embeded systems.
manifest: Generate a manifest file from local fixtures.
originals: Download original models and store them in fixtures.
upload: Upload converted models to GCP for user downloads.
To regenerate the models you would run the originals
command to download the models from HuggingFace, then run convert
to transform them into the tflite format. Run manifest
to generate the new manifest for the models and SHA256 signatures, then run upload
to save them to our GCP bucket.
You must have valid GCP service account credentials to upload datasets.
Releases
To release the construe library and deploy to PyPI run the following commands:
$ python -m build
$ twine upload dist/*