Development Guidelines

If you are a construe developer there are several helper utilities built into the library that will allow you to manage datasets and models both locally and in the cloud. But first, there are additional dependencies that you must install.

In requirements.txt uncomment the section that says: "# Packaging Dependencies", e.g. your requirements should now have a section that appears similar to:

# Packaging Dependencies
black==24.10.0
build==1.2.2.post1
datasets==3.1.0
flake8==7.1.1
google-cloud-storage==2.19.0
packaging==24.2
pip==24.3.1
setuptools==75.3.0
twine==5.1.1
wheel==0.45.0

NOTE: the docs might not be up to date with all required dependencies, so make sure you use the latest requirements.txt.

Then install these dependencies and the test dependencies:

$ pip install -r requirements.txt
$ pip install -r tests/requirements.txt

Tests and Linting

All tests are in the tests folder and are structured similarly to the construe module. All tests can be run with pytest:

$ pytest

We use flake8 for linting as configured in setup.cfg – note that the .flake8 file is for IDEs only and is not used when running tests. If you want to use black to automatically format your files:

$ black path/to/file.py

Dataset Management

The python -m construe.datasets utility provides some helper functionality for managing datasets including the following commands:

  • manifest: Generate a manifest file from local fixtures.

  • originals: Download original datasets and store them in fixtures.

  • sample: Create a sample dataset from the original that is smaller.

  • upload: Upload datasets to GCP for user downloads.

To regenerate the datasets you would run the originals command first to download the datasets from HuggingFace or elsewhere on the web, then run sample to create statistical samples on those datasets. Run manifest to generate the new manifest for the datasets and SHA256 signatures, then run upload to save them to our GCP bucket.

You must have valid GCP service account credentials to upload datasets.

Models Management

The python -m construe.models utility provides helpers for managing models and converting them to the tflite format including the following commands:

  • convert: Convert source models to the tflite format for use in embeded systems.

  • manifest: Generate a manifest file from local fixtures.

  • originals: Download original models and store them in fixtures.

  • upload: Upload converted models to GCP for user downloads.

To regenerate the models you would run the originals command to download the models from HuggingFace, then run convert to transform them into the tflite format. Run manifest to generate the new manifest for the models and SHA256 signatures, then run upload to save them to our GCP bucket.

You must have valid GCP service account credentials to upload datasets.

Releases

To release the construe library and deploy to PyPI run the following commands:

$ python -m build
$ twine upload dist/*