Benchmarks
A benchmark is the combination of a model with a dataset – for each instance in the dataset, the model is applied to it and the amount of time it takes to preprocess and inference is measured. Preprocessing refers to the amount of time it takes to convert the raw data into a feature tensor array; inferencing refers to the amount of time it take to apply the feature array to the model to generate an output.
Note that some models have multiple preprocessing and inferencing steps; for example the moondream model applies a captioning model to an image, then a classification model to the caption.
All benchmarks currently use the tflite
runtime for embedded devices.
The following benchmarks are implemented:
gliner
: applies the GLiNER model to identify and classify named entities in long-form essays.lowlight
: enhances image quality using a convolutional model that understands how to enrich low-light images.mobilenet
: uses the MobileNet v2 model to classify objects in scenes from movie stills.mobilevit
: uses the MobileViT model to identify objects in scenes from movie stills.moondream
: uses an image captioning and text classification model for content moderation on NSFW images.nsfw
: ses a fine-tuned computer vision model to classify images as safe or not safe for work (nsfw).offensive
: applies the offensive speech detection model to the aegis content safety dataset.whisper
: utilities the whisper-tiney english transcription model to create transcribe audio of various UK dialects.
Further information about the models and datasets can be found below.
Model References
Image to Text: Moondream (vikhyatk/moondream2)
Speech to Text: Whisper (openai/whisper-tiny.en)
Image Classification: MobileNet (google/mobilenet_v2_1.0_224)
Object Detection: MobileViT (apple/mobilevit-xx-small)
NSFW Image Classification Fine-Tuned Vision Transformer (ViT) for NSFW Image Classification (Falconsai/nsfw_image_detection)
Image Enhancement LoL MIRNet (keras-io/lowlight-enhance-mirnet)
Text Classification: Offensive Speech Detector (KoalaAI/OffensiveSpeechDetector)
Token Classification: GLiNER (knowledgator/gliner-bi-small-v1.0)
Model Download Size
model |
compressed |
decompressed |
---|---|---|
lowlight |
35.04 MiB (36,745,264 B) |
35.96 MiB (37,709,200 B) |
whisper |
27.02 MiB (28,328,063 B) |
40.45 MiB (42,416,036 B) |
Dataset References
AEGIS AI Content Safety v1.0: Text data that is used to show examples of content safety (e.g. harmful text) described by Nvidia’s content safety taxonomy.
LoL (Low-Light) Dataset: Contains 500 low-light and normal-light image pairs for image enhancement.
English Dialects: Contains 31 hours of audo from 120 individuals speaking with different accents of the British Isles and is used for speech to text.
Reddit Posts Comments: A text dataset of comments on Reddit posts that can be used for NER and content moderation tasks on short form text.
Student and LLM Essays: A text dataset of essays written by students (and LLMs) that can be used for NER and content moderation tasks on longer form text.
NSFW Detection: An image dataset that contains NSFW and SFW images used for content moderation.
Movie Scenes: An image dataset that contains stills from commercial movies and can be used for image classification and content-moderation tasks.
Dataset Download Size
dataset |
instances |
compressed |
decompressed |
---|---|---|---|
aegis |
11,997 |
3.45 MiB (3,619,910 B) |
10.84 MiB (11,362,916 B) |
aegis-sample |
3,030 |
894.86 KiB (916,334 B) |
2.75 MiB (2,878,359 B) |
dialects |
17,877 |
2.30 GiB (2,466,918,919 B) |
3.36 GiB (3,605,272,328 B) |
dialects-sample |
1,785 |
231.87 MiB (243,136,640 B) |
340.18 MiB (356,704,802 B) |
essays |
2,078 |
6.79 MiB (7,116,584 B) |
33.87 MiB (35,516,576 B) |
essays-sample |
512 |
1.71 MiB (1,796,330 B) |
8.38 MiB (8,785,856 B) |
lowlight |
1,000 |
331.37 MiB (347,470,078 B) |
332.12 MiB (348,256,471 B) |
lowlight-sample |
475 |
158.52 MiB (166,217,847 B) |
158.89 MiB (166,608,858 B) |
movies |
106,844 |
6.85 GiB (7,351,355,869 B) |
6.97 GiB (7,479,027,563 B) |
movies-sample |
5,465 |
363.52 MiB (381,174,092 B) |
369.81 MiB (387,776,108 B) |
nsfw |
215 |
26.64 MiB (27,937,058 B) |
26.96 MiB (28,266,876 B) |
nsfw-sample |
53 |
6.13 MiB (6,429,140 B) |
6.23 MiB (6,535,438 B) |
3,844 |
238.64 KiB (244,363 B) |
1.06 MiB (1,117,785 B) |
|
reddit-sample |
957 |
62.48 KiB (63,979 B) |
272.20 KiB (278,734 B) |