User manual for the mind-model-services collection of projects.

Introduction

Java libraries for enabling and simplifying the inference of Deep Learning models.

reference arch Provides Java wrappers for various state-of-the-art deep learning models. To be used in standalone Java application, SpringBoot microservices as well as Streaming, Task/Batch and Serverless programing models.

The implementations use the TensorFlow Java API as well as the DL4J API stack such as JavaCV, ND4J nd4j-tensorflow.

The pre-trained TensorFlow models can be loaded from classpath, file or http resource locations. Also the models can be loaded directly from their model-zoo pages! For this just use the following URI convention:

http://<Model's tar.gz URL>#frozen_inference_graph.pb

Where the frozen_inference_graph.pb is the frozen model’s file name within the archive specified with the URI. Change the fragment name to match the file name within the model archive you use. The frozen_inference_graph.pb seems to be used by default in many projects.

Service Catalog


Object Detection

Java model inference library for the TensorFlow Object Detection API. Allows real-time localization and identification of multiple objects in a single or batch of images. Works with all pre-trained zoo models and object labels.

Object Detection 1

The ObjectDetectionService takes an image or a batch of images and outputs a list of predicted objects bounding boxes represented by ObjectDetection. For the models supporting Instance Segmentation, the ObjectDetectionService can predict the instance segmentation masks in addition to object bounding boxes.

The JsonMapperFunction permits converting the List<ObjectDetection> into JSON objects and the ObjectDetectionImageAugmenter allow to augment the input image with the detected bounding boxes and segmentation masks.

Usage

Add the object-detection dependency to the pom (use the latest version available):

<dependency>
    <groupId>io.mindmodel.services</groupId>
    <artifactId>object-detection</artifactId>
    <version>1.0.0-SNAPSHOT</version>
</dependency>
Example 1: Object Detection

The ExampleObjectDetection.java sample demonstrates how to use the ObjectDetectionService for detecting objects in input images. It also shows how to convert the result into JSON format and augment the input image with the detected object bounding boxes.

1
2
3
4
5
6
7
8
9
10
ObjectDetectionService detectionService = new ObjectDetectionService(
 "http://download.tensorflow.org/models/object_detection/faster_rcnn_nas_coco_2018_01_28.tar.gz#frozen_inference_graph.pb", (1)
 "https://raw.githubusercontent.com/tensorflow/models/master/research/object_detection/data/mscoco_label_map.pbtxt", (2)
 0.4f, (3)
 false, (4)
 true); (5)

byte[] image = GraphicsUtils.loadAsByteArray("classpath:/images/object-detection.jpg"); (6)

List<ObjectDetection> detectedObjects = detectionService.detect(image); (7)
1 Downloads and loads a pre-trained frozen_inference_graph.pb model directly from the faster_rcnn_nas_coco.tar.gz archive in the Tensorflow model zoo. Mind that on first attempt it will download few hundreds of MBs. The consecutive runs will use the cached copy (5) instead.
2 Object category labels (e.g. names) for the model
3 Confidence threshold - Only object with estimate above the threshold are returned
4 Indicate that this is not a mask (e.g. not an instance segmentation) model type
5 Cache the model on the local file system.
6 Load the input image to evaluate
7 Detect the objects in the image and represent the result as a list of ObjectDetection instances.

Next you can convert the result in JSON format.

1
2
String jsonObjectDetections = new JsonMapperFunction().apply(detectedObjects);
System.out.println(jsonObjectDetections);
Sample Object Detection JSON representation
[{"name":"person","estimate":0.998,"x1":0.160,"y1":0.774,"x2":0.201,"y2":0.946,"cid":1},
 {"name":"kite","estimate":0.998,"x1":0.437,"y1":0.089,"x2":0.495,"y2":0.169,"cid":38},
 {"name":"person","estimate":0.997,"x1":0.084,"y1":0.681,"x2":0.121,"y2":0.848,"cid":1},
 {"name":"kite","estimate":0.988,"x1":0.206,"y1":0.263,"x2":0.225,"y2":0.314,"cid":38}]]

Use the ObjectDetectionImageAugmenter to draw the detected objects on top of the input image.

1
2
byte[] annotatedImage = new ObjectDetectionImageAugmenter().apply(image, detectedObjects); (1)
IOUtils.write(annotatedImage, new FileOutputStream("./object-detection/target/object-detection-augmented.jpg")); (2)
1 Augment the image with the detected object bounding boxes (Uses Java2D internally).
2 Stores the augmented image as object-detection-augmented.jpg image file.
Augmented object-detection-augmented.jpg file

Object Detection

Set the ObjectDetectionImageAugmenter#agnosticColors property to true to use a monochrome color schema.
Example 2: Instance Segmentation

The ExampleInstanceSegmentation.java sample shows how to use the ObjectDetectionService for Instance Segmentation. NOTE: It requires a trained model that supports Masks as well as setting the instance segmentation (e.g. useMasks) flag to true.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
ObjectDetectionService detectionService = new ObjectDetectionService(
   "http://download.tensorflow.org/models/object_detection/mask_rcnn_inception_resnet_v2_atrous_coco_2018_01_28.tar.gz#frozen_inference_graph.pb", (1)
   "https://raw.githubusercontent.com/tensorflow/models/master/research/object_detection/data/mscoco_label_map.pbtxt", (2)
   0.4f, (3)
   true, (4)
   true); (5)

byte[] image = GraphicsUtils.loadAsByteArray("classpath:/images/object-detection.jpg");

List<ObjectDetection> detectedObjects = detectionService.detect(image); (6)

String jsonObjectDetections = new JsonMapperFunction().apply(detectedObjects); (7)
System.out.println(jsonObjectDetections);

byte[] annotatedImage = new ObjectDetectionImageAugmenter(true) (8)
    .apply(image, detectedObjects);
IOUtils.write(annotatedImage, new FileOutputStream("./object-detection/target/object-detection-segmentation-augmented.jpg"));
1 Uses one of the 4 MASK pre-trained models
2 Object category labels (e.g. names) for the model
3 Confidence threshold - Only object with estimate above the threshold are returned
4 Use masks output - For the pre-trained models instruct to use the extended fetch names that include instance segmentation masks as well.
5 Cache model - Create a local copy of the model to speed up consecutive runs.
6 Evaluate the model to predict the object in the input image.
7 Convert the detected object in to JSON array. NOTE: that with mask there is an additional field: mask
8 Draw the detected object on top of the input image. Mind the true constructor parameter stands for draw detected masks. If false only the bounding boxes will be shown.
Result augmented object-detection-segmentation-augmented.jpg file

Object Detection Augmented

Models

All pre-trained detection_model_zoo.md models are supported. Following URI notation can be used to download any of the models directly from the zoo.

http://<zoo model tar.gz url>#frozen_inference_graph.pb

The frozen_inference_graph.pb is the frozen model file name within the archive.

For some models this name may differ. You have to download and open the archive to find the real name.
To speedup the bootstrap performance you may consider extracting the frozen_inference_graph.pb and caching it locally. Then you can use the file://path-to-my-local-copy URI schema to access it.

Following models can be used for Instance Segmentation as well:

mask_rcnn_inception_resnet_v2_atrous_coco_2018_01_28.tar.gz

mask_rcnn_inception_v2_coco_2018_01_28.tar.gz

mask_rcnn_resnet101_atrous_coco_2018_01_28.tar.gz

mask_rcnn_resnet50_atrous_coco_2018_01_28.tar.gz

In addition to the model, the ObjectDetectionService requires a list of labels that correspond to the categories detectable by the selected model. All labels files are available in the object_detection/data folder.

It is important to use the labels that correspond to the model being used! Table below highlights this mapping.
Table 1. Relationsip between trained model types and category labels
Model Labels

coco

mscoco_label_map.pbtxt

kitti

kitti_label_map.pbtxt

open-images

oid_bbox_trainable_label_map.pbtxt

inaturalist-species

fgvc_2854_classes_label_map.pbtxt

ava

ava_label_map_v2.1.pbtxt

For performance reasons you may consider downloading the required label files to the local file system.

Build

$ ./mvnw clean install

Semantic Segmentation

Image Semantic Segmentation based on the state-of-art DeepLab Tensorflow model.

VikiMaxiAdi all

Semantic Segmentation is the process of associating each pixel of an image with a class label, (such as flower, person, road, sky, ocean, or car). Unlike the Instance Segmentation, which produces instance-aware region masks, the Semantic Segmentation produces class-aware masks. For implementing Instance Segmentation consult the Object Detection Service instead.

The JsonMapperFunction permits converting the List<ObjectDetection> into JSON objects and the ObjectDetectionImageAugmenter allow to augment the input image with the detected bounding boxes and segmentation masks.

Usage

Add the semantic-segmentation dependency to your pom (use the latest version available):

<dependency>
    <groupId>io.mindmodel.services</groupId>
    <artifactId>semantic-segmentation</artifactId>
    <version>0.0.1-SNAPSHOT</version>
</dependency>

Following snippet demos how to use the PASCAL VOC model to apply mask to an input image

1
2
3
4
5
6
7
8
9
10
11
12
13
SemanticSegmentationService segmentationService = new SemanticSegmentationService(
  "http://download.tensorflow.org/models/deeplabv3_mnv2_pascal_trainval_2018_01_29.tar.gz#frozen_inference_graph.pb", (1)
  true); (2)

byte[] inputImage = GraphicsUtils.loadAsByteArray("classpath:/images/VikiMaxiAdi.jpg"); (3)

byte[] imageMask = segmentationService.masksAsImage(inputImage); (4)
BufferedImage bi = ImageIO.read(new ByteArrayInputStream(imageMask));
ImageIO.write(bi, "png", new FileOutputStream("./semantic-segmentation/target/VikiMaxiAdi_masks.png"));

byte[] augmentedImage = segmentationService.augment(inputImage); (5)
IOUtils.write(augmentedImage, new FileOutputStream("./semantic-segmentation/target/VikiMaxiAdi_augmented.jpg"));
1 Download the PASCAL 2012 trained model directly from the web. The frozen_inference_graph.pb is the name of the model file inside the tar.gz archive.
2 Cache the downloaded model locally
3 Load the input image as byte array
4 Read get the segmentation mask as separate image
5 Blend the segmentation mask on top of the original image

Models

Based on the training datasets, three groups of pre-trained models provided:

VikiMaxiAdi all

DeepLab models trained on PASCAL VOC 2012

cityscape all small

DeepLab models trained on Cityscapes

ADE20K all small

DeepLab models trained on ADE20K

Select the model you want to use, copy its archive download Url and add a #frozen_inference_graph.pb fragment to it. Later fragment is the frozen model’s file name inside the archive

Download the archive and uncompress the frozen_inference_graph.pb for required model. Then use the file://<local-file-name>; URI schema.

Also convenience there are a couple of models, extracted from the archive and uploaded to bintray:

PASCAL VOC 2012 (default)

http://dl.bintray.com/big-data/generic/deeplabv3_mnv2_pascal_train_aug_frozen_inference_graph.pb

CITYSCAPE

http://dl.bintray.com/big-data/generic/deeplabv3_mnv2_cityscapes_train_2018_02_05_frozen_inference_graph.pb

ADE20K

http://dl.bintray.com/big-data/generic/deeplabv3_xception_ade20k_train_2018_05_29_frozen_inference_graph.pb

Build

$ ./mvnw clean install

Pose Estimation

Multi-person pose estimation service for detecting human figures in images and videos.

wk

The pose estimation service predicts where different body parts are located an how are they spatially relate to each other. The implementation is based on the Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields, OpenPose and tf-pose-estimation.

Service uses pre-trained tf-pose-estimation TensorFlow models to predict the locations and the affinity of the body parts. Predictions are represented by heatmaps. Next the service applied few greedy algorithms to refine the parts in the heatmaps and to group them into poses.

ch

Usage

Add the pose-estimation dependency to your pom (Use the latest version available):

<dependency>
    <groupId>io.mindmodel.services</groupId>
    <artifactId>pose-estimation</artifactId>
    <version>0.0.1-SNAPSHOT</version>
</dependency>

Create a PoseEstimationService with the cmu-graph_opt.pb pre-trained model and use it to detect the poses in the tourists.jpg image:

1
2
3
4
5
6
PoseEstimationService poseEstimationService = new PoseEstimationService(
    "https://dl.bintray.com/big-data/generic/2018-05-14-cmu-graph_opt.pb", (1)
    true);(2)
byte[] inputImage = GraphicsUtils.loadAsByteArray("classpath:/images/tourists.jpg");
List<Body> bodies = poseEstimationService.detect(inputImage);(3)
String bodiesJson = new JsonMapperFunction().apply(bodies); (4)
1 URI of the pre-trained, frozen Tensorflow model
2 Download and cache the model locally.
3 Service takes an image (or batch of images) and the produces a list of detected Bodies.. The Body represents a single body posture found on the image. The Body is composed of Parts connected by Limbs. The Limb contains a PAF (Part Affiliation Field) estimate score and the from and to parts it connects. The Part has type and coordinates in the image.
4 Use the JsonMapperFunction to turn the Body list into JSON objects. The output JSON format looks like:
[{"id":0, "limbs": [
    {"score":8.4396105,"from":{"type":"lShoulder","y":56,"x":160},
                         "to":{"type":"lEar","y":24,"x":152}},
    {"score":10.145516,"from":{ "type":"neck","y": 56,"x":144},
                         "to":{"type":"rShoulder","y":56,"x":128}},
 {"id":1, "limbs": [
    {"score":7.85779,  "from":{"type":"neck","y":48,"x":328},
                         "to":{"type":"rHip","y":128,"x":328}},
    {"score":6.8949876,"from":{"type":"neck","y":48,"x":328 },
                         "to":{"type":"lHip","y":128,"x":304}}]
}]

Or the PoseEstimateImageAugmenter function to draw the detected body skeletons on top of the input image:

1
2
byte[] augmentedImage = new PoseEstimateImageAugmenter().apply(inputImage, bodies);
IOUtils.write(augmentedImage, new FileOutputStream("./pose-estimation/target/tourists-augmented.jpg"));

The annotated images would look like this:

tourists augmented

You can configure the PoseEstimateImageAugmenter to use different color schema or graphic characteristics.

Models

Table 2. Pre-build models to use with the Pose Estimation Service
Model Name Model URI

Thin - faster but less accurate (default)

http://dl.bintray.com/big-data/generic/2018-30-05-mobilenet_thin_graph_opt.pb

CMU - better accuracy but slower and large footprint

http://dl.bintray.com/big-data/generic/2018-05-14-cmu-graph_opt.pb

Build

git clone git@github.com:tzolov/mind-model-services.git
cd mind-model-services
$ mvn clean install

Face Detection (MTCNN)

Placeholder for the existing Face Detection MTCNN-Java project.

The FaceDetection service uses a Tensorflow binding provided by the ND4J stack. Later is not compatible with the Google’s TensorFlow Java API. This means that you can’t have in the same project dependency on the other mind-model-services.

Face Detection Maxi

Java and Tensorflow implementation of the MTCNN Face Detector. Based on David Sandberg’s FaceNet’s MTCNN python implementation and the original Zhang, K et al. (2016) ZHANG2016 paper and Matlab implementation.

We’re considering moving the mtcnn-java project code under this projects umbrella. Until then follow the instruction provided there.

Usage

Use the following dependency to add the mtcnn utility to your project

<dependency>
  <groupId>net.tzolov.cv</groupId>
  <artifactId>mtcnn</artifactId>
  <version>0.0.4</version>
</dependency>

Also register jcentral to your list of maven repository (it is available out of the box for Gradle).

<repositories>
    <repository>
      <id>jcenter</id>
      <url>https://jcenter.bintray.com/</url>
    </repository>
</repositories>

The FaceDetectionSample1.java demonstrates how to use MtcnnService for detecting faces in images.

FaceDetectionJavaMTCNN

Here is the essence this sample:

// 1. Create face detection service.
MtcnnService mtcnnService = new MtcnnService(30, 0.709, new double[] { 0.6, 0.7, 0.7 });

try (InputStream imageInputStream = new DefaultResourceLoader() .getResource("classpath:/pivotal-ipo-nyse.jpg").getInputStream()) {
    // 2. Load the input image (you can use http:/, file:/ or classpath:/ URIs to resolve the input image
    BufferedImage inputImage = ImageIO.read(imageInputStream);
    // 3. Run face detection
    FaceAnnotation[] faceAnnotations = mtcnnService.faceDetection(inputImage);
    // 4. Augment the input image with the detected faces
    BufferedImage annotatedImage = MtcnnUtil.drawFaceAnnotations(inputImage, faceAnnotations);
    // 5. Store face-annotated image
    ImageIO.write(annotatedImage, "png", new File("./AnnotatedImage.png"));
    // 6. Print the face annotations as JSON
    System.out.println("Face Annotations (JSON): " + new ObjectMapper().writeValueAsString(faceAnnotations));
}

It takes an input image detect the faces, produces json annotations and augments the image with the faces.

The face annotation json format looks like this:

[ {
  "bbox" : { "x" : 331, "y" : 92, "w" : 58, "h" : 71 }, "estimate" : 0.9999871253967285,
  "landmarks" : [ {
    "type" : "LEFT_EYE", "position" : { "x" : 346, "y" : 120 } }, {
    "type" : "RIGHT_EYE", "position" : { "x" : 374, "y" : 119 } }, {
    "type" : "NOSE", "position" : { "x" : 359, "y" : 133 } }, {
    "type" : "MOUTH_LEFT", "position" : { "x" : 347, "y" : 147 } }, {
    "type" : "MOUTH_RIGHT", "position" : { "x" : 371, "y" : 147 },
  } ]
}, {

Twitter Sentiment

Performs sentiment classification on tweets. Uses a pre-trained TensorFlow model build with twitter-sentiment-cnn.

Service can evaluates Tweet messages (in JSON format) and detects the sentiment: POSITIVE, NEGATIVE and NEUTRAL.

For real life application of the Twitter Sentiment Analysis check:

tf ts
Figure 1. Real-time Twitter Sentiment Analytics with TensorFlow and Spring Cloud Dataflow

Usage

1
2
3
4
5
6
7
8
9
10
String tweet = "{\"text\": \"This is really bad\", \"id\":666, \"lang\":\"en\" }"; (1)

TwitterSentimentService twitterSentimentService = new TwitterSentimentService(
        "http://dl.bintray.com/big-data/generic/minimal_graph.proto", (2)
        "http://dl.bintray.com/big-data/generic/vocab.csv",           (3)
        true); (4)

SentimentResult tweetSentiment = twitterSentimentService.tweetSentiment(tweet); (5)

System.out.println(tweetSentiment.getSentiment() + " : " + tweetSentiment.getEstimate());
1 Sample tweet message in JSON format.
2 Pre-trained model Uri.
3 The Uri of the Word vocabulary used to train the model.
4 Cache the TensorFlow model on the local file system.
5 Use the service to detect the sentiment

Would yield a result like:

NEGATIVE : 0.03941632

Next you can convert the result in JSON format.

1
2
String jsonTweetSentiment = new JsonMapperFunction().apply(tweetSentiment);
System.out.println(jsonTweetSentiment);
Sample tweet sentiment JSON representation
{
 "sentiment":"NEGATIVE",
 "estimate":0.03941632
}

Models

Table 3. Pre-build models to use with the Twitter Setnimnet Service
Model URI

minimal_graph (default)

http://dl.bintray.com/big-data/generic/minimal_graph.proto

Table 4. Pre-build vacabulary to use with the Twitter Setnimnet Service
Vocabulary URI

vocab.csv for minimal_graph(default)

http://dl.bintray.com/big-data/generic/vocab.csv

Build

$ mvn clean install

Image Recognition

Java model inference library for the Inception, MobileNetV1 and MobileNetV2 image recognition architectures. Provides real-time recognition of the LSVRC-2012-CLS categories in the input images.

Inception 1

The ImageRecognitionService takes an image and outputs a list of probable categories the image contains. The response is represented by RecognitionResponse class.

The JsonMapperFunction permits converting the RecognitionResponse into JSON objects and the ImageRecognitionAugmenter can augment the input image with the detected categories (as shown in pic. 1).

Usage

Add the image-recognition dependency to the pom (use the latest version available):

<dependency>
    <groupId>io.mindmodel.services</groupId>
    <artifactId>image-recognition</artifactId>
    <version>1.0.0-SNAPSHOT</version>
</dependency>
Example 1: Image Recognition

The ImageRecognitionExample.java demonstrates how to use the ImageRecognitionService for detecting the categoryes present in an input image. It also shows how to convert the result into JSON format and augment the input image with the detected category labels.

1
2
3
4
5
6
7
8
9
ImageRecognitionService recognitionService = ImageRecognitionService.mobilenetModeV2(
  "https://storage.googleapis.com/mobilenet_v2/checkpoints/mobilenet_v2_1.4_224.tgz#mobilenet_v2_1.4_224_frozen.pb", (1)
  224, (2)
  5, (3)
  true); (4)

byte[] inputImage = GraphicsUtils.loadAsByteArray("classpath:/images/giant_panda_in_beijing_zoo_1.jpg"); (5)

List<RecognitionResponse> recognizedObjects = recognitionService.recognize(inputImage); (6)
1 Downloads and loads a pre-trained mobilenet_v2_1.4_224_frozen.pb model. Mind that on first attempt it will download few hundreds of MBs. The consecutive runs will use the cached copy (5) instead. The category labels for the MobileNetV2 are resolved from src/main/resourcrds/labels/mobilenet_labels.txt.
2 The wxh sieze of the input nomralized image.
3 Top K result to return.
4 Cache the model on the local file system.
5 Load the image to recognise.
6 Return a map of the top-k most probable category names and their probabilites.

The ImageRecognitionService.mobilenetModeV1 and ImageRecognitionService.inception factory metheods help to laod and configure pretrained mobilenetModeV1 and and Inception models.

Next you can convert the result in JSON format.

1
String jsonRecognizedObjects = new JsonMapperFunction().apply(recognizedObjects);
Sample Image Recognition JSON representation
[{"label":"giant panda","probability":0.9946687817573547},{"label":"Arctic fox","probability":0.0036631098482757807},{"label":"ice bear","probability":3.3782739774324E-4},{"label":"American black bear","probability":2.3452856112271547E-4},{"label":"skunk","probability":1.6454080468975008E-4}]

Use the ImageRecognitionAugmenter to draw the recognized categories on top of the input image.

1
2
byte[] augmentedImage = new ImageRecognitionAugmenter().apply(inputImage, recognizedObjects); (1)
IOUtils.write(augmentedImage, new FileOutputStream("./image-recognition/target/image-augmented.jpg"));(2)
1 Augment the image with the recognized categories (uses Java2D internally).
2 Stores the augmented image as image-augmented.jpg image file.
Augmented image-augmented.jpg file

Augmented

Models

This implementation supports all pretrained Inception, MobileNetV1 and MobileNetV2 models. Following URI notation can be used to download any of the models directly from the zoo.

http://<zoo model tar.gz url>#<frozen inference graph name.pb>

The <frozen inference graph name.pb> is the frozen model file name within the archive.

To speedup the bootstrap performance you may consider extracting the model and caching it locally. Then you can use the file://path-to-my-local-copy URI schema to access it.
It is important to use the labels that correspond to the model being used! Table below highlights this mapping.

Build

$ ./mvnw clean install