Image recognition Engine
Image recognition Engine
The image recognition engine recognizes the objects in the image and classifies the scene based on the recognized result. The image recognition engine supports future applications such as visual-based search, video captioning, autonomous driving and visual Q&A. Saltlux plans to advance the current image recognition engine to a level in which it understands the meaning of the scenes in the image, beyond simple explanation of the image, so that it can achieve technological stack shown in the following picture.
< Image recognition engine technology stack >
- More detailed and accurate image understanding using knowledge graph
The image recognition engine uses the knowledge graph in the process in which it understands what the image means through object recognition. Unlike existing image recognition products that simply learn images tagged with words or sentences, it can be linked with a knowledge graph based on meaning to enable more detailed and accurate image understanding.
- Domain-specific image understanding
Interpretation criteria may vary depending on domain knowledge. The image recognition engine based on knowledge graph can be linked with individual knowledge graphs built for various domains, resulting in image understanding results specific to each domain knowledge.
Main features and specifications
The image recognition engine processes images from cameras in real time, recognizes the situation, and provides the recognized information to applications. To this end, it can consist of a visual analysis module to analyze various information in the image and a visual understanding module to understand the situation based on the analyzed information.
< Image Recognition engine block diagram >
- Semantic Segmentation
As the image recognition technology used to analyze photographs and recognizes the kinds of objects is developing, so is the object detection technology used to find out where the recognized objects are located. Semantic Segmentation is a feature that determines to which objects all the points in an image belong to. It is used to find the boundary (line) of each object along with its exact range.
< Object recognition and object segmentation examples >
- Pose Estimation
Pose estimation is a technique used to detect and measure the position of human anatomical key points, such as head, neck, shoulder, knee, etc., and determine the state of the object or estimate its posture.
< Pose Estimation examples >
- Hand Gesture Recognition
Hand Gesture Recognition is a feature that extracts and recognizes meaningful gestures from certain information such as video and motion of visual data acquired using a camera. Usually, hand gesture recognition analyzes hand poses to recognize a defined category of hand gestures. Gesture Recognition detects hand movements such as clicks and scrolling and interprets its meaning without direct input.
< Hand gesture recognition examples >
- Face Landmark Detection
Facial feature extraction detects and tracks key facial features (eyes, nose, mouth, jaw lines, eyebrows, etc.). This allows you to correct facial deformations of rigid and non-rigid bodies due to head movements and facial expressions, and to understand facial expression.
< Facial feature extraction and validation examples >
- Age-Group/Gender Classification
This feature recognizes a person’s face in the image and classifies and estimates the age or gender of the person. In addition, it can identify additional information such as facial expression, emotional state, race, etc.
< Age/gender category example >
- Face Recognition and Verification
This feature identifies the person through face recognition. Face Recognition is the task of recognizing a human face in an image and identifying where it belongs by comparing it with pre-registered face information, while Face Verification is the task of verifying whether a face matches one of the pre-registered faces found in the image.
< Face recognition and verification examples >
Since there are many errors in the Face Recognition process, as shown in the Dnl picture, the image recognition engine uses Face Verification in the post-processing step to correct it.
- Image understanding
The image recognition engine includes a process in which it uses the knowledge graph to understand what the identified image means. It can provide specialized image understanding results, which can be interpreted differently depending on your domain knowledge, in addition to general facts.
① Knowledge Graph for Fact
It is a way of knowledge representation in which two objects and its relation in the phenomenon are combined to express it in a triple. The Visual Genome Project, led by Stanford University in the United States, is also building a dataset in which detailed information obtained by analyzing photos is expressed as KG. The image understanding module of the image recognition engine uses KG for facts to express various analysis results obtained from the image analysis module.
② Knowledge Graph for Domain Knowledge
In order to better understand the situation through images, you need to consider not only facts but also a variety of knowledge related to them. In particular, the domain knowledge associated with this in a single application service is essential for understanding that is specific to that field. For this purpose, knowledge specific to a particular domain can be expressed as KG for each domain in the image understanding module and it is combined with KG for fact in each application service.
Each of the features of the image recognition engine described above is under constant research and development. The status of the art (SOTA) so far is shown in the following table.
< Status-Of-The-Art per image recognition feature >