A Neural Search and Jina AI
Abstract
This report targets the advancement of neural search by using deep learning, whereas another method also introduced by researcher which work similar to this approach and created their own framework known as Jina AI. It uses deep learning algorithm to develop search applications for any kind of modality such as videos, images, source code, and long text etc. This framework allows user to import a “lightweight” version of the Google search engine into project.
Neural Search
In this era of deep learning, the rapid development of the Internet and machine/deep learning has brought convenience and comfort to users’ lives. Neural search is a type of AI engine that utilizes neural hashes to compress and drastically speed up queries. Methods to make search more effective such as following: Rules, Statistics, Neural networks.
Neural search using deep learning algorithm to search, it combines the power of vector search with efficient and fast performance and self-learning capabilities. There is no need to write set of rules, the system trains itself to get better after each iteration of giving output. By using neural network researcher created a new hybrid engine which combined contextual and keyword based results in single digit millisecond time frames no matter the data size. There are few pros and cons of neural search compare to symbolic search such as following:
In above frame work which is used for neural search, it composed of multiple encoders, a metric layer, and a loss layer. First, input data is fed to the encoders which generate vector representations. Note that, product information is encoded by an image encoder and an attribute encoder. In the metric layer, it compute the similarity of a query vector with an image vector and an attribute vector, respectively. Finally, in the loss layer, It compute the difference of similarities between positive and negative pairs, which is used as the feedback to train encoders via backpropagation.
Pre-trained neural networks are deployed to retrieve information. These networks are trained to retrieve information and get better at information retrieval when it train with huge data.
Jina AI
Jina AI is a neural search framework that uses deep neural networks to perform. Jina AI is an open-source, cloud-native neural search framework. It is used for building state-of-the-art and scalable deep learning search applications for any kind of modality. For example, videos, images, source code, long text, etc. The framework allows user to import a “lightweight” version of the Google search engine into the project.
In Jina AI library is created for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc known as DocArray. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer multimodal data with a Pythonic API.
Component of Jina AI
Document, Executor, and Flow
1. A Document is the basic data type in Jina
2. An Executor is how Jina processes Documents
3. A Flow is how Jina streamlines and scales Executors
Document
Document is the basic data type that Jina operates with, it is agnostic to the type /format of data. Text, picture, audio, video are all considered as documents in Jina. The superset of document data type is DocumentArray, it wraps up multiple individual documents and acts as a container for them.
Executor
Executor is the smallest algorithmic unit in Jina that is used to process the documents be it encoding images into vectors, storing vectors on the disk, ranking results all of them are formulated as executors.
Some common executors are as follows:
Crafter: Crafter is used for pre-processing the documents into chunks.
Encoder: The encoder takes the input pre-processed chuck of documents from the crafter and encodes them into embedding vectors.
Indexer: Indexer takes the encoded vectors as input and indexes and stores the vectors in a key-value fashion.
Ranker: Ranker runs on the indexed storage and sorts the results based on a certain ranking.
· An Executor should subclass directly from jina.Executor class.
· An Executor class is a bag of functions with a shared state (via self) allowing it to contain an arbitrary number of functions with arbitrary names.
· Functions decorated with @requests the decorator will be invoked according to their on= endpoint.
Flow
Flow is used to streamlines and scales executors, it represents high-level tasks like indexing, searching, training, etc. It acts as a context manager and orchestrates a group of executors to accomplish a single task e.g. if you want to index the data you need a sequence of executors like crafter, encoder, indexer to work in tandem with each other in order to achieve the desired result. Flow is a service, allowing multiple clients to access it via gRPC / REST / WebSocket from a public or private network.
Flow follows a lazy construction pattern, so it won’t actually run until you use with to open it. Flows can be created by simply importing them from the jina core library, and then adding executors to it.
Compared to symbolic search, neural search:
Removes the fragile pipeline, making the system more resilient and scalable
Finds a better way to represent the underlying semantics of products and search queries
Learns as it goes along, so improves over time
Jina AI Improved the engineering efficiency to their ecosystem, so developer can focus on innovating with the data applications
Jina AI Cloud provides Free CPU/GPU hosting.
Alternative of Jina AI
In the area of neural search using deep learning. There are few alternative of Jina AI such as Hugging face[1], Search.io[2], and baidu[3].
Reference
2. https://www.search.io/features/neuralsearch
3. https://abhibisht89-neural-search-engine.hf.space/api
5. https://ieeexplore.ieee.org/abstract/document/9445247
6. https://link.springer.com/chapter/10.1007/978-3-030-96791-8_20