Jina-面向任何类别数据的云原生神经搜索框架

x86/64、arm64、v6、v7	Linux/MacOS和Python 3.7/3.8/3.9	Docker用户
最低要求 _{(不支持HTTP、WebSocket、Docker)}	`pip install jina`	`docker run jinaai/jina:latest`
_Daemon	_{pip install "jina[daemon]"}	_{docker run --network=host jinaai/jina:latest-daemon}
_{使用附加服务}	_{pip install "jina[devel]"}	_{docker run jinaai/jina:latest-devel}

版本标识符are explained here吉娜可以继续奔跑Windows Subsystem for Linux我们欢迎社会各界帮助我们native Windows support

开始使用

文档、执行者和流是JINA中的三个基本概念

📄 Document是济纳的基本数据类型；
⚙️ Executor是吉娜处理文件的方式；
🔀 Flow是JINA精简和分发执行器的方式

1个️⃣复制-粘贴下面的最小示例并运行它：

^{💡预赛：character embedding，pooling，Euclidean distance}

import numpy as np from jina import Document, DocumentArray, Executor, Flow, requests class CharEmbed(Executor):  # a simple character embedding with mean-pooling offset = 32 # letter `a` dim = 127 - offset + 1 # last pos reserved for `UNK` char_embd = np.eye(dim) * 1 # one-hot embedding for all chars @requests def foo(self, docs: DocumentArray, **kwargs):
        for d in docs:
            r_emb = [ord(c) - self.offset if self.offset <= ord(c) <= 127 else (self.dim - 1) for c in d.text]
            d.embedding = self.char_embd[r_emb, :].mean(axis=0)  # average pooling class Indexer(Executor):
    _docs = DocumentArray()  # for storing all documents in memory @requests(on='/index') def foo(self, docs: DocumentArray, **kwargs):
        self._docs.extend(docs)  # extend stored `docs` @requests(on='/search') def bar(self, docs: DocumentArray, **kwargs):
        q = np.stack(docs.get_attributes('embedding'))  # get all embeddings from query docs d = np.stack(self._docs.get_attributes('embedding'))  # get all embeddings from stored docs euclidean_dist = np.linalg.norm(q[:, None, :] - d[None, :, :], axis=-1)  # pairwise euclidean distance for dist, query in zip(euclidean_dist, docs):  # add & sort match query.matches = [Document(self._docs[int(idx)], copy=True, scores={'euclid': d}) for idx, d in enumerate(dist)]
            query.matches.sort(key=lambda m: m.scores['euclid'].value)  # sort matches by their values f = Flow(port_expose=12345, protocol='http', cors=True).add(uses=CharEmbed, parallel=2).add(uses=Indexer)  # build a Flow, with 2 parallel CharEmbed, tho unnecessary with f:
    f.post('/index', (Document(text=t.strip()) for t in open(__file__) if t.strip()))  # index all lines of _this_ file f.block()  # block for listening request

2个️⃣打开http://localhost:12345/docs(扩展的Swagger UI)在浏览器中，单击/搜索制表符和输入：

{"data": [{"text": "@requests(on=something)"}]}

也就是说，我们希望从上面的代码片段中找到与以下内容最相似的行@request(on=something)现在单击执行巴顿！

3个️⃣不是图形用户界面的人？那就让我们用Python来做吧！保持上述服务器运行，并启动一个简单的客户端：

from jina import Client, Document from jina.types.request import Response def print_matches(resp: Response):  # the callback function invoked when task is done for idx, d in enumerate(resp.docs[0].matches[:3]):  # print top-3 matches print(f'[{idx}]{d.scores["euclid"].value:2f}: "{d.text}"')


c = Client(protocol='http', port_expose=12345)  # connect to localhost:12345 c.post('/search', Document(text='request(on=something)'), on_done=print_matches)

，它打印以下结果：

         Client@1608[S]:connected to the gateway at localhost:12345!
[0]0.168526: "@requests(on='/index')"
[1]0.181676: "@requests(on='/search')"
[2]0.192049: "query.matches = [Document(self._docs[int(idx)], copy=True, score=d) for idx, d in enumerate(dist)]"

^{😔不管用吗？我们的错！Please report it here.}

阅读教程

🧠What is “Neural Search”?
📄Document&DocumentArray：JINA中的基本数据类型
⚙️Executor：JINA是如何处理文件的
🔀Flow：JINA如何精简和分发执行器
- Minimum Working Example
- Flow API
🤹Serving Jina
📓Developer Reference
🧼Clean & Efficient Coding in Jina
😎3 Reasons to Use Jina 2.0

支持

加入我们的Slack community与我们的工程师讨论您的使用情形、问题和支持查询
加入我们的Engineering All Hands会面，讨论您的用例并了解JINA的新功能
- 什么时候?每个月的第二个星期二
- 哪里?缩放(see our public events calendar/.ical)和live stream on YouTube
在我们的网站上订阅最新的视频教程YouTube channel