Introduce Plugins (#13836)

Signed-off-by: yihong0618 <zouzou0208@gmail.com> Signed-off-by: -LAN- <laipz8200@outlook.com> Signed-off-by: xhe <xw897002528@gmail.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: takatost <takatost@gmail.com> Co-authored-by: kurokobo <kuro664@gmail.com> Co-authored-by: Novice Lee <novicelee@NoviPro.local> Co-authored-by: zxhlyh <jasonapring2015@outlook.com> Co-authored-by: AkaraChen <akarachen@outlook.com> Co-authored-by: Yi <yxiaoisme@gmail.com> Co-authored-by: Joel <iamjoel007@gmail.com> Co-authored-by: JzoNg <jzongcode@gmail.com> Co-authored-by: twwu <twwu@dify.ai> Co-authored-by: Hiroshi Fujita <fujita-h@users.noreply.github.com> Co-authored-by: AkaraChen <85140972+AkaraChen@users.noreply.github.com> Co-authored-by: NFish <douxc512@gmail.com> Co-authored-by: Wu Tianwei <30284043+WTW0313@users.noreply.github.com> Co-authored-by: 非法操作 <hjlarry@163.com> Co-authored-by: Novice <857526207@qq.com> Co-authored-by: Hiroki Nagai <82458324+nagaihiroki-git@users.noreply.github.com> Co-authored-by: Gen Sato <52241300+halogen22@users.noreply.github.com> Co-authored-by: eux <euxuuu@gmail.com> Co-authored-by: huangzhuo1949 <167434202+huangzhuo1949@users.noreply.github.com> Co-authored-by: huangzhuo <huangzhuo1@xiaomi.com> Co-authored-by: lotsik <lotsik@mail.ru> Co-authored-by: crazywoola <100913391+crazywoola@users.noreply.github.com> Co-authored-by: nite-knite <nkCoding@gmail.com> Co-authored-by: Jyong <76649700+JohnJyong@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: gakkiyomi <gakkiyomi@aliyun.com> Co-authored-by: CN-P5 <heibai2006@gmail.com> Co-authored-by: CN-P5 <heibai2006@qq.com> Co-authored-by: Chuehnone <1897025+chuehnone@users.noreply.github.com> Co-authored-by: yihong <zouzou0208@gmail.com> Co-authored-by: Kevin9703 <51311316+Kevin9703@users.noreply.github.com> Co-authored-by: -LAN- <laipz8200@outlook.com> Co-authored-by: Boris Feld <lothiraldan@gmail.com> Co-authored-by: mbo <himabo@gmail.com> Co-authored-by: mabo <mabo@aeyes.ai> Co-authored-by: Warren Chen <warren.chen830@gmail.com> Co-authored-by: JzoNgKVO <27049666+JzoNgKVO@users.noreply.github.com> Co-authored-by: jiandanfeng <chenjh3@wangsu.com> Co-authored-by: zhu-an <70234959+xhdd123321@users.noreply.github.com> Co-authored-by: zhaoqingyu.1075 <zhaoqingyu.1075@bytedance.com> Co-authored-by: 海狸大師 <86974027+yenslife@users.noreply.github.com> Co-authored-by: Xu Song <xusong.vip@gmail.com> Co-authored-by: rayshaw001 <396301947@163.com> Co-authored-by: Ding Jiatong <dingjiatong@gmail.com> Co-authored-by: Bowen Liang <liangbowen@gf.com.cn> Co-authored-by: JasonVV <jasonwangiii@outlook.com> Co-authored-by: le0zh <newlight@qq.com> Co-authored-by: zhuxinliang <zhuxinliang@didiglobal.com> Co-authored-by: k-zaku <zaku99@outlook.jp> Co-authored-by: luckylhb90 <luckylhb90@gmail.com> Co-authored-by: hobo.l <hobo.l@binance.com> Co-authored-by: jiangbo721 <365065261@qq.com> Co-authored-by: 刘江波 <jiangbo721@163.com> Co-authored-by: Shun Miyazawa <34241526+miya@users.noreply.github.com> Co-authored-by: EricPan <30651140+Egfly@users.noreply.github.com> Co-authored-by: crazywoola <427733928@qq.com> Co-authored-by: sino <sino2322@gmail.com> Co-authored-by: Jhvcc <37662342+Jhvcc@users.noreply.github.com> Co-authored-by: lowell <lowell.hu@zkteco.in> Co-authored-by: Boris Polonsky <BorisPolonsky@users.noreply.github.com> Co-authored-by: Ademílson Tonato <ademilsonft@outlook.com> Co-authored-by: Ademílson Tonato <ademilson.tonato@refurbed.com> Co-authored-by: IWAI, Masaharu <iwaim.sub@gmail.com> Co-authored-by: Yueh-Po Peng (Yabi) <94939112+y10ab1@users.noreply.github.com> Co-authored-by: Jason <ggbbddjm@gmail.com> Co-authored-by: Xin Zhang <sjhpzx@gmail.com> Co-authored-by: yjc980121 <3898524+yjc980121@users.noreply.github.com> Co-authored-by: heyszt <36215648+hieheihei@users.noreply.github.com> Co-authored-by: Abdullah AlOsaimi <osaimiacc@gmail.com> Co-authored-by: Abdullah AlOsaimi <189027247+osaimi@users.noreply.github.com> Co-authored-by: Yingchun Lai <laiyingchun@apache.org> Co-authored-by: Hash Brown <hi@xzd.me> Co-authored-by: zuodongxu <192560071+zuodongxu@users.noreply.github.com> Co-authored-by: Masashi Tomooka <tmokmss@users.noreply.github.com> Co-authored-by: aplio <ryo.091219@gmail.com> Co-authored-by: Obada Khalili <54270856+obadakhalili@users.noreply.github.com> Co-authored-by: Nam Vu <zuzoovn@gmail.com> Co-authored-by: Kei YAMAZAKI <1715090+kei-yamazaki@users.noreply.github.com> Co-authored-by: TechnoHouse <13776377+deephbz@users.noreply.github.com> Co-authored-by: Riddhimaan-Senapati <114703025+Riddhimaan-Senapati@users.noreply.github.com> Co-authored-by: MaFee921 <31881301+2284730142@users.noreply.github.com> Co-authored-by: te-chan <t-nakanome@sakura-is.co.jp> Co-authored-by: HQidea <HQidea@users.noreply.github.com> Co-authored-by: Joshbly <36315710+Joshbly@users.noreply.github.com> Co-authored-by: xhe <xw897002528@gmail.com> Co-authored-by: weiwenyan-dev <154779315+weiwenyan-dev@users.noreply.github.com> Co-authored-by: ex_wenyan.wei <ex_wenyan.wei@tcl.com> Co-authored-by: engchina <12236799+engchina@users.noreply.github.com> Co-authored-by: engchina <atjapan2015@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: 呆萌闷油瓶 <253605712@qq.com> Co-authored-by: Kemal <kemalmeler@outlook.com> Co-authored-by: Lazy_Frog <4590648+lazyFrogLOL@users.noreply.github.com> Co-authored-by: Yi Xiao <54782454+YIXIAO0@users.noreply.github.com> Co-authored-by: Steven sun <98230804+Tuyohai@users.noreply.github.com> Co-authored-by: steven <sunzwj@digitalchina.com> Co-authored-by: Kalo Chin <91766386+fdb02983rhy@users.noreply.github.com> Co-authored-by: Katy Tao <34019945+KatyTao@users.noreply.github.com> Co-authored-by: depy <42985524+h4ckdepy@users.noreply.github.com> Co-authored-by: 胡春东 <gycm520@gmail.com> Co-authored-by: Junjie.M <118170653@qq.com> Co-authored-by: MuYu <mr.muzea@gmail.com> Co-authored-by: Naoki Takashima <39912547+takatea@users.noreply.github.com> Co-authored-by: Summer-Gu <37869445+gubinjie@users.noreply.github.com> Co-authored-by: Fei He <droxer.he@gmail.com> Co-authored-by: ybalbert001 <120714773+ybalbert001@users.noreply.github.com> Co-authored-by: Yuanbo Li <ybalbert@amazon.com> Co-authored-by: douxc <7553076+douxc@users.noreply.github.com> Co-authored-by: liuzhenghua <1090179900@qq.com> Co-authored-by: Wu Jiayang <62842862+Wu-Jiayang@users.noreply.github.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: kimjion <45935338+kimjion@users.noreply.github.com> Co-authored-by: AugNSo <song.tiankai@icloud.com> Co-authored-by: llinvokerl <38915183+llinvokerl@users.noreply.github.com> Co-authored-by: liusurong.lsr <liusurong.lsr@alibaba-inc.com> Co-authored-by: Vasu Negi <vasu-negi@users.noreply.github.com> Co-authored-by: Hundredwz <1808096180@qq.com> Co-authored-by: Xiyuan Chen <52963600+GareArc@users.noreply.github.com>
2025-02-17 17:05:13 +08:00
parent 222df44d21
commit 403e2d58b9
3272 changed files with 66339 additions and 281594 deletions
--- a/api/tests/integration_tests/model_runtime/__mock/anthropic.py
+++ b/api/tests/integration_tests/model_runtime/__mock/anthropic.py
@@ -1,98 +0,0 @@
-import os
-from collections.abc import Iterable
-from typing import Any, Literal, Union
-
-import anthropic
-import pytest
-from _pytest.monkeypatch import MonkeyPatch
-from anthropic import Stream
-from anthropic.resources import Messages
-from anthropic.types import (
-    ContentBlock,
-    ContentBlockDeltaEvent,
-    Message,
-    MessageDeltaEvent,
-    MessageDeltaUsage,
-    MessageParam,
-    MessageStartEvent,
-    MessageStopEvent,
-    MessageStreamEvent,
-    TextDelta,
-    Usage,
-)
-from anthropic.types.message_delta_event import Delta
-
-MOCK = os.getenv("MOCK_SWITCH", "false") == "true"
-
-
-class MockAnthropicClass:
-    @staticmethod
-    def mocked_anthropic_chat_create_sync(model: str) -> Message:
-        return Message(
-            id="msg-123",
-            type="message",
-            role="assistant",
-            content=[ContentBlock(text="hello, I'm a chatbot from anthropic", type="text")],
-            model=model,
-            stop_reason="stop_sequence",
-            usage=Usage(input_tokens=1, output_tokens=1),
-        )
-
-    @staticmethod
-    def mocked_anthropic_chat_create_stream(model: str) -> Stream[MessageStreamEvent]:
-        full_response_text = "hello, I'm a chatbot from anthropic"
-
-        yield MessageStartEvent(
-            type="message_start",
-            message=Message(
-                id="msg-123",
-                content=[],
-                role="assistant",
-                model=model,
-                stop_reason=None,
-                type="message",
-                usage=Usage(input_tokens=1, output_tokens=1),
-            ),
-        )
-
-        index = 0
-        for i in range(0, len(full_response_text)):
-            yield ContentBlockDeltaEvent(
-                type="content_block_delta", delta=TextDelta(text=full_response_text[i], type="text_delta"), index=index
-            )
-
-            index += 1
-
-        yield MessageDeltaEvent(
-            type="message_delta", delta=Delta(stop_reason="stop_sequence"), usage=MessageDeltaUsage(output_tokens=1)
-        )
-
-        yield MessageStopEvent(type="message_stop")
-
-    def mocked_anthropic(
-        self: Messages,
-        *,
-        max_tokens: int,
-        messages: Iterable[MessageParam],
-        model: str,
-        stream: Literal[True],
-        **kwargs: Any,
-    ) -> Union[Message, Stream[MessageStreamEvent]]:
-        if len(self._client.api_key) < 18:
-            raise anthropic.AuthenticationError("Invalid API key")
-
-        if stream:
-            return MockAnthropicClass.mocked_anthropic_chat_create_stream(model=model)
-        else:
-            return MockAnthropicClass.mocked_anthropic_chat_create_sync(model=model)
-
-
-@pytest.fixture
-def setup_anthropic_mock(request, monkeypatch: MonkeyPatch):
-    if MOCK:
-        monkeypatch.setattr(Messages, "create", MockAnthropicClass.mocked_anthropic)
-
-    yield
-
-    if MOCK:
-        monkeypatch.undo()
--- a/api/tests/integration_tests/model_runtime/__mock/fishaudio.py
+++ b/api/tests/integration_tests/model_runtime/__mock/fishaudio.py
@@ -1,82 +0,0 @@
-import os
-from collections.abc import Callable
-from typing import Literal
-
-import httpx
-import pytest
-from _pytest.monkeypatch import MonkeyPatch
-
-
-def mock_get(*args, **kwargs):
-    if kwargs.get("headers", {}).get("Authorization") != "Bearer test":
-        raise httpx.HTTPStatusError(
-            "Invalid API key",
-            request=httpx.Request("GET", ""),
-            response=httpx.Response(401),
-        )
-
-    return httpx.Response(
-        200,
-        json={
-            "items": [
-                {"title": "Model 1", "_id": "model1"},
-                {"title": "Model 2", "_id": "model2"},
-            ]
-        },
-        request=httpx.Request("GET", ""),
-    )
-
-
-def mock_stream(*args, **kwargs):
-    class MockStreamResponse:
-        def __init__(self):
-            self.status_code = 200
-
-        def __enter__(self):
-            return self
-
-        def __exit__(self, exc_type, exc_val, exc_tb):
-            pass
-
-        def iter_bytes(self):
-            yield b"Mocked audio data"
-
-    return MockStreamResponse()
-
-
-def mock_fishaudio(
-    monkeypatch: MonkeyPatch,
-    methods: list[Literal["list-models", "tts"]],
-) -> Callable[[], None]:
-    """
-    mock fishaudio module
-
-    :param monkeypatch: pytest monkeypatch fixture
-    :return: unpatch function
-    """
-
-    def unpatch() -> None:
-        monkeypatch.undo()
-
-    if "list-models" in methods:
-        monkeypatch.setattr(httpx, "get", mock_get)
-
-    if "tts" in methods:
-        monkeypatch.setattr(httpx, "stream", mock_stream)
-
-    return unpatch
-
-
-MOCK = os.getenv("MOCK_SWITCH", "false").lower() == "true"
-
-
-@pytest.fixture
-def setup_fishaudio_mock(request, monkeypatch):
-    methods = request.param if hasattr(request, "param") else []
-    if MOCK:
-        unpatch = mock_fishaudio(monkeypatch, methods=methods)
-
-    yield
-
-    if MOCK:
-        unpatch()
--- a/api/tests/integration_tests/model_runtime/__mock/google.py
+++ b/api/tests/integration_tests/model_runtime/__mock/google.py
@@ -1,115 +0,0 @@
-from unittest.mock import MagicMock
-
-import google.generativeai.types.generation_types as generation_config_types  # type: ignore
-import pytest
-from _pytest.monkeypatch import MonkeyPatch
-from google.ai import generativelanguage as glm
-from google.ai.generativelanguage_v1beta.types import content as gag_content
-from google.generativeai import GenerativeModel
-from google.generativeai.types import GenerateContentResponse, content_types, safety_types
-from google.generativeai.types.generation_types import BaseGenerateContentResponse
-
-from extensions import ext_redis
-
-
-class MockGoogleResponseClass:
-    _done = False
-
-    def __iter__(self):
-        full_response_text = "it's google!"
-
-        for i in range(0, len(full_response_text) + 1, 1):
-            if i == len(full_response_text):
-                self._done = True
-                yield GenerateContentResponse(
-                    done=True, iterator=None, result=glm.GenerateContentResponse({}), chunks=[]
-                )
-            else:
-                yield GenerateContentResponse(
-                    done=False, iterator=None, result=glm.GenerateContentResponse({}), chunks=[]
-                )
-
-
-class MockGoogleResponseCandidateClass:
-    finish_reason = "stop"
-
-    @property
-    def content(self) -> gag_content.Content:
-        return gag_content.Content(parts=[gag_content.Part(text="it's google!")])
-
-
-class MockGoogleClass:
-    @staticmethod
-    def generate_content_sync() -> GenerateContentResponse:
-        return GenerateContentResponse(done=True, iterator=None, result=glm.GenerateContentResponse({}), chunks=[])
-
-    @staticmethod
-    def generate_content_stream() -> MockGoogleResponseClass:
-        return MockGoogleResponseClass()
-
-    def generate_content(
-        self: GenerativeModel,
-        contents: content_types.ContentsType,
-        *,
-        generation_config: generation_config_types.GenerationConfigType | None = None,
-        safety_settings: safety_types.SafetySettingOptions | None = None,
-        stream: bool = False,
-        **kwargs,
-    ) -> GenerateContentResponse:
-        if stream:
-            return MockGoogleClass.generate_content_stream()
-
-        return MockGoogleClass.generate_content_sync()
-
-    @property
-    def generative_response_text(self) -> str:
-        return "it's google!"
-
-    @property
-    def generative_response_candidates(self) -> list[MockGoogleResponseCandidateClass]:
-        return [MockGoogleResponseCandidateClass()]
-
-
-def mock_configure(api_key: str):
-    if len(api_key) < 16:
-        raise Exception("Invalid API key")
-
-
-class MockFileState:
-    def __init__(self):
-        self.name = "FINISHED"
-
-
-class MockGoogleFile:
-    def __init__(self, name: str = "mock_file_name"):
-        self.name = name
-        self.state = MockFileState()
-
-
-def mock_get_file(name: str) -> MockGoogleFile:
-    return MockGoogleFile(name)
-
-
-def mock_upload_file(path: str, mime_type: str) -> MockGoogleFile:
-    return MockGoogleFile()
-
-
-@pytest.fixture
-def setup_google_mock(request, monkeypatch: MonkeyPatch):
-    monkeypatch.setattr(BaseGenerateContentResponse, "text", MockGoogleClass.generative_response_text)
-    monkeypatch.setattr(BaseGenerateContentResponse, "candidates", MockGoogleClass.generative_response_candidates)
-    monkeypatch.setattr(GenerativeModel, "generate_content", MockGoogleClass.generate_content)
-    monkeypatch.setattr("google.generativeai.configure", mock_configure)
-    monkeypatch.setattr("google.generativeai.get_file", mock_get_file)
-    monkeypatch.setattr("google.generativeai.upload_file", mock_upload_file)
-
-    yield
-
-    monkeypatch.undo()
-
-
-@pytest.fixture
-def setup_mock_redis() -> None:
-    ext_redis.redis_client.get = MagicMock(return_value=None)
-    ext_redis.redis_client.setex = MagicMock(return_value=None)
-    ext_redis.redis_client.exists = MagicMock(return_value=True)
--- a/api/tests/integration_tests/model_runtime/__mock/huggingface.py
+++ b/api/tests/integration_tests/model_runtime/__mock/huggingface.py
@@ -1,20 +0,0 @@
-import os
-
-import pytest
-from _pytest.monkeypatch import MonkeyPatch
-from huggingface_hub import InferenceClient  # type: ignore
-
-from tests.integration_tests.model_runtime.__mock.huggingface_chat import MockHuggingfaceChatClass
-
-MOCK = os.getenv("MOCK_SWITCH", "false").lower() == "true"
-
-
-@pytest.fixture
-def setup_huggingface_mock(request, monkeypatch: MonkeyPatch):
-    if MOCK:
-        monkeypatch.setattr(InferenceClient, "text_generation", MockHuggingfaceChatClass.text_generation)
-
-    yield
-
-    if MOCK:
-        monkeypatch.undo()
--- a/api/tests/integration_tests/model_runtime/__mock/huggingface_chat.py
+++ b/api/tests/integration_tests/model_runtime/__mock/huggingface_chat.py
@@ -1,56 +0,0 @@
-import re
-from collections.abc import Generator
-from typing import Any, Literal, Optional, Union
-
-from _pytest.monkeypatch import MonkeyPatch
-from huggingface_hub import InferenceClient  # type: ignore
-from huggingface_hub.inference._text_generation import (  # type: ignore
-    Details,
-    StreamDetails,
-    TextGenerationResponse,
-    TextGenerationStreamResponse,
-    Token,
-)
-from huggingface_hub.utils import BadRequestError  # type: ignore
-
-
-class MockHuggingfaceChatClass:
-    @staticmethod
-    def generate_create_sync(model: str) -> TextGenerationResponse:
-        response = TextGenerationResponse(
-            generated_text="You can call me Miku Miku o~e~o~",
-            details=Details(
-                finish_reason="length",
-                generated_tokens=6,
-                tokens=[Token(id=0, text="You", logprob=0.0, special=False) for i in range(0, 6)],
-            ),
-        )
-
-        return response
-
-    @staticmethod
-    def generate_create_stream(model: str) -> Generator[TextGenerationStreamResponse, None, None]:
-        full_text = "You can call me Miku Miku o~e~o~"
-
-        for i in range(0, len(full_text)):
-            response = TextGenerationStreamResponse(
-                token=Token(id=i, text=full_text[i], logprob=0.0, special=False),
-            )
-            response.generated_text = full_text[i]
-            response.details = StreamDetails(finish_reason="stop_sequence", generated_tokens=1)
-
-            yield response
-
-    def text_generation(
-        self: InferenceClient, prompt: str, *, stream: Literal[False] = ..., model: Optional[str] = None, **kwargs: Any
-    ) -> Union[TextGenerationResponse, Generator[TextGenerationStreamResponse, None, None]]:
-        # check if key is valid
-        if not re.match(r"Bearer\shf\-[a-zA-Z0-9]{16,}", self.headers["authorization"]):
-            raise BadRequestError("Invalid API key")
-
-        if model is None:
-            raise BadRequestError("Invalid model")
-
-        if stream:
-            return MockHuggingfaceChatClass.generate_create_stream(model)
-        return MockHuggingfaceChatClass.generate_create_sync(model)
--- a/api/tests/integration_tests/model_runtime/__mock/huggingface_tei.py
+++ b/api/tests/integration_tests/model_runtime/__mock/huggingface_tei.py
@@ -1,94 +0,0 @@
-from core.model_runtime.model_providers.huggingface_tei.tei_helper import TeiModelExtraParameter
-
-
-class MockTEIClass:
-    @staticmethod
-    def get_tei_extra_parameter(server_url: str, model_name: str) -> TeiModelExtraParameter:
-        # During mock, we don't have a real server to query, so we just return a dummy value
-        if "rerank" in model_name:
-            model_type = "reranker"
-        else:
-            model_type = "embedding"
-
-        return TeiModelExtraParameter(model_type=model_type, max_input_length=512, max_client_batch_size=1)
-
-    @staticmethod
-    def invoke_tokenize(server_url: str, texts: list[str]) -> list[list[dict]]:
-        # Use space as token separator, and split the text into tokens
-        tokenized_texts = []
-        for text in texts:
-            tokens = text.split(" ")
-            current_index = 0
-            tokenized_text = []
-            for idx, token in enumerate(tokens):
-                s_token = {
-                    "id": idx,
-                    "text": token,
-                    "special": False,
-                    "start": current_index,
-                    "stop": current_index + len(token),
-                }
-                current_index += len(token) + 1
-                tokenized_text.append(s_token)
-            tokenized_texts.append(tokenized_text)
-        return tokenized_texts
-
-    @staticmethod
-    def invoke_embeddings(server_url: str, texts: list[str]) -> dict:
-        # {
-        #     "object": "list",
-        #     "data": [
-        #         {
-        #             "object": "embedding",
-        #             "embedding": [...],
-        #             "index": 0
-        #         }
-        #     ],
-        #     "model": "MODEL_NAME",
-        #     "usage": {
-        #         "prompt_tokens": 3,
-        #         "total_tokens": 3
-        #     }
-        # }
-        embeddings = []
-        for idx in range(len(texts)):
-            embedding = [0.1] * 768
-            embeddings.append(
-                {
-                    "object": "embedding",
-                    "embedding": embedding,
-                    "index": idx,
-                }
-            )
-        return {
-            "object": "list",
-            "data": embeddings,
-            "model": "MODEL_NAME",
-            "usage": {
-                "prompt_tokens": sum(len(text.split(" ")) for text in texts),
-                "total_tokens": sum(len(text.split(" ")) for text in texts),
-            },
-        }
-
-    @staticmethod
-    def invoke_rerank(server_url: str, query: str, texts: list[str]) -> list[dict]:
-        #         Example response:
-        # [
-        #     {
-        #         "index": 0,
-        #         "text": "Deep Learning is ...",
-        #         "score": 0.9950755
-        #     }
-        # ]
-        reranked_docs = []
-        for idx, text in enumerate(texts):
-            reranked_docs.append(
-                {
-                    "index": idx,
-                    "text": text,
-                    "score": 0.9,
-                }
-            )
-            # For mock, only return the first document
-            break
-        return reranked_docs
--- a/api/tests/integration_tests/model_runtime/__mock/nomic_embeddings.py
+++ b/api/tests/integration_tests/model_runtime/__mock/nomic_embeddings.py
@@ -1,59 +0,0 @@
-import os
-from collections.abc import Callable
-from typing import Any, Literal
-
-import pytest
-
-# import monkeypatch
-from _pytest.monkeypatch import MonkeyPatch
-from nomic import embed  # type: ignore
-
-
-def create_embedding(texts: list[str], model: str, **kwargs: Any) -> dict:
-    texts_len = len(texts)
-
-    foo_embedding_sample = 0.123456
-
-    combined = {
-        "embeddings": [[foo_embedding_sample for _ in range(768)] for _ in range(texts_len)],
-        "usage": {"prompt_tokens": texts_len, "total_tokens": texts_len},
-        "model": model,
-        "inference_mode": "remote",
-    }
-
-    return combined
-
-
-def mock_nomic(
-    monkeypatch: MonkeyPatch,
-    methods: list[Literal["text_embedding"]],
-) -> Callable[[], None]:
-    """
-    mock nomic module
-
-    :param monkeypatch: pytest monkeypatch fixture
-    :return: unpatch function
-    """
-
-    def unpatch() -> None:
-        monkeypatch.undo()
-
-    if "text_embedding" in methods:
-        monkeypatch.setattr(embed, "text", create_embedding)
-
-    return unpatch
-
-
-MOCK = os.getenv("MOCK_SWITCH", "false").lower() == "true"
-
-
-@pytest.fixture
-def setup_nomic_mock(request, monkeypatch):
-    methods = request.param if hasattr(request, "param") else []
-    if MOCK:
-        unpatch = mock_nomic(monkeypatch, methods=methods)
-
-    yield
-
-    if MOCK:
-        unpatch()
--- a/api/tests/integration_tests/model_runtime/__mock/openai.py
+++ b/api/tests/integration_tests/model_runtime/__mock/openai.py
@@ -1,71 +0,0 @@
-import os
-from collections.abc import Callable
-from typing import Literal
-
-import pytest
-
-# import monkeypatch
-from _pytest.monkeypatch import MonkeyPatch
-from openai.resources.audio.transcriptions import Transcriptions
-from openai.resources.chat import Completions as ChatCompletions
-from openai.resources.completions import Completions
-from openai.resources.embeddings import Embeddings
-from openai.resources.models import Models
-from openai.resources.moderations import Moderations
-
-from tests.integration_tests.model_runtime.__mock.openai_chat import MockChatClass
-from tests.integration_tests.model_runtime.__mock.openai_completion import MockCompletionsClass
-from tests.integration_tests.model_runtime.__mock.openai_embeddings import MockEmbeddingsClass
-from tests.integration_tests.model_runtime.__mock.openai_moderation import MockModerationClass
-from tests.integration_tests.model_runtime.__mock.openai_remote import MockModelClass
-from tests.integration_tests.model_runtime.__mock.openai_speech2text import MockSpeech2TextClass
-
-
-def mock_openai(
-    monkeypatch: MonkeyPatch,
-    methods: list[Literal["completion", "chat", "remote", "moderation", "speech2text", "text_embedding"]],
-) -> Callable[[], None]:
-    """
-    mock openai module
-
-    :param monkeypatch: pytest monkeypatch fixture
-    :return: unpatch function
-    """
-
-    def unpatch() -> None:
-        monkeypatch.undo()
-
-    if "completion" in methods:
-        monkeypatch.setattr(Completions, "create", MockCompletionsClass.completion_create)
-
-    if "chat" in methods:
-        monkeypatch.setattr(ChatCompletions, "create", MockChatClass.chat_create)
-
-    if "remote" in methods:
-        monkeypatch.setattr(Models, "list", MockModelClass.list)
-
-    if "moderation" in methods:
-        monkeypatch.setattr(Moderations, "create", MockModerationClass.moderation_create)
-
-    if "speech2text" in methods:
-        monkeypatch.setattr(Transcriptions, "create", MockSpeech2TextClass.speech2text_create)
-
-    if "text_embedding" in methods:
-        monkeypatch.setattr(Embeddings, "create", MockEmbeddingsClass.create_embeddings)
-
-    return unpatch
-
-
-MOCK = os.getenv("MOCK_SWITCH", "false").lower() == "true"
-
-
-@pytest.fixture
-def setup_openai_mock(request, monkeypatch):
-    methods = request.param if hasattr(request, "param") else []
-    if MOCK:
-        unpatch = mock_openai(monkeypatch, methods=methods)
-
-    yield
-
-    if MOCK:
-        unpatch()
--- a/api/tests/integration_tests/model_runtime/__mock/openai_chat.py
+++ b/api/tests/integration_tests/model_runtime/__mock/openai_chat.py
@@ -1,267 +0,0 @@
-import re
-from collections.abc import Generator
-from json import dumps
-from time import time
-
-# import monkeypatch
-from typing import Any, Literal, Optional, Union
-
-from openai import AzureOpenAI, OpenAI
-from openai._types import NOT_GIVEN, NotGiven
-from openai.resources.chat.completions import Completions
-from openai.types import Completion as CompletionMessage
-from openai.types.chat import (
-    ChatCompletionChunk,
-    ChatCompletionMessageParam,
-    ChatCompletionMessageToolCall,
-    ChatCompletionToolParam,
-    completion_create_params,
-)
-from openai.types.chat.chat_completion import ChatCompletion as _ChatCompletion
-from openai.types.chat.chat_completion import Choice as _ChatCompletionChoice
-from openai.types.chat.chat_completion_chunk import (
-    Choice,
-    ChoiceDelta,
-    ChoiceDeltaFunctionCall,
-    ChoiceDeltaToolCall,
-    ChoiceDeltaToolCallFunction,
-)
-from openai.types.chat.chat_completion_message import ChatCompletionMessage, FunctionCall
-from openai.types.chat.chat_completion_message_tool_call import Function
-from openai.types.completion_usage import CompletionUsage
-
-from core.model_runtime.errors.invoke import InvokeAuthorizationError
-
-
-class MockChatClass:
-    @staticmethod
-    def generate_function_call(
-        functions: list[completion_create_params.Function] | NotGiven = NOT_GIVEN,
-    ) -> Optional[FunctionCall]:
-        if not functions or len(functions) == 0:
-            return None
-        function: completion_create_params.Function = functions[0]
-        function_name = function["name"]
-        function_description = function["description"]
-        function_parameters = function["parameters"]
-        function_parameters_type = function_parameters["type"]
-        if function_parameters_type != "object":
-            return None
-        function_parameters_properties = function_parameters["properties"]
-        function_parameters_required = function_parameters["required"]
-        parameters = {}
-        for parameter_name, parameter in function_parameters_properties.items():
-            if parameter_name not in function_parameters_required:
-                continue
-            parameter_type = parameter["type"]
-            if parameter_type == "string":
-                if "enum" in parameter:
-                    if len(parameter["enum"]) == 0:
-                        continue
-                    parameters[parameter_name] = parameter["enum"][0]
-                else:
-                    parameters[parameter_name] = "kawaii"
-            elif parameter_type == "integer":
-                parameters[parameter_name] = 114514
-            elif parameter_type == "number":
-                parameters[parameter_name] = 1919810.0
-            elif parameter_type == "boolean":
-                parameters[parameter_name] = True
-
-        return FunctionCall(name=function_name, arguments=dumps(parameters))
-
-    @staticmethod
-    def generate_tool_calls(tools=NOT_GIVEN) -> Optional[list[ChatCompletionMessageToolCall]]:
-        list_tool_calls = []
-        if not tools or len(tools) == 0:
-            return None
-        tool = tools[0]
-
-        if "type" in tools and tools["type"] != "function":
-            return None
-
-        function = tool["function"]
-
-        function_call = MockChatClass.generate_function_call(functions=[function])
-        if function_call is None:
-            return None
-
-        list_tool_calls.append(
-            ChatCompletionMessageToolCall(
-                id="sakurajima-mai",
-                function=Function(
-                    name=function_call.name,
-                    arguments=function_call.arguments,
-                ),
-                type="function",
-            )
-        )
-
-        return list_tool_calls
-
-    @staticmethod
-    def mocked_openai_chat_create_sync(
-        model: str,
-        functions: list[completion_create_params.Function] | NotGiven = NOT_GIVEN,
-        tools: list[ChatCompletionToolParam] | NotGiven = NOT_GIVEN,
-    ) -> CompletionMessage:
-        tool_calls = []
-        function_call = MockChatClass.generate_function_call(functions=functions)
-        if not function_call:
-            tool_calls = MockChatClass.generate_tool_calls(tools=tools)
-
-        return _ChatCompletion(
-            id="cmpl-3QJQa5jXJ5Z5X",
-            choices=[
-                _ChatCompletionChoice(
-                    finish_reason="content_filter",
-                    index=0,
-                    message=ChatCompletionMessage(
-                        content="elaina", role="assistant", function_call=function_call, tool_calls=tool_calls
-                    ),
-                )
-            ],
-            created=int(time()),
-            model=model,
-            object="chat.completion",
-            system_fingerprint="",
-            usage=CompletionUsage(
-                prompt_tokens=2,
-                completion_tokens=1,
-                total_tokens=3,
-            ),
-        )
-
-    @staticmethod
-    def mocked_openai_chat_create_stream(
-        model: str,
-        functions: list[completion_create_params.Function] | NotGiven = NOT_GIVEN,
-        tools: list[ChatCompletionToolParam] | NotGiven = NOT_GIVEN,
-    ) -> Generator[ChatCompletionChunk, None, None]:
-        tool_calls = []
-        function_call = MockChatClass.generate_function_call(functions=functions)
-        if not function_call:
-            tool_calls = MockChatClass.generate_tool_calls(tools=tools)
-
-        full_text = "Hello, world!\n\n```python\nprint('Hello, world!')\n```"
-        for i in range(0, len(full_text) + 1):
-            if i == len(full_text):
-                yield ChatCompletionChunk(
-                    id="cmpl-3QJQa5jXJ5Z5X",
-                    choices=[
-                        Choice(
-                            delta=ChoiceDelta(
-                                content="",
-                                function_call=ChoiceDeltaFunctionCall(
-                                    name=function_call.name,
-                                    arguments=function_call.arguments,
-                                )
-                                if function_call
-                                else None,
-                                role="assistant",
-                                tool_calls=[
-                                    ChoiceDeltaToolCall(
-                                        index=0,
-                                        id="misaka-mikoto",
-                                        function=ChoiceDeltaToolCallFunction(
-                                            name=tool_calls[0].function.name,
-                                            arguments=tool_calls[0].function.arguments,
-                                        ),
-                                        type="function",
-                                    )
-                                ]
-                                if tool_calls and len(tool_calls) > 0
-                                else None,
-                            ),
-                            finish_reason="function_call",
-                            index=0,
-                        )
-                    ],
-                    created=int(time()),
-                    model=model,
-                    object="chat.completion.chunk",
-                    system_fingerprint="",
-                    usage=CompletionUsage(
-                        prompt_tokens=2,
-                        completion_tokens=17,
-                        total_tokens=19,
-                    ),
-                )
-            else:
-                yield ChatCompletionChunk(
-                    id="cmpl-3QJQa5jXJ5Z5X",
-                    choices=[
-                        Choice(
-                            delta=ChoiceDelta(
-                                content=full_text[i],
-                                role="assistant",
-                            ),
-                            finish_reason="content_filter",
-                            index=0,
-                        )
-                    ],
-                    created=int(time()),
-                    model=model,
-                    object="chat.completion.chunk",
-                    system_fingerprint="",
-                )
-
-    def chat_create(
-        self: Completions,
-        *,
-        messages: list[ChatCompletionMessageParam],
-        model: Union[
-            str,
-            Literal[
-                "gpt-4-1106-preview",
-                "gpt-4-vision-preview",
-                "gpt-4",
-                "gpt-4-0314",
-                "gpt-4-0613",
-                "gpt-4-32k",
-                "gpt-4-32k-0314",
-                "gpt-4-32k-0613",
-                "gpt-3.5-turbo-1106",
-                "gpt-3.5-turbo",
-                "gpt-3.5-turbo-16k",
-                "gpt-3.5-turbo-0301",
-                "gpt-3.5-turbo-0613",
-                "gpt-3.5-turbo-16k-0613",
-            ],
-        ],
-        functions: list[completion_create_params.Function] | NotGiven = NOT_GIVEN,
-        response_format: completion_create_params.ResponseFormat | NotGiven = NOT_GIVEN,
-        stream: Optional[Literal[False]] | NotGiven = NOT_GIVEN,
-        tools: list[ChatCompletionToolParam] | NotGiven = NOT_GIVEN,
-        **kwargs: Any,
-    ):
-        openai_models = [
-            "gpt-4-1106-preview",
-            "gpt-4-vision-preview",
-            "gpt-4",
-            "gpt-4-0314",
-            "gpt-4-0613",
-            "gpt-4-32k",
-            "gpt-4-32k-0314",
-            "gpt-4-32k-0613",
-            "gpt-3.5-turbo-1106",
-            "gpt-3.5-turbo",
-            "gpt-3.5-turbo-16k",
-            "gpt-3.5-turbo-0301",
-            "gpt-3.5-turbo-0613",
-            "gpt-3.5-turbo-16k-0613",
-        ]
-        azure_openai_models = ["gpt35", "gpt-4v", "gpt-35-turbo"]
-        if not re.match(r"^(https?):\/\/[^\s\/$.?#].[^\s]*$", str(self._client.base_url)):
-            raise InvokeAuthorizationError("Invalid base url")
-        if model in openai_models + azure_openai_models:
-            if not re.match(r"sk-[a-zA-Z0-9]{24,}$", self._client.api_key) and type(self._client) == OpenAI:
-                # sometime, provider use OpenAI compatible API will not have api key or have different api key format
-                # so we only check if model is in openai_models
-                raise InvokeAuthorizationError("Invalid api key")
-            if len(self._client.api_key) < 18 and type(self._client) == AzureOpenAI:
-                raise InvokeAuthorizationError("Invalid api key")
-        if stream:
-            return MockChatClass.mocked_openai_chat_create_stream(model=model, functions=functions, tools=tools)
-
-        return MockChatClass.mocked_openai_chat_create_sync(model=model, functions=functions, tools=tools)
--- a/api/tests/integration_tests/model_runtime/__mock/openai_completion.py
+++ b/api/tests/integration_tests/model_runtime/__mock/openai_completion.py
@@ -1,130 +0,0 @@
-import re
-from collections.abc import Generator
-from time import time
-
-# import monkeypatch
-from typing import Any, Literal, Optional, Union
-
-from openai import AzureOpenAI, BadRequestError, OpenAI
-from openai._types import NOT_GIVEN, NotGiven
-from openai.resources.completions import Completions
-from openai.types import Completion as CompletionMessage
-from openai.types.completion import CompletionChoice
-from openai.types.completion_usage import CompletionUsage
-
-from core.model_runtime.errors.invoke import InvokeAuthorizationError
-
-
-class MockCompletionsClass:
-    @staticmethod
-    def mocked_openai_completion_create_sync(model: str) -> CompletionMessage:
-        return CompletionMessage(
-            id="cmpl-3QJQa5jXJ5Z5X",
-            object="text_completion",
-            created=int(time()),
-            model=model,
-            system_fingerprint="",
-            choices=[
-                CompletionChoice(
-                    text="mock",
-                    index=0,
-                    logprobs=None,
-                    finish_reason="stop",
-                )
-            ],
-            usage=CompletionUsage(
-                prompt_tokens=2,
-                completion_tokens=1,
-                total_tokens=3,
-            ),
-        )
-
-    @staticmethod
-    def mocked_openai_completion_create_stream(model: str) -> Generator[CompletionMessage, None, None]:
-        full_text = "Hello, world!\n\n```python\nprint('Hello, world!')\n```"
-        for i in range(0, len(full_text) + 1):
-            if i == len(full_text):
-                yield CompletionMessage(
-                    id="cmpl-3QJQa5jXJ5Z5X",
-                    object="text_completion",
-                    created=int(time()),
-                    model=model,
-                    system_fingerprint="",
-                    choices=[
-                        CompletionChoice(
-                            text="",
-                            index=0,
-                            logprobs=None,
-                            finish_reason="stop",
-                        )
-                    ],
-                    usage=CompletionUsage(
-                        prompt_tokens=2,
-                        completion_tokens=17,
-                        total_tokens=19,
-                    ),
-                )
-            else:
-                yield CompletionMessage(
-                    id="cmpl-3QJQa5jXJ5Z5X",
-                    object="text_completion",
-                    created=int(time()),
-                    model=model,
-                    system_fingerprint="",
-                    choices=[
-                        CompletionChoice(text=full_text[i], index=0, logprobs=None, finish_reason="content_filter")
-                    ],
-                )
-
-    def completion_create(
-        self: Completions,
-        *,
-        model: Union[
-            str,
-            Literal[
-                "babbage-002",
-                "davinci-002",
-                "gpt-3.5-turbo-instruct",
-                "text-davinci-003",
-                "text-davinci-002",
-                "text-davinci-001",
-                "code-davinci-002",
-                "text-curie-001",
-                "text-babbage-001",
-                "text-ada-001",
-            ],
-        ],
-        prompt: Union[str, list[str], list[int], list[list[int]], None],
-        stream: Optional[Literal[False]] | NotGiven = NOT_GIVEN,
-        **kwargs: Any,
-    ):
-        openai_models = [
-            "babbage-002",
-            "davinci-002",
-            "gpt-3.5-turbo-instruct",
-            "text-davinci-003",
-            "text-davinci-002",
-            "text-davinci-001",
-            "code-davinci-002",
-            "text-curie-001",
-            "text-babbage-001",
-            "text-ada-001",
-        ]
-        azure_openai_models = ["gpt-35-turbo-instruct"]
-
-        if not re.match(r"^(https?):\/\/[^\s\/$.?#].[^\s]*$", str(self._client.base_url)):
-            raise InvokeAuthorizationError("Invalid base url")
-        if model in openai_models + azure_openai_models:
-            if not re.match(r"sk-[a-zA-Z0-9]{24,}$", self._client.api_key) and type(self._client) == OpenAI:
-                # sometime, provider use OpenAI compatible API will not have api key or have different api key format
-                # so we only check if model is in openai_models
-                raise InvokeAuthorizationError("Invalid api key")
-            if len(self._client.api_key) < 18 and type(self._client) == AzureOpenAI:
-                raise InvokeAuthorizationError("Invalid api key")
-
-        if not prompt:
-            raise BadRequestError("Invalid prompt")
-        if stream:
-            return MockCompletionsClass.mocked_openai_completion_create_stream(model=model)
-
-        return MockCompletionsClass.mocked_openai_completion_create_sync(model=model)
--- a/api/tests/integration_tests/model_runtime/__mock/openai_embeddings.py
+++ b/api/tests/integration_tests/model_runtime/__mock/openai_embeddings.py
--- a/api/tests/integration_tests/model_runtime/__mock/openai_moderation.py
+++ b/api/tests/integration_tests/model_runtime/__mock/openai_moderation.py
@@ -1,140 +0,0 @@
-import re
-from typing import Any, Literal, Union
-
-from openai._types import NOT_GIVEN, NotGiven
-from openai.resources.moderations import Moderations
-from openai.types import ModerationCreateResponse
-from openai.types.moderation import Categories, CategoryScores, Moderation
-
-from core.model_runtime.errors.invoke import InvokeAuthorizationError
-
-
-class MockModerationClass:
-    def moderation_create(
-        self: Moderations,
-        *,
-        input: Union[str, list[str]],
-        model: Union[str, Literal["text-moderation-latest", "text-moderation-stable"]] | NotGiven = NOT_GIVEN,
-        **kwargs: Any,
-    ) -> ModerationCreateResponse:
-        if isinstance(input, str):
-            input = [input]
-
-        if not re.match(r"^(https?):\/\/[^\s\/$.?#].[^\s]*$", str(self._client.base_url)):
-            raise InvokeAuthorizationError("Invalid base url")
-
-        if len(self._client.api_key) < 18:
-            raise InvokeAuthorizationError("Invalid API key")
-
-        for text in input:
-            result = []
-            if "kill" in text:
-                moderation_categories = {
-                    "harassment": False,
-                    "harassment/threatening": False,
-                    "hate": False,
-                    "hate/threatening": False,
-                    "self-harm": False,
-                    "self-harm/instructions": False,
-                    "self-harm/intent": False,
-                    "sexual": False,
-                    "sexual/minors": False,
-                    "violence": False,
-                    "violence/graphic": False,
-                    "illicit": False,
-                    "illicit/violent": False,
-                }
-                moderation_categories_scores = {
-                    "harassment": 1.0,
-                    "harassment/threatening": 1.0,
-                    "hate": 1.0,
-                    "hate/threatening": 1.0,
-                    "self-harm": 1.0,
-                    "self-harm/instructions": 1.0,
-                    "self-harm/intent": 1.0,
-                    "sexual": 1.0,
-                    "sexual/minors": 1.0,
-                    "violence": 1.0,
-                    "violence/graphic": 1.0,
-                    "illicit": 1.0,
-                    "illicit/violent": 1.0,
-                }
-                category_applied_input_types = {
-                    "sexual": ["text", "image"],
-                    "hate": ["text"],
-                    "harassment": ["text"],
-                    "self-harm": ["text", "image"],
-                    "sexual/minors": ["text"],
-                    "hate/threatening": ["text"],
-                    "violence/graphic": ["text", "image"],
-                    "self-harm/intent": ["text", "image"],
-                    "self-harm/instructions": ["text", "image"],
-                    "harassment/threatening": ["text"],
-                    "violence": ["text", "image"],
-                    "illicit": ["text"],
-                    "illicit/violent": ["text"],
-                }
-                result.append(
-                    Moderation(
-                        flagged=True,
-                        categories=Categories(**moderation_categories),
-                        category_scores=CategoryScores(**moderation_categories_scores),
-                        category_applied_input_types=category_applied_input_types,
-                    )
-                )
-            else:
-                moderation_categories = {
-                    "harassment": False,
-                    "harassment/threatening": False,
-                    "hate": False,
-                    "hate/threatening": False,
-                    "self-harm": False,
-                    "self-harm/instructions": False,
-                    "self-harm/intent": False,
-                    "sexual": False,
-                    "sexual/minors": False,
-                    "violence": False,
-                    "violence/graphic": False,
-                    "illicit": False,
-                    "illicit/violent": False,
-                }
-                moderation_categories_scores = {
-                    "harassment": 0.0,
-                    "harassment/threatening": 0.0,
-                    "hate": 0.0,
-                    "hate/threatening": 0.0,
-                    "self-harm": 0.0,
-                    "self-harm/instructions": 0.0,
-                    "self-harm/intent": 0.0,
-                    "sexual": 0.0,
-                    "sexual/minors": 0.0,
-                    "violence": 0.0,
-                    "violence/graphic": 0.0,
-                    "illicit": 0.0,
-                    "illicit/violent": 0.0,
-                }
-                category_applied_input_types = {
-                    "sexual": ["text", "image"],
-                    "hate": ["text"],
-                    "harassment": ["text"],
-                    "self-harm": ["text", "image"],
-                    "sexual/minors": ["text"],
-                    "hate/threatening": ["text"],
-                    "violence/graphic": ["text", "image"],
-                    "self-harm/intent": ["text", "image"],
-                    "self-harm/instructions": ["text", "image"],
-                    "harassment/threatening": ["text"],
-                    "violence": ["text", "image"],
-                    "illicit": ["text"],
-                    "illicit/violent": ["text"],
-                }
-                result.append(
-                    Moderation(
-                        flagged=False,
-                        categories=Categories(**moderation_categories),
-                        category_scores=CategoryScores(**moderation_categories_scores),
-                        category_applied_input_types=category_applied_input_types,
-                    )
-                )
-
-        return ModerationCreateResponse(id="shiroii kuloko", model=model, results=result)
--- a/api/tests/integration_tests/model_runtime/__mock/openai_remote.py
+++ b/api/tests/integration_tests/model_runtime/__mock/openai_remote.py
@@ -1,22 +0,0 @@
-from time import time
-
-from openai.types.model import Model
-
-
-class MockModelClass:
-    """
-    mock class for openai.models.Models
-    """
-
-    def list(
-        self,
-        **kwargs,
-    ) -> list[Model]:
-        return [
-            Model(
-                id="ft:gpt-3.5-turbo-0613:personal::8GYJLPDQ",
-                created=int(time()),
-                object="model",
-                owned_by="organization:org-123",
-            )
-        ]
--- a/api/tests/integration_tests/model_runtime/__mock/openai_speech2text.py
+++ b/api/tests/integration_tests/model_runtime/__mock/openai_speech2text.py
@@ -1,29 +0,0 @@
-import re
-from typing import Any, Literal, Union
-
-from openai._types import NOT_GIVEN, FileTypes, NotGiven
-from openai.resources.audio.transcriptions import Transcriptions
-from openai.types.audio.transcription import Transcription
-
-from core.model_runtime.errors.invoke import InvokeAuthorizationError
-
-
-class MockSpeech2TextClass:
-    def speech2text_create(
-        self: Transcriptions,
-        *,
-        file: FileTypes,
-        model: Union[str, Literal["whisper-1"]],
-        language: str | NotGiven = NOT_GIVEN,
-        prompt: str | NotGiven = NOT_GIVEN,
-        response_format: Literal["json", "text", "srt", "verbose_json", "vtt"] | NotGiven = NOT_GIVEN,
-        temperature: float | NotGiven = NOT_GIVEN,
-        **kwargs: Any,
-    ) -> Transcription:
-        if not re.match(r"^(https?):\/\/[^\s\/$.?#].[^\s]*$", str(self._client.base_url)):
-            raise InvokeAuthorizationError("Invalid base url")
-
-        if len(self._client.api_key) < 18:
-            raise InvokeAuthorizationError("Invalid API key")
-
-        return Transcription(text="1, 2, 3, 4, 5, 6, 7, 8, 9, 10")
--- a/api/tests/integration_tests/model_runtime/__mock/plugin_daemon.py
+++ b/api/tests/integration_tests/model_runtime/__mock/plugin_daemon.py
@@ -0,0 +1,44 @@
+import os
+from collections.abc import Callable
+
+import pytest
+
+# import monkeypatch
+from _pytest.monkeypatch import MonkeyPatch
+
+from core.plugin.manager.model import PluginModelManager
+from tests.integration_tests.model_runtime.__mock.plugin_model import MockModelClass
+
+
+def mock_plugin_daemon(
+    monkeypatch: MonkeyPatch,
+) -> Callable[[], None]:
+    """
+    mock openai module
+
+    :param monkeypatch: pytest monkeypatch fixture
+    :return: unpatch function
+    """
+
+    def unpatch() -> None:
+        monkeypatch.undo()
+
+    monkeypatch.setattr(PluginModelManager, "invoke_llm", MockModelClass.invoke_llm)
+    monkeypatch.setattr(PluginModelManager, "fetch_model_providers", MockModelClass.fetch_model_providers)
+    monkeypatch.setattr(PluginModelManager, "get_model_schema", MockModelClass.get_model_schema)
+
+    return unpatch
+
+
+MOCK = os.getenv("MOCK_SWITCH", "false").lower() == "true"
+
+
+@pytest.fixture
+def setup_model_mock(monkeypatch):
+    if MOCK:
+        unpatch = mock_plugin_daemon(monkeypatch)
+
+    yield
+
+    if MOCK:
+        unpatch()
--- a/api/tests/integration_tests/model_runtime/__mock/plugin_model.py
+++ b/api/tests/integration_tests/model_runtime/__mock/plugin_model.py
@@ -0,0 +1,249 @@
+import datetime
+import uuid
+from collections.abc import Generator, Sequence
+from decimal import Decimal
+from json import dumps
+
+# import monkeypatch
+from typing import Optional
+
+from core.model_runtime.entities.common_entities import I18nObject
+from core.model_runtime.entities.llm_entities import LLMMode, LLMResult, LLMResultChunk, LLMResultChunkDelta, LLMUsage
+from core.model_runtime.entities.message_entities import AssistantPromptMessage, PromptMessage, PromptMessageTool
+from core.model_runtime.entities.model_entities import (
+    AIModelEntity,
+    FetchFrom,
+    ModelFeature,
+    ModelPropertyKey,
+    ModelType,
+)
+from core.model_runtime.entities.provider_entities import ConfigurateMethod, ProviderEntity
+from core.plugin.entities.plugin_daemon import PluginModelProviderEntity
+from core.plugin.manager.model import PluginModelManager
+
+
+class MockModelClass(PluginModelManager):
+    def fetch_model_providers(self, tenant_id: str) -> Sequence[PluginModelProviderEntity]:
+        """
+        Fetch model providers for the given tenant.
+        """
+        return [
+            PluginModelProviderEntity(
+                id=uuid.uuid4().hex,
+                created_at=datetime.datetime.now(),
+                updated_at=datetime.datetime.now(),
+                provider="openai",
+                tenant_id=tenant_id,
+                plugin_unique_identifier="langgenius/openai/openai",
+                plugin_id="langgenius/openai",
+                declaration=ProviderEntity(
+                    provider="openai",
+                    label=I18nObject(
+                        en_US="OpenAI",
+                        zh_Hans="OpenAI",
+                    ),
+                    description=I18nObject(
+                        en_US="OpenAI",
+                        zh_Hans="OpenAI",
+                    ),
+                    icon_small=I18nObject(
+                        en_US="https://example.com/icon_small.png",
+                        zh_Hans="https://example.com/icon_small.png",
+                    ),
+                    icon_large=I18nObject(
+                        en_US="https://example.com/icon_large.png",
+                        zh_Hans="https://example.com/icon_large.png",
+                    ),
+                    supported_model_types=[ModelType.LLM],
+                    configurate_methods=[ConfigurateMethod.PREDEFINED_MODEL],
+                    models=[
+                        AIModelEntity(
+                            model="gpt-3.5-turbo",
+                            label=I18nObject(
+                                en_US="gpt-3.5-turbo",
+                                zh_Hans="gpt-3.5-turbo",
+                            ),
+                            model_type=ModelType.LLM,
+                            fetch_from=FetchFrom.PREDEFINED_MODEL,
+                            model_properties={},
+                            features=[ModelFeature.TOOL_CALL, ModelFeature.MULTI_TOOL_CALL],
+                        ),
+                        AIModelEntity(
+                            model="gpt-3.5-turbo-instruct",
+                            label=I18nObject(
+                                en_US="gpt-3.5-turbo-instruct",
+                                zh_Hans="gpt-3.5-turbo-instruct",
+                            ),
+                            model_type=ModelType.LLM,
+                            fetch_from=FetchFrom.PREDEFINED_MODEL,
+                            model_properties={
+                                ModelPropertyKey.MODE: LLMMode.COMPLETION,
+                            },
+                            features=[],
+                        ),
+                    ],
+                ),
+            )
+        ]
+
+    def get_model_schema(
+        self,
+        tenant_id: str,
+        user_id: str,
+        plugin_id: str,
+        provider: str,
+        model_type: str,
+        model: str,
+        credentials: dict,
+    ) -> AIModelEntity | None:
+        """
+        Get model schema
+        """
+        return AIModelEntity(
+            model=model,
+            label=I18nObject(
+                en_US="OpenAI",
+                zh_Hans="OpenAI",
+            ),
+            model_type=ModelType(model_type),
+            fetch_from=FetchFrom.PREDEFINED_MODEL,
+            model_properties={},
+            features=[ModelFeature.TOOL_CALL, ModelFeature.MULTI_TOOL_CALL] if model == "gpt-3.5-turbo" else [],
+        )
+
+    @staticmethod
+    def generate_function_call(
+        tools: Optional[list[PromptMessageTool]],
+    ) -> Optional[AssistantPromptMessage.ToolCall]:
+        if not tools or len(tools) == 0:
+            return None
+        function: PromptMessageTool = tools[0]
+        function_name = function.name
+        function_parameters = function.parameters
+        function_parameters_type = function_parameters["type"]
+        if function_parameters_type != "object":
+            return None
+        function_parameters_properties = function_parameters["properties"]
+        function_parameters_required = function_parameters["required"]
+        parameters = {}
+        for parameter_name, parameter in function_parameters_properties.items():
+            if parameter_name not in function_parameters_required:
+                continue
+            parameter_type = parameter["type"]
+            if parameter_type == "string":
+                if "enum" in parameter:
+                    if len(parameter["enum"]) == 0:
+                        continue
+                    parameters[parameter_name] = parameter["enum"][0]
+                else:
+                    parameters[parameter_name] = "kawaii"
+            elif parameter_type == "integer":
+                parameters[parameter_name] = 114514
+            elif parameter_type == "number":
+                parameters[parameter_name] = 1919810.0
+            elif parameter_type == "boolean":
+                parameters[parameter_name] = True
+
+        return AssistantPromptMessage.ToolCall(
+            id=str(uuid.uuid4()),
+            type="function",
+            function=AssistantPromptMessage.ToolCall.ToolCallFunction(
+                name=function_name,
+                arguments=dumps(parameters),
+            ),
+        )
+
+    @staticmethod
+    def mocked_chat_create_sync(
+        model: str,
+        prompt_messages: list[PromptMessage],
+        tools: Optional[list[PromptMessageTool]] = None,
+    ) -> LLMResult:
+        tool_call = MockModelClass.generate_function_call(tools=tools)
+
+        return LLMResult(
+            id=str(uuid.uuid4()),
+            model=model,
+            prompt_messages=prompt_messages,
+            message=AssistantPromptMessage(content="elaina", tool_calls=[tool_call] if tool_call else []),
+            usage=LLMUsage(
+                prompt_tokens=2,
+                completion_tokens=1,
+                total_tokens=3,
+                prompt_unit_price=Decimal(0.0001),
+                completion_unit_price=Decimal(0.0002),
+                prompt_price_unit=Decimal(1),
+                prompt_price=Decimal(0.0001),
+                completion_price_unit=Decimal(1),
+                completion_price=Decimal(0.0002),
+                total_price=Decimal(0.0003),
+                currency="USD",
+                latency=0.001,
+            ),
+        )
+
+    @staticmethod
+    def mocked_chat_create_stream(
+        model: str,
+        prompt_messages: list[PromptMessage],
+        tools: Optional[list[PromptMessageTool]] = None,
+    ) -> Generator[LLMResultChunk, None, None]:
+        tool_call = MockModelClass.generate_function_call(tools=tools)
+
+        full_text = "Hello, world!\n\n```python\nprint('Hello, world!')\n```"
+        for i in range(0, len(full_text) + 1):
+            if i == len(full_text):
+                yield LLMResultChunk(
+                    model=model,
+                    prompt_messages=prompt_messages,
+                    delta=LLMResultChunkDelta(
+                        index=0,
+                        message=AssistantPromptMessage(
+                            content="",
+                            tool_calls=[tool_call] if tool_call else [],
+                        ),
+                    ),
+                )
+            else:
+                yield LLMResultChunk(
+                    model=model,
+                    prompt_messages=prompt_messages,
+                    delta=LLMResultChunkDelta(
+                        index=0,
+                        message=AssistantPromptMessage(
+                            content=full_text[i],
+                            tool_calls=[tool_call] if tool_call else [],
+                        ),
+                        usage=LLMUsage(
+                            prompt_tokens=2,
+                            completion_tokens=17,
+                            total_tokens=19,
+                            prompt_unit_price=Decimal(0.0001),
+                            completion_unit_price=Decimal(0.0002),
+                            prompt_price_unit=Decimal(1),
+                            prompt_price=Decimal(0.0001),
+                            completion_price_unit=Decimal(1),
+                            completion_price=Decimal(0.0002),
+                            total_price=Decimal(0.0003),
+                            currency="USD",
+                            latency=0.001,
+                        ),
+                    ),
+                )
+
+    def invoke_llm(
+        self: PluginModelManager,
+        *,
+        tenant_id: str,
+        user_id: str,
+        plugin_id: str,
+        provider: str,
+        model: str,
+        credentials: dict,
+        prompt_messages: list[PromptMessage],
+        model_parameters: Optional[dict] = None,
+        tools: Optional[list[PromptMessageTool]] = None,
+        stop: Optional[list[str]] = None,
+        stream: bool = True,
+    ):
+        return MockModelClass.mocked_chat_create_stream(model=model, prompt_messages=prompt_messages, tools=tools)
--- a/api/tests/integration_tests/model_runtime/__mock/xinference.py
+++ b/api/tests/integration_tests/model_runtime/__mock/xinference.py
@@ -1,169 +0,0 @@
-import os
-import re
-from typing import Union
-
-import pytest
-from _pytest.monkeypatch import MonkeyPatch
-from requests import Response
-from requests.sessions import Session
-from xinference_client.client.restful.restful_client import (  # type: ignore
-    Client,
-    RESTfulChatModelHandle,
-    RESTfulEmbeddingModelHandle,
-    RESTfulGenerateModelHandle,
-    RESTfulRerankModelHandle,
-)
-from xinference_client.types import Embedding, EmbeddingData, EmbeddingUsage  # type: ignore
-
-
-class MockXinferenceClass:
-    def get_chat_model(self: Client, model_uid: str) -> Union[RESTfulGenerateModelHandle, RESTfulChatModelHandle]:
-        if not re.match(r"https?:\/\/[^\s\/$.?#].[^\s]*$", self.base_url):
-            raise RuntimeError("404 Not Found")
-
-        if model_uid == "generate":
-            return RESTfulGenerateModelHandle(model_uid, base_url=self.base_url, auth_headers={})
-        if model_uid == "chat":
-            return RESTfulChatModelHandle(model_uid, base_url=self.base_url, auth_headers={})
-        if model_uid == "embedding":
-            return RESTfulEmbeddingModelHandle(model_uid, base_url=self.base_url, auth_headers={})
-        if model_uid == "rerank":
-            return RESTfulRerankModelHandle(model_uid, base_url=self.base_url, auth_headers={})
-        raise RuntimeError("404 Not Found")
-
-    def get(self: Session, url: str, **kwargs):
-        response = Response()
-        if "v1/models/" in url:
-            # get model uid
-            model_uid = url.split("/")[-1] or ""
-            if not re.match(
-                r"[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}", model_uid
-            ) and model_uid not in {"generate", "chat", "embedding", "rerank"}:
-                response.status_code = 404
-                response._content = b"{}"
-                return response
-
-            # check if url is valid
-            if not re.match(r"^(https?):\/\/[^\s\/$.?#].[^\s]*$", url):
-                response.status_code = 404
-                response._content = b"{}"
-                return response
-
-            if model_uid in {"generate", "chat"}:
-                response.status_code = 200
-                response._content = b"""{
-                    "model_type": "LLM",
-                    "address": "127.0.0.1:43877",
-                    "accelerators": [
-                        "0",
-                        "1"
-                    ],
-                    "model_name": "chatglm3-6b",
-                    "model_lang": [
-                        "en"
-                    ],
-                    "model_ability": [
-                        "generate",
-                        "chat"
-                    ],
-                    "model_description": "latest chatglm3",
-                    "model_format": "pytorch",
-                    "model_size_in_billions": 7,
-                    "quantization": "none",
-                    "model_hub": "huggingface",
-                    "revision": null,
-                    "context_length": 2048,
-                    "replica": 1
-                }"""
-                return response
-
-            elif model_uid == "embedding":
-                response.status_code = 200
-                response._content = b"""{
-                    "model_type": "embedding",
-                    "address": "127.0.0.1:43877",
-                    "accelerators": [
-                        "0",
-                        "1"
-                    ],
-                    "model_name": "bge",
-                    "model_lang": [
-                        "en"
-                    ],
-                    "revision": null,
-                    "max_tokens": 512
-                }"""
-                return response
-
-        elif "v1/cluster/auth" in url:
-            response.status_code = 200
-            response._content = b"""{
-                "auth": true
-            }"""
-            return response
-
-    def _check_cluster_authenticated(self):
-        self._cluster_authed = True
-
-    def rerank(
-        self: RESTfulRerankModelHandle, documents: list[str], query: str, top_n: int, return_documents: bool
-    ) -> dict:
-        # check if self._model_uid is a valid uuid
-        if (
-            not re.match(r"[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}", self._model_uid)
-            and self._model_uid != "rerank"
-        ):
-            raise RuntimeError("404 Not Found")
-
-        if not re.match(r"^(https?):\/\/[^\s\/$.?#].[^\s]*$", self._base_url):
-            raise RuntimeError("404 Not Found")
-
-        if top_n is None:
-            top_n = 1
-
-        return {
-            "results": [
-                {"index": i, "document": doc, "relevance_score": 0.9} for i, doc in enumerate(documents[:top_n])
-            ]
-        }
-
-    def create_embedding(self: RESTfulGenerateModelHandle, input: Union[str, list[str]], **kwargs) -> dict:
-        # check if self._model_uid is a valid uuid
-        if (
-            not re.match(r"[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}", self._model_uid)
-            and self._model_uid != "embedding"
-        ):
-            raise RuntimeError("404 Not Found")
-
-        if isinstance(input, str):
-            input = [input]
-        ipt_len = len(input)
-
-        embedding = Embedding(
-            object="list",
-            model=self._model_uid,
-            data=[
-                EmbeddingData(index=i, object="embedding", embedding=[1919.810 for _ in range(768)])
-                for i in range(ipt_len)
-            ],
-            usage=EmbeddingUsage(prompt_tokens=ipt_len, total_tokens=ipt_len),
-        )
-
-        return embedding
-
-
-MOCK = os.getenv("MOCK_SWITCH", "false").lower() == "true"
-
-
-@pytest.fixture
-def setup_xinference_mock(request, monkeypatch: MonkeyPatch):
-    if MOCK:
-        monkeypatch.setattr(Client, "get_model", MockXinferenceClass.get_chat_model)
-        monkeypatch.setattr(Client, "_check_cluster_authenticated", MockXinferenceClass._check_cluster_authenticated)
-        monkeypatch.setattr(Session, "get", MockXinferenceClass.get)
-        monkeypatch.setattr(RESTfulEmbeddingModelHandle, "create_embedding", MockXinferenceClass.create_embedding)
-        monkeypatch.setattr(RESTfulRerankModelHandle, "rerank", MockXinferenceClass.rerank)
-    yield
-
-    if MOCK:
-        monkeypatch.undo()