The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, and more.
-
Updated
Dec 24, 2024 - JavaScript
The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, and more.
☁️ Build multimodal AI applications with cloud-native stack
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
build ai agents that have the full context, open source, runs locally, developer friendly. 24/7 screen, mic, keyboard recording and control
Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)
notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.
Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, ...) or 100+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).
Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%
Build real-time multimodal AI applications 🤖🎙️📹
Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系,成为中文AIGC和认知智能的基础设施。
Curated tutorials and resources for Large Language Models, AI Painting, and more.
🪩 Create Disco Diffusion artworks in one line
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
TEN Agent is a conversational AI powered by the TEN, integrating Gemini 2.0 Live, OpenAI Realtime, RTC, and more. It delivers real-time capabilities to see, hear, and speak, while being fully compatible with popular workflow platforms like Dify and Coze.
Add a description, image, and links to the multimodal topic page so that developers can more easily learn about it.
To associate your repository with the multimodal topic, visit your repo's landing page and select "manage topics."