MiniMaxAI/MiniMax-M2: Advanced Open-Source Large Language Model

MiniMax-M2 is an advanced open-source large language model developed by MiniMaxAI, designed to deliver high-performance capabilities in coding, agentic reasoning, and general intelligence benchmarks. Positioned as a state-of-the-art AI model, it offers lower latency, lower cost, and higher throughput, making it suitable for production-level tasks that require efficient and scalable AI solutions. This LLM for coding and reasoning excels in generating complex outputs and understanding nuanced inputs. It features advanced inference parameters and AI tool-calling capabilities that enhance interaction with various AI tools.
MiniMax-M2 supports local deployment of large language models with openly available weights, encouraging developers to customize and integrate the model into scalable AI applications. Extensive documentation and community showcases help further adoption alongside deployment guides for vLLM serving systems and MLX marketplaces. This model meets the needs of both advanced AI research and practical production in machine learning-driven intelligent agents.
URL: https://huggingface.co/MiniMaxAI/MiniMax-M2
deepseek-ai/DeepSeek-OCR: Multimodal AI for Advanced Optical Character Recognition

DeepSeek-OCR is a multimodal AI optical character recognition (OCR) model engineered for extracting text from images and complex documents. Integrating vision-language model (VLM) techniques, it leverages vision encoders to significantly improve text recognition accuracy and structured document conversion.
The model excels at maintaining layout and semantic structure preservation for varied document types including PDFs and images. DeepSeek-OCR is optimized for efficient token compression and supports diverse automated document digitization workflows. It is open-source and designed for high utility in both academic and industrial data extraction tasks, backed by a strong testing framework and integration with popular deep learning libraries like PyTorch and Transformers.
The model balances AI-driven document parsing performance with computational cost-effectiveness and fosters active community collaboration.
URL: https://huggingface.co/deepseek-ai/DeepSeek-OCR
moonshotai/Kimi-Linear-48B-A3B-Instruct: Efficient Large-Scale Language Model with Hybrid Linear Attention

Kimi-Linear-48B-A3B-Instruct, developed by Moonshot AI, is an experimental large language model that features a unique hybrid linear attention architecture designed for ultra-long context handling—up to 1 million tokens. This efficient attention mechanism outperforms traditional full attention in speed and hardware efficiency, providing 6.3x faster throughput compared to other linear attention models.
Its architecture is optimized for long sequence natural language processing tasks such as extended conversations and complex document analysis. With 48 billion parameters (3 billion active), it achieves strong performance on benchmarks like MMLU-Pro. The model also supports instruction tuning to enhance natural language understanding and generation.
Although not yet deployed on commercial inference platforms, it is accessible to researchers aiming to advance AI models for long-context reasoning.
URL: https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct
briaai/FIBO: Text-to-Image Generative Model with JSON-Native Control

FIBO by Bria AI is an innovative text-to-image generative AI model uniquely trained on structured JSON-native captions, enabling unparalleled control over image generation. This allows the model to interpret and manipulate extensive semantic metadata for high-fidelity, photorealistic images guided by detailed instructions.
FIBO is designed for professional-grade applications requiring precise controlled AI-driven image synthesis, supporting input captions that exceed 1,000 words for complex scenes. It represents a new standard in controllable generative AI, making it ideal for content creation, design automation, and advanced image editing workflows.
The model is open-source and available for non-commercial use, encouraging development of sophisticated visual AI tools in the creative industry.
URL: https://huggingface.co/briaai/FIBO
meituan-longcat/LongCat-Video: Scalable AI Video Generation Model

LongCat-Video is a foundational AI-powered video generation model developed by Meituan, featuring 13.6 billion parameters tailored for diverse tasks such as text-to-video, image-to-video, and video continuation. It stands out for its ability to generate and extend high-quality videos continuously up to 5 minutes without degradation.
Employing multi-reward reinforcement learning from human feedback (RLHF) and Group Relative Policy Optimization (GRPO), LongCat-Video optimizes multi-modal content generation. This model enables improvements in automated video synthesis, creative media production, and real-world dynamics understanding.
Open-source with local deployment support and GPU-optimized parallelism, it accelerates efficient inference workflows in AI-driven video generation.
URL: https://huggingface.co/meituan-longcat/LongCat-Video
Soul-AILab/SoulX-Podcast-1.7B: Speech Synthesis Model for Podcast Generation

SoulX-Podcast-1.7B, developed by Soul AILab, is a text-to-speech (TTS) AI model optimized for podcast creation and capable of zero-shot voice cloning. This enables generation of natural, high-quality speech outputs without requiring extensive voice-specific training data.
With 1.7 billion parameters, the model balances output quality and operational efficiency. Its flexibility supports applications in podcasting, audiobooks, and various spoken media formats. Designed primarily for research and prototyping, it integrates seamlessly with modern AI voice synthesis frameworks, facilitating rich audio content production with diverse voice profiles.
URL: https://huggingface.co/Soul-AILab/SoulX-Podcast-1.7B
dx8152/Qwen-Edit-2509-Multiple-angles: AI Image Editing with Multi-Perspective Transformations

Qwen-Edit-2509-Multiple-angles is a fine-tuned model from the Qwen series focusing on intelligent AI image editing and multi-angle transformations. It enhances original Qwen capabilities with improved accuracy and versatility in generating image edits from multiple perspectives.
This model suits complex image manipulation demands found in graphic design, creativity, and visual content production. Available openly on Hugging Face and integrated into tools like ComfyUI, it supports machine learning-driven image transformation workflows for professional and creative users.
URL: https://huggingface.co/dx8152/Qwen-Edit-2509-Multiple-angles
openai/gpt-oss-safeguard-20b: AI Safety Reasoning Model for Responsible Use

GPT-OSS-Safeguard-20B by OpenAI is an open-weight AI safety reasoning language model developed collaboratively with the ROOST community, designed for AI safety classification and moderation. It contains 21 billion parameters with 3.6 billion actively leveraged to enable effective GPU-based deployment on 16GB VRAM hardware.
The model targets enhanced AI content moderation and safety classification across diverse applications, supporting transparency and responsible AI usage. Complementing its larger sibling GPT-OSS-Safeguard-120B, it is open-source and accessible for broad safety research and developer integration.
URL: https://huggingface.co/openai/gpt-oss-safeguard-20b
datalab-to/chandra: High-Accuracy OCR with Layout Preservation

Chandra by Datalab is a cutting-edge OCR model specialized in extracting text from images and PDFs while preserving complex page layouts. It produces outputs in multiple structured formats such as markdown, HTML, and JSON, catering to advanced document digitization and content management needs.
The model excels at processing structured documents like contracts, forms, and academic papers with remarkable precision. Supported by open-source availability and streamlined deployment integration, Chandra is a valuable tool for AI-driven document parsing workflows.
URL: https://huggingface.co/datalab-to/chandra
PaddlePaddle/PaddleOCR-VL: Multilingual Vision-Language Model for Document Parsing

PaddleOCR-VL from PaddlePaddle is a state-of-the-art vision-language AI model optimized for multilingual document parsing and element recognition. Featuring a compact 0.9 billion parameter architecture, it merges a NaViT-style dynamic visual encoder with the ERNIE language model.
This blend achieves superior performance in page-level parsing tasks including text, formulas, tables, and charts. Designed for automated document analysis and information extraction, PaddleOCR-VL supports varied real-world multilingual OCR applications. It integrates with the Transformers library and benefits from the broader open-source PaddleOCR ecosystem.
URL: https://huggingface.co/PaddlePaddle/PaddleOCR-VL
ibm-granite/granite-4.0-h-1b: Lightweight Hybrid Instruct Model for Efficient AI Inference

Granite-4.0-H-1B by IBM is a lightweight hybrid instruction-following AI language model that combines standard transformer layers with Mamba-2 technology. The architecture delivers high efficiency and performance suitable for resource-constrained environments requiring fast, secure AI inference.
It supports multilingual tasks and emphasizes robust AI governance principles like security and responsible usage. Granite integrates smoothly with IBM’s WatsonX.ai platform and industry partners, making it an ideal choice for enterprise-grade AI deployments where safety and speed are priorities.

