Large language models (LLMs) are evolving fast, and companies need to know what’s out there, not just for capabilities, but for compliance, privacy, and fine-tuning potential. This post breaks down the major LLMs available today, highlighting their strengths and known weaknesses.
Here’s our running list of key players and what they bring to the table:

OpenAI Models
Most recent releases
- o3: Reasoning, multimodal, tool-use (April 2025)
- o4-mini: Lightweight reasoning (April 2025)
- o3-pro: While GPT-4o remains OpenAI’s flagship for most general tasks, o3-pro is preferred for complex reasoning. (June 2025)
Notable Previous Editions:
- GPT-4o – Fast, multimodal (text, image, audio), excels at reasoning and instruction-following. New flagship.
- GPT-4.5 “Orion” – Improves factual grounding. Slightly less capable at reasoning than 4o but strong on general web tasks.
- GPT-3.5 – Still popular due to speed and low cost. Best for lightweight workloads.
- GPT-4 – Still in use, but largely replaced by 4o for most applications.
Google DeepMind
Most recent releases (June 2025):
- Gemini 2.5 Pro – Flagship multimodal model with strong reasoning, advanced coding, and 1M token context. Designed for complex enterprise tasks.
- Gemini 2.5 Flash – Optimized for speed and efficiency with optional “thinking” mode. Ideal for high-volume production use.
- Gemini 2.5 Flash-Lite – Preview model for ultra-low latency and cost-sensitive tasks. Supports multimodal inputs with optional reasoning.
Notable Previous Editions:
- Gemini 1.0 series – Superseded by the 1.5 and 2.5 families.
- Gemini 1.5 Pro – Long context (up to 1M tokens), strong vision and reasoning.
- Gemini 1.5 Flash – Lightweight, optimized for speed and cost.
Anthropic
Most recent release (March 2025):
Claude 4 (Opus 4, Sonnet 4) – Anthropic’s latest flagship models. Opus 4 offers deep reasoning, strong coding (SWE-bench ~72.5%), and long-context capabilities (up to 200K tokens). Sonnet 4 is optimized for speed and everyday production use, with hybrid reasoning modes for faster latency or more thoughtfulness depending on task.
Notable Previous Editions:
- Claude 3 (Opus, Sonnet, Haiku) – Still used but largely replaced by the Claude 4 family for improved performance, context length, and alignment.
Meta
Most recent release (April 2025):
- LLaMA 4 (Scout, Maverick) — multimodal Mixture‑of‑Experts open‑weight models with massive context windows (Scout: 10 M tokens; Maverick: 1 M tokens), designed for efficiency and strong performance at a lower cost.
Notable Previous Editions:
- LLaMA 3 (8B, 70B) – Open-weight, efficient. 70B is a strong competitor to GPT-4. Highly tunable for enterprise.
- LLaMA 2 – Still used in some fine-tuned variants, but LLaMA 3 offers better performance.
Mistral
Most recent release (June 2025):
- Magistral – Designed for transparent, multilingual chain-of-thought reasoning across domains like math, coding, and planning.
Notable Previous Editions:
- Mixtral 8x7B – Sparse Mixture-of-Experts (MoE). Efficient inference, good performance at a lower cost.
- Mistral 7B – Still available, but Mixtral is generally preferred now.
IBM (watsonx)
Most recent release (February 2025):
- Granite 3.2 – open-source, multimodal reasoning model with enterprise-grade context- and vision-capabilities, available via watsonx.ai and Hugging Face.
Notable Previous Editions:
- Granite 13B Instruct v2 – Enterprise-safe, privacy-first model aligned to responsible AI principles.
AWS (Amazon Bedrock)
Most recent release (April 2024):
- Titan Text Express / Titan Embeddings G1 – AWS’s proprietary models launched via Bedrock for tasks like summarization, Q&A, and retrieval. Primarily used within Amazon’s cloud ecosystem and designed for integration with Guardrails and enterprise controls.
Notable Previous Editions:
- Titan v1 models – Earlier versions focused on embedding and language generation. Less performant than peers but used within closed AWS systems and for finetuning.
Note: AWS also hosts third-party models (Anthropic, Cohere, Mistral, Meta, etc.) through Amazon Bedrock — its strength is more in orchestration than raw LLM leadership.
xAI (Elon Musk)
Most recent release (February 2025):
- Grok 3 – flagship multimodal reasoning model trained on massive compute, outperforms Grok 2.0 with “Think” mode for deeper logic, and now widely available via X Premium and xAI’s API.
- Grok 3 – flagship multimodal reasoning model trained on massive compute, outperforms Grok 2.0 with “Think” mode for deeper logic, and now widely available via X Premium and xAI’s API.
Notable Previous Editions:
- Grok 2.0 – Improved reasoning and conversational flow over Grok 1.5; served as a stepping stone to Grok 3.
- Grok 1 / 1.5 – Open-weight, integrated with X (Twitter). Early-stage models gaining traction.
- Grok 0.x and preview demos (2023).
Databricks
Most recent release (March 2024):
- DBRX – Open-weight, mixture-of-experts model with strong performance in reasoning, retrieval-augmented generation (RAG), and tool use. Popular among data infrastructure teams for its customizability, efficiency, and compatibility with open-source stacks.
Notable Previous Editions:
- None — DBRX is Databricks’ first publicly released foundation model, built from scratch with enterprise AI workloads in mind.
Other Notable LLMs You Should Know
Academic & Open Research Models
- BLOOM (BigScience) – multilingual, open-weight
- Pythia (EleutherAI) – used for transparency studies
- UL2 (Google) – encoder-decoder hybrid
- OPT (Meta) – precursor to LLaMA
- MPT (MosaicML) – highly efficient and fine-tunable
- Falcon (TII UAE) – 40B open-weight, high benchmark scores
Domain-Specific LLMs
- FinGPT – finance-tuned, open-source
- Med-PaLM 2 – Google’s medical model
- Hippocratic AI – safety-aligned for healthcare
- SciBERT – scientific literature understanding
- Galactica (Meta) – briefly released science model
- Regional Models
- ChatGLM (Tsinghua) – Chinese bilingual model
- Noor (Saudi Arabia) – Arabic LLM
- Aleph Alpha – German-made, enterprise focused
- PanGu (Huawei) – Chinese general-purpose model
- Yi (01.AI, China) – open-source, high-performing
Lightweight or Specialized Models
- Phi-2 (Microsoft) – 2.7B, surprisingly capable
- TinyLLaMA – under 2B, designed for edge use
- OpenHermes – RAG-focused model tuned on helpfulness
A Note on Bans
Some companies continue to restrict ChatGPT and similar LLMs over IP risk, data residency, or privacy compliance concerns. For example:
- Apple
- JPMorgan Chase
- Goldman Sachs
- Apple
See our full list of banned LLMs here
Looking to build your own LLM environment with the flexibility and privacy controls of self-hosting without sacrificing capability? Contact us!