Model Stability Report

Which AI models are the most consistent over time? This report analyzes rank changes, state classifications, and sparkline volatility across 300 tracked models to produce a stability score from 0 to 100.

Rock Solid

177

Consistent

Variable

Volatile

Stability Classification Distribution

LMMarketCap.com

Provider Stability Rankings (Avg Score)

LMMarketCap.com

Most Stable Models

Top 20 models with the highest stability scores. These models maintain consistent rankings with minimal volatility.

#	Model	Provider	Score	Stability	24h	7d	State	Rank Spread
1	Claude Opus 4.7 (Fast)Anthropic	Anthropic	94.7	100	0	0	stable	±1
2	GPT-5.5 ProOpenAI	OpenAI	90.3	100	0	0	stable	±2
3	Claude Opus 4.6 (Fast)Anthropic	Anthropic	90.0	100	0	0	stable	±2
4	Grok 4.20xAI	xAI	88.3	100	0	0	stable	±2
5	Grok 4.20 Multi-AgentxAI	xAI	87.4	100	0	0	stable	±2
6	Gemma 4 31B (free)Google	Google	80.1	100	0	0	stable	±2
7	Gemini 3.5 FlashGoogle	Google	78.8	100	0	0	stable	±2
8	GPT-5.4 NanoOpenAI	OpenAI	78.8	100	0	0	stable	±2
9	GPT-5.4 MiniOpenAI	OpenAI	78.8	100	0	0	stable	±2
10	DeepSeek V4 FlashDeepSeek	DeepSeek	77.2	100	0	0	stable	±2
11	DeepSeek V4 Flash (free)DeepSeek	DeepSeek	76.4	100	0	0	stable	±2
12	GLM 5.1Zhipu AI	Zhipu AI	76.0	100	0	0	stable	±2
13	Kimi K2.6Moonshot AI	Moonshot AI	75.5	100	0	0	stable	±2
14	Grok 4.3xAI	xAI	74.9	100	-1	-1	stable	±2
15	Qwen3.6 Max PreviewAlibaba	Alibaba	74.3	100	-1	-1	stable	±2
16	Gemma 4 26B A4B (free)Google	Google	72.7	100	0	-1	stable	±2
17	Gemma 4 26B A4B Google	Google	72.7	100	0	-1	stable	±2
18	GPT Chat LatestOpenAI	OpenAI	40.0	100	-2	0	stable	±2
19	Mistral Medium 3.5Mistral AI	Mistral AI	40.0	100	-2	0	stable	±2
20	Nemotron 3 Nano Omni (free)NVIDIA	NVIDIA	40.0	100	-2	0	stable	±2

Most Volatile Models

Bottom 20 models with the lowest stability scores. These models show significant ranking fluctuations or inconsistent states.

#	Model	Provider	Score	Stability	24h	7d	State	Rank Spread
1	MiMo-V2-OmniXiaomi	Xiaomi	69.7	19	+109	+111	fragile	±2
2	Step 3.5 FlashStepFun	StepFun	66.5	22	-5	-6	fragile	±2
3	GPT-4oOpenAI	OpenAI	70.8	56	+6	-3	stable	±2
4	DeepSeek V3DeepSeek	DeepSeek	69.0	58	-4	-4	stable	±2
5	Llama 3.1 70B InstructMeta	Meta	64.9	59	-7	-2	stable	±2
6	MiniMax M2.1MiniMax	MiniMax	69.5	59	-7	-2	stable	±2
7	GPT-4o (2024-08-06)OpenAI	OpenAI	70.8	59	+7	-2	stable	±2
8	Llama 3.1 8B InstructMeta	Meta	44.1	62	-5	-1	stable	±2
9	GPT-4o (2024-11-20)OpenAI	OpenAI	52.5	62	+9	-1	stable	±2
10	GPT-4o-mini (2024-07-18)OpenAI	OpenAI	56.1	62	+7	-1	stable	±2
11	Phi 4Microsoft	Microsoft	59.9	62	-9	-1	stable	±2
12	Mistral Large 3 2512Mistral AI	Mistral AI	66.6	62	-5	-1	stable	±2
13	DeepSeek V3.2 ExpDeepSeek	DeepSeek	69.8	62	-6	-1	stable	±2
14	DeepSeek V3.2DeepSeek	DeepSeek	69.9	62	-6	-1	stable	±2
15	GPT-4o Search PreviewOpenAI	OpenAI	70.0	62	+9	-1	stable	±2
16	GPT-4o AudioOpenAI	OpenAI	70.0	62	+9	-1	stable	±2
17	GPT-4o (2024-05-13)OpenAI	OpenAI	70.8	62	+8	-1	stable	±2
18	DeepSeek V3 0324DeepSeek	DeepSeek	71.4	62	-8	-1	stable	±2
19	R1 Distill Llama 70BDeepSeek	DeepSeek	42.0	63	+148	-1	stable	±2
20	Devstral Small 1.1Mistral AI	Mistral AI	46.8	63	-5	-1	stable	±2

Stability by Provider

Aggregated stability metrics per provider. Providers are ranked by their average stability score across all models.

Provider	Models	Avg Stability	Most Stable Model	Most Volatile Model
poolside	2	100.0	Laguna XS.2 (free)(100)	Laguna XS.2 (free)(100)
~anthropic	3	100.0	Anthropic Claude Haiku Latest(100)	Anthropic Claude Haiku Latest(100)
~openai	2	100.0	OpenAI GPT Mini Latest(100)	OpenAI GPT Mini Latest(100)
~google	2	100.0	Google Gemini Pro Latest(100)	Google Gemini Pro Latest(100)
~moonshotai	1	100.0	MoonshotAI Kimi Latest(100)	MoonshotAI Kimi Latest(100)
essentialai	1	100.0	Rnj 1 Instruct(100)	Rnj 1 Instruct(100)
deepcogito	1	99.9	Cogito v2.1 671B(100)	Cogito v2.1 671B(100)
xAI	4	98.0	Grok 4.20(100)	Grok Build 0.1(92)
Writer	1	97.9	Palmyra X5(98)	Palmyra X5(98)
inclusionai	3	97.3	Ling-2.6-1T(100)	Ring-2.6-1T(92)
Kuaishou	1	97.0	KAT-Coder-Pro V2(97)	KAT-Coder-Pro V2(97)
Upstage	1	95.9	Solar Pro 3(96)	Solar Pro 3(96)
NVIDIA	9	94.9	Nemotron 3 Nano Omni (free)(100)	Llama 3.3 Nemotron Super 49B V1.5(78)
AI21 Labs	1	93.0	Jamba Large 1.7(93)	Jamba Large 1.7(93)
Inception	1	92.2	Mercury 2(92)	Mercury 2(92)
perceptron	1	92.0	Perceptron Mk1(92)	Perceptron Mk1(92)
Windsurf	1	91.5	SWE-1.5(92)	SWE-1.5(92)
Liquid AI	3	91.4	LFM2.5-1.2B-Thinking (free)(96)	LFM2.5-1.2B-Instruct (free)(89)
Amazon	5	90.7	Nova 2 Lite(99)	Nova Micro 1.0(81)
Google	24	90.1	Gemma 4 31B (free)(100)	Gemini 2.0 Flash(72)
rekaai	2	89.9	Reka Edge(94)	Reka Flash 3(86)
Alibaba	47	89.0	Qwen3.6 Max Preview(100)	Qwen3 VL 235B A22B Instruct(67)
Anthropic	13	88.6	Claude Opus 4.7 (Fast)(100)	Claude 3 Haiku(67)
Tencent	2	88.3	Hunyuan A13B Instruct(91)	Hy3 preview(85)
Perplexity	5	88.0	Sonar Pro Search(92)	Sonar(85)
Baidu	5	88.0	ERNIE 4.5 VL 28B A3B(98)	ERNIE 4.5 300B A47B (77)
aion-labs	3	87.0	Aion-2.0(99)	Aion-1.0(81)
ByteDance	5	85.9	Seed 1.6 Flash(89)	UI-TARS 7B (84)
Mistral AI	22	83.8	Mistral Medium 3.5(100)	Mistral Large 3 2512(62)
Moonshot AI	6	83.5	Kimi K2.6(100)	Kimi K2 0905(67)
OpenAI	59	82.9	GPT-5.5 Pro(100)	GPT-4o(56)
arcee-ai	5	82.8	Trinity Mini(97)	Trinity Large Thinking(74)
IBM	2	80.4	Granite 4.0 Micro(84)	Granite 4.1 8B(77)
MiniMax	8	79.7	MiniMax-01(100)	MiniMax M2.1(59)
Zhipu AI	12	79.7	GLM 5.1(100)	GLM 4.5 Air (free)(64)
Cursor	2	79.0	Composer 2(79)	Composer 2(79)
DeepSeek	13	77.0	DeepSeek V4 Flash(100)	DeepSeek V3(58)
Meta	10	75.3	Llama Guard 4 12B(86)	Llama 3.1 70B Instruct(59)
Xiaomi	5	74.5	MiMo-V2-Flash(95)	MiMo-V2-Omni(19)
Allen AI	1	74.2	Olmo 3 32B Think(74)	Olmo 3 32B Think(74)
Microsoft	2	71.8	Phi 4 Mini Instruct(82)	Phi 4(62)
Cohere	3	69.7	Command A(77)	Command R+ (08-2024)(65)
StepFun	1	22.2	Step 3.5 Flash(22)	Step 3.5 Flash(22)

Stability Distribution

How stability scores are distributed across all 300 tracked models.

0–10

10–20

20–30

30–40

40–50

50–60

60–70

70–80

80–90

90–100

149

What Makes a Model Stable?

Our stability scoring system uses three key signals to measure how consistently a model performs over time.

Rank Consistency

The most direct measure of stability. Models lose up to 25 points for large 24-hour rank changes (5 points per rank position moved) and up to 21 points for 7-day changes (3 points per position). Models that hold their rank tightly score higher.

State Classification

Each model has a state reflecting its overall reliability. Models in a "stable" state receive a 10-point bonus, while "fragile" models are penalized 15 points. This captures systemic reliability beyond simple rank movement.

Sparkline Volatility

The 14-day sparkline data reveals hidden volatility. We compute the standard deviation of the sparkline and subtract up to 20 points. Even models that end where they started can be penalized if they oscillated wildly along the way.

All Trackers

Coding, image, and video model trackers

Degradation Tracker

Detect models with declining performance

Coding Tracker

Daily coding model performance and rankings

Frequently Asked Questions

The stability score starts at 100 and is reduced based on three factors: 24-hour rank changes (up to -25 points, at 5 per position moved), 7-day rank changes (up to -21 points, at 3 per position), and sparkline volatility measured by standard deviation (up to -20 points). Models in a "stable" state get a +10 bonus, while "fragile" models lose 15 points.

Models are classified into four tiers based on their stability score: "Rock Solid" (85-100) means extremely consistent performance with minimal fluctuation. "Consistent" (70-84) means generally reliable with minor variations. "Variable" (50-69) shows noticeable ranking fluctuations. "Volatile" (below 50) indicates significant instability and unpredictable performance.

Stability indicates how predictably a model will perform over time. A highly rated but volatile model may deliver inconsistent results, which is problematic for production applications requiring reliable output quality. Stable models provide more predictable performance, making them safer choices for mission-critical workloads even if they do not always hold the top rank.

Model Stability Report

Rock Solid

177

Consistent

Variable

Volatile

Stability Classification Distribution

LMMarketCap.com

Provider Stability Rankings (Avg Score)

LMMarketCap.com

Most Stable Models

Top 20 models with the highest stability scores. These models maintain consistent rankings with minimal volatility.

#	Model	Provider	Score	Stability	24h	7d	State	Rank Spread
1	Claude Opus 4.7 (Fast)Anthropic	Anthropic	94.7	100	0	0	stable	±1
2	GPT-5.5 ProOpenAI	OpenAI	90.3	100	0	0	stable	±2
3	Claude Opus 4.6 (Fast)Anthropic	Anthropic	90.0	100	0	0	stable	±2
4	Grok 4.20xAI	xAI	88.3	100	0	0	stable	±2
5	Grok 4.20 Multi-AgentxAI	xAI	87.4	100	0	0	stable	±2
6	Gemma 4 31B (free)Google	Google	80.1	100	0	0	stable	±2
7	Gemini 3.5 FlashGoogle	Google	78.8	100	0	0	stable	±2
8	GPT-5.4 NanoOpenAI	OpenAI	78.8	100	0	0	stable	±2
9	GPT-5.4 MiniOpenAI	OpenAI	78.8	100	0	0	stable	±2
10	DeepSeek V4 FlashDeepSeek	DeepSeek	77.2	100	0	0	stable	±2
11	DeepSeek V4 Flash (free)DeepSeek	DeepSeek	76.4	100	0	0	stable	±2
12	GLM 5.1Zhipu AI	Zhipu AI	76.0	100	0	0	stable	±2
13	Kimi K2.6Moonshot AI	Moonshot AI	75.5	100	0	0	stable	±2
14	Grok 4.3xAI	xAI	74.9	100	-1	-1	stable	±2
15	Qwen3.6 Max PreviewAlibaba	Alibaba	74.3	100	-1	-1	stable	±2
16	Gemma 4 26B A4B (free)Google	Google	72.7	100	0	-1	stable	±2
17	Gemma 4 26B A4B Google	Google	72.7	100	0	-1	stable	±2
18	GPT Chat LatestOpenAI	OpenAI	40.0	100	-2	0	stable	±2
19	Mistral Medium 3.5Mistral AI	Mistral AI	40.0	100	-2	0	stable	±2
20	Nemotron 3 Nano Omni (free)NVIDIA	NVIDIA	40.0	100	-2	0	stable	±2

Most Volatile Models

Bottom 20 models with the lowest stability scores. These models show significant ranking fluctuations or inconsistent states.

#	Model	Provider	Score	Stability	24h	7d	State	Rank Spread
1	MiMo-V2-OmniXiaomi	Xiaomi	69.7	19	+109	+111	fragile	±2
2	Step 3.5 FlashStepFun	StepFun	66.5	22	-5	-6	fragile	±2
3	GPT-4oOpenAI	OpenAI	70.8	56	+6	-3	stable	±2
4	DeepSeek V3DeepSeek	DeepSeek	69.0	58	-4	-4	stable	±2
5	Llama 3.1 70B InstructMeta	Meta	64.9	59	-7	-2	stable	±2
6	MiniMax M2.1MiniMax	MiniMax	69.5	59	-7	-2	stable	±2
7	GPT-4o (2024-08-06)OpenAI	OpenAI	70.8	59	+7	-2	stable	±2
8	Llama 3.1 8B InstructMeta	Meta	44.1	62	-5	-1	stable	±2
9	GPT-4o (2024-11-20)OpenAI	OpenAI	52.5	62	+9	-1	stable	±2
10	GPT-4o-mini (2024-07-18)OpenAI	OpenAI	56.1	62	+7	-1	stable	±2
11	Phi 4Microsoft	Microsoft	59.9	62	-9	-1	stable	±2
12	Mistral Large 3 2512Mistral AI	Mistral AI	66.6	62	-5	-1	stable	±2
13	DeepSeek V3.2 ExpDeepSeek	DeepSeek	69.8	62	-6	-1	stable	±2
14	DeepSeek V3.2DeepSeek	DeepSeek	69.9	62	-6	-1	stable	±2
15	GPT-4o Search PreviewOpenAI	OpenAI	70.0	62	+9	-1	stable	±2
16	GPT-4o AudioOpenAI	OpenAI	70.0	62	+9	-1	stable	±2
17	GPT-4o (2024-05-13)OpenAI	OpenAI	70.8	62	+8	-1	stable	±2
18	DeepSeek V3 0324DeepSeek	DeepSeek	71.4	62	-8	-1	stable	±2
19	R1 Distill Llama 70BDeepSeek	DeepSeek	42.0	63	+148	-1	stable	±2
20	Devstral Small 1.1Mistral AI	Mistral AI	46.8	63	-5	-1	stable	±2

Stability by Provider

Aggregated stability metrics per provider. Providers are ranked by their average stability score across all models.

Provider	Models	Avg Stability	Most Stable Model	Most Volatile Model
poolside	2	100.0	Laguna XS.2 (free)(100)	Laguna XS.2 (free)(100)
~anthropic	3	100.0	Anthropic Claude Haiku Latest(100)	Anthropic Claude Haiku Latest(100)
~openai	2	100.0	OpenAI GPT Mini Latest(100)	OpenAI GPT Mini Latest(100)
~google	2	100.0	Google Gemini Pro Latest(100)	Google Gemini Pro Latest(100)
~moonshotai	1	100.0	MoonshotAI Kimi Latest(100)	MoonshotAI Kimi Latest(100)
essentialai	1	100.0	Rnj 1 Instruct(100)	Rnj 1 Instruct(100)
deepcogito	1	99.9	Cogito v2.1 671B(100)	Cogito v2.1 671B(100)
xAI	4	98.0	Grok 4.20(100)	Grok Build 0.1(92)
Writer	1	97.9	Palmyra X5(98)	Palmyra X5(98)
inclusionai	3	97.3	Ling-2.6-1T(100)	Ring-2.6-1T(92)
Kuaishou	1	97.0	KAT-Coder-Pro V2(97)	KAT-Coder-Pro V2(97)
Upstage	1	95.9	Solar Pro 3(96)	Solar Pro 3(96)
NVIDIA	9	94.9	Nemotron 3 Nano Omni (free)(100)	Llama 3.3 Nemotron Super 49B V1.5(78)
AI21 Labs	1	93.0	Jamba Large 1.7(93)	Jamba Large 1.7(93)
Inception	1	92.2	Mercury 2(92)	Mercury 2(92)
perceptron	1	92.0	Perceptron Mk1(92)	Perceptron Mk1(92)
Windsurf	1	91.5	SWE-1.5(92)	SWE-1.5(92)
Liquid AI	3	91.4	LFM2.5-1.2B-Thinking (free)(96)	LFM2.5-1.2B-Instruct (free)(89)
Amazon	5	90.7	Nova 2 Lite(99)	Nova Micro 1.0(81)
Google	24	90.1	Gemma 4 31B (free)(100)	Gemini 2.0 Flash(72)
rekaai	2	89.9	Reka Edge(94)	Reka Flash 3(86)
Alibaba	47	89.0	Qwen3.6 Max Preview(100)	Qwen3 VL 235B A22B Instruct(67)
Anthropic	13	88.6	Claude Opus 4.7 (Fast)(100)	Claude 3 Haiku(67)
Tencent	2	88.3	Hunyuan A13B Instruct(91)	Hy3 preview(85)
Perplexity	5	88.0	Sonar Pro Search(92)	Sonar(85)
Baidu	5	88.0	ERNIE 4.5 VL 28B A3B(98)	ERNIE 4.5 300B A47B (77)
aion-labs	3	87.0	Aion-2.0(99)	Aion-1.0(81)
ByteDance	5	85.9	Seed 1.6 Flash(89)	UI-TARS 7B (84)
Mistral AI	22	83.8	Mistral Medium 3.5(100)	Mistral Large 3 2512(62)
Moonshot AI	6	83.5	Kimi K2.6(100)	Kimi K2 0905(67)
OpenAI	59	82.9	GPT-5.5 Pro(100)	GPT-4o(56)
arcee-ai	5	82.8	Trinity Mini(97)	Trinity Large Thinking(74)
IBM	2	80.4	Granite 4.0 Micro(84)	Granite 4.1 8B(77)
MiniMax	8	79.7	MiniMax-01(100)	MiniMax M2.1(59)
Zhipu AI	12	79.7	GLM 5.1(100)	GLM 4.5 Air (free)(64)
Cursor	2	79.0	Composer 2(79)	Composer 2(79)
DeepSeek	13	77.0	DeepSeek V4 Flash(100)	DeepSeek V3(58)
Meta	10	75.3	Llama Guard 4 12B(86)	Llama 3.1 70B Instruct(59)
Xiaomi	5	74.5	MiMo-V2-Flash(95)	MiMo-V2-Omni(19)
Allen AI	1	74.2	Olmo 3 32B Think(74)	Olmo 3 32B Think(74)
Microsoft	2	71.8	Phi 4 Mini Instruct(82)	Phi 4(62)
Cohere	3	69.7	Command A(77)	Command R+ (08-2024)(65)
StepFun	1	22.2	Step 3.5 Flash(22)	Step 3.5 Flash(22)

Stability Distribution

How stability scores are distributed across all 300 tracked models.

0–10

10–20

20–30

30–40

40–50

50–60

60–70

70–80

80–90

90–100

149

What Makes a Model Stable?

Our stability scoring system uses three key signals to measure how consistently a model performs over time.

Rank Consistency

State Classification

Sparkline Volatility

All Trackers

Coding, image, and video model trackers

Degradation Tracker

Detect models with declining performance

Coding Tracker

Daily coding model performance and rankings

Frequently Asked Questions

Model Stability Report

Stability Classification Distribution

Provider Stability Rankings (Avg Score)

Most Stable Models

Most Volatile Models

Stability by Provider

Stability Distribution

What Makes a Model Stable?

Rank Consistency

State Classification

Sparkline Volatility

Related

Model Stability Report

Stability Classification Distribution

Provider Stability Rankings (Avg Score)

Most Stable Models

Most Volatile Models

Stability by Provider

Stability Distribution

What Makes a Model Stable?

Rank Consistency

State Classification

Sparkline Volatility

Related