Before You Merge: Predicting LoRA Dominance with Geometry
Claim: Spectral metrics computed before merging predict whether one LoRA adapter will dominate the other after merging.
Evidence (27 cross-task pairs + 9 same-task calibration pairs, 3 seeds per task, Mistral-7B):
- Subspace overlap correlates with post-merge dominance at r = 0.91 (p < 0.001).
- Compatibility score rank-orders merge retention at Spearman ρ = 1.0 (p < 0.01).
- Same-task overlap is 2.4× cross-task overlap (0.473 vs. 0.200) — proving the metric reads genuine subspace structure.
Artifact: gradience merge-audit produces a per-layer compatibility report with overlap, scale balance, and risk assessment — before you spend the compute to run the merge.
The Problem: The Engineering Tax of Blind Merging
You trained two LoRA adapters — one for code, one for chat. Both work exceptionally well on their own. You merge them. The resulting model is fluent at conversation but can't write a function to save its life.
This is the merge dominance problem. One adapter's learned subspace overwhelms the other during combination. The merged model retains one capability and destroys the other. You usually discover this after the merge, after the evaluation, and after a wasted afternoon.
If you maintain a collection of task-specific adapters, merge attempts are an engineering tax: you pay for evaluation compute, you pay for silent failures, and you pay in time spent guessing why some merges work and others don't.
In our previous post, we established that spectral metrics read genuine structure in individual LoRA adapters—structure invisible to loss curves but highly predictive of safe compression targets. gradience merge-audit extends this geometric lens to adapter relationships, acting as a pre-flight check that reduces the number of merges you attempt blindly.
pip install gradience
gradience merge-audit --adapter-a ./chat-adapter --adapter-b ./code-adapter
The Solution: Pre-Merge Spectral Metrics
For every pair of adapters, we compute three families of spectral compatibility metrics before they ever touch:
- Subspace overlap: Measures how much two adapters learned in shared directions. We extract orthonormal bases from each adapter's SVD (using an adaptive k derived from energy rank) and compute principal angles between them. High overlap means the adapters occupy similar regions of weight space; linearly merging them will create interference.
- Scale balance: Measures whether two adapters have similar magnitudes. If one adapter's singular values are an order of magnitude larger, it dominates the merge regardless of subspace alignment.
- Rank compatibility: Compares the effective dimensionality (stable rank) of the two adapters. A massive mismatch suggests fundamentally different learned representations.
The Results: Predicting Dominance
We tested these metrics across 27 cross-task pairs and 9 same-task calibration pairs using Mistral-7B (Tasks: Chat/Instruction, GSM8K Math, Code Generation). Every pair was merged via linear averaging at three different weight configurations and evaluated on both tasks.
We tracked two primary outcomes:
- R_min (retention-min): The minimum retention ratio across both tasks. If the merged adapter retains 95% of task A's score but only 80% of task B's, R_min is 0.80. This punishes merges that sacrifice one task for the other.
- D (dominance index): Defined as |retention_A − retention_B| after merging. High D means one task completely dominated.
1. Dominance Prediction is Highly Accurate
When two adapters share a subspace, one eats the other. The spectral audit detects this before the merge even happens.
| Metric | Correlation with D | p-value |
|---|---|---|
| mean_overlap | r = 0.91 | p < 0.001 |
| compatibility_score | ρ = 1.0 | p < 0.01 |
| frob_bounded_ratio | r = 0.85 | p < 0.01 |
2. Retention Prediction Ranks Difficulty Perfectly
Without knowing anything about the underlying tasks, the spectral audit successfully ranked the merge difficulty based purely on singular values:
| Pair | Pre-Merge Compatibility Score | Empirical Outcome |
|---|---|---|
| Chat + Code | 0.286–0.290 | Highest retention |
| Chat + Math | 0.187–0.190 | Moderate retention |
| Math + Code | 0.175–0.180 | Lowest retention |
This matches practitioner intuition—chat and code share more linguistic structure than abstract math and code—but the audit derived this entirely from the geometry of the weights. The rank ordering is exceptionally reliable (Spearman ρ = 1.0, p < 0.01).
3. Null Controls Prove it Isn't Noise
If the overlap metric were measuring statistical noise, same-task pairs (e.g., chat seed 1 vs. chat seed 2) and cross-task pairs would look identical. They don't. Same-task pairs showed a mean overlap of 0.473, while cross-task pairs showed 0.200. That is a 2.4× separation (p < 0.001). The metric is reading genuine subspace structure created by different training objectives.
Transparent Limitations
- Linear averaging only (for now): These results map to linear merges. Robust methods like TIES or DARE may reduce the severity of the predicted interference. We expect the diagnostic directional signal to remain informative, but the failure thresholds will shift.
- Scale vs. Direction: High overlap with balanced scale may be less harmful than low overlap with extreme scale imbalance. The exact interaction between overlap and scale is an active area of our investigation.
- Controlled Adapters: Our initial validation used adapters trained under controlled conditions. Validation on popular "wild" Hugging Face Hub adapters—with completely unknown training histories—is our next step.
What's Next?
If compressing an adapter reduces its redundant subspace (as shown in Post 1), does it also reduce merge interference? We are currently testing whether auditing and compressing adapters before merging them produces better outcomes than merging raw, over-parameterized adapters.
Stop merging blindly. Audit your geometry first.
Glossary
| Term | Definition |
|---|---|
| Subspace overlap | mean cos²(principal angles) between top-k subspaces of two adapters |
| Dominance (D) | post-merge imbalance: |retention_A − retention_B| |
| R_min | minimum score retention ratio across all merged tasks |
Reproducibility & Links
| Base model | mistralai/Mistral-7B-v0.1 |
| Tasks | Chat/instruction, GSM8K (exact match), code generation |
| LoRA config | r=64, alpha=64, target_modules=[q_proj, k_proj, v_proj, o_proj] |
| Seeds | 42, 123, 456 |
| Merge method | Linear averaging (weights: 0.5/0.5, 0.7/0.3, 0.3/0.7) |
| Subspace k | Adaptive per layer: k = energy_rank_90(layer) |
| Code | gradience merge-audit --adapter-a X --adapter-b Y |
| PyPI | pypi.org/project/gradience/0.11.0 |
| Repo | github.com/johntnanney/gradience |
| License | Apache 2.0 |
@software{gradience2026,
title = {Gradience: Spectral Analysis of Low-Rank Adaptation Dynamics},
author = {Nanney, John T.},
year = {2026},
url = {https://github.com/johntnanney/gradience}
}