Closed-Form Logit Steering

COLLINS WESTNEDGE
OCT 8, 2025


Goal

The goal is to derive the closed-form minimal perturbation to an input x that achieves any target probability p ∈ (0,1) in logistic regression.


Identities & Intuition

Model components

Geometry

Approach

Move \(x\) along \(w\) by some \(\lambda\): \[ x' = x + \lambda w \] with \[ \operatorname{logit}(p) = w^\top x' + b. \]


Derivation

  1. Set up the constraint: Since the model’s score is \(z = w^\top x' + b\) and we want probability \(p\), we require \(w^\top x' + b = \operatorname{logit}(p)\).

  2. Plug in \(x'\) (use \(w^\top w=\|w\|^2\)): \[ \begin{aligned} \operatorname{logit}(p) &= w^\top(x+\lambda w)+b \\ &= w^\top x + \lambda\, w^\top w + b \\ &= w^\top x + \lambda\|w\|^{2} + b. \end{aligned} \]

  3. Solve for \(\lambda\): \[ \lambda = \frac{\operatorname{logit}(p) - (w^\top x + b)}{\|w\|^{2}}. \]

  4. Substitute \(\lambda\) into \(x' = x + \lambda w\): \[ x' = x + \frac{\operatorname{logit}(p) - (w^\top x + b)}{\|w\|^{2}}\,w. \]


Final Formula

\[ \boxed{ x' = x + \frac{\operatorname{logit}(p) - (w^\top x + b)}{\|w\|^{2}}\,w } \]

Where \(x' = x + \lambda w\) achieves the target probability \(p\).


Interactive Demo

Interactive 3D Visualization

Open interactive visualization β†’


Applications