The objective of this page is to understand the optimization functions in Vision and Robotics as a generic optimization function from the perspective of state and measurements. And differences between these optimization functions, for example, we discuss what is the difference between "Graph SLAM with landmarks" and "Bundle Adjustment"?
In any generic optimization problem, be it in Robotics or Computer Vision, we have sensor(s) which give us some information about the environment or robot or both in the form of measurements (for example, how far is a landmark from our robot's current position; or how much the robot has moved compared to its previous pose). Now, our goal is to find a set of states $\mathbf{x}$ (for example, poses) which best explain these measurements $\mathbf{z}{i}$. Also, the system is described by a set of $n$ observation functions $\left\{f{i}(\mathrm{x})\right\}{i=1: n}$ which maps our state $\mathbf{x}$ to a predicted measurement $\widehat{\mathbf{z}}{i}$. Since our goal is to find state $\mathbf{x}$ which explains measurements $\mathbf{z}{i}$, we want our "predicted" measurements $\widehat{\mathbf{z}}{i}$ to be as close as possible to "real" measurements $\mathbf{z}_{i}$.
$\mathbf{x}$: the state vector.
$\mathbf{z}_{i}$: a "real" measurement of the state $\mathbf{x}$.
$\widehat{\mathbf{z}}{i}=f{i}(\mathbf{x})$: observation function which maps $\mathbf{x}$ to a "predicted" measurement $\widehat{\mathbf{z}}_{i}$.
Objective: Estimate the state $\mathbf{x}$ which best explains the measurements $\mathbf{z}_{1:n}$.
Following the terminology of our least squares optimization page, "residual vector" is the difference between actual and predicted measurement:
$$ \mathbf{e}{i}(\mathbf{x})=\mathbf{z}{i}-f_{i}(\mathbf{x}) =\mathbf{z}{i}-\widehat{\mathbf{z}}{i} $$
Assume it is Gaussian error having zero mean with information matrix $\Omega_i$.
Our final squared error thus is a scalar:
$$ e_{i}(\mathbf{x})=\mathbf{e}{i}(\mathbf{x})^{T} \mathbf{\Omega}{i} \mathbf{e}_{i}(\mathbf{x}) $$
This is a Non-Linear Weighted Least Squares problem as we have seen in our least squares optimization page. We have seen how we can solve such optimization problems in the same page. In this page, we discuss optimization functions in Vision and Robotics.
<aside> 🎇 Optimizing variables: poses here. Our measurements are relative transforms between poses, either odometry (relative transforms between consecutive notes) or loop closure (relative transforms between arbitrary nodes).
</aside>
$$ \mathbf{e}{i}(\mathbf{x})=\mathbf{z}{i}-f_{i}(\mathbf{x}) =\mathbf{z}{i}-\widehat{\mathbf{z}}{i} $$
In SLAM, it is desirable to rewrite the above error term explicitly as follows to denote that these terms correspond to relative transformation between two nodes:
$$ \mathbf{e}{i j}\left(\mathbf{x}{i}, \mathbf{x}{j}\right)=\mathbf{z}{i j}-\hat{\mathbf{z}}{i j}\left(\mathbf{x}{i}, \mathbf{x}{j}\right) = \mathbf{u}{i j}- f(\mathbf{x}{i}, \mathbf{x}{j}) = \mathbf{u}{i j}- \hat{\mathbf{u}}{ij} $$
where $\mathbf{u}{ij}$'s are control inputs, either odometry (relative transforms between consecutive notes) or loop closure (relative transforms between arbitrary nodes). And $f(\mathbf{x}{i}, \mathbf{x}{j})$ is our "predicted" measurement $\hat{\mathbf{u}}{i}$.