Chapter 10: Multi-Contact In-Hand Manipulation — Rearranging Inside the Hand
Overview
In-hand manipulation is not only rotating an object inside the hand. In manufacturing, it includes shifting an object after grasping, creating a new contact, aligning a part for insertion, and freeing fingers for the next object. The Cosmax active in-hand rearrangement problem is the anchor for this chapter [1].
After reading this chapter, you should be able to... - Distinguish rolling, sliding, finger gaiting, and regrasping. - Explain how tactile feedback complements object pose and contact-mode estimation. - Decompose sequential multi-object grasping as a control problem. - Describe how model-based MPC and RL policies can be combined.
10.1 Contact-Maintaining vs Contact-Switching Tasks
The Cosmax material divides in-hand manipulation into two families. In contact-maintaining tasks, the fingers keep contact throughout the motion, such as rotating a ball in the palm. In contact-switching tasks, new contacts appear and old contacts break. Sequential multi-object grasping is the second family: the hand must keep the first object stable, release or reposition some fingers, and create contacts for the second object.
Contact-switching is difficult because:
- the dynamics change discontinuously when the mode changes;
- object pose is often occluded by the hand;
- releasing a finger reduces the force-closure margin.
Tactile feedback is therefore a safety gate for contact transitions, not just an auxiliary perception signal.
10.2 Four Basic Motions
| Motion | Meaning | Tactile-control focus |
|---|---|---|
| Rolling | move the object while maintaining contact | contact-patch motion, normal-force stability |
| Sliding | create intentional slip at a contact | shear force and slip velocity |
| Finger gaiting | release and reattach fingers to change the contact set | force closure of remaining contacts |
| Palm/finger pivot | rotate around palm or a selected finger | support point and torque balance |
Real tasks combine these motions. A hand may roll the object, gait a finger to free workspace, and slide the object into final alignment. Tactile sensing provides the contact centroid, patch geometry, and force vector needed to stabilize each transition.
10.3 Sequential Multi-Object Grasping Pipeline
The Cosmax pipeline has three stages:
- First grasp: sample and execute a pre-grasp pose using an object-point-cloud-conditioned model.
- In-hand rearrangement: move the first object from the first-grasp configuration toward a second-grasp-compatible configuration, using MPC plus residual RL.
- Second grasp: optimize a multi-object pre-grasp pose and maintain stability with a lifting policy.
Tactile sensing has a different role at each stage:
| Stage | Tactile role | Failure signal |
|---|---|---|
| First grasp | contact verification, grip-force adjustment | one-sided contact, increasing slip |
| Rearrangement | contact-mode estimation, force redistribution | closure margin collapse, pose drift under occlusion |
| Second grasp | maintain first object while validating new contact | unexpected internal force, finger interference |
The key is that rearrangement is its own module. Static DoF allocation preassigns fingers to objects. Active rearrangement dynamically recovers DoF by moving the first object inside the hand.
10.4 Why SE(3) Rearrangement Is Hard
The 2026-06-05 Cosmax material reports that a Jiang-style model-based method could reproduce SO(3) sphere rotation, but failed on a simple SE(3) target-pose rearrangement test [2] [3]. This is an important negative result. The difficulty of in-hand manipulation increases sharply when position and orientation must change together.
SE(3) rearrangement introduces:
- changes in support against gravity;
- changes in the wrench cone when a finger releases contact;
- larger object-pose errors due to occlusion;
- tighter finger workspace and collision constraints.
Thus tactile multi-contact control cannot rely on a target pose alone. It needs intermediate object reference motion and contact-schedule hints.
10.5 Simulation and Real-Hand Transfer
Multi-contact control is expensive to debug directly on hardware. The Cosmax plan therefore builds an Isaac Sim/Isaac Lab environment for a Tesollo DG-5F-M hand mounted on a Franka Panda arm [2]. This matches the S9 manufacturing Physical AI frame: manufacturers should first build bounded digital-twin cells, task schemas, failure logs, and evaluation harnesses.
A useful simulator must include:
- accurate hand URDF/MJCF and actuator limits;
- fingertip rubber/compliant contact;
- tactile or proxy-force sensing;
- object assets that start simple and expand toward real SKUs;
- grasp-pose generation and multi-object candidate filtering;
- logging schemas that match the real robot.
Without this stack, RL wastes samples and MPC breaks under real friction and compliance.
Summary
Multi-contact in-hand manipulation is where tactile sensing, contact dynamics, grasp planning, and learning meet. In sequential multi-object manufacturing tasks, the key skill is active rearrangement: keeping the first object stable while freeing fingers for the next one. Tactile sensing judges whether contact transitions are safe and reduces the residual that learning must absorb.
Manufacturing-Cell Checkpoint
Multi-contact in-hand manipulation is where manufacturing difficulty rises quickly. A rigid pick-and-place task may survive with vision and gripper force alone. Once the hand rolls or shifts an object to free a finger for a second grasp, the number of contacts, their locations, internal forces, and slip margins all change. Tactile sensing does not need to solve full object pose perfectly; it must tell the controller which contacts can be released and which must be preserved.
Production experiments should begin with a small transition set rather than a broad dexterity benchmark. Palm support, index release, thumb-index pinch transition, and second-object approach can be tested as atomic contact transitions. For each transition, log slip margin and force-closure margin. Those logs reveal where contact-implicit planning, residual RL, or diffusion policies actually fail.
Operational Reading Note
The practical value of this chapter is not only the concept of multi-contact in-hand manipulation; it is the set of engineering decisions that the concept changes. A deployable robot-hand project should start by asking what state becomes observable after this chapter is applied. The answer should be concrete: contact existence, contact patch, normal force, shear direction, slip margin, object pose, task phase, operator override, or product-damage risk. If a variable cannot be logged or consumed by a controller, it remains an explanatory idea rather than a system capability.
The second decision is the unit of evidence. Research demos often report one success metric, but tactile manipulation improves through failure records. A useful attempt record contains the object or SKU, the selected grasp candidate, the robot hand and sensor configuration, calibration version, task phase, tactile summary, policy action, safety intervention, and final outcome. This record is what connects the sensor chapters to the data chapter, the control chapters to the learning chapters, and the manufacturing chapters to QA.
The third decision is where the chapter sits in the control stack. Some ideas belong in fast reflex loops, some in contact MPC, some in policy inputs, and some only in offline diagnosis. Mixing these time scales creates brittle systems: a VLA cannot react to millisecond slip, and a low-level force controller cannot infer the next process step. The right architecture separates fast contact stabilization, mid-level grasp or rearrangement control, and slow task reasoning.
Finally, the chapter should be evaluated by the failure modes it removes. A method that improves benchmark success but leaves the team unable to distinguish perception failure, contact-acquisition failure, force-closure failure, execution-time slip, or maintenance drift is not yet production-ready. A method with slightly lower headline performance but better logs, safer force limits, and clearer recovery hooks may be the stronger basis for manufacturing Physical AI.
Chapter-Specific Implementation Framework
Turning multi-contact in-hand manipulation into a working system begins with state definition. The concept should not remain an abstract performance claim; it should become a variable that a controller and a logger can both read. For this chapter, the relevant state may include contact graph, contact patch, normal force, shear direction, object pose, task phase, safety margin, operator override, and product-damage risk. Each variable needs a coordinate frame, a timestamp convention, a calibration version, and an owner in the control stack. Without this discipline, a successful trial is hard to explain and a failed trial is almost impossible to diagnose.
The second step is time-scale separation. A fast loop at hundreds of hertz or 1 kHz should handle motor current, force derivatives, shear spikes, and slip reflexes. A mid-level loop at tens of hertz should update contact pose, grasp phase, and reference finger motion. A slower task loop should reason over object identity, SKU, fixture state, instruction, and the next grasp candidate. multi-contact in-hand manipulation must be assigned to the right layer. A VLA cannot react to millisecond slip. A low-level force controller cannot infer the next process step. A robust architecture lets these layers communicate through compact state summaries rather than forcing every signal into one monolithic policy.
The third step is a record schema. A useful attempt record should contain attempt id, robot-hand model, sensor layout, calibration version, task phase, object or SKU id, selected grasp, measured contact patch, normal and shear force summary, slip event, policy output, safety intervention, operator note, and final outcome. In a manufacturing cell this record is also a QA trace. A research demo can be persuasive with a video, but a production experiment needs replayable evidence. For that reason, the result table for multi-contact in-hand manipulation should include failure-type distribution, retry count, product-damage rate, cycle-time variance, and intervention frequency alongside success rate.
The fourth step is a small test protocol. Starting with every object and every hand motion makes failures uninterpretable. A better protocol begins with atomic tasks: contact acquisition, stable hold, controlled release, contact switch, recovery after slip, and force-limited correction. The next stage composes two or three atoms into sequential manipulation. Only after that should the system attempt a Cosmax-style first grasp, in-hand rearrangement, and second grasp sequence. This staged protocol reveals whether multi-contact in-hand manipulation actually removes a failure mode or merely shifts the failure later in the trajectory.
The fifth step is to treat hardware and maintenance as experimental variables. The same algorithm can behave differently when gel surfaces wear, pads become contaminated, cable tension changes, a sensor is replaced, calibration drifts, backlash grows, or surface humidity changes. The log therefore needs software version, pad age, cleaning state, calibration time, replacement event, and fault code. These fields are not administrative details. They determine whether a performance drop comes from the learned policy, the contact model, the sensor, the hand mechanics, or the production environment.
The sixth step is failure-driven decision making. The team should ask which failure class improves after adding multi-contact in-hand manipulation: perception before contact, contact acquisition, force-closure insufficiency, execution-time slip, collision, product damage, or operator override. If the answer is unclear, the method is not yet actionable. If the answer is clear, the next investment becomes much easier to choose. A contact-state problem suggests better sensing or calibration. A closure-margin problem suggests hand geometry or force control. A replay mismatch suggests simulation fidelity. A repeated intervention suggests task design, fixture design, or operator workflow.
| Implementation question | Evidence to log | Passing criterion |
|---|---|---|
| Is the state observable? | sensor packet, calibrated value, contact frame | controller and QA read the same value |
| Are control layers separated? | fast reflex, mid-level planner, slow policy timestamps | fast contact events do not wait for slow task reasoning |
| Can failures be classified? | failure type, task phase, intervention note | root cause narrows to a small set of candidates |
| Is maintenance visible? | pad age, calibration version, replacement event | hardware drift can be separated from policy error |
| Does it connect to manufacturing KPI? | cycle time, damage rate, retry count, downtime | research success translates into operating metrics |
Validation Protocol: From Demonstration to Repeatable Evidence
The method in this chapter should be validated as a repeatable experiment, not as a single successful demonstration. The first step is to lock the reset condition. Object pose, hand initialization, sensor calibration, pad condition, lighting, fixture state, and software version should be recorded before every trial. If those variables drift silently, the team cannot tell whether tactile feedback improved the behavior or whether the experiment simply became easier.
The second step is planned disturbance. Rotate the object slightly, vary surface friction, delay one fingertip contact, perturb the grasp candidate, or introduce a mild occlusion. A tactile method should degrade gracefully under these disturbances. More importantly, the log should show which signal was used for recovery: normal force, shear direction, slip event, contact patch migration, motor current, or a learned latent state. Without planned disturbance, the system may look robust while only succeeding in the narrow reset condition.
The third step is ablation. Compare no tactile input, normal force only, normal plus shear, slip-event tokens, and the full tactile summary. If performance improves only when the full high-dimensional stream is used, the method may be powerful but expensive. If a compact contact summary gives most of the gain, it may be the better manufacturing design because it is easier to log, debug, and transmit across control layers.
The fourth step is recovery-oriented metrics. A contact-rich system will still fail. The question is whether it notices earlier, recovers faster, retries safely, or leaves a clearer diagnosis. Useful metrics include time from slip onset to correction, force overshoot, contact reacquisition time, number of safe retries, intervention frequency, and product-damage near misses. These metrics often matter more than the final binary success rate.
The final step is deployment rehearsal. A researcher-adjusted experiment and an operator-run procedure are different systems. The operator should replace the sensor or pad, run calibration, start the task, stop after a fault, and export logs using the intended procedure. If performance collapses during this rehearsal, the bottleneck is not the policy alone; it is the integration and maintenance workflow. Passing this rehearsal is what turns a tactile manipulation method into a candidate for a manufacturing cell.
Control Design Pattern: Turning Tactile Signals into Actions
The four chapters in Part III return to the same operational question: when the hand receives tactile evidence, what should the fingers do next? A practical answer is a three-stage pattern. First, do not map the raw tactile signal directly to action. Convert it into contact belief: where the contact is, which direction force is applied, how much margin remains, whether slip is likely, and whether the next contact transition is feasible. Second, split that belief into a safety gate and a reference update. The safety gate prevents excessive force, slip, collision, and product-damage risk. The reference update changes target finger pose, target force, internal force, or release timing. Third, feed these results back to the higher-level policy so it can choose the next mode.
This pattern matters most in manufacturing multi-object tasks. Declaring that the first object is stable does not only mean that grip force is high enough. It means that the remaining contacts can support the object if one finger is released, that palm support is real rather than assumed, that the approach path for the second object is open, and that the slip margin is still acceptable. Tactile sensing updates this judgment continuously. Therefore the contact controller's output should not be a single command such as "grip harder." It should include a discrete mode such as hold, release, shift, roll, regrasp, or abort, plus finger-level references for position, force, or torque.
Experiments should evaluate the quality of these mode transitions. At each transition, check whether force spikes appear, whether the contact patch moves in the expected direction, whether shear remains inside the friction cone, whether object-pose estimates agree with tactile evidence, and whether the controller recovers without operator intervention. These logs reveal whether the problem is controller design, hand morphology, sensor placement, calibration, or the task fixture. Without transition-level evidence, a high success rate can hide a brittle policy that only works because the reset condition is narrow.
The same pattern also clarifies the relationship between model-based and learned control. A contact model can propose which mode transition should be possible. A learned residual can compensate for friction, compliance, and unmodeled geometry. Tactile feedback decides whether the proposed transition is actually happening. If the three disagree, the system should slow down, increase observation, or abort rather than blindly completing the motion. This is the control-level meaning of using touch as a first-class signal.
Operator Handoff and Safe-Stop Criteria
In a manufacturing cell, tactile control must end in an operator-understandable procedure. The system should expose when it continues, when it slows down, and when it stops. If slip margin drops, the controller may increase grip force within the allowed envelope. If the contact patch leaves the expected region, it may attempt a regrasp. If force limits or product-damage risk are exceeded, it should enter abort mode immediately. These conditions should not be hidden inside the policy. The operator interface and QA log should use the same names as the controller.
A useful handoff record contains three fields. The first is phase: acquire, hold, shift, release, regrasp, or abort. The second is stop reason: slip, over-force, lost contact, pose uncertainty, collision risk, hardware fault, or calibration drift. The third is next action: automatic retry, operator confirmation, sensor cleaning, recalibration, fixture reset, or object removal. This makes tactile control a shared operating procedure rather than a black-box behavior. It also gives the engineering team a direct path from field failures back to controller changes, sensor maintenance, or task redesign.
References
- Cosmax Robotics Meeting. (2026a). Sequential multi-object grasping and active in-hand rearrangement problem statement. Internal meeting PDF, 2026-05-12. private source [Cosmax, 2026a]
- Cosmax Robotics Meeting. (2026b). Model-based approach vs RL-based approach for in-hand manipulation. Internal meeting PDF, 2026-06-05. private source [Cosmax, 2026b]
- Jiang, Z., et al. (2025). Robust model-based in-hand manipulation with integrated real-time motion-contact planning and tracking. arXiv:2505.04978. source [Jiang et al., 2025]
- Yang, L., et al. (2025). Multi-finger manipulation via trajectory optimization with differentiable rolling and geometric constraint. IEEE RA-L. source [Yang et al., 2025]
- Li, Y., et al. (2025). DROP: Dexterous reorientation via online planning. ICRA 2025. source [Li et al., 2025]
- Bansal, A., et al. (2026). EgoScale: Scaling human video to unlock dexterous robot intelligence. NVIDIA GEAR. source [Bansal et al., 2026]