Part IV: Learning and Transfer

Chapter 14: Sim-to-Real Transfer — From Virtual to Reality

Written: 2026-04-01 Last updated: 2026-06-09

Overview

Transferring policies learned in simulation to physical robots is a central challenge for tactile manipulation. Tactile sim-to-real is inherently harder than visual sim-to-real — due to gel deformation, multi-physics coupling, and contact model fidelity limitations. This chapter covers simulation engines, domain randomization, tactile sim-to-real, and Real-Sim-Real loops.

After reading this chapter, you will be able to... - Compare major tactile simulation engines (Isaac Gym, MuJoCo, Tacto, DiffTactile). - Explain DeXtreme's ADR approach. - Understand the unique challenges of tactile sim-to-real. - Describe the Real-Sim-Real loop concept and representative examples.

14.1 Simulation Engines: Isaac Gym/Lab, MuJoCo, Tacto, DiffTactile

Isaac Gym / Isaac Lab

NVIDIA's Isaac ecosystem is the standard for GPU-accelerated physics simulation: thousands of parallel environments, Newton physics engine, Omniverse digital twins. Core platform for DeXtreme[1] and GR00T[12].

Figure 14.1: Eight standard Isaac Gym environments — (top) Ant, Humanoid, Franka Cube-Stack, Ingenuity; (bottom) Shadow Hand, ANYmal, Allegro, TriFinger. Running thousands of parallel environments on a single GPU delivers a 2–3 orders-of-magnitude speedup over CPU-based RL. Source: Makoviychuk et al. (2021), Fig. 1.
Figure 14.1: Eight standard Isaac Gym environments — (top) Ant, Humanoid, Franka Cube-Stack, Ingenuity; (bottom) Shadow Hand, ANYmal, Allegro, TriFinger. Running thousands of parallel environments on a single GPU delivers a 2–3 orders-of-magnitude speedup over CPU-based RL. Source: Makoviychuk et al. (2021), Fig. 1.

MuJoCo

DeepMind's MuJoCo excels at contact-rich simulation. Used for ExoStart[6] [#9] dynamics filtering and OpenAI Dactyl [2].

Tacto (2022)

Meta FAIR's open-source simulator for vision-based tactile sensors (PyRender + PyBullet). Generates synthetic tactile images for GelSight/DIGIT. 150+ citations.

Figure 14.2: TACTO produces high-resolution, high-fidelity readings for vision-based tactile sensors at over 100 Hz. A modular architecture lets a single simulator model different sensors (GelSight, DIGIT) and plug into different physics engines. Source: Wang, Lambeta et al. (2022), Fig. 1.
Figure 14.2: TACTO produces high-resolution, high-fidelity readings for vision-based tactile sensors at over 100 Hz. A modular architecture lets a single simulator model different sensors (GelSight, DIGIT) and plug into different physics engines. Source: Wang, Lambeta et al. (2022), Fig. 1.

DiffTactile (2024)

Differentiable tactile simulator supporting gradient-based optimization with FEM-based deformation modeling.

TacEx (2024)

Integrates GelSight simulation into Isaac Sim for end-to-end research workflows (→ Chapter 16.2).

Engine GPU Accel. Tactile Differentiable Primary Use Representative
Isaac Gym/Lab Yes Indirect No Large-scale RL DeXtreme, GR00T
MuJoCo No Indirect No Contact-rich sim ExoStart, Dactyl
Tacto No Yes (vision) No Tactile image gen DIGIT sim-to-real
DiffTactile Partial Yes Yes Gradient optim. Contact optimization
TacEx Yes Yes (GelSight) No Integrated workflow Research pipeline

14.2 Domain Randomization: Automatic Domain Randomization in DeXtreme

Domain Randomization (DR) — randomly varying simulation physics parameters to make policies robust across conditions — is the most widely used sim-to-real strategy.

DeXtreme (2023)

Handa et al. [2023, NVIDIA] represents the state of the art in DR:

  • Automatic Domain Randomization (ADR): Simultaneous physics + non-physics randomization
  • Physics: friction, mass, joint stiffness, gravity
  • Non-physics: lighting, camera position, texture, background
  • Allegro Hand + Isaac Gym
  • Omniverse Replicator for synthetic visual data
Key Paper: Handa et al. 2023. "DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to Reality." ICRA 2023. ADR with simultaneous physics/non-physics randomization is the key to sim-to-real dexterous manipulation.
Figure 14.3: DeXtreme controlling an Allegro Hand to re-orient a cube in the real world. A policy trained in simulation under ADR transfers zero-shot to hardware and performs robust in-hand reorientation. Source: Handa et al. (2023), Fig. 1.
Figure 14.3: DeXtreme controlling an Allegro Hand to re-orient a cube in the real world. A policy trained in simulation under ADR transfers zero-shot to hardware and performs robust in-hand reorientation. Source: Handa et al. (2023), Fig. 1.

14.3 Tactile Sim-to-Real: Binary Tactile Skin [#13] Models and Zero-Shot Transfer

Tactile sim-to-real is fundamentally harder than visual sim-to-real: accurate gel deformation modeling is difficult; multi-physics coupling (optical + deformation + contact) is complex; sensor noise profiles differ between simulation and reality.

Yin et al.[4]'s binary 3-axis tactile skin model offers a practical solution:

  • Simplification: Continuous force → binary contact (contact yes/no + 3-axis direction)
  • 5,000 FPS simulation speed
  • Zero-shot sim-to-real transfer
  • 93% success on out-of-distribution objects

Key lesson: Simplified models with wide coverage can be more effective for sim-to-real transfer than precise but narrow tactile simulation.

Sim-to-Real RL for Humanoid Dexterous Manipulation[6] provides a practical recipe covering environment modeling, reward design, policy learning, and transfer.

Figure 14.4: Yin et al.'s binary 3-axis tactile skin model enabling zero-shot sim-to-real transfer for ternary shear and binary normal force sensing. An RL policy trained in simulation generalizes to unseen object geometries, new hand orientations, and novel object dynamics — evaluated across 190+ real-world rollouts. Source: Yin et al. (2024), Fig. 1.
Figure 14.4: Yin et al.'s binary 3-axis tactile skin model enabling zero-shot sim-to-real transfer for ternary shear and binary normal force sensing. An RL policy trained in simulation generalizes to unseen object geometries, new hand orientations, and novel object dynamics — evaluated across 190+ real-world rollouts. Source: Yin et al. (2024), Fig. 1.

14.4 Real-Sim-Real Loops: RoboPaint [#15], X-Sim, ExoStart

Real-Sim-Real integrates real data into simulation before transferring back to reality.

RoboPaint (2025)

3D Gaussian Splatting (3DGS) reconstructs real scenes in simulation, increasing visual fidelity.

X-Sim (2025)

Dan et al.[6]'s Real-to-Sim-to-Real pipeline: real human data → simulation → policy learning → real transfer.

ExoStart (2025)

The most data-efficient Real-Sim-Real example:

  1. ~10 exoskeleton demos (real)
  2. MuJoCo dynamics filtering (sim)
  3. Auto-curriculum RL (sim)
  4. ACT vision student (distillation)
  5. Zero-shot real transfer → >50% on 6/7 tasks
Key Paper: Si, Z., et al. (2025). "ExoStart: From 10 Exoskeleton Demos to Dexterous Robot Manipulation." 10 exoskeleton demos → dynamics filtering → auto-curriculum RL → zero-shot real. The exemplar of data-efficient Real-Sim-Real.
Figure 14.5: Overview of the ExoStart framework. (a) Human demonstration via a sensorized exoskeleton; (b) Dynamics filtering — trajectory optimization reconstructs dynamically feasible trajectories compatible with the robot; (c) Auto-curriculum RL plus vision-based policy distillation. A handful (~10) of demonstrations are amplified into massive parallel rollouts and then transferred zero-shot to the real robot. Source: Si et al. (2025), Fig. 1.
Figure 14.5: Overview of the ExoStart framework. (a) Human demonstration via a sensorized exoskeleton; (b) Dynamics filtering — trajectory optimization reconstructs dynamically feasible trajectories compatible with the robot; (c) Auto-curriculum RL plus vision-based policy distillation. A handful (~10) of demonstrations are amplified into massive parallel rollouts and then transferred zero-shot to the real robot. Source: Si et al. (2025), Fig. 1.

DexWM (2025)

Meta FAIR's DexWM [arXiv Dec 2025] learns a world model from human videos:

  • Combines 829 hours of human video + robot data to train a world model
  • Learns policies directly from the world model without explicit simulation
  • 83% real grasping success (zero-shot)
  • Unlike conventional Real-Sim-Real, learns dynamics directly from data without an explicit simulation engine
  • Sits between the co-training approaches (Chapter 15.6) and teleop-free approaches (Chapter 15.7)

14.5 Analyzing the Sim-to-Real Gap: Dynamics, Perception, Contact Models

Three sources of the sim-to-real gap:

14.5.1 Dynamics Gap

Joint friction, stiction, contact dynamics, and deformable material behavior are difficult to model faithfully. ADR overcomes this through robustness, not precision.

14.5.2 Perception Gap

Differences in camera images, depth maps, and tactile images between simulation and reality. Omniverse Replicator and RoboPaint's 3DGS reduce this.

14.5.3 Contact Model Gap

The hardest aspect of tactile sim-to-real. FEM is precise but expensive; analytical models are too approximate. DiffTactile addresses this with differentiable contact models, but real-time performance remains insufficient.

Human-in-the-Loop RL [2025, Science Robotics] combines human intuition with autonomous policy optimization to achieve precise manipulation even when the sim-to-real gap is large.


14.6 The Power and Limits of Synthetic Data

NVIDIA's synthetic data pipeline is currently the most powerful data generation approach:

  • 780K trajectories (6,500 hours equivalent) → generated in 11 hours
  • 40% improvement in real performance
  • Isaac Sim + Omniverse Replicator

Limitations remain clear:

  • Sim-to-real gap constrains synthetic data effectiveness
  • Tactile synthetic data has a larger gap than visual
  • Material property diversity is difficult to simulate
Figure 14.6: DiffTactile grasping a deformable object — a side-by-side comparison of the real world (left) and DiffTactile (right). Combining FEM, MPM, Position-Based Dynamics, and penalty-based contact, DiffTactile differentiably models light-elastic, elastoplastic, and cable objects, enabling gradient-based optimization of contact-rich manipulation policies. Source: Si et al. (2024), Fig. 1.
Figure 14.6: DiffTactile grasping a deformable object — a side-by-side comparison of the real world (left) and DiffTactile (right). Combining FEM, MPM, Position-Based Dynamics, and penalty-based contact, DiffTactile differentiably models light-elastic, elastoplastic, and cable objects, enabling gradient-based optimization of contact-rich manipulation policies. Source: Si et al. (2024), Fig. 1.

Summary and Outlook

Sim-to-real transfer is simultaneously one of the largest bottlenecks and fastest-advancing areas of tactile manipulation. DeXtreme's ADR, Yin et al.'s simplified tactile model, and ExoStart's data-efficient Real-Sim-Real represent the current frontier. NVIDIA's synthetic data addresses scale, but the tactile sim-to-real gap remains fundamentally larger than the visual gap, requiring advances along the DiffTactile/TacEx direction.

The next chapter examines Embodiment Retargeting — transferring skills from human to robot (→ Chapter 15).


Manufacturing-Cell Checkpoint

Tactile sim-to-real is not just geometry matching. Real hands introduce gel wear, pad contamination, cable tension, finger compliance, temperature drift, and surface-treatment variation. A simulator that renders plausible tactile images is useful, but a deployable system must also test force range, shear response, slip onset, sensor noise, and calibration drift at the task level.

It helps to separate simulation into three roles. The first is a planner sandbox for risky contact transitions. The second is a robustness test across friction and compliance variation. The third is a diagnosis tool that replays real failure logs and checks whether the same failure signature appears. With those roles separated, Isaac/TacEx/DiffTactile-style tools become part of a manufacturing improvement loop rather than a standalone demo.

Operational Reading Note

The practical value of this chapter is not only the concept of tactile sim-to-real; it is the set of engineering decisions that the concept changes. A deployable robot-hand project should start by asking what state becomes observable after this chapter is applied. The answer should be concrete: contact existence, contact patch, normal force, shear direction, slip margin, object pose, task phase, operator override, or product-damage risk. If a variable cannot be logged or consumed by a controller, it remains an explanatory idea rather than a system capability.

The second decision is the unit of evidence. Research demos often report one success metric, but tactile manipulation improves through failure records. A useful attempt record contains the object or SKU, the selected grasp candidate, the robot hand and sensor configuration, calibration version, task phase, tactile summary, policy action, safety intervention, and final outcome. This record is what connects the sensor chapters to the data chapter, the control chapters to the learning chapters, and the manufacturing chapters to QA.

The third decision is where the chapter sits in the control stack. Some ideas belong in fast reflex loops, some in contact MPC, some in policy inputs, and some only in offline diagnosis. Mixing these time scales creates brittle systems: a VLA cannot react to millisecond slip, and a low-level force controller cannot infer the next process step. The right architecture separates fast contact stabilization, mid-level grasp or rearrangement control, and slow task reasoning.

Finally, the chapter should be evaluated by the failure modes it removes. A method that improves benchmark success but leaves the team unable to distinguish perception failure, contact-acquisition failure, force-closure failure, execution-time slip, or maintenance drift is not yet production-ready. A method with slightly lower headline performance but better logs, safer force limits, and clearer recovery hooks may be the stronger basis for manufacturing Physical AI.

Chapter-Specific Implementation Framework

Turning tactile sim-to-real into a working system begins with state definition. The concept should not remain an abstract performance claim; it should become a variable that a controller and a logger can both read. For this chapter, the relevant state may include failure replay, contact patch, normal force, shear direction, object pose, task phase, safety margin, operator override, and product-damage risk. Each variable needs a coordinate frame, a timestamp convention, a calibration version, and an owner in the control stack. Without this discipline, a successful trial is hard to explain and a failed trial is almost impossible to diagnose.

The second step is time-scale separation. A fast loop at hundreds of hertz or 1 kHz should handle motor current, force derivatives, shear spikes, and slip reflexes. A mid-level loop at tens of hertz should update contact pose, grasp phase, and reference finger motion. A slower task loop should reason over object identity, SKU, fixture state, instruction, and the next grasp candidate. tactile sim-to-real must be assigned to the right layer. A VLA cannot react to millisecond slip. A low-level force controller cannot infer the next process step. A robust architecture lets these layers communicate through compact state summaries rather than forcing every signal into one monolithic policy.

The third step is a record schema. A useful attempt record should contain attempt id, robot-hand model, sensor layout, calibration version, task phase, object or SKU id, selected grasp, measured contact patch, normal and shear force summary, slip event, policy output, safety intervention, operator note, and final outcome. In a manufacturing cell this record is also a QA trace. A research demo can be persuasive with a video, but a production experiment needs replayable evidence. For that reason, the result table for tactile sim-to-real should include failure-type distribution, retry count, product-damage rate, cycle-time variance, and intervention frequency alongside success rate.

The fourth step is a small test protocol. Starting with every object and every hand motion makes failures uninterpretable. A better protocol begins with atomic tasks: contact acquisition, stable hold, controlled release, contact switch, recovery after slip, and force-limited correction. The next stage composes two or three atoms into sequential manipulation. Only after that should the system attempt a Cosmax-style first grasp, in-hand rearrangement, and second grasp sequence. This staged protocol reveals whether tactile sim-to-real actually removes a failure mode or merely shifts the failure later in the trajectory.

The fifth step is to treat hardware and maintenance as experimental variables. The same algorithm can behave differently when gel surfaces wear, pads become contaminated, cable tension changes, a sensor is replaced, calibration drifts, backlash grows, or surface humidity changes. The log therefore needs software version, pad age, cleaning state, calibration time, replacement event, and fault code. These fields are not administrative details. They determine whether a performance drop comes from the learned policy, the contact model, the sensor, the hand mechanics, or the production environment.

The sixth step is failure-driven decision making. The team should ask which failure class improves after adding tactile sim-to-real: perception before contact, contact acquisition, force-closure insufficiency, execution-time slip, collision, product damage, or operator override. If the answer is unclear, the method is not yet actionable. If the answer is clear, the next investment becomes much easier to choose. A contact-state problem suggests better sensing or calibration. A closure-margin problem suggests hand geometry or force control. A replay mismatch suggests simulation fidelity. A repeated intervention suggests task design, fixture design, or operator workflow.

Implementation question Evidence to log Passing criterion
Is the state observable? sensor packet, calibrated value, contact frame controller and QA read the same value
Are control layers separated? fast reflex, mid-level planner, slow policy timestamps fast contact events do not wait for slow task reasoning
Can failures be classified? failure type, task phase, intervention note root cause narrows to a small set of candidates
Is maintenance visible? pad age, calibration version, replacement event hardware drift can be separated from policy error
Does it connect to manufacturing KPI? cycle time, damage rate, retry count, downtime research success translates into operating metrics

Validation Protocol: From Demonstration to Repeatable Evidence

The method in this chapter should be validated as a repeatable experiment, not as a single successful demonstration. The first step is to lock the reset condition. Object pose, hand initialization, sensor calibration, pad condition, lighting, fixture state, and software version should be recorded before every trial. If those variables drift silently, the team cannot tell whether tactile feedback improved the behavior or whether the experiment simply became easier.

The second step is planned disturbance. Rotate the object slightly, vary surface friction, delay one fingertip contact, perturb the grasp candidate, or introduce a mild occlusion. A tactile method should degrade gracefully under these disturbances. More importantly, the log should show which signal was used for recovery: normal force, shear direction, slip event, contact patch migration, motor current, or a learned latent state. Without planned disturbance, the system may look robust while only succeeding in the narrow reset condition.

The third step is ablation. Compare no tactile input, normal force only, normal plus shear, slip-event tokens, and the full tactile summary. If performance improves only when the full high-dimensional stream is used, the method may be powerful but expensive. If a compact contact summary gives most of the gain, it may be the better manufacturing design because it is easier to log, debug, and transmit across control layers.

The fourth step is recovery-oriented metrics. A contact-rich system will still fail. The question is whether it notices earlier, recovers faster, retries safely, or leaves a clearer diagnosis. Useful metrics include time from slip onset to correction, force overshoot, contact reacquisition time, number of safe retries, intervention frequency, and product-damage near misses. These metrics often matter more than the final binary success rate.

The final step is deployment rehearsal. A researcher-adjusted experiment and an operator-run procedure are different systems. The operator should replace the sensor or pad, run calibration, start the task, stop after a fault, and export logs using the intended procedure. If performance collapses during this rehearsal, the bottleneck is not the policy alone; it is the integration and maintenance workflow. Passing this rehearsal is what turns a tactile manipulation method into a candidate for a manufacturing cell.

References

  1. Handa et al. 2023. DeXtreme: Transfer of agile in-hand manipulation from simulation to reality. ICRA 2023. arXiv:2210.13702. scholar
  2. Various. (2020). OpenAI Dactyl: Solving Rubik's Cube with a robot hand. IJRR. scholar
  3. Wang, S., Lambeta, M., et al. (2022). Tacto: A fast, flexible, and open-source simulator for vision-based tactile sensors. IEEE RA-L. scholar
  4. Si, Z., Zhang, G., Ben, Q., Romero, B., Xian, Z., Liu, C., & Gan, C. (2024). DiffTactile: A physics-based differentiable tactile simulator for contact-rich robotic manipulation. ICLR 2024. arXiv:2403.08716. scholar
  5. Yin, J., Qi, H., Malik, J., Pikul, J., Yim, M., & Hellebrekers, T. (2024). Learning in-hand translation using tactile skin with shear and normal force sensing. arXiv:2407.07885. #13 scholar
  6. Si, Z., Qian, K., Sontakke, N., et al. (2025). ExoStart: Efficient learning for dexterous manipulation with sensorized exoskeleton demonstrations. arXiv:2506.11775. #9 scholar
  7. Dan, Y., et al. (2025). X-Sim: Real-to-Sim-to-Real pipeline. scholar
  8. Various. (2025). RoboPaint: 3DGS for Real-Sim-Real visual transfer. #15 scholar
  9. 2024 study. TacEx: GelSight simulation in Isaac Sim. scholar
  10. Various. (2025). Sim-to-real reinforcement learning for vision-based dexterous manipulation on humanoids. arXiv preprint. arXiv:2502.20396. scholar
  11. Various. (2025). Human-in-the-loop RL for precise dexterous manipulation. Science Robotics. https://doi.org/10.1126/scirobotics.ads5033. scholar
  12. NVIDIA. (2025). GR00T N1: An open foundation model for generalist humanoid robots. arXiv preprint. arXiv:2503.14734. scholar
  13. NVIDIA. (2026). Synthetic data pipeline: 780K trajectories in 11 hours. GTC 2026 Keynote. scholar
  14. Various. (2025). Tactile Robotics: Past and Future. arXiv:2512.01106. scholar
  15. Lipman, Y., et al. (2023). Flow matching for generative modeling. ICLR 2023. scholar
  16. Various. (2025). DexWM: Dexterous world models from human video. arXiv preprint. Meta FAIR. scholar