Part II: Hands — Robot and Human

Chapter 6: Human Hand Data Collection — Teaching by Demonstration

Written: 2026-04-01 Last updated: 2026-06-09

Overview

The most intuitive way to teach robots to manipulate is to show them human demonstrations. Yet transferring the 27-DoF motion and tactile information of the human hand to a robot presents fundamental challenges. This chapter covers human hand modeling (MANO [#17]), motion tracking gloves, tactile gloves, exoskeletons, and teleoperation systems, surveying the data pipeline from human to robot.

After reading this chapter, you will be able to... - Describe the MANO hand model's structure and applications. - Compare the characteristics of major motion tracking gloves. - Understand tactile glove designs (STAG, OSMO [#18]) and their cross-embodiment potential. - Evaluate the strengths and limitations of exoskeleton and teleoperation approaches.

6.1 The Human Hand Model: MANO (778 Vertices, 16 Joints)

MANO ^[1] is a statistical human hand model learned from 1,000 3D scans (SIGGRAPH Asia 2017):

778 vertices, 16 joints
PCA shape space: Low-dimensional representation of inter-individual hand size/shape variation
Pose blend shapes: Skin deformation modeling as a function of joint angles
Compatible with the SMPL full-body model (SMPL+H)

MANO serves as the foundation for virtually all human hand research — hand pose estimation, human-robot retargeting, and tactile transfer (UniTacHand's MANO UV map) (→ Chapter 11.4).

6.2 Motion Tracking Gloves: From Stretchable Sensors to Commercial Products

Seminar 2 (Taejoon) systematically reviewed the current state of motion tracking gloves.

Stretchable Liquid-Metal Sensor Glove (2024)

Published in Nature Communications [2024]:

9 eGaIn (eutectic gallium-indium) liquid-metal sensors
9 DoF tracking, adapts to all hand sizes (one-size-fits-all)
Joint angle error: 4.16 degrees, fingertip position error: 4.02 mm
Bayesian refinement + Kalman filter

Key Paper: 2024 study. "Stretchable Liquid-Metal Sensor Glove." Nature Communications. A 9-sensor eGaIn glove adapting to all hand sizes. The 4.02 mm fingertip error is sufficient for most manipulation demonstrations.

Commercial Glove Comparison

Seminar 2 compared three commercial gloves:

Glove	Sensor Type	Sensors	DoF	Notes
Rokoko	IMU	7	6	Lightest solution
Manus	Stretch/flex	16	—	NVIDIA Isaac Teleop official glove (GTC 2026)
StretchSense	EMF	—	25	Highest DoF

NVIDIA's designation of Manus as the official data glove for Isaac Teleop at GTC 2026 signals industrial standardization.

Korean Research: ML-Based Wearable Sensors

Seoul National University research [2024, PMC] implemented real-time hand motion recognition with ML-based wearable sensors, bridging the gap between lab research and practical applications.

6.3 Tactile Gloves: STAG, OSMO, and the Open-Source Approach

Beyond motion tracking, collecting tactile information during human grasping is the next step.

STAG (2019)

Sundaram et al. ^[3] (Nature, 2019):

548 piezoresistive sensors: High-resolution pressure distribution across the human hand
Grasp-finger correlation analysis
Pioneering tactile demonstration dataset for robot learning

Ruppel et al.^[21] proposed a 169-sensor reduced version, exploring the trade-off between sensor count and information loss.

OSMO Glove (2025)

OSMO [arXiv:2512.08920] takes the innovative approach of using the same glove on both human and robot hands:

12 three-axis magnetic sensors: Simultaneous normal and shear force measurement
MuMetal shielding against external magnetic interference
Core concept — Embodiment Bridge: Using identical sensing gloves on human and robot simplifies the cross-embodiment problem
Open-source: Reproducible design

Key Paper: OSMO. (2025). "OSMO: Open-Source Multi-axis Tactile Glove." arXiv:2512.08920. A 12-sensor three-axis magnetic tactile glove usable on both human and robot hands. Presents a new paradigm for cross-embodiment tactile transfer.

Key insight from Seminar 2: Using identical sensing gloves on human and robot simplifies the cross-embodiment problem. When tactile data is collected from human demonstrations and the same glove is mounted on a robot hand for policy learning, the tactile domain gap is eliminated (→ Chapter 11.4).

Figure 6.1: OSMO tactile glove concept — full-hand tactile coverage via 3-axis magnetic sensors; the same glove is worn by human and robot alike, enabling the "glove as interface" cross-embodiment strategy. Source: Yin et al., 2025, arXiv:2512.08920, Fig. 1.

Figure 6.2: OSMO is compatible with diverse in-the-wild hand trackers: Aria Gen 2, Quest 3, Apple Vision Pro, RGB videos, and the Manus glove. Source: Yin et al., 2025, arXiv:2512.08920, Fig. 2.

TacCap 2025

TacCap [arXiv, Mar 2025] implements FBG (Fiber Bragg Grating) optical tactile sensors in a fingertip thimble form factor:

FBG fiber-optic tactile sensors: High sensitivity and fast response
Mountable on both human and robot fingers: Cross-embodiment approach similar to OSMO
EMI immune (electromagnetic interference immune): Fully immune to electromagnetic interference due to optical sensing — usable in MRI environments or near strong electromagnetic fields
Thimble form factor for easy attachment to existing gloves or robot fingertips

TacCap occupies a complementary position to OSMO's magnetic sensors. In environments where magnetic sensors are vulnerable to external fields, FBG optical sensing provides a robust alternative.

VTDexManip 2025

VTDexManip [ICLR 2025] is the first large-scale visual-tactile human demonstration dataset:

565,000 frames: Simultaneous visual and tactile data collection
10 tasks, 182 objects: Covering diverse manipulation scenarios
First visual-tactile human demonstration dataset: While prior datasets provided either visual or tactile data, this integrates both modalities
Serves as a benchmark for cross-embodiment policy learning

VTDexManip provides a concrete answer to "how to utilize demonstration data collected with tactile gloves." The 565K frames represent sufficient scale for large-scale imitation learning.

FSR Optimization

Tang et al.^[5] and Chen et al.^[5] addressed the optimal placement of FSR sensors, exploring how to extract maximum information from limited sensors — answering the "high spatial resolution vs. optimized sensor count/position" trade-off.

6.4 Exoskeleton Approaches: DexUMI [#8], ExoStart [#9], DEXOP [#10]

Exoskeletons mechanically connect to the human hand, directly capturing motion.

DexUMI (2025)

Xu et al.^[5] — "Human Hand as Universal Interface":

Wearable exoskeleton resolves the kinematic gap
SAM2 inpainting resolves the visual gap — erasing the human hand from camera images and replacing it with the robot hand
86% success on Inspire and XHand
3.2x faster data collection than teleoperation

Figure 6.3: DexUMI transfers dexterous human manipulation skills to various robot hands. Demonstrations span long-horizon, contact-rich, multi-finger, and precise skill categories — 86% average success. Source: Xu et al., 2025, arXiv:2505.21864, Fig. 1.

Figure 6.4: DexUMI exoskeleton design — Inspire Hand (left) and XHand (right) share the same joint-to-fingertip mapping. Wrist motion and joint actions are recorded alongside visual and tactile observations. Source: Xu et al., 2025, arXiv:2505.21864, Fig. 2.

ExoStart (2025)

Si et al.^[5] learn dexterous manipulation policies from just 10 exoskeleton demonstrations:

~10 exoskeleton demos
MuJoCo dynamics filtering
Auto-curriculum RL
ACT vision student
Zero-shot real → >50% success on 6 of 7 tasks

This pipeline exemplifies Real-Sim-Real transfer (→ Chapter 10.4).

Figure 6.5: ExoStart framework overview — (a) human demonstration with a sensorized exoskeleton, (b) MuJoCo dynamics filtering, (c) auto-curriculum RL with sim-to-real distillation. Source: Si et al., 2025, arXiv:2506.11775, Fig. 1.

DEXOP (2025)

Fang et al.^[5] (DEXOP) use a four-bar linkage to directly mechanically couple human and robot fingers:

8x faster data collection than teleoperation
Direct contact feedback
51.3% vs. 42.5% success (vs. teleoperation)
Variants include DEXOP-12 (4 fingers, 12 DOF), DEXOP-9 (3 fingers, no ring), DEXOP-6 (dual 3-finger bimanual)

Figure 6.6: DEXOP hardware overview — whole-hand tactile sensors (left), whole-hand dexterous perioperation (middle), contact-rich long-horizon bimanual tasks (right). Source: Fang et al., 2025, arXiv:2509.04441, Fig. 1.

AirExo / AirExo-2 (SJTU, 2024-2025)

AirExo [ICRA 2024] and AirExo-2 [CoRL 2025] are low-cost passive exoskeletons developed at Shanghai Jiao Tong University:

Approximately $300 fabrication cost: Constructed from 3D-printed parts and low-cost sensors
Passive actuation: Records human motion without motors
In-the-wild human demonstration collection: Enables data capture outside laboratory settings in everyday environments
Key finding from AirExo-2: 3 min teleop + in-the-wild data >= 20 min teleop only — empirically demonstrating that expensive teleoperation data can be supplemented or replaced by natural environment demonstrations
Full upper-body exoskeleton covering both arm and hand

The AirExo series exemplifies the democratization of data collection. The finding that in-the-wild data from a $300 exoskeleton can complement or replace costly teleoperation data presents a new solution to the scalability problem of robot learning data.

ACE (UCSD, CoRL 2024)

ACE [CoRL 2024] is a universal teleoperation interface developed at the University of California San Diego:

Hand-facing camera + exoskeleton: Tracks finger poses via hand-facing camera while capturing arm motion through the exoskeleton
Single system supports diverse robot platforms: Enables teleoperation of humanoids, robot arms, grippers, and quadrupeds
Cross-embodiment switching occurs at the software level, requiring no hardware changes
Intuitive operation: Natural human motions map directly to robot actions

ACE's key contribution is enabling control of any robot through a single interface. This maximizes the reusability of collected human demonstration data.

NuExo (Nubot Lab, ICRA 2025)

NuExo [ICRA 2025] is an active exoskeleton system developed at Nubot Lab:

5.2 kg backpack-style active exoskeleton: Motor-driven with haptic feedback
100% upper-limb ROM (Range of Motion): Captures the full range of human upper-limb motion without restriction
Successful 2.5 mm screw tightening: Performs extremely precise manipulation tasks remotely
Backpack form factor ensures mobility across diverse environments

NuExo takes the opposite approach from passive exoskeletons (AirExo). Active actuation and haptic feedback push the upper bound of precision, specializing in fine manipulation demonstration collection.

HumanoidExo (NUDT, 2025)

HumanoidExo [arXiv, Oct 2025] is a full-body exoskeleton developed at the National University of Defense Technology (NUDT):

Lightweight exoskeleton + LiDAR: Exoskeleton captures upper body/arm motion; LiDAR captures lower body/locomotion trajectories
Full-body trajectory collection: Simultaneously records upper-limb manipulation and lower-limb locomotion
Locomotion learning from exoskeleton data alone: Directly learns locomotion policies from human demonstrations without separate gait simulation
Optimized for full-body teleoperation of humanoid robots

HumanoidExo extends the application scope of exoskeletons from hands/arms to the entire body. As humanoid robots approach commercialization, the importance of full-body demonstration collection infrastructure is growing.

Key Perspective: DexUMI, ExoStart, and DEXOP each bridge the human-robot gap differently, but all share the goal of overcoming teleoperation's throughput bottleneck (~10 demos/hr). AirExo addresses cost barriers, ACE tackles platform compatibility, NuExo pushes precision limits, and HumanoidExo enables full-body applications.

6.5 Large-Scale Data: From Internet Videos to Egocentric Capture

Shaw et al. [2024, CMU] proposed extracting human hand motions from internet videos and retargeting them to robot hands. The potential lies in scalability — millions of hand manipulation videos exist online, and converting them to robot learning data could fundamentally solve the teleoperation bottleneck.

ImMimic ^[14] augments data by interpolating between large-scale human trajectories and a few teleoperation trajectories. As discussed in Seminar 1, this represents the direction of synergistically using human data instead of expensive teleop data (→ Chapter 11.2).

DexH2R^[15] implements task-oriented human-to-robot dexterous transfer, mapping the intent of human demonstrations to robot actions.

EgoScale and the 2026 Shift Toward Large-Scale Hand Data

NVIDIA GEAR's EgoScale pushes this trend further in 2026. EgoScale pretrains a VLA model on more than 20,000 hours of action-labeled egocentric human video and reports a log-linear relationship between human-data scale and downstream dexterous robot performance ^[31]. The key point is not merely collecting more robot data, but treating first-person human hand video as a reusable motor prior.

This changes the data strategy of Chapter 6. Gloves and exoskeletons still provide precise hand and force data, but not every task can be teleoperated at robot scale. A practical hand-data stack is likely to have three layers:

large-scale egocentric video for hand motion, object, and task diversity;
a smaller amount of aligned human-robot data for embodiment-specific mid-training;
tactile/force-rich specialist data for slip, insertion, wiping, cap tightening, and other contact-quality skills.

EgoScale therefore does not replace tactile data. It suggests that the practical route is large-scale hand-motion priors plus smaller tactile/force-rich datasets.

EgoDex (Apple, 2025)

EgoDex ^[29] is a large-scale egocentric hand manipulation dataset leveraging Apple Vision Pro and ARKit:

829 hours, 90M (90 million) frames: The largest hand manipulation dataset to date
194 tasks: Spanning from everyday object manipulation to tool use
30 Hz per-finger tracking: Real-time 3D trajectory recording for each finger via ARKit hand tracking
Collected with consumer hardware (Apple Vision Pro), ensuring scalability

EgoDex opens a new middle ground between internet video and teleoperation approaches. It is as large-scale as video data while providing accurate 3D finger trajectories like teleoperation. The 829-hour scale exceeds existing robot demonstration datasets by orders of magnitude, approaching the data volume required for foundation model training.

6.6 Teleoperation: AnyTeleop, DexPilot, Bunny-VisionPro

Teleoperation remains the most traditional collection method and yields the highest data quality.

AnyTeleop (2023)

Qin et al. [2023, RSS] built a general-purpose vision-based teleoperation system:

Dex-Retarget: Maps human keypoints to robot joint positions
Compatible with diverse robot hands
As discussed in Seminar 1, naive retargeting has limitations — kinematic differences between human and robot can violate physical feasibility

DexPilot (2020)

Handa & Van Wyk [2020, NVIDIA] achieved 23-DOA teleoperation from bare-hand depth images. Requires only an RGB-D camera, maximizing accessibility.

Bunny-VisionPro (2024)

Ding et al. [2024, UCSD] implemented bimanual teleoperation via Apple Vision Pro with haptic feedback, achieving research-grade teleoperation using consumer hardware.

DexCap (2024)

Wang et al. [2024, Stanford] created a portable mocap system enabling 3x faster data collection than teleoperation, with policy learning from 30 minutes of data.

DOGlove (2025)

Zhang et al. [2025, RSS, arXiv:2502.07730] designed DOGlove, a low-cost open-source haptic feedback glove:

21-DoF motion capture + 5-DoF haptic force feedback: Faithfully replicates human hand kinematics
Retargets to Shadow Hand, LEAP Hand, and other multi-finger robots
Haptic feedback conveys contact cues to the operator during teleoperation

Figure 6.7: DOGlove — 21-DoF motion capture paired with 5-DoF haptic force feedback in a low-cost open-source form factor. Teleoperation benchmarks show a clear advantage of haptic-enabled tasks over vision-only baselines. Source: Zhang et al., 2025, RSS, arXiv:2502.07730, Fig. 1.

Feel Robot Feels (2026)

A tactile feedback array glove that closes the haptic loop, enabling operators to directly feel what the robot touches.

UMI-FT ^[30] occupies a unique position in this landscape: a handheld demonstration device that preserves human dexterity with natural haptic feedback (no teleoperation latency), collects 6-axis force/torque data via CoinFT sensors, and scales to in-the-wild environments. Hundreds of people can collect demonstrations daily without requiring robots or trained operators. The embodiment gap is small because the device mimics the robot's gripper form factor ^[30].

Figure 6.8: Data collection strategy comparison — teleoperation vs. video learning vs. UMI-style handheld. Source: Choi, SNU Data Science Seminar 2026.

System	Input Device	Haptic Feedback	Throughput	Cost
AnyTeleop	RGB camera	None	Baseline	Low
DexPilot	RGB-D camera	None	Baseline	Low
Bunny-VisionPro	Vision Pro	Yes	Baseline	Medium
DexCap	Motion capture	None	3x	Medium
DexUMI	Exoskeleton	Direct contact	3.2x	Medium
DEXOP	4-bar linkage	Direct contact	8x	Low
AirExo	Passive exoskeleton	None	High (in-the-wild)	Very low (~$300)
ACE	Camera+exoskeleton	None	Baseline	Medium
NuExo	Active exoskeleton	Yes (haptic)	Baseline	High
EgoDex	Vision Pro + ARKit	None	Very high (829h)	Medium
Internet video	None (observation)	None	Unlimited	Very low

Summary and Outlook

Human hand data collection sits on a trade-off between throughput and data quality. Teleoperation provides high quality but scales poorly at ~10 demos/hr; internet video offers unlimited scale but lacks action labels and tactile information. The OSMO glove's Embodiment Bridge and DEXOP's mechanical coupling propose new solutions to this trade-off.

AirExo's $300 passive exoskeleton and EgoDex's 829-hour Vision Pro dataset are simultaneously advancing democratization and scaling of data collection. Furthermore, VTDexManip's 565K-frame visual-tactile dataset and TacCap's EMI-immune FBG sensors are accelerating the practical deployment of tactile-inclusive demonstration collection.

NVIDIA Isaac Teleop + MANUS standardization, internet video mining at scale, and extending synthetic data (780K trajectories/11 hours) to the tactile domain are the key directions for resolving the data bottleneck.

The next chapter examines how robots learn to manipulate from such collected data (→ Chapter 8: Contact Dynamics).

6.8 Collection Strategy: Teleop-Free Data, Co-Training, and Tactile-Rich Specialists

The related S2 survey's Part I/II sharpens Chapter 6's data-collection argument. Human hand data is not one category. Egocentric video, stretchable gloves, tactile gloves, passive exoskeletons, and handheld grippers each provide different cost/modality tradeoffs. "Large-scale data" is therefore not enough; the role of each data type must be defined.

The useful S2 frame has three layers. First, teleop-free Data B collects task diversity and human strategies quickly. Second, a smaller amount of Data A provides the action manifold that the target robot can actually execute. Third, tactile-rich specialist data raises the ceiling on tasks such as wiping, insertion, cap tightening, and sequential multi-object grasping, where contact quality determines success.

This also fits EgoScale. More than 20,000 hours of egocentric video can build a strong hand-motion prior, but without tactile and force signals it cannot explain the last 20-30% of contact-rich failures. Conversely, collecting everything with tactile gloves limits scale. A practical strategy layers large vision-only priors, medium-scale human tactile data, and a small amount of executable robot data.

For the Cosmax-style problem, this distinction is essential. Sequential multi-object grasping cannot infer safe contact transitions from video alone, while pure teleoperation cannot collect enough diversity. The system should learn candidate strategies from natural human data and verify force closure and slip margin on the robot hand through tactile feedback.

Operational Reading Note

The practical value of this chapter is not only the concept of human-hand data collection; it is the set of engineering decisions that the concept changes. A deployable robot-hand project should start by asking what state becomes observable after this chapter is applied. The answer should be concrete: contact existence, contact patch, normal force, shear direction, slip margin, object pose, task phase, operator override, or product-damage risk. If a variable cannot be logged or consumed by a controller, it remains an explanatory idea rather than a system capability.

The second decision is the unit of evidence. Research demos often report one success metric, but tactile manipulation improves through failure records. A useful attempt record contains the object or SKU, the selected grasp candidate, the robot hand and sensor configuration, calibration version, task phase, tactile summary, policy action, safety intervention, and final outcome. This record is what connects the sensor chapters to the data chapter, the control chapters to the learning chapters, and the manufacturing chapters to QA.

The third decision is where the chapter sits in the control stack. Some ideas belong in fast reflex loops, some in contact MPC, some in policy inputs, and some only in offline diagnosis. Mixing these time scales creates brittle systems: a VLA cannot react to millisecond slip, and a low-level force controller cannot infer the next process step. The right architecture separates fast contact stabilization, mid-level grasp or rearrangement control, and slow task reasoning.

Finally, the chapter should be evaluated by the failure modes it removes. A method that improves benchmark success but leaves the team unable to distinguish perception failure, contact-acquisition failure, force-closure failure, execution-time slip, or maintenance drift is not yet production-ready. A method with slightly lower headline performance but better logs, safer force limits, and clearer recovery hooks may be the stronger basis for manufacturing Physical AI.

References

Romero, J., Tzionas, D., & Black, M. J. (2017). Embodied hands: Modeling and capturing hands and bodies together. SIGGRAPH Asia 2017. scholar
2024 study. Stretchable liquid-metal sensor glove. Nature Communications. https://doi.org/10.1038/s41467-024-50101-w scholar
Sundaram, S., Kellnhofer, P., Li, Y., Zhu, J.-Y., Torralba, A., & Matusik, W. (2019). Learning the signatures of the human grasp using a scalable tactile glove. Nature, 569, 698-702. scholar
Ruppel, P., et al. (2024). Reduced tactile sensor array for grasp analysis. Sensors. scholar
Yin, J., Qi, H., Wi, Y., Kundu, S., Lambeta, M., Yang, W., Wang, C., Wu, T., Malik, J., & Hellebrekers, T. (2025). OSMO: Open-source tactile glove for human-to-robot skill transfer. arXiv preprint. arXiv:2512.08920. #18 scholar
Xu, M., Zhang, H., Hou, Y., Xu, Z., Fan, L., Veloso, M., & Song, S. (2025). DexUMI: Using human hand as the universal manipulation interface for dexterous manipulation. arXiv preprint. #8 scholar
Si, Z., et al. (2025). ExoStart: From 10 exoskeleton demos to dexterous robot manipulation. #9 scholar
Fang, H.-S., Romero, B., Xie, Y., et al. (2025). DEXOP: A device for robotic transfer of dexterous human manipulation. arXiv preprint. arXiv:2509.04441. #10 scholar
Qin, Y., et al. (2023). AnyTeleop: A general vision-based dexterous robot hand-arm teleoperation system. RSS 2023. scholar
Handa, A., & Van Wyk, K. (2020). DexPilot: Vision-based teleoperation for dexterous manipulation. ICRA 2020. scholar
Ding, Z., et al. (2024). Bunny-VisionPro: Real-time bimanual dexterous teleoperation for imitation learning. arXiv preprint. arXiv:2407.03162. scholar
Wang, C., et al. (2024). DexCap: Scalable and portable mocap data collection system. RSS 2024. scholar
Shaw, K., Bahl, S., & Pathak, D. (2024). Learning dexterity from human hand motion in internet videos. arXiv preprint. arXiv:2212.04498. scholar
Liu, Y., et al. (2025). ImMimic: Large-scale human trajectory + few-shot teleoperation interpolation. scholar
Li, Y., et al. (2024). DexH2R: Task-oriented dexterous manipulation from human to robots. arXiv preprint. scholar
Various. (2025). DOGlove: Low-cost open-source haptic feedback glove. scholar
Various. (2026). Feel Robot Feels: Tactile feedback array glove. scholar
Murphy, L., et al. (2025). Capacitive tactile sensing for teaching by demonstration. arXiv preprint. scholar
Tang, M., et al. (2025). FSR sensor optimization for grasp classification. IEEE Journal of Biomedical and Health Informatics. scholar
Chen, H., et al. (2025). Capacitive sensor for lift-risk identification. Applied Ergonomics. scholar
2024 study. ML-based wearable sensors for real-time hand motion recognition. PMC. (Seoul National University) scholar
TacCap. (2025). TacCap: FBG optical tactile sensor thimble for human and robot fingertips. arXiv preprint, Mar 2025. scholar
VTDexManip. (2025). VTDexManip: A large-scale visual-tactile dataset for dexterous manipulation from human demonstrations. ICLR 2025. scholar
Fang, J., et al. (2024). AirExo: Low-cost exoskeletons for learning whole-arm manipulation in the wild. ICRA 2024. scholar
Fang, J., et al. (2025). AirExo-2: Scaling up generalizable manipulation skills via purely kinesthetic demonstrations in the wild. CoRL 2025. scholar
Zhao, Q., et al. (2024). ACE: A cross-platform visual-exoskeleton system for low-cost dexterous teleoperation. CoRL 2024. scholar
NuExo. (2025). NuExo: A 5.2 kg active upper-limb exoskeleton for dexterous teleoperation with 100% ROM. ICRA 2025. scholar
HumanoidExo. (2025). HumanoidExo: Lightweight exoskeleton with LiDAR for full-body humanoid teleoperation and locomotion learning. arXiv preprint, Oct 2025. scholar
EgoDex. (2025). EgoDex: Learning dexterous manipulation from large-scale egocentric hand data via Apple Vision Pro. Apple, 2025. scholar [Apple, 2025]
Choi, H., Hou, Y., Pan, C., Hong, S., Patel, A., Xu, X., Cutkosky, M. R., & Song, S. (2026). In-the-Wild Compliant Manipulation with UMI-FT. arXiv preprint. arXiv:2601.09988. #36 scholar
Bansal, A., et al. (2026). EgoScale: Scaling human video to unlock dexterous robot intelligence. NVIDIA GEAR. https://research.nvidia.com/labs/gear/egoscale/ [Bansal et al., 2026] source
Um, T. (2026). S2 From Human Hands to Robot Hands: large-scale tactile hand data survey. source [Um, 2026]