AR Puppetry
My Role
Concept design, stage design, 3D scene building, AR development, performance design
Can AR bring puppet theater to the tabletop while preserving the physical manipulation that defines it?
Multi-image-tracking AR puppet theater. Performer moves physical tiles to control digital characters for a live performance.
Carcassonne tiles as tracking markers. Unity + ARFoundation, six-tile simultaneous tracking, iPhone deployment.
Live classroom performance. Six tiles tracked simultaneously, stable 60fps (iPhone 15 Pro).
Motivation
I still remember the moment when I first watched the puppetry in The Sound of Music—it felt like a door to a new world had opened. The marionettes danced with such life and personality that I was captivated. Since then I have developed deep interests in drama, musical, and visual storytelling. Among all these forms of art, puppetry has a special emotional bond to me.
Technologies grant us new forms of presentation. When I was playing with AR on my phone, I immediately realized the connection between AR and puppetry—both project characters into a shared physical space. The camera becomes a stage window, and the table becomes a theater floor.
Research
Before designing my AR puppet theater, I studied two distinct traditions—one rooted in communal ritual, the other in modern entertainment—to understand the core elements of puppet performance.
Chinese Traditional Puppetry
Using sticks and rods to control characters, traditionally conveying moral tales, historical epics, and cultural myths. Puppetry serves as a medium of communal identity and artistic expression. Performances are often accompanied by live music and narration, creating a complete theatrical experience.
Modern Western Puppetry
Using hands to control characters, serving primarily as an entertainment and storytelling medium for children. From Sesame Street to hand puppet shows, Western puppetry emphasizes character interaction, humor, and audience engagement through direct emotional connection.
The audience watches through a 'frame'. Shadow puppetry uses a screen, AR uses a phone screen. Control happens outside the frame.
Move one physical object, the character follows. Zero abstraction layers. Marionettes use strings, AR uses image tracking.
Design
Project Direction
Projection
Using AR to project virtual puppets and scenes onto reality, bridging digital and physical worlds.
Manipulation
Controlling virtual puppets by interacting with physical objects—moving tiles moves characters.
Live Performance
The performer controls the stage and serves as the storyteller, creating an intimate theatrical experience.
Audience
Viewers use digital devices to watch the show with freedom of viewing angle.
Stage Design
I use tiles from Carcassonne—one of my favorite board games—as tracking images, placed on a grid-based board that serves as the stage floor. The Scene Area holds tiles representing different scenes; swapping tiles switches the scene. The Character Area holds tiles representing characters; the performer moves them to drive the story.
Carcassonne tiles were chosen because each tile has a unique illustration, making them ideal as AR tracking images. The grid-based board game mechanic naturally maps to a theatrical stage layout.
Scenes
Only two scenes, not three or four. The reason is simple: six tiles need to be split among four characters, leaving at most two slots for scenes. And two scenes naturally create a 'departure—arrival' narrative arc, which is more effective than piling on more environments.
Countryside Road
Opening. Watchtower, trees, stone walls—characters meet here and set off together.
Town Market
Finale. Medieval market, timber buildings—all four characters converge here to complete the story.
Characters
Four low-poly characters. Low-poly was a deliberate choice—Carcassonne tiles are sketch-style illustrations, and realistic characters would break the visual consistency. Each character maps to a dedicated tile, and there are no fixed relationships between them—the same script plays out differently every time.
Plot
With the help of AI, I crafted a short drama script 'A Melody of Magic and Steel' featuring the scenes and characters. Sir Rowan the Knight meets a Soldier on the countryside road, and together they journey to the town market where they encounter a Wizard performing magical tricks and a Vendor selling goods. The story explores themes of adventure, commerce, and the blending of magic with everyday life.
Scene 1: The Countryside Road A gentle melody begins as Sir Rowan, a knight, rides through the countryside. He spots a Soldier sitting on a rock, cleaning his shield. Song: "On the Road to Glory" Sir Rowan: "Through the fields, across the plains, with my sword, I carve my name." Soldier: "I've seen my share of toil and pain, fighting battles that never end." Sir Rowan: "Let's travel together, my friend. Strength is found in numbers." Soldier: "A knight's company is always welcome." Scene 2: The Town Market The market is busy, with vendors calling out. The Wizard is performing a magical trick. The Vendor is selling fruits nearby. Song: "Magic and Food" Wizard: "Potions and charms, come and see, a little magic for you and me!" Vendor: "I've got apples and meat, fresh and sweet, forget the magic, come take a treat!" Sir Rowan: "I'll take both—strength and sustenance." Soldier: "A full belly and a bit of magic—perfect!" All Together: "Together we stand, with food and might, through every journey, we take flight!"
Development
Chose Unity + ARFoundation because it's the only solution that supports both multi-image tracking and 3D rendering. Web AR lacks tracking precision, and native ARKit has no built-in 3D rendering pipeline. Deployed on iPhone 15 Pro (ARKit backend).
The biggest challenge was implementing reliable multi-image tracking. Each Carcassonne tile needs to be recognized individually and mapped to the correct 3D model, while maintaining spatial consistency as the user moves around the stage.
Characters flickering and disappearing
Tried Increased tracking buffer zone and extended hold time after signal loss
Result Flickering reduced but positions drifted—characters slowly floated away from their tiles
Low recognition rate on some tiles
Tried Tested each tile's trackability score individually
Result Found that tiles with simple patterns scored very low; swapping to tiles with complex patterns significantly improved stability
After studying the Unity AR sample project and various online resources, I achieved stable multi-image tracking. The key insight was optimizing the reference image library—each tile's distinct artwork provides enough visual features for reliable detection, even at different angles and lighting conditions.
Unity + ARFoundation
Core AR framework with built-in image tracking, plane detection, and rendering pipeline.
Image Tracking
Each Carcassonne tile is registered as a reference image. When detected by the camera, the corresponding 3D scene or character is rendered at the tile's position and orientation.
iOS Deployment
Built and deployed to iPhone for real-time AR performance with ARKit backend. Optimized for stable 60fps rendering with multiple tracked images.
Reflection
AR and puppetry share the same structure: manipulate physical objects to bring virtual characters to life in physical space. Carcassonne tiles made it work—no custom markers, no touchscreen, the performer's hands are the controls.
Managing six tiles solo isn't realistic. During the performance, fumbling was constant, and the audience's attention kept bouncing between phone and table. The real form of puppetry is division of labor, not a one-person show.
Next: multi-performer mode—one controls characters, another controls scenes. External speakers and large-screen projection to take the show from phone to stage.