AR Puppetry

Solo Project·Fall 2024·Unity / ARFoundation / iOS

Concept design, stage design, 3D scene building, AR development, performance design

Problem

Can AR bring puppet theater to the tabletop while preserving the physical manipulation that defines it?

Goal

Multi-image-tracking AR puppet theater. Performer moves physical tiles to control digital characters for a live performance.

Approach

Carcassonne tiles as tracking markers. Unity + ARFoundation, six-tile simultaneous tracking, iPhone deployment.

Result

Live classroom performance. Six tiles tracked simultaneously, stable 60fps (iPhone 15 Pro).

I still remember the moment when I first watched the puppetry in The Sound of Music—it felt like a door to a new world had opened. The marionettes danced with such life and personality that I was captivated. Since then I have developed deep interests in drama, musical, and visual storytelling. Among all these forms of art, puppetry has a special emotional bond to me.

Technologies grant us new forms of presentation. When I was playing with AR on my phone, I immediately realized the connection between AR and puppetry—both project characters into a shared physical space. The camera becomes a stage window, and the table becomes a theater floor.

Before designing my AR puppet theater, I studied two distinct traditions—one rooted in communal ritual, the other in modern entertainment—to understand the core elements of puppet performance.

Chinese Traditional Puppetry

Using sticks and rods to control characters, traditionally conveying moral tales, historical epics, and cultural myths. Puppetry serves as a medium of communal identity and artistic expression. Performances are often accompanied by live music and narration, creating a complete theatrical experience.

Modern Western Puppetry

Using hands to control characters, serving primarily as an entertainment and storytelling medium for children. From Sesame Street to hand puppet shows, Western puppetry emphasizes character interaction, humor, and audience engagement through direct emotional connection.

Framed Viewing

The audience watches through a 'frame'. Shadow puppetry uses a screen, AR uses a phone screen. Control happens outside the frame.

One-to-One Mapping

Move one physical object, the character follows. Zero abstraction layers. Marionettes use strings, AR uses image tracking.

Project Direction

01

Projection

Using AR to project virtual puppets and scenes onto reality, bridging digital and physical worlds.

02

Manipulation

Controlling virtual puppets by interacting with physical objects—moving tiles moves characters.

03

Live Performance

The performer controls the stage and serves as the storyteller, creating an intimate theatrical experience.

04

Audience

Viewers use digital devices to watch the show with freedom of viewing angle.

Stage Design

I use tiles from Carcassonne—one of my favorite board games—as tracking images, placed on a grid-based board that serves as the stage floor. The Scene Area holds tiles representing different scenes; swapping tiles switches the scene. The Character Area holds tiles representing characters; the performer moves them to drive the story.

Carcassonne tiles were chosen because each tile has a unique illustration, making them ideal as AR tracking images. The grid-based board game mechanic naturally maps to a theatrical stage layout.

Scenes

Only two scenes, not three or four. The reason is simple: six tiles need to be split among four characters, leaving at most two slots for scenes. And two scenes naturally create a 'departure—arrival' narrative arc, which is more effective than piling on more environments.

Countryside Road

Opening. Watchtower, trees, stone walls—characters meet here and set off together.

Town Market

Finale. Medieval market, timber buildings—all four characters converge here to complete the story.

Characters

Four low-poly characters. Low-poly was a deliberate choice—Carcassonne tiles are sketch-style illustrations, and realistic characters would break the visual consistency. Each character maps to a dedicated tile, and there are no fixed relationships between them—the same script plays out differently every time.

WizardVendorSoldierKnight

Plot

With the help of AI, I crafted a short drama script 'A Melody of Magic and Steel' featuring the scenes and characters. Sir Rowan the Knight meets a Soldier on the countryside road, and together they journey to the town market where they encounter a Wizard performing magical tricks and a Vendor selling goods. The story explores themes of adventure, commerce, and the blending of magic with everyday life.

Scene 1: The Countryside Road

A gentle melody begins as Sir Rowan, a knight, rides through the countryside. He spots a Soldier sitting on a rock, cleaning his shield.

Song: "On the Road to Glory"
Sir Rowan: "Through the fields, across the plains, with my sword, I carve my name."
Soldier: "I've seen my share of toil and pain, fighting battles that never end."

Sir Rowan: "Let's travel together, my friend. Strength is found in numbers."
Soldier: "A knight's company is always welcome."

Scene 2: The Town Market

The market is busy, with vendors calling out. The Wizard is performing a magical trick. The Vendor is selling fruits nearby.

Song: "Magic and Food"
Wizard: "Potions and charms, come and see, a little magic for you and me!"
Vendor: "I've got apples and meat, fresh and sweet, forget the magic, come take a treat!"

Sir Rowan: "I'll take both—strength and sustenance."
Soldier: "A full belly and a bit of magic—perfect!"

All Together: "Together we stand, with food and might, through every journey, we take flight!"

Chose Unity + ARFoundation because it's the only solution that supports both multi-image tracking and 3D rendering. Web AR lacks tracking precision, and native ARKit has no built-in 3D rendering pipeline. Deployed on iPhone 15 Pro (ARKit backend).

The biggest challenge was implementing reliable multi-image tracking. Each Carcassonne tile needs to be recognized individually and mapped to the correct 3D model, while maintaining spatial consistency as the user moves around the stage.

Characters flickering and disappearing

Tried Increased tracking buffer zone and extended hold time after signal loss

Result Flickering reduced but positions drifted—characters slowly floated away from their tiles

Low recognition rate on some tiles

Tried Tested each tile's trackability score individually

Result Found that tiles with simple patterns scored very low; swapping to tiles with complex patterns significantly improved stability

After studying the Unity AR sample project and various online resources, I achieved stable multi-image tracking. The key insight was optimizing the reference image library—each tile's distinct artwork provides enough visual features for reliable detection, even at different angles and lighting conditions.

Unity + ARFoundation

Core AR framework with built-in image tracking, plane detection, and rendering pipeline.

Image Tracking

Each Carcassonne tile is registered as a reference image. When detected by the camera, the corresponding 3D scene or character is rendered at the tile's position and orientation.

iOS Deployment

Built and deployed to iPhone for real-time AR performance with ARKit backend. Optimized for stable 60fps rendering with multiple tracked images.

AR and puppetry share the same structure: manipulate physical objects to bring virtual characters to life in physical space. Carcassonne tiles made it work—no custom markers, no touchscreen, the performer's hands are the controls.

Managing six tiles solo isn't realistic. During the performance, fumbling was constant, and the audience's attention kept bouncing between phone and table. The real form of puppetry is division of labor, not a one-person show.

Next: multi-performer mode—one controls characters, another controls scenes. External speakers and large-screen projection to take the show from phone to stage.


//