Gesture Controlled Temple Run using OpenCV & MediaPipe

Ever imagined playing Temple Run with just your hand gestures? No keyboard. No joystick. Just your camera and a bit of Python magic. Well, I did exactly that — and the result is a hands-free gaming experience powered by AI.

Welcome to the future of intuitive gameplay.

🎯 Project Overview

This project is a Gesture-Controlled Temple Run system built using:

🧠 OpenCV: For video frame processing
✋ MediaPipe: For real-time hand landmark detection
💻 Python: The glue that binds it all
🎮 Temple Run (or any similar endless runner game): As the playground

What started as a simple idea became an exciting experiment in computer vision, delivering immersive control without ever needing to touch the keyboard.

🔍 Why I Built This

As someone deeply invested in AI, computer vision, and the future of interaction, I’ve always been intrigued by human-computer interfaces beyond the mouse and keyboard.

Temple Run is nostalgic. But adding hand gestures to it made it futuristic. I wanted to create something that's not just fun but also a learning playground for AI-based gesture recognition.

And honestly? It’s so much fun to control characters just by swiping your hand.

🧰 Tech Stack

Here’s a breakdown of the tools and libraries used:

Python 3.9
OpenCV: For accessing camera feed, drawing, and visualization
MediaPipe Hands: To detect hand landmarks in real-time
PyAutoGUI: To simulate keyboard key presses like left, right, space, etc.
Subway Surfers / Temple Run: As the game being controlled

🎥 Yes, the code works with both Temple Run and Subway Surfers on web or desktop.

🎬 How It Works

The idea is simple — use a webcam to track your hand, interpret your gestures, and simulate key presses accordingly.

1. Capture Frame from Webcam

camera_video = cv2.VideoCapture(0)

OpenCV reads frames continuously from the webcam.

2. Detect Hand using MediaPipe

mpHands = mp.solutions.hands
hands = mpHands.Hands()

MediaPipe provides fast, efficient landmark detection — it identifies 21 key points on your hand (fingers, joints, wrist, etc.) in real-time.

3. Interpret Gestures

Based on movement and finger positions, we define gestures:

Swipe Left ➡️ → Move left
Swipe Right ⬅️ → Move right
Show 5 fingers 🖐️ → Jump
Show 1 finger ☝️ → Slide (crouch)

4. Trigger Game Actions

Using pyautogui.press("left") or "right", the code simulates keyboard input based on detected gestures.

So you never touch the keyboard — your hand becomes the controller.

📦 Project Source Code

You can access the full, clean source code here:

📄 Source Code (Google Doc):
👉 Click here to view the code

📱 Customizing the Display

To make the webcam feed match a mobile phone-like display, we resized the output to resemble a vertical phone screen:

resized_frame = cv2.resize(frame, (360, 640))

This enhances the immersive feel — like you're watching and interacting through a mobile screen.

🧪 Challenges Faced

1. Gesture Conflicts

Some gestures looked too similar (e.g., 1 finger vs 2 fingers). Solved by refining landmark logic and adding cooldown timers.

2. Swipe Detection

Accurately detecting a horizontal hand swipe within frames was tricky. I improved it by comparing previous frame hand position vs current frame.

3. Lighting Sensitivity

MediaPipe performs best in good lighting. In dim conditions, landmark detection was less reliable — fixed using external light.

💡 Learning Outcomes

This project taught me:

Real-time hand tracking using MediaPipe
Interfacing hardware inputs (camera) with software outputs (keyboard presses)
Designing smooth, intuitive gesture recognition systems
Debugging live vision-based AI models in constrained environments

🔥 Showcase & Demo

🎥 I recorded the gameplay as a demo video and posted it on LinkedIn. The video showcases Temple Run being played with just my hand gestures — no contact, no controller.

The response? Incredible.
From AI enthusiasts to recruiters and developers, everyone was intrigued by the practicality of the system.

🗣️ How You Can Build It

Want to build it yourself?

Just follow these steps:

Install dependencies:

pip install opencv-python mediapipe pyautogui

Clone or copy the source code
Run the script:

python app.py

Open your browser and visit:

Temple Run
or Subway Surfers

Play using just your hands!

🚀 Future Improvements

Voice + Gesture Combo: (Imagine saying "Jump" and raising your hand.)
Cross-platform integration: To work with mobile games too.
AI-powered gesture customization: So users can define their own gestures.

Live Preview:

💬 Final Thoughts

Gesture control is not just about fancy tech — it's about redefining interaction. This project gave me a real sense of what's possible with just a webcam, Python, and some creativity.

Whether you’re building next-gen interfaces, game enhancements, or just love tinkering with AI — this project is a great place to start.

🧑‍💻 Author

Ashutosh Mishra
CTO at Ottox

🔗 Connect with ottoX on LinkedIn

Gesture Controlled Temple Run using OpenCV & MediaPipe — No Keyboard, Just Your Hands!