Gesture Controlled Temple Run using OpenCV & MediaPipe — No Keyboard, Just Your Hands!



Ever imagined playing Temple Run with just your hand gestures? No keyboard. No joystick. Just your camera and a bit of Python magic. Well, I did exactly that — and the result is a hands-free gaming experience powered by AI.

Welcome to the future of intuitive gameplay.


🎯 Project Overview

This project is a Gesture-Controlled Temple Run system built using:

  • 🧠 OpenCV: For video frame processing
  • MediaPipe: For real-time hand landmark detection
  • 💻 Python: The glue that binds it all
  • 🎮 Temple Run (or any similar endless runner game): As the playground

What started as a simple idea became an exciting experiment in computer vision, delivering immersive control without ever needing to touch the keyboard.


🔍 Why I Built This

As someone deeply invested in AI, computer vision, and the future of interaction, I’ve always been intrigued by human-computer interfaces beyond the mouse and keyboard.

Temple Run is nostalgic. But adding hand gestures to it made it futuristic. I wanted to create something that's not just fun but also a learning playground for AI-based gesture recognition.

And honestly? It’s so much fun to control characters just by swiping your hand.


🧰 Tech Stack

Here’s a breakdown of the tools and libraries used:

  • Python 3.9
  • OpenCV: For accessing camera feed, drawing, and visualization
  • MediaPipe Hands: To detect hand landmarks in real-time
  • PyAutoGUI: To simulate keyboard key presses like left, right, space, etc.
  • Subway Surfers / Temple Run: As the game being controlled

🎥 Yes, the code works with both Temple Run and Subway Surfers on web or desktop.


🎬 How It Works

The idea is simple — use a webcam to track your hand, interpret your gestures, and simulate key presses accordingly.

1. Capture Frame from Webcam

camera_video = cv2.VideoCapture(0)

OpenCV reads frames continuously from the webcam.


2. Detect Hand using MediaPipe

mpHands = mp.solutions.hands
hands = mpHands.Hands()

MediaPipe provides fast, efficient landmark detection — it identifies 21 key points on your hand (fingers, joints, wrist, etc.) in real-time.


3. Interpret Gestures

Based on movement and finger positions, we define gestures:

  • Swipe Left ➡️ → Move left
  • Swipe Right ⬅️ → Move right
  • Show 5 fingers 🖐️ → Jump
  • Show 1 finger ☝️ → Slide (crouch)

4. Trigger Game Actions

Using pyautogui.press("left") or "right", the code simulates keyboard input based on detected gestures.

So you never touch the keyboard — your hand becomes the controller.


📦 Project Source Code

You can access the full, clean source code here:

📄 Source Code (Google Doc):
👉 Click here to view the code


📱 Customizing the Display

To make the webcam feed match a mobile phone-like display, we resized the output to resemble a vertical phone screen:

resized_frame = cv2.resize(frame, (360, 640))

This enhances the immersive feel — like you're watching and interacting through a mobile screen.


🧪 Challenges Faced

1. Gesture Conflicts

Some gestures looked too similar (e.g., 1 finger vs 2 fingers). Solved by refining landmark logic and adding cooldown timers.

2. Swipe Detection

Accurately detecting a horizontal hand swipe within frames was tricky. I improved it by comparing previous frame hand position vs current frame.

3. Lighting Sensitivity

MediaPipe performs best in good lighting. In dim conditions, landmark detection was less reliable — fixed using external light.


💡 Learning Outcomes

This project taught me:

  • Real-time hand tracking using MediaPipe
  • Interfacing hardware inputs (camera) with software outputs (keyboard presses)
  • Designing smooth, intuitive gesture recognition systems
  • Debugging live vision-based AI models in constrained environments

🔥 Showcase & Demo

🎥 I recorded the gameplay as a demo video and posted it on LinkedIn. The video showcases Temple Run being played with just my hand gestures — no contact, no controller.

The response? Incredible.
From AI enthusiasts to recruiters and developers, everyone was intrigued by the practicality of the system.


🗣️ How You Can Build It

Want to build it yourself?

Just follow these steps:

  1. Install dependencies:
  2. pip install opencv-python mediapipe pyautogui
  3. Clone or copy the source code
  4. Run the script:
  5. python app.py
  6. Open your browser and visit:
  7. Play using just your hands!

🚀 Future Improvements

  • Voice + Gesture Combo: (Imagine saying "Jump" and raising your hand.)
  • Cross-platform integration: To work with mobile games too.
  • AI-powered gesture customization: So users can define their own gestures.

Live Preview:


💬 Final Thoughts

Gesture control is not just about fancy tech — it's about redefining interaction. This project gave me a real sense of what's possible with just a webcam, Python, and some creativity.

Whether you’re building next-gen interfaces, game enhancements, or just love tinkering with AI — this project is a great place to start.


🧑‍💻 Author

Ashutosh Mishra
CTO at Ottox

🔗 Connect with ottoX on LinkedIn