← Back

Uplifting Happiness Index - Mood Recognition System

Tech Stack

Computer Vision
Audio Processing
Machine Learning
Python
OpenCV
TensorFlow
Speech Recognition
Emotion Detection

Project Overview

The Uplifting Happiness Index is a comprehensive mood recognition system that combines facial expression analysis and speech emotion detection to provide accurate mood assessment. The system uses advanced machine learning techniques to analyze both visual and auditory cues for emotion classification.

What I Built

Dual-Mode Recognition System

  • Facial Mood Recognition: Computer vision-based emotion detection from facial expressions
  • Speech Mood Recognition: Audio processing for emotion analysis from speech patterns
  • Integrated Analysis: Combined approach for enhanced accuracy
  • Real-time Processing: Live mood assessment capabilities

Key Features

  • Multi-modal Analysis: Combines visual and audio inputs for comprehensive assessment
  • Real-time Processing: Instant mood detection and feedback
  • User-friendly Interface: Easy-to-use application for mood tracking
  • Accuracy Optimization: Multiple models for improved recognition accuracy

Technical Implementation

Facial Mood Recognition

The system analyzes facial expressions using computer vision techniques:

  • Face Detection: OpenCV-based face detection and landmark identification
  • Feature Extraction: Extraction of facial features and expressions
  • Emotion Classification: Machine learning model for emotion prediction
  • Real-time Processing: Live video feed analysis

Speech Mood Recognition

Audio processing pipeline for speech emotion analysis:

  • Audio Capture: Real-time audio input processing
  • Feature Extraction: Extraction of audio features (pitch, tempo, energy)
  • Emotion Classification: ML model for speech emotion detection
  • Integration: Combined analysis with facial recognition

System Architecture

Python
# Main application structure
def main():
    # Initialize both recognition systems
    facial_system = FacialMoodRecognition()
    speech_system = SpeechMoodRecognition()
    
    # Combined analysis
    def analyze_mood():
        facial_result = facial_system.detect_emotion()
        speech_result = speech_system.analyze_speech()
        
        # Combine results for final assessment
        final_mood = combine_results(facial_result, speech_result)
        return final_mood

Challenges & Solutions

Challenge 1: Multi-modal Integration

Problem: Combining facial and speech analysis for consistent results Solution:

  • Implemented weighted combination algorithm
  • Used confidence scores for result fusion
  • Created fallback mechanisms for single-mode analysis

Challenge 2: Real-time Performance

Problem: Processing both video and audio streams simultaneously Solution:

  • Optimized processing pipelines
  • Used efficient algorithms for feature extraction
  • Implemented parallel processing where possible

Challenge 3: Accuracy Optimization

Problem: Ensuring high accuracy across different users and conditions Solution:

  • Used multiple model approaches
  • Implemented data augmentation techniques
  • Applied ensemble methods for improved accuracy

Results & Performance

System Performance

  • Facial Recognition Accuracy: High accuracy in emotion detection
  • Speech Recognition Accuracy: Reliable speech emotion analysis
  • Combined Accuracy: Enhanced performance through multi-modal approach
  • Processing Speed: Real-time analysis capabilities

Supported Emotions

The system can detect and classify multiple emotions:

  1. Happy: Positive emotions and joy
  2. Sad: Negative emotions and sadness
  3. Angry: Anger and frustration
  4. Surprise: Unexpected or surprising emotions
  5. Fear: Fearful or anxious states
  6. Neutral: Balanced emotional state

What I Learned

Technical Skills

  • Computer Vision: Advanced facial recognition and emotion detection
  • Audio Processing: Speech analysis and emotion classification
  • Machine Learning: Multi-modal model integration and optimization
  • Real-time Systems: Building systems for live data processing
  • System Integration: Combining multiple technologies effectively

Machine Learning Concepts

  • Multi-modal Learning: Combining different data types for analysis
  • Feature Engineering: Extracting meaningful features from audio and video
  • Model Optimization: Improving accuracy through various techniques
  • Real-time Inference: Deploying models for live applications

Code Snippets

Facial Recognition Implementation

Python
class FacialMoodRecognition:
    def __init__(self):
        self.face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
        self.emotion_model = load_emotion_model()
    
    def detect_emotion(self, frame):
        # Face detection
        faces = self.face_cascade.detectMultiScale(frame, 1.1, 4)
        
        for (x, y, w, h) in faces:
            face_roi = frame[y:y+h, x:x+w]
            
            # Preprocess face for emotion detection
            processed_face = self.preprocess_face(face_roi)
            
            # Predict emotion
            emotion = self.emotion_model.predict(processed_face)
            
            return emotion
    
    def preprocess_face(self, face_img):
        # Resize and normalize
        face_img = cv2.resize(face_img, (48, 48))
        face_img = face_img / 255.0
        return np.expand_dims(face_img, axis=0)

Speech Recognition Implementation

Python
class SpeechMoodRecognition:
    def __init__(self):
        self.audio_model = load_audio_model()
        self.feature_extractor = AudioFeatureExtractor()
    
    def analyze_speech(self, audio_data):
        # Extract audio features
        features = self.feature_extractor.extract_features(audio_data)
        
        # Predict emotion from speech
        emotion = self.audio_model.predict(features)
        
        return emotion
    
    def extract_features(self, audio):
        # Extract MFCC, pitch, energy, and other features
        mfcc = librosa.feature.mfcc(y=audio, sr=sample_rate)
        pitch = librosa.yin(audio, fmin=75, fmax=300)
        energy = librosa.feature.rms(y=audio)
        
        return np.concatenate([mfcc, pitch, energy])

Combined Analysis

Python
def combine_results(facial_result, speech_result):
    # Weighted combination based on confidence
    facial_confidence = facial_result['confidence']
    speech_confidence = speech_result['confidence']
    
    # Calculate weighted emotion scores
    combined_emotion = {}
    for emotion in emotions:
        facial_score = facial_result['scores'][emotion] * facial_confidence
        speech_score = speech_result['scores'][emotion] * speech_confidence
        
        combined_emotion[emotion] = (facial_score + speech_score) / 2
    
    # Return emotion with highest score
    final_emotion = max(combined_emotion, key=combined_emotion.get)
    
    return {
        'emotion': final_emotion,
        'confidence': combined_emotion[final_emotion],
        'facial_result': facial_result,
        'speech_result': speech_result
    }

Future Improvements

  1. Advanced Models: Implement state-of-the-art emotion recognition models
  2. Mobile App: Develop mobile application for on-the-go mood tracking
  3. Long-term Analysis: Add trend analysis and mood history tracking
  4. Personalization: Adapt models to individual users for better accuracy
  5. Integration: Connect with health and wellness applications

Project Impact

This project demonstrates my ability to:

  • Multi-modal Learning: Combine different data types for comprehensive analysis
  • Real-time Systems: Build systems capable of live data processing
  • Computer Vision: Implement advanced facial recognition techniques
  • Audio Processing: Work with speech analysis and emotion detection
  • System Integration: Combine multiple technologies into cohesive solutions

The project showcases practical application of machine learning in emotion recognition and demonstrates my proficiency in computer vision, audio processing, and system integration, making it a valuable addition to my portfolio.