Generative AI for Bridging Humor Across Modalities
@Creation @ Social
ongoing multimodal research project for course MIT MMAI (MAS60)
intention: Interactive, Expressive, Entertaining, Offensive, Other
2 Fusion
3 Generation
Developed a pioneering cross-modal system that synthesizes contextually relevant audio based on the deep semantic reasoning of internet memes.
The "Humor Gap" Challenge: Solved the limitation where current LLMs/Vision models struggle to decode the complex, non-linear humorous logic (irony, dark humor) that arises from the "semantic friction" between text and image.
Advanced Reasoning: Leveraged cutting-edge Multimodal Alignment (e.g., SigLIP/BLIP-2) and Foundation Models (e.g., LLaVA/Gemini Vision) to interpret hidden humorous metaphors.
PM Impact - Capability for Virality: Lowered the content creation barrier by enabling models to autonomously identify "meme-able" visual patterns and pair them with emotive, context-aware audio, fostering a richer ecosystem for shareable content.