Meme-to-Audio Synthesis

Generative AI for Bridging Humor Across Modalities

@Creation @ Social

ongoing multimodal research project for course MIT MMAI (MAS60)

meme-demo.mp4

1 Alignment

contrastive learning

intention: Interactive, Expressive, Entertaining, Offensive, Other

2 Fusion

3 Generation

Developed a pioneering cross-modal system that synthesizes contextually relevant audio based on the deep semantic reasoning of internet memes.

The "Humor Gap" Challenge: Solved the limitation where current LLMs/Vision models struggle to decode the complex, non-linear humorous logic (irony, dark humor) that arises from the "semantic friction" between text and image.
Advanced Reasoning: Leveraged cutting-edge Multimodal Alignment (e.g., SigLIP/BLIP-2) and Foundation Models (e.g., LLaVA/Gemini Vision) to interpret hidden humorous metaphors.
PM Impact - Capability for Virality: Lowered the content creation barrier by enabling models to autonomously identify "meme-able" visual patterns and pair them with emotive, context-aware audio, fostering a richer ecosystem for shareable content.

Page updated

Google Sites

Report abuse