Article #13 from 2025
After 2 years working as a Cloud Engineer, I felt confident enough to compete in the Google Cloud x MLB Hackathon where I used Google's IA capabilities to generate tailored engaging content in multiple languages and forms in a creative manner.
Fan Frames is an innovative application that enhances the baseball fan experience by generating personalized highlights through text, audio, and AI-generated images. The platform addresses the challenge of information overload in sports media by providing targeted, engaging content based on specific teams, players, and time periods. What sets Fan Frames apart is its creative use of team mascots for visual representation, ensuring both compliance with AI image generation guidelines and maintaining strong team identity connections.
The application is built on a streamlined architecture using Google Cloud services, including App Engine and Cloud Run Functions, while leveraging cutting-edge AI technologies such as Gemini 2.0 and Imagen 3. The platform integrates seamlessly with MLB's API to fetch relevant data, which is then transformed into personalized content through various AI models for text generation, translation, text-to-speech, and image creation. Fan Frames demonstrates a practical solution for enhancing fan engagement that could be readily integrated into MLB's official application.
The application works as long as both the MLB API and Google Cloud models are available.
If the application is up but doesn’t work, make sure you've entered valid inputs and try again.
Text and audio highlights don’t change much for the same inputs but the image does because I chose to randomize several parameters like style, stances, and time of day.
I’ve built the application using only 3 fully managed elements (1 App Engine deployment and 2 Cloud Run Functions):
The App Engine deployment is a fully managed serverless platform that allowed me to focus on coding the Front End without thinking about availability, scaling, and more. Simply execute ‘gcloud app deploy’ and the web application is live worldwide!
The first Cloud Run Function is dedicated to fetching data related to the fan request from the MLB API and generating both the text and audio summary using several Google Cloud services: Vertex AI, Gemini 2.0, Translate, Text-To-Speech. Cloud Run Functions are fully managed lightweight computations solutions to respond to events, HTTP requests in this case, and execute code.
The second Cloud Run Function is dedicated to generating an image related to the fan request using Google’s Imagen 3 model.
The goal is not to get caught up in infrastructure management since Google Cloud takes care of it, the goal is to generate value out of existing data in the simplest way possible using Google Cloud cutting edge services.
The whole code as well as deployment instructions are replicable. Here are the key parts:
gcloud functions deploy get_highlights --region=$REGION --trigger-http --memory 512Mi --cpu .333 --min-instances=1 --max-instances=30 --runtime python312 --allow-unauthenticated --concurrency 1 --entry-point get_highlights --gen2 --source functions/get_highlights --timeout 60s --set-env-vars REGION=$REGION,PROJECT_ID=$PROJECT_ID,DEBUG=yes,MODEL=gemini-2.0-flash-exp
gcloud functions deploy get_image --region=$REGION --trigger-http --memory 512Mi --cpu .333 --min-instances=1 --max-instances=30 --runtime python312 --allow-unauthenticated --concurrency 1 --entry-point get_image --gen2 --source functions/get_image --timeout 60s --set-env-vars REGION=$REGION,PROJECT_ID=$PROJECT_ID,DEBUG=yes,MODEL=imagen-3.0-generate-001
highlights = GenerativeModel({MODEL}).generate_content(prompt + json.dumps(data),generation_config={"temperature": 0})
audio = client.synthesize_speech(input=highlights, voice="en-US-Neural2-I").audio_content
style = random.choice(["cartoonish 3D model", "oil painting", "manga-like sketch"])
image = ImageGenerationModel.from_pretrained({MODEL}).generate_images(prompt=prompt, number_of_images=1, safety_filter_level="block_few", person_generation="allow_all").__getitem__(0)._image_bytes
The best part of this project was being able to generate quality images related to various input data while keeping a coherent style:
I initially tried to generate complex images without taking into account the aesthetics too much, only to end up reducing the amount of information displayed and focusing on the appearance, especially emotions and style, of the ones that mattered most. Instead of writing ‘Win’ or ‘Loss’, make the subjects reflect the result using adequate stances. I kept 3 stances, 3 image styles, and 3 times of day to balance variety and quality.
I reduced the number of tokens tenfold without degrading Gemini’s output by only extracting necessary data when querying the MLB API using hydrations and fields. Even though Gemini models handle 1 million or more tokens doesn’t mean you shouldn’t count.
Even though I had a rough idea of my goal at first, it took several decisions, trials, and iterations to end up with a satisfying application. The most important questions were “How much control do I give the fan?”, “Do I represent humans celebrating or the team mascot?”, and “How do I optimize for cost and efficiency?”. This is why I decided not to let the fan influence the content orientation (summary length, audio tone, image style), nor represent humans (mascots are universal), nor make the app store the files (audio and image) for now despite having tried it. Perfection is achieved when there is nothing left to take away.
Two of Google's latest products weren’t in General Availability during the Hackathon period, which could have significantly enhanced the fan experience:
NotebookLM for a podcast summarizing a Season
Veo 2 for a video highlighting a key moment
Despite being very complete, the MLB API wasn’t fully accessible as the centralized metrics for a given player (/stats/metrics) weren’t directly available, which forced me to find a less convenient and less complete workaround.
Because Imagen 3 blocks generating images of celebrities and writing their name on jerseys, I was forced to take a creative approach to represent the teams. One important parameter of Imagen 3 is person_generation, which value is dont_allow unless you get approved by Google through a form to use allow_all. This change drops the generation error rate from 50% to 0%.