Let’s talk about GMA’s AI sportscasters

You can build your own AI sportscaster in 10 minutes. But should you?

Oct 02, 2023

My old employer has been in the news recently over its decision to roll out AI-generated sportscasters. My curiosity was piqued because, well, it hit close to home. I worked for GMA-7 for most of the past decade, heading up what used to be the News and Public Affairs Digital Media division that also handled digital publishing for sports properties of the network. I left the company in June to pursue a fellowship that ponders the big questions about the future of media and journalism, including AI. And this is ostensibly still a sports blog.

The initial reaction of the public to the GMA announcement of the AI sportscasters was overwhelmingly negative. Most of the criticism focused on the possibility of AI replacing human sports reporters in the future. Among those who added their voice to the chorus were Chino Trinidad and Mark Zambrano, who both previously covered sports for GMA News, and Jamela Alindogan, the longtime international correspondent whose career began as a UAAP courtside reporter. Media veterans including Roby Alampay, Joel Pablo Salud, Boo Chanco, Ann Marie Pamintuan, and Alan Soon all gave thoughtful takes worth reading. The National Union of Journalists of the Philippines called for the start of a conversation, while student organizations from the University of the Philippines and the University of Santo Tomas were more scathing.

Oddly enough, the first story that GMA published in response to the outcry was an article with this headline:

After about a day, the story had been revised to include a more sober title:

In apparent response to the criticism, GMA Network Senior Vice and Head of Integrated News, Regional TV, and Synergy Oliver Victor B. Amoroso: “Maia and Marco are AI presenters, they are not journalists, they can never replace our seasoned broadcasters and colleagues who are the lifeblood of our organization. We are now living in the age of AI and other major news organizations worldwide are already using this as a tool to improve their operations. As the leading news organization in the Philippines, we will constantly look for ways to hone our craft, while preserving the value of our human assets and the integrity of our reporting.”

In talking to former colleagues about the issue, there was a mix of responses. Some were completely surprised by the AI project, while others who worked on it directly said it was always meant to augment and not replace human talents.

What quickly became apparent from all those conversations though was that the organization did not have in place any guidelines for the use of generative AI, either for the newsroom or for editorial products.

Meanwhile, the actual debut of the AI sportscasters was a huge hit on Facebook, drawing more than three million views on the GMA News page, even if the actual video was a bit underwhelming considering all the hype:

The most glaring thing about this video is that this isn’t some dark magic. Everything that you need to generate AI sportscasters is available off the shelf. In fact, you could create your own in about 10 minutes, for less than $20. Why don’t we do that right now?

Let’s create our own AI sportscaster

First, we need to conjure up our sportscaster. We just need him to be handsome, and a sportscaster. I asked Midjourney to create an image for me with the prompt:

a photorealistic image of a handsome Filipino sports reporter

It quickly generated four sample images from that prompt, all of them handsome:

I decided to go with Picture No. 1, so I asked Midjourney for a higher resolution image. Look at our guy!

Next, I logged into the D-ID’s Creative Reality Studio to upload the picture of our handsome Filipino sportscaster.

This tool allows you select which voice model to use for the voiceover. A search for “Philippines” shows there are four voices available: James (English-Philippines), Rosa (English-Philippines), Angelo (Filipino-Philippines), and perhaps the most Pinoy name of all, Blessica (Filipino-Philippines). These are all languages available from Microsoft Azure’s neural text-to-speech (TTS) models: James and Rosa were released in March 2021, while Angelo and Blessica followed suit in November 2021. More curiously, Microsoft makes it easy for companies to create custom voices for their brands.

I type in a few quick words for the script. D-ID does all the hard work of animating the picture we uploaded. I click “Generate Video,” and voila:

The hardest part of the whole thing is keying in your credit card number to pay for the services. A subscription to Midjourney (which I’ve been using to generate images for FireQuinito.com) costs $10 a month, while the most basic plan for for D-ID’s Creative Reality Studio costs just $5.90 a month. It was so easy, you’d realize that having AI sportscasters isn’t quite the flex it was intended to be.

But there’s a larger point here. The fact that the technology is already here, accessible to everyone, only underscores how important it is that media organizations to have official guidelines on the role of AI in newsroom processes and editorial product, to figure out what would be acceptable use and to serve as guardrails for what wouldn’t.

No, AI-generated reporters won’t be replacing your favorite news personalities anytime soon, but here are some examples to think about that could happen tomorrow. But now, they’re all complicated by the fact that the company went ahead and rolled out the release of AI sportcasters without thinking of implications.

Case 1: Digital Voices

GMA News (and other outlets) publish a lot of short-form digital videos. To save on time and costs, these digital videos usually include text captions instead of voiceovers.

But now that we know we could generate voiceovers in Filipino using TTS models, would that be acceptable use of AI? Note that no one would lose their job in this scenario. And besides, you’ve already used AI sportscasters, so why couldn’t you use AI voiceovers for these digital videos?

But then, on YouTube, there are hundreds of videos that use generated voiceovers too. If you use off-the-shelf TTS voices like Angelo or Blessica, your videos might sound like everyone else’s videos; it would make it very easy for bad actors to mimic your content. Would you then train a new model to sound unique? How would you train that voice model? Would you be transparent about whose voices you used to train the model?

Say you now have a voice model specifically for use of digital videos that used to not have any voiceovers. Like any responsible media organization, you label all the videos you publish that use this AI voice model.

Then you realize that you have other digital video products that only have voiceovers and not on-camera talent. An example: Need to Know, an explanatory journalism series produced for the web.

The voiceover work, in this case, usually takes up additional time for the producers of the video. Would it then be acceptable to use the AI voice model? Would your answer be different for video projects that pay separately for voice talents?

Case 2: Translating from the Regions

GMA has a bunch of regional newscasts, where reporters file their stories in their native language.

Everyday, a handful of stories from these newscasts make it to the main national lineup. The regional reporter would then have to translate their script into Tagalog, record the new voiceover, add subtitles, and basically recreate the report.

This is a long and tedious process. In the example above, the story came out on the regional One Western Visayas newscast on Friday and the national Balita Ko newscasts on Monday. AI could certainly help speed things up.

You could use ChatGPT to translate the script from Hiligaynon to Tagalog, but of course, you need to make sure the output is checked by a human being. You’d also want the original reporter to do the voiceover for the national newscast.

But what about all the other stories from the regional newscast that is published on the web in their original language? Would you consider using AI voiceover translations for those videos to reach a wider audience? You already rolled out AI sportscasters, so why not AI translations.

But why should the translated voiceovers sound like randos. Wouldn’t it be better if they sounded like the original reporters? Would you develop voice models using reporters? Spotify and YouTube are already testing out similar technology.

Suppose you’re fine with that; it’s only for online anyway. Now what if big news breaks and you need to get the story from the regional newscast into a breaking news bulletin ASAP. The regional reporter is not immediately available. Would you use the AI voiceover?

Case 3: Evening News Rush

It’s 5:30 p.m. and all the editing bays are at full swing for the 6:30 p.m. newscast. A reporter has sent the script for their big story ready for voiceover, but still needs to confirm a key detail. The newsdesk finally gets the confirmation, but now the reporter could not be reached because they’re in a remote area. Thankfully, we now have an AI model for the reporter’s voice (they’ve been using it for translated voiceovers for online videos). The story is in danger of being dropped if the video editor doesn’t get working on it soon. Can you use AI to voice their report?

Case 4: Podcast Ad Reader

The Howie Severino Podcast draws a lot of interest from advertisers. Howie Severino never reads ads himself because that would be a conflict, but the advertiser is a really big fan. They ask that their ad be read by someone who reminds people of Howie. Can you use AI to do that?

These are just some examples I came up with off the top of my head — I’m sure there are many, many more. Some of them have obvious answers, while others may be more complicated. I hope this helps push forward the conversation about AI in the newsroom not just for GMA, but for all news outlets in our country.

It’s a conversation that needs to reach beyond just the newsroom. While editors and media managers have their hearts in the right place, many of them are as an editor-friend put it, “plainly ill-equipped to handle that process.” The discussion needs to happen across the organization, and should involve technology managers, human resource officials, and even owners.

Make no mistake, it’s a conversation that’s going to happen in other companies across the country and the world too. But media organizations have an outsized role to play in the discourse of AI because we are in the business of building trust with our audiences, which begins with trust with our own people.