Toutiao’s approach to curbing fake news: Teach the AI to write it so that the machines can fight it.

Bots vs bots is what we need to solve a human problem.

By Frank Hersey

A version of this article first appeared on TechNode, a site covering the rapid changes in China’s technology and startups in real time. It’s re-published here with permission of TechNode.

Toutiao’s AI software did not generate this headline, but for the 20 million pieces of content that flow through the platform each day, headline generation and AB testing are just two of the AI services Toutiao uses to get more people tapping.

Speaking to foreign journalists for the first time as head of the Jinri Toutiao AI Lab and vice president of the app’s owner Bytedance, Dr. Ma Wei-Ying talked about the tech that his lab is working on, why it has a bot that generates fake news and what it knows about its users.

Jinri Toutiao is a news recommendation app that is trained and updated in real time on a user’s behavior. Unlike search engines, Ma pointed out, its search function is individual rather than one ranking for everyone.

“This is the democratization of content creation,” said Ma, putting Bytedance in line with other Chinese tech companies that have recently declared themselves as content companies. “Toutiao is becoming a new information platform for people to find information and connect with information. People are using their smartphones not just to access information, but to create information. They don’t need their own website–they can use Toutiao to directly upload and publish the information and content they create.”

The tremendous amount of data generated by users and creators allows the training of neuro-network models. Applying AI to the data gathered is generating a better understanding of the world these users are in.

“We are moving from a digital representation of the world to a semantic representation of the world.”

Ma believes the system is going to improve across the board. “Content creation will be fundamentally revolutionized in next few years” as AI allows the “mining of human intelligence to close the feedback loop” of each stage of the lifecycle of content creation, moderation, dissemination, and consumption. Here’s how.

Make fake news to beat fake news

Bytedance has a different approach to tackling fake news: writing it. The AI lab that Ma heads has developed a bot that uses the company’s growing database of real fake news stories to generate its own fake fake news. It then has another bot for detecting fake news which is trained by analyzing its counterpart’s fake feed, and by drawing on a matching database of real news. “One is good at writing, which means this also helps us to advance machine writing, and the other is machine reading. These two can push each other to improve by using the label data and assimilated data through our algorithms,” said Ma.

Ma believes that having two competing algorithms allows them each to improve. Toutiao lets users report what they believe to be fake news and analyzes comments to detect whether they suggest the content might be fake. When the system identifies a piece of fake news that has got through, it will notify all who have read it that they had read something fake.

Bytedance is using this “dual-learning” technique in other ways. It machine translates news from Chinese into English, then has another program to translate that article from English into Chinese to improve both processes. Fake news can also be translated to allow the algorithms to train for Toutiao’s global expansion. Other aspects of global expansion are language-independent, such as video, meaning those algorithms have already been trained on large numbers of Chinese users.

In the future, the culmination of analyzing successful pieces, building a database of popular topics, and developing machine writing will mean Toutiao will be able to automatically generate articles for its readers on their favorite subjects.

Better algorithms, better articles

“We adjust our strategy every week. It’s a constant experiment,” said Ma. The system is monitoring in real time and is also working to predict if a piece of content will be a success.

Algorithms offer four headlines to article writers then conduct AB testing to determine which is having the most impact. But not all articles are subject to algorithms due to the computing power involved. Only when a piece starts to gain traction will it get extra help.

Machine learning is used for viral prediction. It compares incoming articles with previous content that has taken off and as the machine learning proves successful, the accuracy of the system increases with constant feedback. Ma acknowledged that care has to be taken to prevent the algorithms from distorting the popularity of particular elements of content or stopping content from new users getting through who have yet to establish a positive profile from the system.

Automated sports commentary

Object recognition in video is also finely developed to fuel more personalization. Bytedance is working on smarter, personalized sports coverage, explained Ma. The current one-feed-fits-all approach will be replaced with a tailored viewing experience when fan data recognizes an interest in, for example, a particular player. Coverage will focus more on that player, with the end goal being a personalized, automated commentary and onscreen captions.

Location, location, location. And time.

Toutiao builds up an idea of users’ lives including their whereabouts and habits. As well as understanding what content the user is interested in, the AI adjusts recommendations based on current and historic location. Ma gave an example of this which shows the sophistication of the tool. Chinese people living in the U.S., using Toutiao as part of their everyday lives there, are generating a footprint. Then suddenly Chinese New Year comes around and the location changes from the U.S. to somewhere in China. The news may change accordingly there and then, but once the user heads back to the States, the software assumes that the user’s location at Chinese New Year was significant to them, and probably their hometown. Once back in the U.S., if any news stories crop up in their supposed hometowns, they will show up in the users’ feeds.

Time is used as a gauge for what is appropriate to send. Algorithms work out when a person is busy and so the app will not bombard them with too much content and will save it until they are free. On a larger scale, the data is providing profiles of cities and areas of cities in terms of people’s working habits. On an individual scale, these patterns can suggest what a person’s occupation is, but the data is anonymized. The system generates a user ID per smartphone, made up of a billion factors and which only an algorithm can identify.

Moderation and government relations

In a separate briefing, Bytedance senior vice-president for corporate development Liu Zhen revealed that of the 20 million pieces of content uploaded to Toutiao each day, 90% are machine moderated. Meaning the other 2 million pieces are human-reviewed. Although Toutiao has been working on its moderation for five years, humans are and always will be needed, according to Ma.

“We have a very good communication channel between the company and the government. So far we’ve been working very hard because we are a new platform, a new kind of application exploring a new frontier. Things have been going quite smoothly because the communication channel is very open and very healthy,” said Ma.

Frank Hersey

Frank Hersey is a Beijing-based tech reporter who’s been visiting China since 2001. He tries to go beyond the headlines to explain the context and impact of developments in China’s tech sector. Follow Frank Hersey on Twitter.

From this week


Instagram launched IGTV, a long-form video app.

The idea isn’t to go after Snapchat (now that they’ve copied practically everything) — it’s YouTube they’re after. It’s directed at video creators, offering them up to 1-hour video clips (versus a mere 1-minute on Instagram). They’ve spent a lot of time thinking about how to design this app for mobile audiences. It’s vertical, full-screen, and autoplaying. A vertical YouTube, perhaps. I have to say the UX is fresh and enlightening. What will it take for newsrooms to jump onboard?

Governments & policy


Microsoft rebranded its MSN news apps to simply Microsoft News.

If you’ve never tried it, News boasts an aggregated base of 1,000 “premium publishers” and 800 human editors who pick and feature stories. I re-installed the app today, just to see what’s different from the old MSN app. Quick rant: I don’t know why news app publishers are still asking people to self-declare their interest in topics. Isn’t it clear by now that people don’t necessarily know what they want to read in news? We don’t think in terms of categories — tech, sports, money. If it’s interesting, we’ll read it, right? Sectioning was invented by newspapers to address very specific reader and advertiser behaviors in print. They don’t apply to digital.
The Verge



Media startups

Singapore-based New Naratif released their financial statement, in a show of transparency rarely seen in these circles.

The team is appealing the Singapore government’s rejection of its business application (on grounds that NN is “contrary to national security”) so they’re hoping this will shed more light on their operations. I love reading financial statements of media startups — always a treat. It’s also always staggering to see how much money gets spent on PayPal fees (a problem we face as well).
New Naratif


The Apple Watch now has a responsive browser.

One of the reasons this is a big deal is because Ethan Marcotte is excited about it. I’ll take his word for it — he’s the revoltingly talented web designer who started the whole ‘responsive web design’ thing. He says “the Watch’s WebKit browser looks pretty darned good, as it turns out.” It still doesn’t support web fonts or video embeds, but that will probably come soon. Wrist-first design, people: you heard it here first. (Thank you, Shuwei.)
Ethan Marcotte