Budapest Post

Cum Deo pro Patria et Libertate
Budapest, Europe and world news

Google’s SummAE AI generates abstract summaries of paragraphs

Google’s SummAE AI generates abstract summaries of paragraphs

Google researchers propose a novel AI summarization model - SummAE- capable of generating abstract summaries of paragraphs.
Machines have a tougher time summarizing text than you’d think, at least where the summarization is abstractive rather than extractive. While the extraction requires merely concatenating sentences, abstraction involves the task of paraphrasing using novel sentences. Progress has been made in the news domain recently, perhaps owing to the abundance of corpora on which algorithmic systems can be trained. But robust summarization of most other writing forms remains an unsolved problem.

Motivated by this, a team at Google Brain investigated an abstractive summarization system dubbed SummAE that’s largely unsupervised, meaning it’s able to generalize from a small amount of training data to unseen textual examples. While it couldn’t summarize beyond single five-sentence paragraphs, the researchers claim it “significantly” improves upon the baseline and represents a “major” step in the direction of human-level performance.


Machines have a tougher time summarizing text than you’d think, at least where the summarization is abstractive rather than extractive. While the extraction requires merely concatenating sentences, abstraction involves the task of paraphrasing using novel sentences. Progress has been made in the news domain recently, perhaps owing to the abundance of corpora on which algorithmic systems can be trained. But robust summarization of most other writing forms remains an unsolved problem.

Motivated by this, a team at Google Brain investigated an abstractive summarization system dubbed SummAE that’s largely unsupervised, meaning it’s able to generalize from a small amount of training data to unseen textual examples. While it couldn’t summarize beyond single five-sentence paragraphs, the researchers claim it “significantly” improves upon the baseline and represents a “major” step in the direction of human-level performance.

Recommended videosPowered by AnyClip
Go Eat A McRib
Play

Unmute
Duration
0:59
/
Current Time
0:17

Fullscreen
Up Next

NOW PLAYINGGo Eat A McRib
Scientists Discover What Makes 'Water Bears' Virtually Indestructible
Doctor diagnoses his own cancer with an app
There's A Bigger Danger To Pedestrians Than Walking While Distracted
Prince Harry to edit National Geographic's Instagram
The Secret Culprit Of America's Student Debt Crisis
5 Quotes About The Power of Books

The data set and code are freely available on GitHub, along with the configuration settings for the best model.

“As one of the very first works approaching single-document [abstract summarization], we propose a novel neural model — SummAE,” wrote the coauthors. “[We believe it] is therefore desirable to have models capable of automatically summarizing documents abstractively with little to no supervision.”

SummAE contains a denoising autoencoder that encodes (that is, generates numerical representations of) sentences and paragraphs of the target text in a shared space. Guided by a decoder whose input is prepended with a token signaling whether to decode a sentence or a paragraph, the system generates summaries by decoding each sentence from the encoded paragraphs.

The researchers discovered that most traditional approaches to training the auto-encoder resulted in long, multi-sentence summaries. To encourage it to learn higher-level concepts disentangled from their original expression, the team employed two denoising approaches — randomly masking tokens and permuting the order of sentences within paragraphs — that increased the number of training examples substantially. They also experimented with an adversarial critic component that could distinguish between sentences and paragraphs, in addition to two pretraining tasks that encouraged the encoder to learn how sentences narratively followed within a paragraph.

The researchers trained three different variations of SummAE on the ROCStories, a corpus of self-contained, diverse, non-technical, and concise prose. They split the original 98,159 training stories into three separate collections — a training set, a validation set, and a test set — and collected three human summaries each for 500 validation examples and 500 test examples.

After 100,000 training steps with pretraining, the team reports that the best model significantly outperformed a baseline extractive sentence generator on the Recall-Oriented Understudy for Gisting Evaluation (ROUGE), a set of metrics devised to evaluate automatic summarization. Moreover, they say that in a qualitative study involving evaluators recruited through Amazon’s Mechanical Turk, volunteers rated one of the three SummAE models’ summaries “fluent” and “information-relevant” 80% of the time.

“The paragraph reconstructions show some coherence, although with some disfluencies and factual inaccuracies that are common with neural generative models,” wrote the coauthors. “Since the summaries are decoded from the same latent vector as the reconstructions, improving them could lead to more accurate summaries.”
AI Disclaimer: An advanced artificial intelligence (AI) system generated the content of this page on its own. This innovative technology conducts extensive research from a variety of reliable sources, performs rigorous fact-checking and verification, cleans up and balances biased or manipulated content, and presents a minimal factual summary that is just enough yet essential for you to function as an informed and educated citizen. Please keep in mind, however, that this system is an evolving technology, and as a result, the article may contain accidental inaccuracies or errors. We urge you to help us improve our site by reporting any inaccuracies you find using the "Contact Us" link at the bottom of this page. Your helpful feedback helps us improve our system and deliver more precise content. When you find an article of interest here, please look for the full and extensive coverage of this topic in traditional news sources, as they are written by professional journalists that we try to support, not replace. We appreciate your understanding and assistance.
Newsletter

Related Articles

0:00
0:00
Close
IMF Upgrades Global Growth Forecast as Weaker Dollar Supports Outlook
House Republicans Move to Defund OECD Over Global Tax Dispute
France Opens Criminal Investigation into X Over Algorithm Manipulation Allegations
Trump Steamrolls EU in Landmark Trade Win: US–EU Trade Deal Imposes 15% Tariff on European Imports
ChatGPT CEO Sam Altman says people share personal info with ChatGPT but don’t know chats can be used as court evidence in legal cases.
Intel Reports Revenue Beats but Sees 81% Rise in Losses
Politics is a good business: Barack Obama’s Reported Net Worth Growth, 1990–2025
UN's Top Court Declares Environmental Protection a Legal Obligation Under International Law
"Crazy Thing": OpenAI's Sam Altman Warns Of AI Voice Fraud Crisis In Banking
The Podcaster Who Accidentally Revealed He Earns Over $10 Million a Year
UK Government Considers Dropping Demand for Apple Encryption Backdoor
Japanese Man Discovers Family Connection Through DNA Testing After Decades of Separation
Russia Signals Openness to Ukraine Peace Talks Amid Escalating Drone Warfare
Switzerland Implements Ban on Mammography Screening
Pogacar Extends Dominance with Stage Fifteen Triumph at Tour de France
President Trump Diagnosed with Chronic Venous Insufficiency After Leg Swelling
CEO Resigns Amid Controversy Over Relationship with HR Executive
NVIDIA Achieves $4 Trillion Valuation Amid AI Demand
Tulsi Gabbard Unveils Evidence Alleging Political Manipulation of Intelligence During Trump Administration
Centrist Criticism of von der Leyen Resurfaces as she Survives EU Confidence Vote
Trump Announces Coca-Cola to Shift to Cane Sugar in U.S. Production
FIFA Pressured to Rethink World Cup Calendar Due to Climate Change
Zelensky Reshuffles Cabinet to Win Support at Home and in Washington
"Can You Hit Moscow?" Trump Asked Zelensky To Make Putin "Feel The Pain"
Church of England Removes 1991 Sexuality Guidelines from Clergy Selection
Superman Franchise Achieves Success with Latest Release
Hungary's Viktor Orban Rejects Agreements on Illegal Migration
Air India Pilot’s Mental Health Records Under Scrutiny
Jamie Dimon Warns Europe Is Losing Global Competitiveness and Flags Market Complacency
Moonshot AI Unveils Kimi K2: A New Open-Source AI Model
Martha Wells Says Humanity Still Far from True Artificial Intelligence
Nvidia Becomes World’s First Four‑Trillion‑Dollar Company Amid AI Boom
EU Delays Retaliatory Tariffs Amid New U.S. Threats on Imports
Trump Proposes Supplying Arms to Ukraine Through NATO Allies
US Opens First Rare Earth Mine in Over 70 Years in Wyoming
Bitcoin Reaches New Milestone of $116,000
Severe Heatwave Claims 2,300 Lives Across Europe
Declining Beer Consumption Signals Cultural Shift in Germany
Emails Leaked: How Passenger Luggage Became a Side Income for Airport Workers
Polish MEP: “Dear Leftists - China is laughing at you, Russia is laughing, India is laughing”
Western Europe Records Hottest June on Record
BRICS Expands Membership with Indonesia and Ten New Partner Countries
Elon Musk Founds a Party Following a Poll on X: "You Wanted It – You Got It!"
China’s Central Bank Consults European Peers on Low-Rate Strategies
France Requests Airlines to Cut Flights at Paris Airports Amid Planned Air Traffic Controller Strike
Poland Implements Border Checks Amid Growing Migration Tensions
Emirates Airline Expands Market Share with New $20 Million Campaign
Amazon Reaches Milestone with Deployment of One Millionth Robot
Yulia Putintseva Calls for Spectator Ejection at Wimbledon Over Safety Concerns
House Oversight Committee Subpoenas Former Jill Biden Aide Amid Investigation into Alleged Concealment of President Biden's Cognitive Health
×