A Comparison of Automated Keyphrase Extraction Techniques and of Automatic Evaluation vs. Human Evaluation

Hussey, Richard; Williams, Shirley; Mitchell, Richard; Field, Ian

InfoSys 2025 Congress
March 09, 2025 to March 13, 2025 - Lisbon, Portugal

ICNS 2025, The Twenty-Second International Conference on Networking and Services
ICAS 2025, The Twenty-Second International Conference on Autonomic and Autonomous Systems
ENERGY 2025, The Fifteenth International Conference on Smart Grids, Green Communications and IT Energy-aware Technologies
WEB 2025, The Thirteenth International Conference on Building and Exploring Web Based Environments
DBKDA 2025, The Seventeenth International Conference on Advances in Databases, Knowledge, and Data Applications
SIGNAL 2025, The Tenth International Conference on Advances in Signal, Image and Video Processing
BIOTECHNO 2025, The Seventeenth International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies
AIHealth 2025, The Second International Conference on AI-Health

InfoWare 2025 Congress
March 09, 2025 to March 13, 2025 - Lisbon, Portugal

ICCGI 2025, The Twentieth International Multi-Conference on Computing in the Global Information Technology
ICWMC 2025, The Twenty-Second International Conference on Wireless and Mobile Communications
VEHICULAR 2025, The Fourteenth International Conference on Advances in Vehicular Systems, Technologies and Applications
INTERNET 2025, The Seventeenth International Conference on Evolving Internet
COLLA 2025, The Fifteenth International Conference on Advanced Collaborative Networks, Systems and Applications
INTELLI 2025, The Fourteenth International Conference on Intelligent Systems and Applications
VISUAL 2025, The Tenth International Conference on Applications and Systems of Visual Paradigms
HUSO 2025, The Eleventh International Conference on Human and Social Analytics
BRAININFO 2025, The Tenth International Conference on Neuroscience and Cognitive Brain Information

DataSys 2025 Congress
April 06, 2025 to April 10, 2025 - Valencia, Spain

AICT 2025, The Twenty-Second Advanced International Conference on Telecommunications
ICIW 2025, The Twentieth International Conference on Internet and Web Applications and Services
ICIMP 2025, The Twentieth International Conference on Internet Monitoring and Protection
SMART 2025, The Fourteenth International Conference on Smart Cities, Systems, Devices and Technologies
IMMM 2025, The Fifteenth International Conference on Advances in Information Mining and Management
INFOCOMP 2025, The Fifteenth International Conference on Advanced Communications and Computation
MOBILITY 2025, The Fifteenth International Conference on Mobile Services, Resources, and Users
SPWID 2025, The Eleventh International Conference on Smart Portable, Wearable, Implantable and Disability-oriented Devices and Systems
ACCSE 2025, The Tenth International Conference on Advances in Computation, Communications and Services

ComputationWorld 2025 Congress
April 06, 2025 to April 10, 2025 - Valencia, Spain

SERVICE COMPUTATION 2025, The Seventeenth International Conference on Advanced Service Computing
CLOUD COMPUTING 2025, The Sixteenth International Conference on Cloud Computing, GRIDs, and Virtualization
FUTURE COMPUTING 2025, The Seventeenth International Conference on Future Computational Technologies and Applications
COGNITIVE 2025, The Seventeenth International Conference on Advanced Cognitive Technologies and Applications
ADAPTIVE 2025, The Seventeenth International Conference on Adaptive and Self-Adaptive Systems and Applications
CONTENT 2025, The Seventeenth International Conference on Creative Content Technologies
PATTERNS 2025, The Seventeenth International Conference on Pervasive Patterns and Applications
COMPUTATION TOOLS 2025, The Sixteenth International Conference on Computational Logics, Algebras, Programming, Tools, and Benchmarking
BUSTECH 2025, The Fifteenth International Conference on Business Intelligence and Technology
AIVR 2025, The Second International Conference on Artificial Intelligence and Immersive Virtual Reality

NexComm 2025 Congress
May 18, 2025 to May 22, 2025 - Nice, France

ICDT 2025, The Twentieth International Conference on Digital Telecommunications
SPACOMM 2025, The Seventeenth International Conference on Advances in Satellite and Space Communications
ICN 2025, The Twenty-Fourth International Conference on Networks
ICONS 2025, The Twentieth International Conference on Systems
MMEDIA 2025, The Seventeenth International Conference on Advances in Multimedia
PESARO 2025, The Fifteenth International Conference on Performance, Safety and Robustness in Complex Systems and Applications
CTRQ 2025, The Eighteenth International Conference on Communication Theory, Reliability, and Quality of Service
ALLDATA 2025, The Eleventh International Conference on Big Data, Small Data, Linked Data and Open Data
SOFTENG 2025, The Eleventh International Conference on Advances and Trends in Software Engineering

DigitalWorld 2025 Congress
May 18, 2025 to May 22, 2025 - Nice, France

ICDS 2025, The Nineteenth International Conference on Digital Society
ACHI 2025, The Eighteenth International Conference on Advances in Computer-Human Interactions
GEOProcessing 2025, The Seventeenth International Conference on Advanced Geographic Information Systems, Applications, and Services
eTELEMED 2025, The Seventeenth International Conference on eHealth, Telemedicine, and Social Medicine
eLmL 2025, The Seventeenth International Conference on Mobile, Hybrid, and On-line Learning
eKNOW 2025, The Seventeenth International Conference on Information, Process, and Knowledge Management
ALLSENSORS 2025, The Tenth International Conference on Advances in Sensors, Actuators, Metering and Sensing
SMART ACCESSIBILITY 2025, The Tenth International Conference on Universal Accessibility in the Internet of Things and Smart Environments

IARIA Congress 2025, The 2025 IARIA Annual Congress on Frontiers in Science, Technology, Services, and Applications
July 06, 2025 to July 10, 2025 - Athens, Greece

DigiTech 2025 Congress
July 06, 2025 to July 10, 2025 - Athens, Greece

DIGITAL 2025, Advances on Societal Digital Transformation
IoTAI 2025, The Second International Conference on IoT-AI
GPTMB 2025, The Second International Conference on Generative Pre-trained Transformer Models and Beyond

NexTech 2025 Congress
September 28, 2025 to October 02, 2025 - Lisbon, Portugal

UBICOMM 2025, The Nineteenth International Conference on Mobile Ubiquitous Computing, Systems, Services and Technologies
ADVCOMP 2025, The Nineteenth International Conference on Advanced Engineering Computing and Applications in Sciences
SEMAPRO 2025, The Nineteenth International Conference on Advances in Semantic Processing
AMBIENT 2025, The Fifteenth International Conference on Ambient Computing, Applications, Services and Technologies
EMERGING 2025, The Seventeenth International Conference on Emerging Networks and Systems Intelligence
DATA ANALYTICS 2025, The Fourteenth International Conference on Data Analytics
GLOBAL HEALTH 2025, The Fourteenth International Conference on Global Health Challenges
CYBER 2025, The Tenth International Conference on Cyber-Technologies and Cyber-Systems

SoftNet 2025 Congress
September 28, 2025 to October 02, 2025 - Lisbon, Portugal

ICSEA 2025, The Twentieth International Conference on Software Engineering Advances
ICSNC 2025, The Twentieth International Conference on Systems and Networks Communications
CENTRIC 2025, The Eighteenth International Conference on Advances in Human-oriented and Personalized Mechanisms, Technologies, and Services
VALID 2025, The Seventeenth International Conference on Advances in System Testing and Validation Lifecycle
SIMUL 2025, The Seventeenth International Conference on Advances in System Simulation
SOTICS 2025, The Fifteenth International Conference on Social Media Technologies, Communication, and Informatics
INNOV 2025, The Fourteenth International Conference on Communications, Computation, Networks and Technologies
AISyS 2025, The Second International Conference on AI-based Systems and Services

SocSys 2025 Congress
October 26, 2025 to October 30, 2025 - Barcelona, Spain

PANDEMICS ANALYTICS 2025, International Conference on Pandemics Analytics
PREDICTION SOLUTIONS 2025, International Conference on Prediction Solutions for Technical and Societal Systems
SOCIETY TRENDS 2025, International Conference on Technical Advances and Human Consequences
MODERN SYSTEMS 2025, International Conference of Modern Systems Engineering Solutions
IoTUAV 2025, The Second International Conference on Internet of Things and Unmanned Autonomous Vehicles

NetWare 2025 Congress
October 26, 2025 to October 30, 2025 - Barcelona, Spain

SENSORCOMM 2025, The Nineteenth International Conference on Sensor Technologies and Applications
SENSORDEVICES 2025, The Sixteenth International Conference on Sensor Device Technologies and Applications
SECURWARE 2025, The Nineteenth International Conference on Emerging Security Information, Systems and Technologies
AFIN 2025, The Seventeenth International Conference on Advances in Future Internet
CENICS 2025, The Eighteenth International Conference on Advances in Circuits, Electronics and Micro-electronics
ICQNM 2025, The Nineteenth International Conference on Quantum, Nano/Bio, and Micro Technologies
FASSI 2025, The Eleventh International Conference on Fundamentals and Advances in Software Systems Integration
GREEN 2025, The Tenth International Conference on Green Communications, Computing and Technologies
HEALTHINFO 2025, The Tenth International Conference on Informatics and Assistive Technologies for Health-Care, Medical Support and Wellbeing

TechWorld 2025 Congress
October 26, 2025 to October 30, 2025 - Barcelona, Spain

EXPLAINABILITY 2025, The Second International Conference on Systems Explainability
SCALABILITY 2025, The Second International Conference on Systems Scalability and Expandability
VEHICULAR ANALYTICS 2025, The Second International Conference on Vehicular Systems
WIRELESS ANALYTICS 2025, The Second International Conference on Wireless and Telecommunications

ThinkMind // International Journal On Advances in Life Sciences, volume 4, numbers 3 and 4, 2012 // View article lifsci_v4_n34_2012_7

A Comparison of Automated Keyphrase Extraction Techniques and of Automatic Evaluation vs. Human Evaluation

Authors:
Richard Hussey
Shirley Williams
Richard Mitchell
Ian Field

Keywords: Automated Keyphrase Extraction; C-Value; Comparisons; Document Classification; Human Evaluation; Inverse Document Frequency; NC-Value; Reuters News Corpus; Synonyms; Term Frequency

Abstract:
Keyphrases are added to documents to help identify the areas of interest they contain. However, in a significant proportion of papers author selected keyphrases are not appropriate for the document they accompany: for instance, they can be classificatory rather than explanatory, or they are not updated when the focus of the paper changes. As such, automated methods for improving the use of keyphrases are needed, and various methods have been published. However, each method was evaluated using a different corpus, typically one relevant to the field of study of the method’s authors. This not only makes it difficult to incorporate the useful elements of algorithms in future work, but also makes comparing the results of each method inefficient and ineffective. This paper describes the work undertaken to compare five methods across a common baseline of corpora. The methods chosen were Term Frequency, Inverse Document Frequency, the C-Value, the NC-Value, and a Synonym based approach. These methods were analysed to evaluate performance and quality of results, and to provide a future benchmark. It is shown that Term Frequency and Inverse Document Frequency were the best algorithms, with the Synonym approach following them. Following these findings, a study was undertaken into the value of using human evaluators to judge the outputs. The Synonym method was compared to the original author keyphrases of the Reuters’ News Corpus. The findings show that authors of Reuters’ news articles provide good keyphrases but that more often than not they do not provide any keyphrases.

Pages: 136 to 153

Publication date: December 31, 2012

Published in: journal

ISSN: 1942-2660

SERVICES