NexTech 2021 Congress
October 03, 2021 to October 07, 2021 - Barcelona, Spain

  • UBICOMM 2021, The Fifteenth International Conference on Mobile Ubiquitous Computing, Systems, Services and Technologies
  • ADVCOMP 2021, The Fifteenth International Conference on Advanced Engineering Computing and Applications in Sciences
  • SEMAPRO 2021, The Fifteenth International Conference on Advances in Semantic Processing
  • AMBIENT 2021, The Eleventh International Conference on Ambient Computing, Applications, Services and Technologies
  • EMERGING 2021, The Thirteenth International Conference on Emerging Networks and Systems Intelligence
  • DATA ANALYTICS 2021, The Tenth International Conference on Data Analytics
  • GLOBAL HEALTH 2021, The Tenth International Conference on Global Health Challenges
  • CYBER 2021, The Sixth International Conference on Cyber-Technologies and Cyber-Systems

SoftNet 2021 Congress
October 03, 2021 to October 07, 2021 - Barcelona, Spain

  • ICSEA 2021, The Sixteenth International Conference on Software Engineering Advances
  • ICSNC 2021, The Sixteenth International Conference on Systems and Networks Communications
  • CENTRIC 2021, The Fourteenth International Conference on Advances in Human-oriented and Personalized Mechanisms, Technologies, and Services
  • VALID 2021, The Thirteenth International Conference on Advances in System Testing and Validation Lifecycle
  • SIMUL 2021, The Thirteenth International Conference on Advances in System Simulation
  • SOTICS 2021, The Eleventh International Conference on Social Media Technologies, Communication, and Informatics
  • INNOV 2021, The Tenth International Conference on Communications, Computation, Networks and Technologies
  • HEALTHINFO 2021, The Sixth International Conference on Informatics and Assistive Technologies for Health-Care, Medical Support and Wellbeing

NetWare 2021 Congress
November 14, 2021 to November 18, 2021 - Athens, Greece

  • SENSORCOMM 2021, The Fifteenth International Conference on Sensor Technologies and Applications
  • SENSORDEVICES 2021, The Twelfth International Conference on Sensor Device Technologies and Applications
  • SECURWARE 2021, The Fifteenth International Conference on Emerging Security Information, Systems and Technologies
  • AFIN 2021, The Thirteenth International Conference on Advances in Future Internet
  • CENICS 2021, The Fourteenth International Conference on Advances in Circuits, Electronics and Micro-electronics
  • ICQNM 2021, The Fifteenth International Conference on Quantum, Nano/Bio, and Micro Technologies
  • FASSI 2021, The Seventh International Conference on Fundamentals and Advances in Software Systems Integration
  • GREEN 2021, The Sixth International Conference on Green Communications, Computing and Technologies

TrendNews 2021 Congress
November 14, 2021 to November 18, 2021 - Athens, Greece

  • CORETA 2021, Advances on Core Technologies and Applications
  • DIGITAL 2021, Advances on Societal Digital Transformation

 


ThinkMind // HUSO 2018, The Fourth International Conference on Human and Social Analytics // View article huso_2018_2_10_80009


Fast Extraction of Statistically Relevant Descriptor Words for Social Media Communities

Authors:
Arces A. Talavera
Arnulfo P. Azcarraga

Keywords: Random Projection; Dimensionality Reduction; Social Media; Text Analytics

Abstract:
Social media communities can be characterized by descriptor words that are frequently used by its community members but are less often used in other communities. These can be extracted by computing a descriptor index and choosing those words with the highest index. The novel descriptor index proposed here is based on the z-score that measures the frequency of a word in a given community relative to the frequency of the word in all the communities combined, using a statistical standard error. The measure based on z-scores is validated by comparing the words extracted when using z-scores with the words extracted using the fairly popular Term Frequency-Inverse Document Frequency (TF-IDF) and the Lagus method. Once it is established that z-scores can be used to extract descriptor words, the next hurdle is to reduce the dimensionality of the vector space model, where each word that appears in any of the social community messages would constitute one dimension in the vector space model. The solution explored here, used in tandem with z-scores as descriptor index measure, is the Random Projection method. In this dimensionality reduction method, more than 40,000 unique words (dimensions) are randomly projected to as few as 400 dimensions (99% reduction) and yet the proposed scheme still extracts essentially the same descriptor words for each community. To evaluate the combined use of z-scores and Random Projection, and to determine some suitable parameter values for the proper execution of the Random Projection method, 10 communities on Facebook were selected. Despite using only 1% of the original number of dimensions, there is a match of 85% of the top 10 descriptor words between those extracted with all 40,000 dimensions compared to those extracted with only 400.

Pages: 24 to 30

Copyright: Copyright (c) IARIA, 2018

Publication date: June 24, 2018

Published in: conference

ISSN: 2519-8351

ISBN: 978-1-61208-648-4

Location: Venice, Italy

Dates: from June 24, 2018 to June 28, 2018

SERVICES CONTACT
2010 - 2017 © ThinkMind. All rights reserved.
Read Terms of Service and Privacy Policy.