Knowledge Matters

Understanding knowledge relationships

Connected Action

Syndicate content
Sociology and the Internet, Social Media, Networks and Mobile Social Software
Updated: 53 min 40 sec ago

Keyword Networks: create word association networks from text with NodeXL (with a macro)

Mon, 30/01/2012 - 09:43

This is the collection of keyword pairs that appeared in two clusters of people who Tweeted about “Paul Ryan”, the Republican Congressman from Wisconsin who delivered the GOP rebuttal to the 2011 United States State of the Union Address.  This network illustrates the ways that certain word pairs appears only or predominantly in one cluster (colored here Red and Blue) or the other. Terms that appeared in both clusters appear as purple.

Social networks are built from relationships between people.  Keyword networks are built from relationships between words and other text strings.  When two words appear in the same message, sentence, or alongside one another ties of different strengths are created.  The networks that result can illuminate the relationships among topics of importance in a collection of messages.

Markus Strohmaier from the Technical University Graz (TUG) along with Claudia Wagner gave us inspiration in a paper:

C. Wagner, M. Strohmaier, The Wisdom in Tweetonomies: Acquiring Latent Conceptual Structures from Social Awareness Streams, Semantic Search 2010 Workshop (SemSearch2010), in conjunction with the 19th International World Wide Web Conference (WWW2010), Raleigh, NC, USA, April 26-30, ACM, 2010. (pdf)

in which they defined a range of ways two words (technically these are strings, they may not really be words) can be associated with one another.  Words could be linked if they are in the same tweet, next to one another, or sequential among other ways to link terms.

NodeXL has not had any features for exploring the networks in texts.  Now with the addition of a new macro from Scott Golder, it is fairly simple to extract pairs of keywords from collection of tweets.  NodeXL’s Twitter importer can optionally include the content of the tweet that included the search term and this column of text can now be processed itself into a new network based on the ways words appear together in tweets.

This feature builds on the work of several people.  Scott Golder from Cornell started the ball rolling with a simple but effective VBA script that allowed others to build and refine the models of what counts as a tie between two words.  Vladimir Barash added several refinements including support for stop word lists to remove common terms.  Scott then picked up the code again and added a set of features for selecting the nature of the graph and making it easier to select the options needed.

The code for the Keyword Network macro is below.

The instructions to use it take a few steps to complete:

1. Create a new workbook, eg a list of tweets or an import from a Twitter search, whatever. Save it as .xlsm. The m is important. This can be an existing NodeXL workbook.

2. Go to Developer -> Macros. Make up a name; it doesn’t matter because it’ll get overwritten. Then press Create. the VBA window will open.

3. In the big text are that says “Sub whatever() End Sub”, select all that text and delete it. Paste in the contents of the text file below.

4. Go to Tools->Reference. Check the checkboxes for “Microsoft Scripting Runtime” and “Microsott VBScript Regular Expressions”. Press OK. Save the file (File->Save) then exit (“Close and return to Microsoft Excel”).

5. Now go to Developer -> Macros. Choose CreateWordNet and press the Run button.

6. It’ll ask you for a worksheet name, a column and a start-row. Then it’ll create a new worksheet with the edgelist in it.

The edge list is not directed (there isn’t really a concept of direction in “co-occurs”) but is weighted. Each pair is weighted by the number of times it appears.

This version also includes options for edge creation.

First, it is now possible to suppress edges of weight=1, which is helpful in getting rid of a lot of garbage.

Second, it is now possible to defined edges by adjacency or co-tweeting. Given a tweet of words “w1 w2 w3″ adjacency will give edges w1-w2 and w2-w3, while co-tweeting will give edges w1-w2, w1-w3, w2-w3.

For edges defined by adjacency, you may choose directed or undirected edges. So a tweet of “Marc Smith Marc” (for example) would generate the weighted directed edges Marc,Smith,1 and Smith,Marc,1 while the sole undirected edge would be Marc,Smith,2. That is, for undirected edges (where ordering doesn’t matter) the words are alphabetized.

An illustrated guide:

Start with a NodeXL workbook with a column of text for either Vertices or Edges (or any column of text).  Here we have the tweet text of a recent Twitter Search Term network query.

Select “Developer” from the Excel menu and create a new Macro.  I take the text of Scott’s macro and paste it here, replacing everything else in the code buffer.

Note the selection of Tools>References> needed to run this macro!  Select Microsoft Scripting Runtime and Microsoft VBScript Regular Expressions 5.5.

Running the Macro:

Scott’s macro presents a series of dialogs to the user (I believe we could do this in a single dialog when we revise):

First we specify the worksheet in the workbook containing the text column to process:

Next we specify the column containing the text to process:

Next we specify the row in which the text starts in that column:

The macro will copy an edge attribute forward if specified (note, I think the *last* attribute for any AB pair is what is reported).

The user is asked if the results should omit the singleton edges, which can be useful.

Edges can be defined as co-sequential or co-cell: ie. ABCD can generate AB, BC, CD or AB, AC, AD, etc.

Users select if they want the edges to include their reciprocal (i.e. generate a “BA” edge for each “AB” edge).

The result is a worksheet with word pair edges and the weights of their frequency of occurrence.

This worksheet can then be imported into a separate NodeXL template using the Import from Open Workbook feature:

This generates a keyword network that looks like this:

We will be working on a revised and updated version of this workflow in the coming months.  For example, this is a possible UI revision:

Create Word Network VB Macro Vdb5//

 

Categories: Network Analysis News

2012 Monthly Online Practitioner Course in Organizational Network Analysis with NodeXL

Sat, 28/01/2012 - 01:09

Interested in applying social network methods to better understand the structure of your business or organization?

In collaboration with Optimice, I will teach a workshop on Social Network Analysis for enterprises, organizations, and businesses using NodeXL.

  • Self-paced e-learning (4 hours)
    • Introduction to Social/Organisational Network Analysis
    • Network patterns and metrics
    • Software tools for network analysis
    • Managing an ONA Project
  • Module 1: Scoping your ONA Project (2 hour virtual session hosted by Patti Anklam)
    • Determining which business problem to solve with ONA
    • Review of case-studies
    • Determining your questions
  • Module 2: Setting up your ONA survey (2 hour virtual session hosted by Cai Kjaer / Laurence Lock Lee)
    • Setting up your survey
    • Working with mailing lists and other lists
    • Creating relationship sets and network questions
    • Previewing and launching the survey
    • Tracking progress and downloading responses
  • Module 3: Visualise networks with NodeXL (2 hour virtual session hosted by Marc Smith)
    • Getting started with NodeXL
    • Calculating and visualizing network metrics
    • Preparing data and filtering
    • Importing data from Social Media tools
    • Clustering and grouping

A number of ONA Practitioner Courses are available to suit the timezones of participants located in the US, Europe and/or Asia-Pacific (but not restricted to these regions):

Course Code Date and Time Time Zone Payment OPC-2012-9-EUR 29 February 2012 to 27 March 2012
(Registration deadline is 15 February 2012)Module 1: 13 March 2012 (10am – 12pm)
Module 2: 20 March 2012 (10am -12pm)
Module 3: 28 March 2012 (3 – 5pm)Self-paced to be completed before starting module 1. Europe – London GMT $US 1,599
OPC-2012-13-APAC 27 March 2012 to 25 April 2012
(Registration deadline is 13 March 2012)Module 1: 11 April 2012 (11am – 1pm)
Module 2: 18 April 2012 (11am – 1pm)
Module 3: 25 April 2012 (11am – 1pm)Self-paced to be completed before starting module 1. Asia-Pacific – Sydney EST $US 1,599
OPC-2012-17-US 25 April 2012 to 22 May 2012
(Registration deadline is 11 April 2012)Module 1: 8 May 2012 (4 – 6pm)
Module 2: 15 May 2012 (4 – 6pm)
Module 3: 22 May 2012 (4 – 6pm)Self-paced to be completed before starting module 1. Americas – New York EST $US 1,599

Categories: Network Analysis News

March 5th Talk at Predictive Analytics World 2012 in San Francisco: Crowd Photography for Social Media

Wed, 04/01/2012 - 18:00

I will speak this March 4th at the 2012 Predictive Analytics World in San Francisco about ” Crowd Photography for Social Media“.

http://www.predictiveanalyticsworld.com/sanfrancisco/2012/speakers.php
http://www.predictiveanalyticsworld.com/sanfrancisco/2012/agenda.php#

Monday @ 5:25-5:45pm

Track 1:
Social Data Case Study:
Social Media Research Foundation

Crowd Photography for Social Media

Crowds of people gather in social media around many products, services, businesses, and events but they can be difficult to see and understand. With new free and open tools, it is now possible to map and measure social media spaces, capturing the sub-groups and key people within and between them. Learn how to capture social media data and quickly generate a visual map of the crowd. With maps in hand, we will discuss ways they guide a journey to the key influencers and concepts in the crowd.

Speaker: Marc Smith, Director, Social Media Research Foundation

Categories: Network Analysis News

March 1 Talk at O’Reilly Strata Conf, Santa Clara, Mapping social media networks (with no coding) using NodeXL

Tue, 03/01/2012 - 18:00


On March 1st I will speak at the 2012 Strata Conference in Santa Clara, California about:

Mapping social media networks (with no coding) using NodeXL

Time: 16:50 on 01 Mar 2012.

Session type: 40 minute presentation

Topics: Visualization & Interface

Description: Maps of the complex connections that form when people link, like, reply, rate, review, favorite, friend, follow, edit, and mention one another can reveal important trends. It is possible to create network maps with free and open tools that identify key people and sub-groups in any social media population with just a few key clicks. Can you make a pie chart? You can now make a network chart.

Abstract: Networks are a data structure common found across all social media services that allow populations to author collections of connections. The Social Media Research Foundation’s (http://www.smrfoundation.org) free and open NodeXL project (http://nodexl.codeplex.com) makes analysis of social media networks accessible to most users of the Excel spreadsheet application. With NodeXL, Networks become as easy to create as pie charts. Applying the tool to a range of social media networks has already revealed the variations present in online social spaces. A review of the tool and images of Twitter, flickr, YouTube, and email networks will be presented.

We now live in a sea of tweets, posts, blogs, and updates coming from a significant fraction of the people in the connected world. Our personal and professional relationships are now made up as much of texts, emails, phone calls, photos, videos, documents, slides, and game play as by face-to-face interactions. Social media can be a bewildering stream of comments, a daunting fire hose of content. With better tools and a few key concepts from the social sciences, the social media swarm of favorites, comments, tags, likes, ratings, and links can be brought into clearer focus to reveal key people, topics and sub-communities. As more social interactions move through machine-readable data sets new insights and illustrations of human relationships and organizations become possible. But new forms of data require new tools to collect, analyze, and communicate insights.

Categories: Network Analysis News

Feb 23 Talk at Personal Digital Archiving 2012 at the Internet Archive, San Francisco: Arc-chiving: saving social links for study

Mon, 02/01/2012 - 18:00

I will present a talk at Personal Digital Archiving 2012 titled “Arc-chiving: saving social links for study“.

The conference will be held on Thursday-Friday, February 23-24, 2012 at the Internet Archive in San Francisco.

News and updates on the conference will be posted at the conference web site, http://personalarchiving.com.

My talk this year will focus on collecting and analyzing connections between digital objects (like users) and the insights these tools make possible.

Abstract: While digital content is archived in various ways, the “arcs” or links among people and their digital objects are not systematically saved. Efforts to store social media often overlooks including data about collections of connections. The Social Media Research Foundation is dedicated to open tools, open data, and open scholarship related to social media. It is producing tools that can collect, analyze and upload social media data, including the arcs that link people and objects. Using the free and open NodeXL application, users can collect, analyze and visualize complex networks and then upload the data to a growing archive on the web at NodeXLGraphGallery.org. As the group of researchers grows, an archive is being assembled to provide researchers around the world with the data about social media needed to understand the ways computer mediated communication tools shape society.

My talk at the 2011 Personal Digital Archiving conference is available through the Internet Archive’s video service:

Categories: Network Analysis News

January 19-20, 2012: Syracuse University – NodeXL Social Network Analysis Workshop

Sun, 01/01/2012 - 18:00


I will speak and lead a workshop on social media network analysis at Syracuse University on the 19th and 20th of January, 2012.

Ines Mergel is my host.  Prof. Mergel is Assistant Professor of Public Administration, Department of Public Administration and International Affairs, and a Senior Research Associate at the Center for Technology and Information Policy at the Maxwell School of Citizenship and Public Affairs, Syracuse University, NY.

I will speak about the patterns we are finding in the data collected and analyzed by NodeXL.

Categories: Network Analysis News

December 15, 2011 – @IFTF NodeXL & Gephi – Social Media Mapping Open House

Wed, 07/12/2011 - 21:00

Online Ticketing powered by Eventbrite NodeXL Event at IFTF, Thursday, December 15, 2011
Along with the Social Media Research Foundation, the Institute for the Future is co-hosting a meetup for those interested in mapping social media networks. Users of tools like NodeXL and Gephi (among others) are welcome to join us for an evening devoted to collecting, analyzing, and visualizing social media networks. Thursday, December 15th at 6pm at the Institute for the Future‘s offices in Palo Alto at 124 University Avenue, 2nd floor.

Your email:

 

Online Ticketing for NodeXL/Social Media Network Mapping powered by Eventbrite
Categories: Network Analysis News

November 28 and 30: Mastering Social Media – Cape Town and Johannesburg, South Africa

Sat, 19/11/2011 - 00:56

Mastering Social Media 2011 is a workshop scheduled for November 28 and 30 in Cape Town and Johannesburg, South Africa.

My partner, Walter Pike, is a Marketing Maven and Founder of PiKE | New Marketing (www.pike.co.za) and the founder of the Digital Academy www.digitalacademy.co.za
He blogs at walterpike.com.

Mastering Social Media will give you practical tools on how to plan, execute and monitor your social media campaigns. Discussions will lead you through the introduction to social media marketing, understanding community dynamics, mapping social networks and applying network insights to your goals.
Brand Managers, Marketing Managers, Advertising Agencies, Digital Agencies, and PR Agencies are likely to find the day useful.

Venues and Dates

Cape Town

28 November 2011
Protea Hotel
Breakwater Lodge,
Waterfront

Johannesburg

30 November 2011
Gordon Institute
of Business Science,
Illovo

Pictures:

Categories: Network Analysis News

Contrasting teaparty and occupywallstreet twitter networks

Thu, 17/11/2011 - 15:56

Both teaparty and occupywallstreet are actively discussed in twitter.

This map of connections among people who tweeted Teaparty starts on 11/15/2011 14:22 UTC and ends on 11/15/2011 17:23, a total of 3 hours and 1 minute of traffic.

The Teaparty data set contained 1,533 tweets, replies and mentions.
Blue edges are connections created by replies and mentions. Grey lines are follows relationships.

Top most between users:
@ronpaul
@michellemalkin
@christopherhull
@theteaparty_net
@capaction
@thedailyedge
@bill1phd
@dbargen
@gulagbound
@rightcandidates

Graph Metric: Value
Graph Type: Directed
Vertices: 659
Unique Edges: 8808
Edges With Duplicates: 1423
Total Edges: 10231
Self-Loops: 1084
Connected Components: 49
Single-Vertex Connected Components: 44
Maximum Vertices in a Connected Component: 606
Maximum Edges in a Connected Component: 10148
Maximum Geodesic Distance (Diameter): 6
Average Geodesic Distance:2.693965
Graph Density: 0.02036797
NodeXL Version: 1.0.1.193

The major clusters are composed of teaparty supporters. The center bottom cluster are teaparty critics.

This map of the connections among people who tweeted Occupywallstreet starts on 11/15/2011 23:08 and ends on 11/15/2011 23:34 UTC, a total of 26 minutes of traffic.

Occupywallstreet 1,370 tweets, replies and mentions
Blue edges are connections created by replies and mentions. Grey lines are follows relationships.

Top most between users:
@occupywallst
@mmflint
@nyclu
@allisonkilkenny
@andrewbreitbart
@operationleaks
@occupydenver
@theatlantic
@usgeneralstrike
@rt_com

Graph Metric: Value
Graph Type: Directed
Vertices: 1000
Unique Edges: 3546
Edges With Duplicates: 826
Total Edges: 4372
Self-Loops: 794
Connected Components: 241
Single-Vertex Connected Components: 230
Maximum Vertices in a Connected Component: 747
Maximum Edges in a Connected Component: 3998
Maximum Geodesic Distance (Diameter): 7
Average Geodesic Distance: 2.65438
Graph Density: 0.003246246
NodeXL Version: 1.0.1.194

Some notable contrasts:
Teaparty Graph Density: 0.002652645
Occupywallstreet Graph Density: 0.02036797 – significantly lower levels of interconnection
Teaparty: Single-Vertex Connected Components 44 of 1000
Occupywallstreet: Single-Vertex Connected Components 283 of 1000

Many more “isolates” (Single-Vertex Connected Components) in Occupywallstreet.
Many more hubs, and more retweeting activity in Occupywallstreet.

The difference in duration of these data sets illustrates the relative speed of content creation in the topics. The data sets are commensurable in that they are both the result of a single query against the Twitter search API. So both maps are the results of charting connections among the authors of the last 1500 tweets, how ever long that takes to create.

Categories: Network Analysis News

November 8, 2011: University of Manchester, NodeXL SNA / Social Media Workshop

Sat, 05/11/2011 - 18:00

Methodologies for Web and Social Media Data Analysis in Social Science and Policy Research

CCSR Short Course
Social Media Network Analysis using NodeXL

November 9th  9.00 am – 5.30 pm.

Marc Smith
Social Media Research Foundation

http://www.smrfoundation.org

Course Summary: Networks are everywhere in the natural and social world.  New tools are making the task of getting, processing, measuring, visualizing and gaining insights from network data sets easier than ever before.  The rise of social media offers a new and abundant source of network data.  The NodeXL project (http://www.codeplex.com/nodexl) from the Social Media Research Foundation (http://www.smrfoundation.org) offers a free and open path to network overview, discovery and exploration within the context of the familiar Excel spreadsheet.  In this short course we will introduce the NodeXL application and review the landscape of networks, social networks, and social media networks. Using the tool, non-programmers can quickly select a network of interest from various social media and other data sources.  Twitter, flickr, YouTube, email, the World Wide Web, and Facebook data can be quickly imported into NodeXL.  Networks can then be analyzed and visualized using tools similar to those used to create a pie chart or line graph [1].  As the challenge and cost of network acquisition and analysis drops, abundant data sets are being generated that document the range of variation of diverse sources of social media.  How many different kinds of Twitter hashtags exist?  Using snapshots of hundreds of hashtags collected over a year, it is now possible to build rough taxonomies of this kind of social media.  NodeXL provides access to a web gallery of data [2], allowing users to browse existing data sets and upload their own as well. Borrowing the vision of telescope arrays that create composite images far better than any individual instrument could, the Social Media Research Foundation envisions an user generated archive that provides a research asset that supports the collective effort to understand the structures and dynamics of network data.

[1] NodeXL Image Gallery: http://www.flickr.com/photos/marc_smith/sets/72157622437066929/
[2] NodeXL Graph Gallery: http://nodexlgraphgallery.org

Course Objectives
After this course, participants will:

(1) Be familiar with the basic concepts of networks, social networks and social media networks
(2) Understand the core features of the NodeXL network analysis and visualization tool
(3) Review images and data sets for dozens of different social media networks
(4) Learn to identify general types of social media networks along with the key people and groups within them

Target Audience
This course is suitable for people with some experience or interest in social media, social science, or social network analysis.  It is particularly appropriate for those who are involved in studying social structures and their change over time.

Laboratory and IT requirements:
Participants will need access to a computer connected to the Internet  and will be supplied with the free NodeXL software.

Suggested Reading
Analyzing social media networks with NodeXL: Insights from a connected world
http://www.amazon.com/gp/product/0123822297?ie=UTF8&tag=conneactio-20&linkCode=as2&camp=1789&creative=390957&creativeASIN=0123822297

EventGraphs:
http://www.cs.umd.edu/localphp/hcil/tech-reports-search.php?number=2010-13

Visualizing the Signatures of Social Roles in Online Discussion Groups:
http://www.cmu.edu/joss/content/articles/volume8/Welser/

Discussion catalysts in online political discussions: Content importers and conversation starters
http://www.connectedaction.net/wp-content/uploads/2009/08/2009-JCMC-Discussion-Catalysts-Himelboim-and-Smith.pdf

Analyzing (Social Media) Networks with NodeXL
http://www.connectedaction.net/wp-content/uploads/2009/08/2009-CT-NodeXL-and-Social-Queries-a-social-media-network-analysis-toolkit.pdf

Whiter the experts: Social affordances and the cultivation of experts in community Q&A systems
http://www.connectedaction.net/wp-content/uploads/2009/08/2009-Social-Computing-Whither-the-Experts.pdf

First steps to NetViz Nirvana: evaluating social network analysis with NodeXL
http://www.cs.umd.edu/~cdunne/pubs/Bonsignore09Firststepsto.pdf

Categories: Network Analysis News

Event: LocalSocialSummit11 and Social Media Research Foundation Reception November 10th, 2011

Fri, 28/10/2011 - 18:00


On November 9th and 10th, 2011 the LocalSocialSummit11 will be held in London.

On Thursday morning, November 10th, Doctor Bernie Hogan from the Oxford Internet Institute and member of the Social Media Research Foundation will speak on the topic:

Insights: Social Network Analysis On Facebook Data, with a local twist

Thursday at 3:45 I will speak at the event on:

Charting Collections Of Connections In Social Media

Following the conference from 5 to 6pm will be a reception for the Social Media Research Foundation.

Please join us for refreshments and a discussion of the ways social media analysis tools can guide your engagement with stakeholders around the world.

The event is held at wallacespace:

22 Duke’s Road, London, WC1H 9PN
t +44(0)20 7395 1265
f +44 (0)20 7836 9591
e ask@wallacespace.com

22 Duke’s Road is only minutes walk away from Euston, St Pancras International, King’s Cross and Russell Square Tube.

Registration form for attending the event is below….

Categories: Network Analysis News

November 3, 2011: Seoul, South Korea – International Symposium on Convergence Technology (ConTech 2011)

Sun, 23/10/2011 - 05:00

I will speak at the International Symposium on Convergence Technology (ConTech 2011) – Smart & Humane World – on November 3rd in Seoul, South Korea.

Date: 2011 November 3 (Thurs)
Place: COEX Grand Ballroom, Seoul, Korea
Organized by Advanced Institutes of Convergence Technologies (AICT), Seoul National University (SNU)
In Cooperation with Ministry of Knowledge Economy, Ministry of Education, Science and Technology, National Research Foundation of Korea, Graduate School of Convergence Science and Technology (GSCST)
Symposium Chair : Choi, Yanghee (President, AICT)

Program
09:00~09:30 Registration
09:30~10:00 Opening Ceremony
Plenary Session : Smart & Humane World through Convergence
10:00~10:40 Speaker (TBD)
10:50~11:30 Speaker (TBD)
11:30~13:00 Lunch
Session 1 : Bio Convergence (Chair : Prof. Kim, Sunghoon)
Session 2 : IT Convergence (Chair : Dr. Lee, Manjai)
Session 3 : Appropriate Technology (Chair : Prof. Kang, Namjun)
13:00~15:00 Scott A. Strobel (Professor, Yale University)
Speaker Kevin Kim (Professor, University of Illinois)
Speaker Masaru Kitsuregawa (Professor, Tokyo University)
Speaker Haesun Park (Professor, Georgia Institute of Technology)
Speaker Marc Smith (Connected Action)
Speaker Sang-goo Lee (Professor, Seoul National University)
Speaker Haklae Kim (Samsung)
Speaker Raghu Ramakrishnan (Yahoo)

My slides: 20111103 con tech2011-marc smith

View more presentations from Marc Smith.

I will also visit Professor Han Woo Park at YeungNam University (Wikipedia) in Daegu, South Korea to meet with his students in the Webometrics Institute program.

This will be my second trip to Korea, I was there last year for a related event.  Pictures after the jump:

Categories: Network Analysis News

Copyright © 2004 -2012 Knowledge Matters™ - all rights reserved
The Webpages and Occasional Blog of Graham Durant-Law
E-mail: graham@durantlaw.info

Clicky