Knowledge Matters

Understanding knowledge relationships

Zero Intelligent Agents

Syndicate content
How can the social sciences, mathematics and computer science combine to affect national security policy?
Updated: 21 hours 33 min ago

Local R User Group Panel from useR! 2010 (Video)

Sun, 25/07/2010 - 02:13

As I mentioned last week, I will be hosting videos of several of the keynote speakers from this year’s useR! 2010 conference at the video Rchive. As it happens, the first video I was able to upload was the panel discussion we held on starting local R user groups. I have uploaded the video, which is also embedded below (after the jump).

I was joined on the panel by an illustrious assembly of R community members, which included:

Categories: Network Analysis News

userR! 2010 Videos to be Hosted at Rchive

Tue, 20/07/2010 - 22:57

Today, I am packing up the car and heading south to my old home, Washington, DC, for the useR! 2010 conference, which is being held at the National Institute of Standards and Technology. Incidentally, where I was an intern in the Information Technology Lab during college.

If you are not able to make the trip to Gaithersburg, MD; fear not, through the hard work of Szilard Pafka (organizer of the LA R user’s group) and Katherine Mullen, coordinator of useR!, I will be hosting many of the conference’s keynotes, lectures and several of the panel discussions at the Video Rchive. It may be several days before all of the videos are uploaded, but be sure to check back at the Rchive next week for any updates.

If you are attending useR!, do try to make it to our panel discussion on starting a local R users group in your area (Thursday, 3:25pm in the Red Room). The panel includes several prominent charactersmembers of the R community, and should be a very entertaining and informative discussion.

Hope to see you there!

Categories: Network Analysis News

Anatomy of a Life-Milestone Announcement on Facebook

Fri, 16/07/2010 - 09:08

As I have mentioned, I recently returned for a lovely trip to Europe. While on vacation my brilliant, beautiful, funny, and all around perfect girlfriend accepted my invitation to be my wife.

Pause for shared overwhelming feeling of joy…

While I am still basking in the glow of being the luckiest man on Earth, as a true data geek I could not let this opportunity to analyze a novel data set escape me.

One of the most fascinating aspects of social media is how it has changed the way life-milestones, like getting engaged, are announced. Facebook’s ‘Relationship Status’ feature allows users to inform all of their friends at once about these large life changes. Such announcements are often met with a sudden deluge of comments and wall postings, so I thought: wouldn’t it be interesting to collect this data and analyze the frequency of decay of these postings?

Though I am not on Facebook, my fiancée is, and with a little help from Facebook’s API and R’s ggplot2 library I was able to collect and analyze this data. Below I present (with permission) the data on from my fiancée’s wall for the first 48 hours after she changed her Relationship Status from ‘In a Relationship’ to ‘Engaged’.

Interesting. A huge spike in the first hour, a drop and flattening over the next two hours, and finally another large drop with sporadic spikes. Women dominate the initial posts, while the gender difference vanish as posting frequency decreased more late-comers make posts.

Of course, all of this is secondary to that fact that—Kristen—I love you and I cannot wait to spend the rest of my life with you!

Categories: Network Analysis News

What Will ‘Data Science’ Teach Us?

Fri, 16/07/2010 - 01:30

If the level of online discourse is a good indicator of whether a topic has penetrated the collective nerd consciousness, then the notion of a burgeoning “data science” discipline has taken hold. A few weeks ago I discussed where to draw the line on this idea, but recently I again begann thinking about the idea and term more critically. Yesterday, I had a wonderful discussion with a brilliant member of the data community here in New York, which focused on the delicate balance between keeping a human-friendly face on mass quantities of data—something the data scientists are meant to do—and having this new discipline make formidable contributions to our general understand of human behavior.

That is, up to this point, many of the great evangelists of data science have focused on telling stories with data. Science, however, is not about story telling, but about discovery. Perhaps I am particularly cautious of the suffix “science” because of the awkward self-consciousness the word has imbued in my own discipline. At its roots, political science was a discipline that sought to construct narratives; equal parts history, philosophy and personal experience. The name “political science,” therefore, brought the ire of the “hard science” community, as they felt (perhaps with reason) that the word had been appended to the title erroneously, as there were no identifiably scientific aspects to the endeavor. While my discipline has come a long way in its application of the scientific method, and today can much more accurately be referred to as a science, there continues to be a delicate balance between discovery and story telling. What, then, can the data science community learn from this experience?

Broadly, all disciplines are measured by their contributions to our understanding of the universe. Data science—by design—is the product of measured human activity, and therefore should seek to provide new insight into human behavior. Unfortunately, the current focus of many of the community’s members has been a self-congratulatory appraisal of the tools that have been developed to allow for this large-scale measurement and recording. To be a successful discipline, however, the focus must move away from tools and toward questions.

To paraphrase a famous nerd, with great data comes great responsibility; so to begin, the data science community must ask: what questions do mass quantities of measured human existence allow us to address that were never previously possible? Just the thought should be enough to inspire some to begin writing research proposal, but in effort to contribute to this discussion here are a few things I hope data science will teach us:

  • How do online discourses manifest in offline behavior? – I study terrorism, and one lingering problem in this area is the threat from so-called online radicalization. That is, to what extent does information obtained online influence individuals to join radical organization or commit acts of terror? This question, however, applies to many other areas, such as voting and purchasing decisions. As our ability to analyze these discourses increases, perhaps data science will provide some answers.
  • How do we reach the “tipping point”? – Malcom Gladwell did well to introduce the idea of the tipping point, but since then we have learned too little about how these culminations occur, and what—if any—are the consistent behavioral features that lead to them. Often, these events occur online, where data science may be able to analyze the tracks that lead to these phase shifts.
  • What are the ethical limits of personal data analysis? – The rise of massive stores of personal data online has been a boon to the data science community, but it has not come without some trepidation. With intimate knowledge of the tools and processes used to capture and analyze this information, this community is uniquely positioned to contribute to a discussion of the ethical limits of their own work.
  • Do we really consumer things differently? – Everyday people make decisions about what they will consume; in terms of purchases, food, information, etc., and conventional wisdom states that these decisions are largely a function of birth cohorts, geography, educations, etc. Is this really case? The vast amount of consummatory data being generated online may be able to help us understand the most significant indicators of these differences.
  • Can more/better data explain rational irrationality? – Today we learned some of the limits of behavioral economics, which have helped explain instances of seemingly irrational behavior. As the op-ed points out, however, there continue to be many questions that discipline fails to explain. Perhaps, then, the explanations of these anomalies can be borne out of data.

I welcome your own thoughts on what data science will teach us, and hope you will share them. Personally, I think this discipline has the potential to generate vast amounts of knowledge, but must be cautious to not loose sight of the question in the sea of information.

Categories: Network Analysis News

Sunbelt XXX, and Other Loose Ends

Wed, 14/07/2010 - 00:54

I have been back in the United States for about a week, but only now have found some time to get back to blogging. As I stated before my departure, the primary reason for my trip to Europe was to participate in the 30th meeting of the International Network of Social Network Analysts.

First, Aric Hagberg and I gave a workshop on using NetworkX to hack social networks. Given that it was the first time we had ever given this workshop, I was pleased with how well it went and the positive reception we received from the audience. It was encouraging to see so many researchers from academia, private corporations and the government interested in learning the mechanics of generating network data and analyzing it. That said, Sunbelt did reinforce my previous observation that academic researchers have a lot of catching up to do in terms of tools. There were several talks that indicated an unfortunate lack of technical expertise, which could easily be overcome with a minimal level of effort. Thankfully, conferences like Sunbelt allow for a people with many different talents to mix together and exchange ideas—and this norm was on display in Riva del Garda.Next, the panel session were a good mix of methodological research and substantive application. Despite some serious logistical impediments (heat, overcrowding, etc.), I was able to see some very interesting talks, and received some interesting feedback on my own research. Given the idyllic location of the conference, there were only a few times where most of the conference attendees were in the same place, which detracted from the networking opportunities—a somewhat shameful consequence, given it was a conference on social networking. The best interactions I had occurred while strolling the poster presentations. The biggest winners of the poster sessions were the dynamic duo of Mathieu Bastian and Sebastien Heymann, the purveyors of Gephi. We discussed potential future opportunities to interface Gephi with NetworkX, but Gephi itself was a huge hit at Sunbelt and I expect to see its beautiful graphs on display in future network papers.

It appears that Sunbelt XXXI will be in St. Pete Beach, Florida, and I hope to see an even larger crowd there in 2011. By way of closing the book on one networks conference and opening it on another, I would also like to pass on an announcement I was sent about an upcoming conference on network visualization. In October, Harvard will be hosting the Connecting the Dots symposium, which will feature keynote talks from Alessandro Vespignani and Ben Fry. The conference itself looks interesting, but there is also a very inclusive call for presenters:

In addition to the keynotes, we are soliciting proposals for guest speakers to give short 20-minute presentations. We are interested in any presentation that includes the visual depiction and/or visual analysis of network data as a central theme. Potential topics include but are not limited to network visualization algorithms, network visualization software, network communities and visualization, other network theory or analysis, and artistic projects centering on network visualization. Given the cross-disciplinary nature of network science, we welcome applications from researchers in any scientific discipline.

Seems like a great opportunity for anyone studying network and network representation. I expect to see my friends from Gephi there, and hope others will submit.

Photo: napolipuntoacapo.it

Categories: Network Analysis News

Materials from NetworkX Workshop

Thu, 01/07/2010 - 18:27

On Tuesday Aric Hagberg and I presented a half-day workshop on NetworkX titled “Hacking Social Networks with the Python Programming Language,” at Sunbelt XXX. I tweeted this on Tuesday, but for those that were unable to make the plane, train and bus trip required to reach Riva del Garda, Italy to attend Sunbelt, Aric and I have posted all of the workshop materials (slides, LaTeX, code, etc.) to Github.

Please feel free to download, play, and reuse liberally. Also, if you have any questions, please drop me a line.

Categories: Network Analysis News

Extended leave of absence

Thu, 17/06/2010 - 00:41

No sooner do I post my (controversial?) list of reasons why grad students blog, then I must take an extended leave of absence. As others have rightly pointed out, course and academic research comes first. In a few days I will be heading to Europe for, among other things, Sunbelt XXX to hobnob with the world’s foremost network theorists, and present Aric Hagberg and my workshop on NetworkX. If you are going to be at Sunbelt, please drop me a line.

Blogging we return sometime mid-July, so farewell until then…

Categories: Network Analysis News

Gratuitous and Rather Useless World Cup Post

Sat, 12/06/2010 - 02:48

I am not a soccer fan, I prefer the American version of football. That said, I am admittedly actively following the 2010 World Cup. While watching the opening match between South Africa and Mexico, I thought it would be fun to ask the question, “Do free countries produce better football teams?”

So, I quickly combined the FIFA World Rankings and Press Freedom Index for all of the countries participating in the 2010 World Cup, and came up with this:

Note, the Press Freedom Index goes from 0.00 (most free) to 115.00 (least free), and the FIFA point totals increase as the teams overall quality improves, so we might expect a negative relationship. Also, I removed North Korea from the data set because it was such an extreme outlier on the press freedom dimension. So, we find basically a null result. There is a slight negative relationship, but it is essentially random.

The peak and valley of the smoothed fit curve is a bit interesting. For the worst teams, as freedom goes down the quality of the teams go up, but around a freedom score of 20, that relations inverses and as the quality of the teams increases so does the level of freedom—until we reach the best team, Brazil and Spain.

Categories: Network Analysis News

Learning About Network Theory

Thu, 10/06/2010 - 00:52

Over the past several weeks I have had to pleasure of co-authoring a lengthy introduction to network theory with Bradford Cross, co-founder and head of research for FlightCaster (one of my top five favorite iPhone apps). After many ebbs and flows of writing, it is finally up over at Brad’s excellent Measuring Measures blog. Here’s Brad’s motivation for the post:

I received a lot of great feedback to my first and second posts on learning about machine learning. Part of that feedback was that people wanted to see similar posts for other topics. The most asked about topic was Network Theory, no doubt due to a massive recent increase in interest in social networks and social network analysis (SNA).

In this post, Drew Conway (a PhD Candidate at New York University, studying networks) and I will walk you through a guide that we hope may be of use to others trying to find their way through network theory

So, go check out Learning About Network Theory and let Brad and I know what you think. Also, be sure to up-vote it on Hacker News and /r/statistics.

Categories: Network Analysis News

ZIA Welcomes the Terrible Twos with a New Look

Wed, 09/06/2010 - 14:30

As you may have already noticed (I went live a bit early during the testing), rather than add a new bell or whistle to ZIA to celebrate its two year birthday, I decided to give the site a slight redesign! I was getting tired of the three things: the overall crowding of text caused by the three column layout, the size of the top banner, and the clashing of colors throughout the site. So, I decided to kill the left column, redesign the banner, and go with a green, grey and black motif for the introductory colors. What do you think? Here is the side by side comparison:


As I said yesterday, part of the fun in blogging is getting familiar with the technical details of web design, and this experiment continues. I welcome all thoughts and criticism of the new look. Before I go off and gorge myself on cake and…work—as I did last year—here are posts from the previous year that I thought should have received a bit more love than they did:

Categories: Network Analysis News

Ten Reasons Why Grad Students Should Blog

Wed, 09/06/2010 - 05:28

Tomorrow is the two year anniversary of ZIA. In keeping with the tradition started last year there are some changes afoot for the website itself, but I will keep those under wraps until the actual birthday (wouldn’t want to open your gift early, yes?). Rather, today I would like to be more reflective. A few days ago, as the two year anniversary approached, I began thinking back on not just what I accomplished this year at ZIA but also how much the blog has provided me. Upon this reflection it occurred to me that this endeavor has been incredibly beneficial. As such, it seemed logical to me that this would also be the case for many other grad students; which was immediately triggered the question: then why do so few do it?

There are a few notable exceptions, but for the most part it is the faculty that partake in blogging. Perhaps this is simply a function of my particular discipline, in which I—admittedly—do most of my blog reading. I have, however, been to many corners of the blogosphere, and at least within my N=1 sample this appears to be a common phenomenon. I welcome others to show me that this is not the case in other disciplines, but even so, more grad students should be publishing online.

As I thought longer about the vacant state of grad student blogging I wondered if it could be explained as a “they don’t know what they don’t know” situation. Perhaps by standing from the outside looking in, my fellow grad students simply do not know all of the benefits that can come from participating in an online discourse. To remedy this informational problem, and in an attempt to encourage more grad students to begin blogging, I present (in no particular order) my ten reasons why grad students should blog:

  1. You actually have something to say – This is perhaps the best reason why you should be blogging. One of the most frustrating characteristics of the blogosphere is its inherent infinitesimal signal-to-noise ratio. As a grad student, especially those in PhD programs, you have already been deemed qualified to participate in the discussion at a very high level by a panel of distinguished scholars, i.e., the admission committee. Why keep all that smart analysis to yourself?
  2. Honing your craft – At its core, graduate school is preparing you to be an active member of the academy. While we may struggle through our preliminary methods classes as we build our technical expertise, it is the application of these tools to interesting research questions that builds successful careers. A blog provides a wonderful lab for experimentation, both in terms of the technical application of methods and toying with research questions in sub-fields of your discipline you may not have otherwise tested.
  3. Establishing an identity – If you are in graduate school to be the “best kept secret in academia,” you are making a fatal mistake. As with any other job market, getting the preverbal foot in the door for a job talk at a university is a critical first step. As a graduate student it can be incredibly difficult to navigate the sea of senior faculty, their research agendas, and how that fits into your career goals. Having a blog provides you an independent beacon upon which you can broadcast your own ideas. Consider this, ZIA is but a tiny blip within the academic blogosphere, but in the last year my CV has been downloaded by over 875 unique visitors, or more than twice a day.
  4. Extending your network outside of academia – Though it is often hard to imagine this from within the cozy confines of the ivory tower, there are a lot of brilliant people outside of academia interested in exactly the same things you are. The difficulty, however, is connecting with them. The Internet is a powerful networking device, and if you are willing to put yourself out there these people will seek you out (Kevin Costner knows what I am talking about). Your bonafides are already largely taken care of (see point reason #1), now you have to impress the Internet with your brilliant musings.
  5. The faculty in your department will not think less of you – I have been asked several times by fellow grad students some form of the following question: “Weren’t you worried what your advisors would think about your blog?” Of course, I never even thought about this question, as I started blogging before actually matriculating to NYU (note that ZIA’s anniversary is early-June and most universities begin the Fall semester in late August). This, however, is besides the point. No, I was never worried about what my advisors would think. The things I write about on ZIA are exactly the same kinds of things I say in seminar and write for term papers (in fact, these ideas often flow both ways). Furthermore, most of those faculty who might actually view blogging in a negative light are also those most unlikely to ever read your blog.
  6. Instant and broad criticism of your work – Part of the maturation process for any grad student is developing the ability to receive, absorb, and convert criticism. Much of this will come from rote academic traditions contained within the classroom and conferences, but a blog offers an alternative channel for this criticism. Not only will you get criticism from fellow academics, but criticism from non-academics can illuminate aspects of your research that can be improved to allow for broader understanding.
  7. Sharpening your own critical eye – What is the primary thrust of most graduate seminars? Read a series of papers, and spend the next 120 or so minutes tearing them apart. This is meant to help students recognize the difference between good and great work, but also begin to discover where the more fertile patches exist within the landscape of possible research agenda. There are, however, many more papers published in a semester than any one seminar could possibly hope to cover. Also, many seminars are focused on seminal works, not cutting edge research—the same cutting edge research you are most likely already reading in your free time. A blog provides you a platform upon which to criticize this new work, and if you are very lucky (as I have been on a few occasions) an opportunity to interact with the authors in a public forum.
  8. Oh, the places you’ll go – A combined effect of reasons #1-5 is you will be given the opportunity to travel all over the world and participate in many conferences, seminars, panels, etc. Without a public voice on the Internet I would have never had the opportunity to present to the Bay Area R User’s Group or the University of Michigan’s Center for the Study of Complex Systems. As you extend your network outside of academia it will take you to places you could never have thought possible without the blog.
  9. Building technical expertise – Not all of the work you put into your blog will go toward writing noteworthy posts. Some of the effort, particularly at the outset, will be focused on building the actual site. This will require you to learn technical skills you would have otherwise never had the need or desire to. This is incredibly useful in and of itself, but these skills can be applied beyond blogging. Considering how charts from a paper will look in the online version of your paper (the version most readers will see) is something you may only have thought of after several iterations of trial and error posting it to your blog.
  10. It is just plain fun – You are a nerd. You enjoy writing. In many ways, a blog sells itself. But, the additional joy you will feel as you watch your daily hits go up, and the frequency of (non-SPAM) comments increases, will become a powerful motivating force in your day-to-day. A wonderful side effect of which is that the overall quality of your work will also increase, as you become a better writer, researcher and conveyer of complex ideas.

I realize that this will not motivate everyone to navigate over to WordPress and being their own blogs, but I hope it has helped you understand some of the benefits of having your own presence on the Web. I welcome your own thoughts, either as a grad student blogger, or as someone unmoved by the above reasons.

Photo: Me, hard at work on another blog post…

Categories: Network Analysis News

Where to Draw the Line on ‘Data Science’?

Fri, 04/06/2010 - 00:33

I completely agree with Tim O’Reilly. Mike Loukides’ post on what is data science is a, “seminal, important post.” If it has managed to avoid your gaze over the past twenty-four hours I highly recommend it; if nothing else, it is a 2,000 word massage of the data geek ego and a nifty tool and who’s who reference to boot. As the latest in a recent series of blog post and magazine/newspaper articles on the rise of the data scientist Loukides draws broad strokes on this emerging discipline, covering everything from where the data comes from, to how to manage it, and who is doing great work (kudos for getting quotes for so many excellent members of the data community).

While I think it is important to write and discuss the importance of this field, I think it is equally important that we—the data science community—do not fall into a perpetual cycle of self-admiration and navel gazing. That is, when asking the question, “what is data science,” we should also be asking, “what is not data science?” Or, perhaps more appropriately, “What is good data science, and how do I become a good data scientist?” These questions have not been the focus of the discussion thus far, and it is time to start asking them.

Up to this point the discussion of what is data science has been rather inclusive. As Loukides notes:

In the last few years, there has been an explosion in the amount of data that’s available. Whether we’re talking about web server logs, tweet streams, online transaction records, “citizen science,” data from sensors, government data, or some other source, the problem isn’t finding data, it’s figuring out what to do with it.

After reading the Loukides piece in the context of what has already been said, I was struck by what appears to be a gradual blurring between what is science and what now being promoted as data science. As an example, consider the recent adjustment of the estimated amount of oil spilled into the Gulf coast. Using the live video feeds of the spill and satellite imagery, FSU oceanographer Ian R. MacDonald performed “rough calculations” to find that the actual amount of oil being spilled may have been four or five times what the government had estimated. Now, were the calculations performed by Dr. MacDonald data science, or just science? His data came from the streaming ethers of the Internet pointed out by Loukides and others; the external spring from which data science flows, but his primary tools were his own eyes and decades of experience.

Before you accuse me of pedantic folly, my purpose with the MacDonald example is to highlight the fact that good data science is exactly the same good science. The most meaningful analyses will be borne from a thorough understanding of the data’s context, and an acute sense of what the most important questions should be asked. The conversation up to this point, unfortunately, has been far too focused on the data resources themselves and the tools used to approach them. Good data science will never be measured by the terabytes in your Cassandra database, the number of EC2 nodes your jobs is using, or the volume of mappers you can send through a Hadoop instance. Having a lot of data does not license you to have a lot to say about it.

To that end, I have been disappointed in the lack of mention as to on how critical the social sciences are to good data science. Loukides quotes LinkedIn’s Chief Scientist DJ Patil in reference to who makes the best data scientist:

…the best data scientists tend to be “hard scientists,” particularly physicists, rather than computer science majors. Physicists have a strong mathematical background, computing skills, and come from a discipline in which survival depends on getting the most from the data.

While I have the upmost respect for physicist, like Patil, their discipline is unencumbered by such pesky matters as human free will and fallibility. I happen to know that DJ respects and understands the difference because I have had the great pleasure of discussing this issues with him, but imagine how much more difficult the so-called hard sciences would be if atoms got to decide their own charge? As data science is fundamentally about gleaning information from the data trail of humans those with perspective on causality in this context are invaluable. While large data stores may be interested in running a regression over some set of variables, a good data scientist would first wonder what the underlying process was that generated those observations, what is missing, and how that affects the interpretations of results.

My assessment of the current state of data science is best described as cautious optimism. The tools needed to capture the data deluge (as Chris Anderson puts it) have developed at a truly astonishing rate. And though I think those leading the data science charge are brilliant and preeminently capable of continuing its surge; I fear our intuitions about what the data mean have not kept pace, and it may be sooner than later that our analyses suffer for it.

Photo: Boston Globe

Categories: Network Analysis News

Homegrown Terrorism and the Small N Problem

Fri, 28/05/2010 - 06:09

I just finished the new RAND report on homegrown terrorism in the United States, entitled “Would-Be Warriors: Incidents of Jihadist Terrorist Radicalization in the United States Since September 11, 2001,” and it is a fascinating analysis of the paths to radicalization by American citizens over the past near-decade. This paper is clearly extremely timely given the seemingly sudden rise in domestic radicalization toward jihadism. As the report notes, “the 13 cases in 2009 did indicate a marked increase in radicalization leading to criminal activity, up from an average of about four cases a year from 2002 to 2008.” Given this fact, and the more recent Faisal Shahzad case, and the overall increase in attacks against the U.S., the salience of homegrown terrorism is as high as ever.

Previously, I have written skeptically about the notion that domestic terrorism is—in fact—on the rise. This apparent trend may be better described as a regression toward to mean level of this activity over a longer time period. To the author’s credit, Brian Michael Jenkins, he assuages any alarmist notions of a sudden and abnormal rise in domestic terrorism by reviewing the extensive history of domestic terrorism incidents that occurred in the United States during the 1960′s and 70′s.

After reading the RAND report I was not necessarily dissuaded rom my position that the current spike is nothing more than a mean regression; however, I was convinced by this report that the stakes have changed considerably since the previous decades and thus this subject deserves considerable attention going forward. What the RAND report suffers from, and many other reports on domestic terrorism, is a small N problem, and in order to more accurately study this phenomenon efforts must be made to overcome these issues.To be clear, having a small number of observations with respect to domestic terrorism and radicalization is a “good” problem. National security benefits from the fact that these are rare events, and we are thankful that this is the case. That said, because the RAND analysis consists of only 46 observations over an 8 year period any conclusions must be tempered by this fact. For example, when describing who the terrorist are the author states (emphasis mine):

Information on national origin or ethnicity is available for 109 of the identified homegrown terrorists. The Arab and South Asian immigrant communities are statistically overrepresented in this small sample, but the number of recruits is still tiny. There are more than 3 million Muslims in the United States, and few more than 100 have joined jihad—about one out of every 30,000—suggesting an American Muslim population that remains hostile to jihadist ideology and its exhortations to violence.

We know, however, that this final assertion is not true; specifically, with regard to the numbers. The numbers, at best, only support the claim that domestic radicalization is very rarely observed. It does not suggest anything about the internal disposition of American Muslims. While this may actually be the case, simply by not observing a phenomenon cannot support this claim. The cliché, “The absence of evidence is not evidence of absence,” is particularly applicable to small N problems. If we are actually interested in understanding the sentiment of American Muslims then traditional survey work would be quite applicable.

Clearly, the primary problem is that because these are rare events we simply do not have enough data to build good statistical models. As such, whenever endeavoring to study this subject al attempt to retain as much applicable data should be made. In the case of the RAND report this was not done, as the data were thinned to include only those cases that resulted in indictments in the U.S. or abroad. While this is a minimal limitations, the underlying assumptions is that paths and intents for radicalization is somehow different for those who are indicted versus those that are not. This seems dubious at best, and therefore a better approach would be to include all possible observations, and then using a more theoretically unbiased method for data cleansing (such as a coarsened exact matching) to isolate those observations of interest. This seems to follow a troubling trend in terrorism studies of selection on the dependent variable.

Photo: Euro-Med

Categories: Network Analysis News

PolNet 2010 and the Cult of ERGM

Tue, 25/05/2010 - 02:25

I returned to NYC on Friday from the Political Networks conference, but have only now had a chance to reflect. Charli Carpenter, of the always excellent Duck of Minerva, has already made many great points about what large conference could learn from niche conferences through her experience at PolNets (who’s that guy imbibing in that photo, anyway?). I agree with much of what Charli points out about, and overall thoroughly enjoyed the conference. I think a combination of low-visibility of these methods within the discipline as a whole with high-energy among those actually interested in networks resulted in a very top-heavy set of presentations.

A clear advantage to a conference like PolNets is that rather having a specific substantive focus at its core—like so many smaller conferences—here the focus was on a methodological technology. With that, there is less need during presentations for people to “sell” their method, because everyone in attendance has essentially signaled acceptances by being there. Therefore, more of the discussions are centered on the substantive implications of applying network theory to some research agenda, or specific methodological quibbles. This is all well and good, and add to this the fact that a small number of attendees means graduate students and young scholars have a lot of opportunity to discuss their work with more established academics.

While I have studied networks for several years, this was actually my first conference on the subject. I do, however, try to stay rather current on the literature and as such came to the conference with the expectation that the breadth of topics covered would be wide both in terms of application of network methods and political science topics. Perhaps due to my own naivety, or willful ignorance, I was disappointed to find that this was not the case.

On the former point, from what I observed at PolNets it seems that the social science networks community is rapidly forming as a cult of the exponential random graph model (ERGM) framework. In some ways this makes perfect sense. ERGM are—for lack of a better term—statistical models that describe network and allow for some degree of inference to be drawn about these structures. This can be extremely useful for social scientists, as it describes networks in familiar statistical terms. What was surprising was the wholesale, and often unquestioning, commitment to these models for all types of analysis with the social sciences. In fact one of the creators of ERGM went so far as to call it the lingua franca of all network models. To be clear, mathematically ERGM can produce all possible networks; however, in practice this is akin to saying that all the works of Shakespeare could be reproduced in Morse code. While technically possible, it would be a fool’s errand. The ERGM framework has significant computational limitations, which was reinforced by the admission of several presenters needing weeks to complete model estimations on very moderately sized networks.

While there were a few notable exceptions (best exemplified by the presenters on the Innovations in Network Measurement panel), I would have liked to see more research not just extending the ERGM framework, but also stepping outside of it to build models to describe the massively complex networks that have become commonplace in disciplines outside of the social sciences. My fear is that networks in the social sciences will become a “one trick pony,” and a pony that itself is incredibly hampered by current technology.

With respect to the breadth of application in political science I was impressed by the diversity of topics covered by the panels. I was disappointed, however, by the actual representation of political scientists at the conference. While I am fully aware that the study of networks is highly interdisciplinary, and that political science as a discipline is a very late adopter of this technology, it would have been encouraging to see more APSA card caring political scientists among the attendees. For example, on the second day of the conference a “panel of experts” convened to field questions from anyone who cared to pose one. The problem: there was not a political scientist among the experts, making it hard to ask pointed questions about networks in political science.

As I said, though, overall the conference was excellent, and I extend my thanks and congratulations to Mike Ward of Duke University for putting on such a great event. Next stop: Sunbelt 2010!

Categories: Network Analysis News

Items in my Egonet – Political Networks Edition

Tue, 18/05/2010 - 22:45

Today I am heading to Duke University for the week to attend the 2010 Political Networks Conference. As such, posting will be light this week, but I will be live tweeting the conference under the #PolNets2010 tag and will have a review post up up sometime over the weekend (hopefully).

It has been a (really) long time since I posted an Items in my Egonet set of links, and given my upcoming trip I thought it appropriate to have a political networks theme—enjoy!

p.s., if you missed it, here is my preview of the terrorism panel at the conference from a few days ago.

Categories: Network Analysis News

Thoughts on Measuring Online Social Influence

Tue, 18/05/2010 - 08:05

Over the past few weeks I have had several conversations with people interested in understanding how to understand the dynamics of influence in online discourse. Clearly, there is a social network aspect to this, as in these platforms provide the medium for these exchanges to take place and in most cases users are only subject to information existing on their network (the notable exception being Twitter, though most users still only pull information from those they are following). The primary question is: how does online social activity manifest itself in offline behavior? For example, to what extent to do social networking platforms influence voting behavior; or, how do reviews of recently released movies posted to Twitter affect an individual’s likelihood to see it in the theater; or, are online discourses a meaningful path to violent radicalization?

From an analytical perspective, the difficulty is that there are no reliable ways of measuring the process by which this influence occurs. Intuitively, we know that influence is happening online, but this process is largely hidden within the context of online exchanges. As we often represent online social interactions as networks, and because much of the relevant data will have a network form, it may be useful to begin by framing this problem in terms of a graph.

In these terms, there are at least two ways one might approach this problem. First, to measure influence we might attempt to identify influential individuals, and subsequently measure their activity. An important assumption here is that people are influenced in some relatively uniform way as a function of receiving information from those they “trust”. Over time, and assuming a constant rate of influence, as individuals self-organize to these influencers we could infer individual level of influence. A second approach is to attempt to measure signals related to the digestion of information. That is, rather than assume influence comes from key actors do the reverse, assume influence comes from pivotal pieces of information. In this case, these signals might come in the form of first-, second-order, etc., transmission of these key bits of information from their source, or the infusion of a some bit of information into a network from multiple sources. As with the influential actors approach, by observing these signals over time could approximate changes in preference and thus infer influence.

In the context of these competing approaches this problem becomes a philosophical one, and exemplifies the fundamental differences in node versus edge analyses in networks. By assuming individuals drive influence we are taking a node-centric approach, wherein actors have some valuation for information received online, and are therefore attracted to those individuals that maximize this utility. The edge-centric approach assumes that content is valued over source, and that the information contained on some edge is the primary engine to influence. It has always been my contention that too much time is spent focused on nodes in network analysis. In fact, the problem of measuring influence in online social networks is an excellent example of the value of edge-centric analysis.

As stated, this is essentially a measurement problem—we need a way to quantify information digestion, but lack an appropriate metric. Consider, a social network with some fixed number of nodes. By focusing on the characteristics of the nodes we are inherently limiting our analytical scope. While the most “central” actors may change over time, we can never achieve a meaningful measure of influence by simply examining the structural characteristics of these nodes. Influence can only occur as a function of edges; therefore, it they must be the primary unit of analysis in this endeavor. Perhaps this is why I have always been a big fan of the line graph transformation.

Photo: Mr. Irrelevant

Categories: Network Analysis News

Mining and Analyzing Online Social Graph Data [Updated with Video]

Fri, 14/05/2010 - 05:57

First, I apologize for the lack of posting this week. End-of-semester madness has kept me very busy, and the summer conference circuit is heating up so I have been busy generating material. In addition, tonighton Thursday, May 13th I am givinggave a talk to the NYC Predictive Analytics group about mining and analyzing online social graph data. A bit is borrowed from the SNA in R talk, but it is mostly new material, and as such the slides may be of interest.

I believe the talk is also being video taped, so when it becomes available online I will update with a link.

The video and slides have been posted, and both are embedded here after the jump.

Also, if you would like to participate in tonight’s live demonstration, simply post a tweet with the hash-tag #analyticsnyc in it and your Twitter network will be scraped!

Categories: Network Analysis News

New Terrorism Issue of JCR; Thoughts on ‘Duration and Sustainability’

Thu, 06/05/2010 - 05:42

The newest issue of the Journal of Conflict Resolution is out, and it has a special focus on the application of analytical tools to terrorism and counter-terrorism policy. The issue contains articles from some of the world’s top terrorism scholars, including Walter Enders, Patrick Brandt, Rozlyn Engel and Todd Sandler to name a few. As Sandler points out in the introduction:

The hallmark of post-9/11 articles on terrorism is to enlighten policy makers during a time when fear of the potential destructiveness of future terrorist attacks looms large in the mind of the public. Recessionary concerns and deficit spending underscore the importance of allocating counterterrorism resources efficiently and frugally. The primary purpose of this special issue is to investigate unexplored aspects of terrorism with advanced analytical methods with the intent of drawing a host of policy recommendations. These novel issues involve changes in terrorist targeting over time, the use of foreign aid in fighting terrorism, the influences on the structure of terrorist organizations, the costs of harboring terrorists, and the role of backlash in fostering large-scale terrorist attacks.

As such, the issue will be of particular interest to ZIA readers. It also includes a paper by Enders and Jindapon, “Network Externalities and the Structure of Terror Network,” which I critiqued several months ago when it was still a working paper. While all of the papers are worth reading, I will focus my comments a paper entitled, “On the Duration and Sustainability of Transnational Terrorist Organizations,” by a duo of scholars from West Point and an economist from Claremont McKenna College.

Here the authors set out to investigate how the what factors contribute to the longevity of a terrorist organization, and thus remain unified and effective over time. Using the ITERATE data on terrorist events, the researchers code each event as it relates to an organization, country of event, and successful attack. Along with the ITERATE data, several socio-economic variables are added to country-year pairs as controls for the “environment in which the terrorist organization is staging its attack.” As the authors describe:

More precisely, our model is estimating the probability that a terrorist organization that perpetrated a successful attack in period 1 survives to perpetrate an attack in a later period. Our approach means that a terrorist organization whose members still live and affiliate with the group but that does not attack has effectively “died out” from the ITERATE data. Duration dependence in this context specifically means the probability of staging a subsequent attack, given that the organization has previously staged at least one attack.

The first series of analysis summarizes this data, which indicates that the vast majority of terrorists groups perpetrate only one attack, and that the number of organizations has trended down over the past forty years. On the first point, we should expect this, as terrorist events are clearly rare; therefore, the distribution of organizational level of activity should follow something roughly akin to a Pareto. The second point is more interesting, but unfortunately a more considered explanation of the data is not provided. The data shows a sharp drop-off starting in the early 1990s and continuing through the 2000s; with some peaks and valleys. In my view this downward trend could be explained by a number of factors. Perhaps counterterrorism polices have become more effective over these two decades, which raised the perceived cost of entering the market. Likewise, it may be terrorist organizations became more well organized—through some competitive mechanism—which in turn shrunk the market of organizations. Still, it may be that the actual coding of groups has been conflated, as it became more difficult to separate a group like al-Qaeda from its affiliates. I would have liked to see a more thorough discussion of this in the paper.

The authors then move to their duration model to test the factors contributing to the longevity of transnational terrorist organizations. They propose four models, one that controls for regional fixed effects, a second that controls for income, a third that accounts for group violence (number of attacks, lethality, etc.), and a fourth that combines them all. The findings are very much in line with what we might expect intuitively. Using a binary variable for democracy, which is coded down from Polity IV rather than taken from the Przeworski, and a series of regional indicators, the authors find that organizations operating in democracies and Middle Eastern/African countries last longer. As terrorism is a politically coercive tool, we would expect that groups use it more often in democracies, which are more vulnerable to attacks; and hence those groups would have longer success. The regional effect is likely a function of the number of groups operating in these areas, and the number of attacks perpetrated, as compared to other parts of the world.

The income variables are also significant and positive, which again are likely a function of the reality that terrorist groups prefer to target in high populations areas with relatively high mean income, as these are the places where coercion may be most likely. Finally, the violence indicators are also all positive and significant. These may be the easiest to explain, as groups that are effective are perpetrating attacks will continue to get better in the future. In other words, the best predicator of a successful and long-lived terrorist group in the future may be their current level of competence.

While the findings of this paper did not add much understanding to the processes by which terrorist groups survive, and the title was a bit deceiving as very little consideration was given to actual group organization, the piece stands alone as one of the most thorough treatments of this topic in the academic literature.

Photo: Journal of Conflict Resolution

Categories: Network Analysis News

Preview of ‘Terrorism and Insurgency’ Panel at 2010 Political Networks Conference

Wed, 05/05/2010 - 01:24

In two weeks is the 2010 Political Networks Conference at Duke University. I will presenting my work on graph motif models on the ‘Random Graph Models’ panel on the second day of the conference with an impressive set of co-panelist. Then, Ia m chairing the panel on ‘Innovations in Network Measurement,’ which is exciting and a bit intimidating, given the seniority of some of the panelists. Both of these panels are accurately focused on my methodological interest, but not on my substantive interest.

Fortunately, there is a panel on ‘Terrorism and Insurgency,’ which is being chaired by the very brilliant Dominick Wright of the University of Michigan (who I finally met in person at MPSA). The terrorism panel includes the following papers:

  • Modeling Terrorist Networks Using Bootstrapping
    Robert Duval, West Virginia University
  • Insurgent Network Structure and Rhetoric
    Michael Gabbay, University of Washington
    Ashley Thirkill-Mackelprang, University of Washington
  • Measuring Online Behaviour: A Role for Network Analysis?
    Lisa McInerney, Dublin City University

I had hoped to review these papers before the conference, but I was not able to find any of them online. I was, however, able to find what appears to be related work by McInerney with her co-author Maura Conway on “Jihadi Video & Auto-Radicalisation: Evidence from an Exploratory YouTube Study.” As this appears to be preliminary work related to McInerney’s presentation at Political Networks I look forward to learning about how it has progressed, but in the mean time I offer some thoughts.

The pathways to terrorist radicalization emanating from online discourses is an extremely important and topical area of research. Friend and fellow blogger Tim Stevens of King’s College London has written extensively in this area, and the expert team of analysts at jihadica.com have been doing stalwart work in this area for several years. Also, friends and fellow network blogggers Michael Bommarito and Dan Katz of University of Michigan have examined the networks generated by the online discussions of Christmas Day Bomber, Umar Farouk Abdulmutallab. As such, this work presented by Conway and McInerney fits well into this burgeoning research niche examining the dynamics of online radicalization networks.

Here, the authors look specifically at the networks and exchanges formed through jihadist content posted to YouTube.com. The study thus focuses on the application of network analysis to YouTube.com data; which in this case is limited to those videos related to the conflict in Iraq. From a group of 240 videos in this category, 50 were randomly selected for analysis. In terms of the data collection, the authors may have a problem with selecting on the dependent variable, as it is likely the individuals they are most interested in studying (those moving toward radicalization) would not limit their search as such. Also, of the 50 videos surveyed 30 unique authors were identified. On the surface may be a positive for the authors, but it is a bit confusing given our expectation that in online content creation most posts are generated by few authors.

As the authors state, the application of network analysis techniques in this version of the paper are very limited. The difficulty in this analysis is defining the context of the relationships. As is shown, these relationships account for vast geographic distances, and are by construction two-mode through some video. It is difficult to imply relationships from co-posting to an online forum. That said, this is an interesting start, and I look forward to learning more.

Generally, however, I am not convinced of the claim of “auto-radicalization” through online discourses. From the evidence I have seen, the influence of online forums is to motivate individuals to commit to traveling to a place for in-person radicalization, as we have seen in Abdulmutallab, the NoVa Five, and many others. Online activity may be an interesting focal point for data collection, but it will be extremely difficult to develop reliable metrics for predicting whom among the users will decide to move toward more critical aspects of radicalization.

Photo: Newsweek

Categories: Network Analysis News

The Value of Edges in Complex Network Visualization

Thu, 29/04/2010 - 01:34

Given the convergence of national security and data nerds that come to this blog, I am sure that by now most of you have read the article in yesterday’s New York Times on how PowerPoint in the silent killer of military intelligence. The catalyst of this discussion appears to have been this now infamous slide on the Afghan Stability / COIN Dynamics produced by PA Consulting Group.

For most of you this is old news, as this slide has been circulating the Internet for several months. A such, this post is not about the slide, or the notion that slide decks are detrimental to the intelligence process more generally. Others have said their piece (most of whom having little to know knowledge of the intelligence process); therefore, I will only say that fundamentally intelligence is about distilling extremely complicated things into neat digestible pieces for leadership to evaluate and make decisions. If you think “bullet-point” level detail is bad for intelligence then your problem is with the demand side of the equation—not the supply. But I digress…

In reviewing the reignited interest in this slide I came across an old post by Andrew Gelman wherein he critiques only the visual aspects of the network chart. There was one line that stood out to me:

I understand the goals of showing the connections between the nodes, but as it is, the graph is dominated by the tangle of lines.

Indeed, which moved me to think about the value of drawing edges in complex network in writ large. In my experience, except for the sparsest of network data, edges adds very little information to the visualization. In fact, edges often detract from the analytical value of a network plot by creating a confusing weave of lines that are impossible to follow or understand. I propose that the value of drawing edges is actually an asymptotic function of the density of the network data in question. I even made a picture.

This is not to say that edge data should not be used in a visualization—in fact —quite the contrary. Edges are needed to calculate the placement of nodes in many of the most information visualization algorithms. For example, techniques such as Fruchterman-Reingold and Kamada-Kawai attempt to minimize the distance between nodes with related structure and prevent nodes from being drawn on top of one another. As such, the placement of nodes in two-dimensional space is meaningful (structurally similar nodes will be closer), but once the positions of the nodes have been calculated the value of the edges is used. Consider the recently generated visualization of the relationships among artists in the last.fm database.

The author (Tamas Nepusz, co-creator of igraph) has created something truly stunning, both in terms of aesthetics and information. Each nodes is colored by genre, and using a force-directed layout we can see that there are strong relationships among rock (red), pop (green) and hip-hop (blue). As we look toward the center, however, potentially interests aspects of the visualization are lost within the maelstrom of edges, to the point where it is nearly impossible to recognize what is happening. Now, consider the alternate “cloud” version of this network.

Personally, I do not like the blurring of nodes, and the loss of labels; however, by removing the edges and allowing the nodes to stand alone the relationships among various music genres and artist is much more apparent. For example, it is much easier to see small clusters at the center and periphery. Being able to see these makes an observer want to investigate those clusters further, and see what artists they represent. In addition, edges can present a deceptive illustration of the strength of ties between clusters. Note the magenta (reggae and ska) cluster in the lower-right of the network. With the edges, it appears that this cluster has strong ties to within the network (note the edges pulling it in two directions). Without the edges, however, we can see that this cluster is actually much more peripheral relative to the density of ties among the other genre clusters.

A while back I proposed the idea of using invisible edges to identify clusters of nodes in three-dimensions with the so-called “exploded network view,” which is really simply an extension of the idea that edges have steeply diminishing value in network visualization. Going forward I will being drawing edges much more sparingly, and I highly recommend that analysts also consider the value of drawing edges when attempting to present network analysis visually.

Categories: Network Analysis News

Copyright © 2004 -2010 Knowledge Matters™ - all rights reserved

The Webpages of Durant-Law Consulting Pty Limited
and Occasional Blog of Graham Durant-Law

E-mail: graham@durantlaw.info

Clicky