Social Networks as a Foundation for Computer Science Jeffrey Forbes http://www.cs.duke.edu/csed/socialnet Social Networks, CAARMS 2006 1 Where are we going Questions What should our concerns be for those choosing to major in Computer Science? courses, research, jobs,
Should we be concerned by the precipitous decline in those taking our courses or majoring or ? majors, technical students, nontechnical What can we do to ensure the ongoing success of our academic discipline? Look inward, look to others Social Networks, CAARMS 2006 2 Acknowledgements
Social Networks/Broadening Participation group: Owen Astrachan developed most of this talk Casey Alt, Richard Lucic, Susan Rodger Students: Ben Spain & Dametrious Peyton Drawn from the work of: Michael Kearns, UPenn Eytan Adar, formerly of HP Labs John Breese, David Heckerman, Microsoft Research Jonathan Herlocker, Oregon State University Thomas Hoffman, Brown University Marti Hearst, UC Berkeley Jennifer Golbeck, University of Maryland Social Networks, CAARMS 2006 3
Social Networks, CAARMS 2006 4 Computer Science What is the foundation of computer science? Historically, now, in the future What changes are here, on the horizon? From theory to practice to education
Can we relate to what and how students learn? Is every generation different, the same, Social Networks, CAARMS 2006 5 WWDD? Social Networks, CAARMS 2006 6 Questions and Answers
Judge a man by his questions rather than by his answers Voltaire Social Networks, CAARMS 2006 7 History and Computer Science Those who cannot remember the past are condemned to repeat it.
Dont know much about history, dont know much about biology, dont know much about a science book Social Networks, CAARMS 2006 8 Who, when? No stretching is required to envision computer consoles installed in every home. Everyone will have better access to the Library of Congress than the librarian himself now has. Full reports on current events, whether baseball scores, the smog index in Los Angeles or the minutes of the 178th
Korean Truce Commission will be John, McCarthy, Information, Chapter available for the asking. 1, 1966 Social Networks, CAARMS 2006 9 You win some, you lose some People will soon become discontented with the canned programs available; they will want to write their own. The ability to write a computer program will be as widespread as the ability to drive a car. Not knowing how to program will be like
living in a house full of servants and not speaking their language. Many people can write simple programs after an hour or two of instruction. Programming is far easier to learn than a foreign language or algebra. Social Networks, CAARMS 2006 10 kentlew.com Then and Now Bailey, SIGCSE 1972 It is remarkable that the majority of students can indeed handle fairly complex (Fortran) I/O by the end of the
first six lessons, even though they have not actually been formally taught how to do it. Roberts et al, SIGCSE 2006 The problem most often cited by those attempting to teach Java to novices is the lack of a simple input mechanism, Social Networks, CAARMS 2006 11 What has changed in 20 years? Machines Characteristics and Availability
Internet Availability, IM, web, Google, Students Comfort with technology, Expectations Social Networks, CAARMS 2006 12 The more things change? Assume I took your first course(s) in 1984 and understood the
concepts so completely that I could still get a 100 on the final from 1984 if I took it today (e.g., I've been in a cryogenic chamber). How would I do on the 2004 final exam? Social Networks, CAARMS 2006 13 What has changed in Physics? "You'd get a 100 plus or minus sigma. Intro classical physics hasn't really changed that much over the last 100
years. In graduate level e.g. E&M or quantum classes I think ditto, although sigma would be bigger (and might depend more on the instructor variation than on any real variation in the material). The main difference is, I think, that your chances of GETTING 100 now would be much higher." Rob Brown, Head of Physics Instruction Social Networks, CAARMS 2006 14 What has changed in Biology? "The basic principles and concepts of biology haven't changed much in 20 years. What has
changed relates to specific content, and in this arena the changes have been enormous. 20 years ago, we barely knew how to sequence DNA; today information of this kind has had a major impact on just about every topic in the biological sciences. Thus, some questions on an exam today would address topics that would be completely unfamiliar to a 1984 timetraveler. " Greg Wray, Director of Undergraduate Studies, Biology Social Networks, CAARMS 2006 15 What has changed in Economics? " we now cover material that was only introduced in an advanced or
intermediate course in 1984. In 1984 we spent the bulk of the time dealing with the Keynesian model and virtually no dialogue about supply side policies. Now the Keynesian stuff is a small subset of a much broader exposure to Aggregate demand and supply Also there is more international coverage now - as opposed to 20 years ago for obvious reasons." Lori Leachman, Director of Undergraduate Studies, Economics Social Networks, CAARMS 2006 16 What has changed in Calculus?
We have two varieties of calculus courses, the lab courses and the traditional ... The latter two have not changed significantly in decades, and I think that a student who fared well on the 1984 exam in those courses would do well today, and vice versa. [In the lab courses] You would ace about half the exam. The other half would be unfamiliar to you. For example, you would probably not know how to answer a problem on modeling a set of data, creating an approximation using Euler's method, interpreting derivatives in the context of applications in other fields, or giving explanations of ideas Lewis Blake, Supervisor of First-year Instruction Social Networks, CAARMS 2006
17 Changes in Computer Science? Social Networks, CAARMS 2006 18 Changing CS? Rock, Hard place If Computer Science has changed drastically is it to keep up with fads and stylistic changes or because of fundamental changes in the discipline?
Are we leveraging the technological and intellectual resources at our disposal If we havent changed, is it because of a solid bedrock of principles that endures? Or because were lazy, goodfor-nothing, Social Networks, CAARMS 2006 19 What is CS? Who wants to study it? Why do they want to? Social Networks, CAARMS
2006 20 Social Networks, CAARMS 2006 21 NY Times in 1984 Social Networks, CAARMS 2006 22 What is CS? Why study it?
Do we have Physics (Math, ) Envy? It's hard for voice over Internet Protocol or e-commerce to compete with finding the age of the universe, Peter Lee, CMU Does familiarity breed contempt? What was different in 1984 than today? Social Networks, CAARMS 2006
23 Social Networks, CAARMS 2006 24 Occupational Distribution of Projected S&E Job Openings 2002-2012 Engineers Information Technology Life Scientists Physical Scientists Natural
Science Managers John Sargent, US Department of Commerce, 2004 2006 25 Social Networks, CAARMS Annual Degrees and Job Openings in Broad S&E Fields PhD Master's Bachelor's Projected Job Openings 160,000 140,000 120,000
100,000 80,000 60,000 40,000 20,000 Engineering Physical Sciences Computer Sciences Biological / Agricultural Sciences John Sargent, US Department of Commerce, 2004 Social Networks, CAARMS 2006
26 Demographics: 18 - 24 year olds White Asian/ Pacific Islande r Hispani c African American Native American 2000
66% 4% 15% 14% 1% 2010 63% 5% 17% 14%
1% 2025 55% 7% 22% 15% 1% Social Networks, CAARMS 2006 US Census Bureau
27 Bachelors Degrees from Doctoral Institutions Social Networks, CAARMS 2006 28 How are Black students doing? Bachelor's Degrees Earned 5,000 Math 4,000
CS 3,000 Math - Male 2,000 CS - Male Math - Female 1,000 Degrees Award 0 CS - Female 1992 1994 1996 1998 2001 Year
Social Networks, CAARMS 2006 29 How are Black students doing? (2) Master's Degrees Earned 600 Math 500 CS 400 Math - Male
300 CS - Male 200 Degrees 100 Math - Female CS - Female 0 1992 1993 1994 1995 1996 1997 1998 2000
2001 Year Social Networks, CAARMS 2006 30 How are black students doing? (3) PhD Degrees Earned 20 Math 15 CS Math - Male
10 CS - Male Degrees 5 Math - Female CS - Female 0 1992 1993 1994 1995 1996 1997 1998 2000 2001
Year Social Networks, CAARMS 2006 31 Awarded Degrees in CS B.A./ B.S. M.S. Ph.D. Female 28%
34% 23% White 57% 53% 76% African American 10% 8% 3%
5% 5% 2% .6% .7% .2% Hispanic Native American / Alaskan Native Persons with Disabilities 1%
Women, Minorities, and with Disabilities in S&E, 2004 (200132data) SocialPersons Networks, CAARMS 2006 Tough stats Between 1970 and 2001, CS/CE doctorates 8,913 to whites 154 to blacks 8,915 tenure-track faculty at 177 departments 497 Women
63 Hispanic* 32 African American* 6 Native Americans* 2003 Taulbee Data Social Networks, CAARMS 2006 33 More local
Social Networks, CAARMS 2006 34 Social Networks, CAARMS 2006 35 Social Networks, CAARMS 2006 36 COHFE
Amherst College, Barnard College, Brown University, Bryn Mawr College, Carleton College, Columbia University, Cornell University, Dartmouth College, Duke University, Georgetown University, Harvard University, Johns Hopkins University, Massachusetts Institute of Technology, Mount Holyoke College, Northwestern University, Oberlin College, Pomona College, Princeton University, Rice University, Smith College, Stanford University, Swarthmore College, Trinity College, University of Chicago, University of Pennsylvania, University of Rochester, Washington University in St. Louis, Wellesley College, Wesleyan University, Williams College, Yale University Social Networks, CAARMS 2006 37
Social Networks, CAARMS 2006 38 If you dont take a course in CS, you wont major in it. Social Networks, CAARMS 2006 39 Who's going to College? Social Networks, CAARMS
2006 40 Who's going to College? Social Networks, CAARMS 2006 41 Who's going to College? Social Networks, CAARMS 2006 42
How do we get Students into the CompSci Tent? Social Networks, CAARMS 2006 43 Interdisciplinary minors At Duke it is difficult to double major in sciences Too many requirements, 17 courses in biology Students are interested in credentials No business major/minor, certificate program
(requires intro, capstone, six courses) Minor requires five courses, double counting ok Three courses in CS, two in econ or biology From gene to social networks, data mining, Social Networks, CAARMS 2006 44 A Future for Computer Science? Social Networks, CAARMS 2006
45 Is there a Science of Networks? From Erdos numbers to random graphs to Internet From FOAF to Selfish Routing: apparent similarities between many human and technological systems & organization Modeling, simulation, and hypotheses Compelling concepts Metaphor of viral spread Properties of connectivity has qualitative and quantitative effects Computer Science?
From the facebook to tomogravity How do we model networks, measure them, and reason about them? What mathematics is necessary? Will the real-world intrude? Social Networks, CAARMS 2006 46 Physical Networks The Internet Vertices: Routers Edges: Physical connections
Another layer of abstraction Vertices: Autonomous systems Edges: peering agreements Both a physical and business network Other examples US Power Grid Interdependence and August 2003 blackout Social Networks, CAARMS 2006 47
What does the Internet look like? Social Networks, CAARMS 2006 48 Social Networks, CAARMS 2006 49 US Power Grid Social Networks, CAARMS 2006
50 Business & Economic Networks Example: eBay bidding vertices: eBay users links: represent bidder-seller or buyerseller fraud detection: bidding rings Example: corporate boards vertices: corporations links: between companies that share a board member Example: corporate partnerships
vertices: corporations links: represent formal joint ventures Example: goods exchange networks vertices: buyers and sellers of commodities links: represent permissible transactions Social Networks, CAARMS 2006 51 Content Networks Example: Document similarity Vertices: documents on web Edges: Weights defined by similarity See TouchGraph GoogleBrowser
Conceptual network: thesaurus Vertices: words Edges: synonym relationships Social Networks, CAARMS 2006 52 Enron Social Networks, CAARMS 2006 53 Social networks
Example: Acquaintanceship networks vertices: people in the world links: have met in person and know last names hard to measure Example: scientific collaboration vertices: math and computer science researchers links: between coauthors on a published paper Erdos numbers : distance to Paul Erdos Erdos was definitely a hub or connector; had 507 coauthors How do we navigate in such networks? Social Networks, CAARMS
2006 54 Social Networks, CAARMS 2006 55 Acquaintanceship & more Social Networks, CAARMS 2006 56 Network Models (Barabasi)
Differences between Internet, Kazaa, Chord Building, modeling, predicting Static networks, Dynamic networks Modeling and simulation Random and Scale-free Implications? Structure and Evolution Modeling via Touchgraph Social Networks, CAARMS
2006 57 Web-based social networks http://trust.mindswap.org Myspace Passion.com Friendster Black Planet Facebook
73,000,000 23,000,000 21,000,000 17,000,000 8,000,000 Whos using these, what are they doing, how often are they doing it, why are they doing it? Social Networks, CAARMS 2006 58 Golbecks Criteria
Accessible over the web via a browser Users explicitly state relationships Not mined or inferred Relationships visible and browsable by others Reasons? Support for users to make connections Simple HTML pages dont suffice Social Networks, CAARMS 2006
59 CSE 112, Networked Life (UPenn) Find the person in Facebook with the most friends Document your process Find the person with the fewest friends What does this mean? Search for profiles with some phrase that yields 30-100 matches Graph degrees/friends, what is
distribution? Social Networks, CAARMS 2006 60 CompSci 1: Overview CS0 Audioscrobbler and last.fm Collaborative filtering What is a neighbor? What is the network? Social Networks, CAARMS 2006 61
What can we do with real data? How do we find a graphs diameter? This is the maximal shortest path between any pair of vertices Can we do this in big graphs? What is the center of a graph? From rumor mills to terrorists How is this related to diameter? Demo GUESS (as augmented at Duke) IM data, Audioscrobbler data Social Networks, CAARMS
2006 62 My recommendations at Amazon Social Networks, CAARMS 2006 63 And again Social Networks, CAARMS 2006 64
Collaborative Filtering Goal: predict the utility of an item to a particular user based on a database of user profiles User profiles contain user preference information Preference may be explicit or implicit Explicit means that a user votes explicitly on some scale Implicit means that the system interprets user behavior or selections to impute a vote Problems Missing data: voting is neither complete nor uniform
Preferences may change over time Interface issues Social Networks, CAARMS 2006 65 Memory-based methods Store all user votes and generalize from them to predict vote for new item Predicted vote of active user a for item j: where there are n users with non-zero
weights, vi,j is the vote of user i and item j, is a normalizing factor, w() is a weighting function between users Distance metric Correlation or similarity Social Networks, CAARMS 2006 66 Computing weights - Cosine Correlation In information retrieval, documents are
represented as vectors of word frequencies For CF, we treat preferences as vector Documents -> users Word frequencies -> votes Similarity is then the cosine between two vectors Dot product of the vectors divided by the product of their magnitudes Social Networks, CAARMS 2006 67
Computing weights - Pearson & Spearman correlation Pearson Correlation First used for CF in GroupLens project [Resnick et al., 1994] Relatively efficient to calculate incrementally Spearman Correlation same as Pearson but calculations are done on rank of va,j and vi,j Social Networks, CAARMS 2006 68
Model-based methods Really what we want is the expected value of the users vote Cluster Models Users belong to certain classes in C with common tastes Naive Bayes Formulation Calculate Pr(vi|C=c) from training set Bayesian Network Models Social Networks, CAARMS
2006 69