Multidimensional Scaling
Model: Distance = square root of sum of squared distances on k dimensions dxy = √∑(xi-yi)2
Data: a matrix of distances
Find the dimensional values in k = 1, 2, ... dimensions for the objects that best reproduces the original data.
Example: Consider the distances between nine American cities. Can we represent these cities in a two dimensional space.
BOS CHI DC DEN LA MIA NY SEA SF BOS 0 963 429 1949 2979 1504 206 2976 3095 CHI 963 0 671 996 2054 1329 802 2013 2142 DC 429 671 0 1616 2631 1075 233 2684 2799 DEN 1949 996 1616 0 1059 2037 1771 1307 1235 LA 2979 2054 2631 1059 0 2687 2786 1131 379 MIA 1504 1329 1075 2037 2687 0 1308 3273 3053 NY 206 802 233 1771 2786 1308 0 2815 2934 SEA 2976 2013 2684 1307 1131 3273 2815 0 808 SF 3095 2142 2799 1235 379 3053 2934 808 0
This can be done in R by using the cmdscale function. First copy the distances from above to the clipboard. Then use the following commands:
source("http://personality-project.org/r/useful.r") #get some extra functions, including read.clipboard() cities <- read.clipboard(header="TRUE") #take the data from clipboard cities #show the data city.location <- cmdscale(cities, k=2) #ask for a 2 dimensional solution round(city.location,0) #print the locations to the screen plot(city.location,type="n", xlab="Dimension 1", ylab="Dimension 2",main ="cmdscale(cities)") #put up a graphics window text(city.location,labels=names(cities)) #put the cities into the map
The output gives us the the original distance matrix (just to make sure we put it in correctly, the x,y coordinates for each city, and then the following graph.
cities <-read.clipboard(header=TRUE) > cities #show the data BOS CHI DC DEN LA MIA NY SEA SF BOS 0 963 429 1949 2979 1504 206 2976 3095 CHI 963 0 671 996 2054 1329 802 2013 2142 DC 429 671 0 1616 2631 1075 233 2684 2799 DEN 1949 996 1616 0 1059 2037 1771 1307 1235 LA 2979 2054 2631 1059 0 2687 2786 1131 379 MIA 1504 1329 1075 2037 2687 0 1308 3273 3053 NY 206 802 233 1771 2786 1308 0 2815 2934 SEA 2976 2013 2684 1307 1131 3273 2815 0 808 SF 3095 2142 2799 1235 379 3053 2934 808 0 > city.location <- cmdscale(cities, k=2) #ask for a 2 dimensional solution > round(city.location,0) #print the locations to the screen [,1] [,2] BOS -1349 -462 CHI -428 -175 DC -1077 -136 DEN 522 13 LA 1464 561 MIA -1227 1014 NY -1199 -307 SEA 1596 -639 SF 1697 132
This solution can be represented graphically:
Note that the solution is not quite what we expected (it is giving us a mirrored Australian orientation to American cities.) However, by reversing the signs in city.location, we get the more conventional representation:
city.location <- -city.location plot(city.location,type="n", xlab="Dimension 1", ylab="Dimension 2",main ="cmdscale(cities)") #put up a graphics window text(city.location,labels=names(cities)) #put the cities into the map
(Using the maps package we can compare this solution to a map of the US.
map("state")
A useful feature is R is most commands have an extensive help file. Asking for help(cmdscale) shows that R includes a distance matrix for 20 European cities. The following commands (taken from the help file) produce a nice two dimensional solution. (Note that since dimensions are arbitrary, the second dimension needs to be flipped to produce the conventional map of Europe.)
loc <- cmdscale(eurodist) x <- loc[,1] y <- -loc[,2] plot(x, y, type="n", xlab="", ylab="", main="cmdscale(eurodist)") text(x, y, names(eurodist), cex=0.8)