Introduction to Matlab & Data Analysis Lecture 11: Data handling tips and Quality Graphs Maya Geva, Weizmann 2011 1 Why use matlab for your data analysis? One interface for all stages of your work
View raw data Manipulate it with statistics\signal processing\etc. (automate your scripts to go over multiple data files) Make quality and reproducible graphs 2 First step view raw data Graphics reveal Data 4 sets of {x,y} data points mean and variance of
{x} and {y} is equal correlation coefficient too regression line, and error of fit using the line are equal too F.J. Anscombe, American 3 Statistican, 27 (1973) One more example See how A jumps out in the plot but blends in the marginal distribution
4 View your data Look for interesting events a1 = = subplot(2,1,1) a2 = subplot(2,1,2) linkaxes([a1 a2], 'xy'); Live demonstration 5 Use interactive modes [x,y] = ginput(N) Comes in handy when youre interested in a few important
points in your plot A very useful method for extracting data out of published images 6 Having limited data filling in the missing points 7 Fill in missing data Using simple interpolation (table lookup): interp1( measured sample times, measured samples, new time vector, 'linear', NaN ); Other interpolation options cubic, spline etc. 8 Example - interpolation
x = 0:.6:pi; y = sin(x); xi = 0:.1:pi; 1 0.9 0.8 0.7 figure 0.6 yi = interp1(x,y,xi,'cubic');0.5 yj = 0.4 interp1(x,y,xi,'linear');0.3 plot(x,y,'ko') 0.2 hold on 0.1 plot(xi,yi,'r:') 0 0
plot(xi,yj,'g.:') data cubic interpolation linear interpolation 0.5 1 1.5 2 2.5 3 3.5 9 Smooth your data if needed spline toolbox
This smoothing spline minimizes n p w( j ) | y (:, j ) f ( x( j )) |2 (1 p ) (t ) | D 2 f (t ) |2 dt j 1 150 Using diff() on unsmoothed location data csaps(x,y,p) Experiment till you find the right p to use (the function can give you an initial guess if Using diff() on smoothed location data you dont know
where to begin) 100 50 0 -50 -100 -150 1.468 1.47 1.472 1.474 1.476
1.478 10 9 x 10 There are three kinds of lies: lies, damned lies, and statistics Exploratory data analysis Hypothesis testing (Almost) Everything youre used to doing with your favorite statistics software (spss etc.) is possible to do under the Matlabs rooftop* * youll might have to work a bit harder to code the specific tests youve got ready in spss you can always look for other peoples code in Mathworks website 11
Random number generators rand(n) - n uniformly distributed numbers between [0,1] Multiply and shift to get any range you need randn(n) - Normally distributed random numbers mean = 0, STD = 1 Multiply and shift to get the mean and STD you need For: Mean = 0.6, Variance = 0.1: x = .6 + sqrt(0.1) * randn(n) 12
Example Implementing coin-flips in Matlab p = rand(1); If (p>0.5) Do something Else Do something else end 13 Histograms 1D 10 bins 0.4 0.3 0.2 Probability function
X= randn(1,1000); [C, N] = hist(X, 50); bar(N,C/sum(C)) (N = location of bins, C = counts in each location) 0.1 0 -4 -3 -2 -1 0
1 2 3 4 1 2 3 4 50 bins 0.08 0.06
0.04 [C, N] = hist(X, 10); bar(N,C/sum(C)) 0.02 0 -4 -3 -2 -1 0 Values 14 Histograms 2D
x = randn(1000,1); y = exp(.5*randn(1000,1)); scatterhist(x,y) Allows viewing correlations in your data 15 Basic Characteristics of your data: mean std
median max min How to find the 25% percentile of your data? Y = prctile(X,25) 16 Is your data Gaussian? Normal Probability Plot - Y Normal Probability Plot - X x = normrnd(10,1,25,1); 0.99 0.98 normplot(x) 0.997 0.99 0.98 0.95
0.95 0.90 0.90 0.75 Probability Probability y = exprnd(10,100,1); normplot(y) 0.75 0.50 0.25 0.50 0.25
0.10 0.10 0.05 0.05 0.02 0.01 0.02 0.01 0.003 8 9 10 Data 11
12 13 0 10 20 Data 30 40 17 Statistics toolbox Hypothesis Tests 18
Its not always easy to prove your data is Gaussian If youre sure it is you can use the parametric tests in the toolbox Remember that one of the parametric tests has an un-parametric version that can be used: ttest ranksum, signrank anova kruskalwallis These tests work well when your data set is large, otherwise use precaution 19 Analysis of Variance
One way anova1 Two way anova2 N-way anovan What is ANOVA? In its simplest form ANOVA provides a statistical test of whether or not the means of several groups are all equal, and therefore generalizes t-test to more than two groups. (Doing multiple two-sample t-tests would result in an increased chance of committing a type I error.) 20 Example - one way ANOVA
Using data-matrix hogg hogg = [24 15 21 27 33 23 14 7 12 17 14 16 11 9 7 13 12
18 7 19; 7 24; 4 19; 7 15; 12 10; 18 20] The columns - different shipments of milk (Hogg and Ledolter (1987) ). The values in each column represent bacteria counts from cartons of milk chosen randomly from each shipment. Do some shipments have higher counts than others? [p,tbl,stats] = anova1(hogg); 21 Using ANOV A Sums of squares P-value
box plot() Degrees of freedom Confidence interval mean squares (SS/df) F statistic 25-75 percentiles median Data range 22 Using ANOVA Many times it comes handy to perform multiple comparisons on the different data sets multcompare(stat
s) Allows interactively using the ANOVA result Click on the group you want to test 1 2 3 4 5 5 10 15 20
25 30 3 groups have means significantly different from Group 1 23 Theres a lot more you can do with your data Signal Processing Toolbox Filter out specific frequency bands: Get rid of noise Focus on specific oscillations
Calculate cross correlations View Spectograms And much more 24 The visual Display of Quantitative Information and Envisioning Information \Edward Tufte 25 Making Quality Graphs for publications in Matlab
No need to waste time on importing data between different software Update data in a simple re-run Learn how to control the fine details 26 Graphics Handles Hierarchy 27 Example of the different components of a graphic object
28 Reminder gcf get handle of current figure gca get handle of current axes set set(gca,'Color','b') get(h) returns all properties of the graphics object h 29
Rules for Quality graphs If you want to really control your graph dont limit yourself to subplot, instead place each subplot in the exact location you need axes('position', [0.09 , 0.38 , 0.28 , 0.24]); %[left, bottom, width, height] Ulanovsky, Moss; PNAS 2008 30 The position vector [left, bottom, width, height] 31 write a template that allows control of every level of your figure
Outline - Define the shape and size of your figure Subplot A) define axes size and location inside the figure Load data, decide on plot type and add supplementary items (text, arrows etc.) A B A C B C Subplot B) define axes size and location inside the figure
Load data, decide on plot type and add supplementary items (text, arrows etc.) 32 Preparing the starting point Outline - Define the shape and size of your figure figure set(gcf,'DefaultAxesFontSize',8); set(gcf,'DefaultAxesFontName','helvetica'); set(gcf,'PaperUnits','centimeters','PaperPosi tion',[0.2 0.2 8.3 12]); %[left, bottom, width, height] Many more options to control your general figure size 33
Use the appropriate graph function to optimally view different data types 2D graphs: Plot plotyy Semilogx / semilogy Loglog Area Fill Pie bar/ barh
Hist / histc / staris Stem Errorbar Polar / rose Fplot / ezplot Scatter Image / imagesc /pcolor/imshow 3D graphs: Plot3 Pie3 Mesh / meshc / meshz
Surf / waterfall / surfc Contour Quiver Fill3 Stem3 Slice Scatter3 34 35 2D Plots 36
3D Plots Positioning Axes 37 Try to create a clear code that will enable fine tuning Subplot A) define axes size and a1 = axes('position', [0.14 , 0.08 , 0.8 , 0.5]); location inside the figure Load data, decide on plot type and add Specify the source of the data supplementary items (text, arrows load() etc.)
Plot the data with your selected function Specify the axes parameters clearly xlimits = [0.7 4.3]; xticks = 1 : 4 ; ylimits = [-28 2]; yticks = [-28 0]; xlimits and ylimits will later be used as your reference point to place text and other attributes on the figure 38 Specify the location of every additional attribute in the code
Use text() to replace title(), xlabel(), ylabel() it will give you a better control on exact location line(), rectangle() annotation(): line arrow doublearrow (two-headed arrow) textarrow (arrow with attached text box), textbox ellipse Rectangle
If you want your graphic object to pass outside Axes rectangle use the Clipping property line(X,Y,,Clipping,off) 39 Line attributes Control line and marker attributes Colors can be picked out from all palette by using [R G B] notation plot(x,y,'--rs','LineWidth',2, 'MarkerEdgeColor','k',... 'MarkerFaceColor','g', 'MarkerSize',10) 40 God is in the details
set( gca, 'xlim', xlimits, 'xtick', xticks, 'ylim', ylimits, 'ytick', [ylimits(1) 0 ylimits(2)], 'ticklength', [0.030 0.030], 'box', 'off' ); % Set the limits and ticks you defined earlier line( xlimits, [0 0], 'color', 'k', 'linewidth', 0.5 ); % Place line at y = 0 text( xlimits(1)-diff(xlimits)/2.8, ylimits(1)+diff(ylimits)/2.0, {'\Delta Information', '(bits/spike)'}, fontname', 'helvetica', 'fontsize', 7, 'rotation', 90, 'HorizontalAlignment', 'center' ); % Instead of using ylabel use a relative placement technique 41 Use any symbols you
need Greek Characters: Math Symbols \alpha, \beta, \gamma \circ , \pm Font Bold \bf, Italic \it Superscript x^5, Subscript x_5
42 Example multiple axes on same plot h = axes('Position',[0 0 1 1],'Visible','off'); axes('Position',[.25 .1 .7 .8]) Plot data in current axes t = 0:900; plot(t,0.25*exp(-0.005*t)) Define the text and display it in the fullwindow axes: str(1) = {'Plot of the function:'}; str(2) = {' y = A{\ite}^{-\alpha{\ itt}}'}; str(3) = {'With the values:'}; str(4) = {' A = 0.25'}; str(5) = {' \alpha = .005'}; str(6) = {' t = 0:900'}; set(gcf,'CurrentAxes',h) 43
Example % Prepare three plots on one figure x = -2*pi:pi/12:2*pi; subplot(2,2,1:2) plot(x,x.^2) h1=subplot(2,2,3); plot(x,x.^4) h2=subplot(2,2,4); plot(x, x.^5) % Calculate the location of the bottom two - p1 = get(h1,'Position'); t1 = get(h1,'TightInset'); p2 = get(h2,'Position'); t2 = get(h2,'TightInset'); x1 = p1(1)-t1(1); y1 = p1(2)-t1(2); x2 = p2(1)-t2(1); y2 = p2(2)-t2(2); w = x2-x1+t1(1)+p2(3)+t2(3); h = p2(4)+t2(2)+t2(4); % Place a rectangle on the bottom two, a line on the top one annotation('rectangle',[x1,y1,w,h],... 'FaceAlpha',.2,'FaceColor','red','EdgeColor','red'); line( [-8 8], [5 5], 'color', 'k', 'linewidth', 0.5 );
Margin added to Position to include labels and title 44 Save your graph First Option : saveas(h,'filename','format') Second (better for printing purposes) eval(['print ', figure_name_out, ' -f', num2str(gcf), ' -depsc cmyk']); % Photoshop format eval(['print ', figure_name_out, ' -f', num2str(gcf), ' -dpdf cmyk']); % PDF format The publishing industry uses a standard four-color separation (CMYK) and not the RGB. 45 Single auditory neurons
rapidly discriminate conspecific communication signals, Machens et al., Nature .Neurosci. (2003) Test Yourself Can you reproduce these figures Fig.1 Fig.2 46 Pros and Cons For Preparing Graphs for Publication in Matlab Cons It might take you a long time to prepare your first quality figure template Pros All the editing rounds will be much faster and robust than youre used to
Changing the data Adding annotations Changing the figure size 47 Example making a raster plot A = full(data_extracellular_A1_neuron__SparseMatrix); % convert from sparse to full % Plot a line on each spike location [M, N] = size(A); [X,Y] = meshgrid(1:N,0:M-1); Locations_X(1,:) = X(:); Locations_X(2,:) = X(:); Locations_Y(1,:) = [Y(:)*4+1].*A(:); Locations_Y(2,:) = [Y(:)*4+3].*A(:); indxs = find(Locations_Y(1,:) ~= 0); Locations_X = Locations_X(:,indxs); Locations_Y = Locations_Y(:,indxs); figure
line(Locations_X,Locations_Y,'LineWidth',4,'Color','k') 48 First option using imagsc 50 100 150 Display axes border 200 250 300 350 100 200
300 400 500 600 700 49 placing lines in each spike location: 0 0 100 200
300 400 500 600 700 Time bin 50