Classification for High Dimensional Problems Using Bayesian ...

Classification for High Dimensional Problems Using Bayesian ...

Classification for High Di mensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Tre es Radford M. Neal and Jianguo Zhang the winners of NIPS2003 feature selection challe nge University of Toronto The results

Combination of Bayesian n eural networks and classific ation based on Bayesian clu stering with a Dirichlet diffu sion tree model. A Dirichlet diffusion tree m ethod is used for Arcene. Bayesian neural networks (as in BayesNN-large) are us ed for Gisette, Dexter, and Dorothea.

For Madelon, the class pro babilities from a Bayesian n eural network and from a Di richlet diffusion tree method are averaged, then threshol ded to produce predictions. Their General Approach Use simple techniques to reduce

the computational difficulty of the problem, then apply more sophisticated Bayesian methods. The simple techniques: PCA and feature selection by significance tests. Bayesian neural networks. Automatic Relevance Determination. (I) First level feature reduction

Feature selection using significance tests (first level) An initial feature subset was found by si mple univariate significance tests. (corr elation coefficient, symmetrical uncertai

nty ) Assumption: Relevant variables will be a t least somewhat relevant on their own. For all tests, a p-value was found by co mparing to the distribution found when permuting the class labels. Dimensionality reduction with PCA (an alternative for FS)

There are probably better dimensionality reduction methods than PCA, but thats what we used. One reason is that its feasible even when p is huge, provided n is not too large time required is of order min(pn2, np2). PCA was done using all the data (training, validation, and test).

(II) Building learning model & Second level feature Selection Bayesian Neural Networks Conventional neural network learning

Bayesian Neural Network Learning Based on the statistic interpretation of the conventional neural network learning Bayesian Neural Network Learning

Bayesian predictions are found by integration rather than maximization. For a test case x, y is predicted: Conventional neural network only consider parameters with maximum posterior Bayesian Neural Network consider all possible parameters in the parameter space. Can be implemented by Gaussian

approximation and MCMC ARD Prior Still remember the decay?

How? (by optimize the decay parameter) Associate weights from each input with a decay parameter There are theories for optimizing the decays. Result.

If an input feature x is irrelevant, its relevance hyperparameter =1/aa will tend to be small, forcing the relevant weight from that input to be near zero. Some Strong Points of This Algorithm

Bayesian learning integrates over the posterior distribution for the network p arameters, rather than picking a single optimal set of parameters. This farth er helps to avoid overfitting. ARD can be used to adjust the relevan ce of input features We can using prior to incorporate exter nal knowledge Dirichlet Diffusion Trees

An Bayesian hierarchical clusterin g method The methods BayesNN-small features selected using significance tests.

BayesNN-large principle components BayesNN-DFT-combo the class probabilities from a Bayesian neural network and from a Dirichlet diffusion tree m ethod are averaged, then thresholded to pro

duce predictions. About the datasets The results http://www.nipsfsc.ecs.soton.ac.uk/ Thanks. Any Question?

Recently Viewed Presentations

  • Easy Responsive Design - files.cwa-union.org

    Easy Responsive Design - files.cwa-union.org

    Communications Workers of America. CWA represents 700,000 workers in private and public sector employment in the United States, Canada and Puerto Rico in the following fields.
  • Characterization of LIGO Input Optics University of Florida

    Characterization of LIGO Input Optics University of Florida

    Rana Adhikari Peter Fritschel Mike Zucker Caltech Jordan Camp Bill Kells Nergis Mavalvala David Ottaway Daniel Sigg Stan Whitcomb Presented by Haisheng Rong at LSC Meeting 7 Hanford, 15-17 August 2000
  • Henri Rousseau  To learn about the life and

    Henri Rousseau To learn about the life and

    The starter for this next lesson was a thinking skills "bridge map" where they had to find a synonym for the 5 items featured. (These were all provided by the identifying key words activity at the end of the previous...
  • Business Application Services Project Review, March 30, 2011

    Business Application Services Project Review, March 30, 2011

    Business Application Services Project Review, March 30, 2011. Business Application Services Project Review ... RED Convert Higher Ed agencies from the legacy payroll system to the PeopleSoft system. ... (Group B) 3/21/11 4/29/11 Kick-Off Meeting with OUHSC, OUHSC-2, Rose State...
  • Aim: How can we prepare for the plot and themes that will ...

    Aim: How can we prepare for the plot and themes that will ...

    Aim: How can we prepare for the plot and themes that will arise in the play Hamlet?. Read each of the 4 scenarios and answer the questions for each. GROUPS- You will be put into a group to work on...
  • 1 Curriculum Basics for New Curriculum Chairs Michael

    1 Curriculum Basics for New Curriculum Chairs Michael

    Associate Degrees (AA, AS) Associate Degrees for Transfer (AA-T, AS-T) Certificates of Achievement. 12-18 units. 18 or more units. Locally Approved Certificates <18 units, CO approval optional but not required. Certificate of Completion (CDCP) Certificate of Competency (CDCP) Adult High...
  • Create Strategic Planning Task Force 201 Augus Septembe

    Create Strategic Planning Task Force 201 Augus Septembe

    Propose strategies, goals and measurable objectives. Finalize strategic planning document. ... Gulf Coast State College will deliver life-changing learning opportunities and will join as a full partner in dynamic cultural and economic development of the region.
  • Data Link Layer - Chipps

    Data Link Layer - Chipps

    As a frame is being built at the data link layer and that frame is destined to leave its home network for some other far off network as a packet, something must tell the upper layers, above the frame level,...