PowerPoint 演示文稿

PowerPoint 演示文稿

CS 381V: Visual Recognition Experiment Presentation of Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition Max Jaderberg, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman NIPS Deep Learning Workshop, 2014 Zhuode Liu University of Texas at Austin 1 Outline Model Architecture (and a drawback: word classifier model too larg e) What does SVT dataset and synthetic dataset look like Failure Cases and Special Cases of word classifier model analysis of the inferior char classifier and n-gram classifier.

2 Note! CNN is trained on completely synthetic data No real data The three different models uses the same CNN as the feature extracto r That CNN is trained using the 1st models architecture, and used for fin e-tuning when training the 2nd, 3rd model. The three models are: word classifier, char classifier, and n-gram classifier . 3 Model Architecture and Size Green: word classifier Blue:char classifier Black: n-gram classifier Every Every model model shares shares weights weights except

except the the last last layer. layer. (credit: Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition, Max Jaderberg et al. Deep Learning Workshop, 2014) 4 Model Architecture and Size Green: word classifier Blue:char classifier Black: n-gram classifier Size: 1918MB, 1340MB for the last softmax layer Every Every model model shares shares weights weights except except the

the last last layer. layer. Although Although the the word word classifier classifier performs performs the the best, best, the the last last layer layer is is too too big. big.

Size: 485MB Size: 667MB (credit: Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition, Max Jaderberg et al. Deep Learning Workshop, 2014) What does SVT Dataset look like? 6 What does SVT Dataset look like? 7 What does the synthetic training set look like? Generating process: (credit: Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition, Max Jaderberg et al. Deep Learning Workshop, 2014) 8 Successful Cases

Remind: in the experiment, this model(word classifier) performs well 93.1% on IC03 dataset, 80.7% on SVT dataset Those that look like images in the synthesis training set 9 Successful Cases Remind: in the experiment, this model(word classifier) performs well 93.1% on IC03 dataset, 80.7% on SVT dataset Slightly Harder: Burbank: 0.99 Bureaus: 0.0016 Tabu: 0.89 Tdd: 0.045

10 Failure Cases --- need some context informatio n PRED: air TRUE: bar 10 20 30 20 40 60 80 100 11

Failure Cases need some context information PRED: air TRUE: bar 10 20 30 20 Context: 40 60 80 100 (Predicted: AIR) (True: )

12 Failure Cases need some context information I dont think it is correct, and it is used to derive the formula below. However, this formula gives a good way to incorporate context, because is the model output, is our prior context. PRED: air TRUE: bar (Predicted: AIR with score 0.1003) (True: BAR with score 0.0046) 10 20 30 20 40

60 80 100 If we set to be the frequency of AIR and BAR in this dataset, 13 then the score for AIR and BAR becomes more similar (AIR: 15e-5, BAR: 6e-5) Failure Cases need some context information I dont think it is correct, and it is used to derive the formula below. However, this formula gives a good way to incorporate context, because is the model output, is our prior context. PRED: contort TRUE: comfort (Predicted: Contort with score 0.5286)

(True: Comfort with score 0.4690) 10 20 30 20 40 60 80 100 If we set to be the frequency of them in this dataset, then the score for Comfort becomes much higher (Contort : 0, Comfort : 7.25e-4 , ranked at top-1) 14

Accuracy on SVT Dataset Trained Lexicon: 90k TOP-1: 83.59% TOP-5: 89.47% TOP-10: 91.18% Quick Reminder: What does 90k and 50 mean? Trained Lexicon: 50 TOP-1: 95.4% Much higher than the above83.59%, which shows the power of context Other Methods: (copied from the paper) PhotoOCR: 90.4% Almazan: 89.2% Gordo: 90.7% 15 Special Cases: slanted words These three images look hard! Because the text is slanted.

Top-5 predictions for the city image: '19' 'iq' 'nsi' '12' 'isl For the lights image: 'sights' 'insults' 'pews' 'resits' 'jesuits For the bookstore image: 'predispositions' 'prolongations' 'pseudonymous' 'buys' 'preconceptions None of them are correct! The same thing happens for these images: 16 Special Cases: slanted words Test images: What are these words like in the synthetic training set? Dont have much variation in the orientation of the text.

17 Special Cases: vertical words Can it handle vertical words? No, because No vertical words in the training set. The CNN input is 32*100. If vertical, the word will be elongated and o nly span 32 pixels. example: (32*100) Top-5 Predition: 'e' 'b' 'is' '8' '19 (totally wrong) 18 character classifier on SVT dataset Prediction Examples: (acc: 72.91%) Pred: mountann True: mountain

Pred: apartments True: amartments Pred: tie True: the 19 (top image credit: Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition, Max Jaderberg et al. Deep Learning Workshop, 2014) Accuracy of the character classifier on SVT d ataset (find the closest word in the 90k lexicon) previous model: word classifier With edit-distance correction Without 72.9 1%

78.4 8% 83.5 9% 20 Accuracy of the character classifier on SVT d ataset (find the closest word in the 90k lexicon) previous model: word classifier With edit-distance correction Without 72.9 1% 78.4 8%

83.5 9% For the without model: Average edit distance to the true word: 0.608 Average edit distance to the true word, for the wrongly predicted words: 2.24 (quite high) On IC13 dataset: 1.9 (also high) Pred: motomborts (:2) True: motorsports Pred: araaery (:3) True: brewery Pred: anngll (:3) True: angelo 21 Accuracy of the character classifier on SVT d

ataset (find the closest word in the 90k lexicon) previous model: word classifier With edit-distance correction Without 72.9 1% 78.4 8% 83.5 9% For the without model: Average edit distance to the true word: 0.608 Average edit distance to the true word, for the wrongly predicted words: 2.24 (quite high) On IC13 dataset: 1.9 (also high)

Conclusion: Doesnt perform well compared to the word classifier model (83.59%). Computing edit-distance correction on 90k lexicon is very slow. (9.5s per word) 22 Accuracy of the N-Gram classifier on SVT dat aset Input: Image. Output: N-gram encoding To make a prediction: find the lexicon word that have the closest N-gram encoding. f, m, o, r, u, fm, fo, fu, or, rm, ru, um, uo, for, fou, fum, fur, ium, miu, orm, oru, otu, our, rmu, rum, uor, uru, form, ourn, rium, rums e, h, l, o, t, el, ho, ot, te, hel, hol, hot, lof, lte, oel, oet, ohe, olt, oot, ote, otl, ott, tel, tet, tte, tely, otte, oote, oter, oted, hoot, rott, toot,

lott, mote, otle, ottl (top image credit: Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition, Max Jaderberg et al. Deep Learning Workshop, 2014) 23 Accuracy of the N-Gram classifier on SVT dat aset Input: Image. Output: N-gram encoding To make a prediction: find the lexicon word that have the closest N-gram encoding. Accuracy on the 90k lexicon: 60.22% Conclusion: Worse than the word classifier (83.59%) and char classifier (72.91%). Computing the closest N-gram encoding on the 90k lexicon is very slow. (3.5s per word) 24 (top image credit: Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition, Max Jaderberg et al. Deep Learning Workshop, 2014) Summary

Size word classifier n-gram classifier char classifier 667 485MB MB 191 8 MB Accuracy word classifier 60.2

2% 72.9 1% n-gram classifier char classifier 83.5 9% Test Runtime word classi... 0.25 n-gram classi... char classi... 0.1 3.5 9.5

char classifier w/ correc... 25 Summary N-gram classifier is not useful in the word classifier more general 90k lexicon n-gram classifier And the paper only reports it on the smaller 50 lexicon char classifier Neither does char classifier Size 667 485MB MB

191 8 MB Accuracy But maybe useful for recognizing any char sequence word classifier 60.2 2% 72.9 1% n-gram classifier Word classifier is both fast and accur ate, but its big. char classifier

83.5 9% Test Runtime word classi... 0.25 n-gram classi... char classi... 0.1 3.5 9.5 char classifier w/ correc... 26

Recently Viewed Presentations

  • Connected Chapter 5 - Christakis & Fowler

    Connected Chapter 5 - Christakis & Fowler

    The path of a bill can be traced if enough people input the same bill into the system. ... Photos by Steve Polyak and Wesley Fryer The Strength of Weak Ties The Strength of Weak Ties Mix of strong ties...
  • Seven Ways To Disobey God - Bible Truths

    Seven Ways To Disobey God - Bible Truths

    Seven Ways To Disobey God 1. By Doing What Is Forbidden (Gen. 2:16, 17; 3: 6). Adam and Eve, in eating the forbidden fruit, violated the Will of God. We refer to this kind of sin as a sin of...
  • OBSTETRICS EMERGENCIES - kau

    OBSTETRICS EMERGENCIES - kau

    OBSTETRICS EMERGENCIES Post-partum haemorrhage Shoulder dystocia Cord prolapse Eclampsia Uterine rupture Uterine inversion Fetal distress APH Delivery of the 2nd twin
  • Walzer thinks that the US decision to drop atomic bomb on two ...

    Walzer thinks that the US decision to drop atomic bomb on two ...

    Plagiarism. 1. It is permissible to get help from other people but if you do you should give them credit - perhaps in a first or last footnote if the help is general, at the spot where you use their...
  • The first day of spring or fall, when the sun is directly ...

    The first day of spring or fall, when the sun is directly ...

    constellation: Ursa Major (the big bear) North Star. Polaris. The name "The Great Bear" seems to have been assigned to the constellation in antiquity, due to its northern latitudes. Only a prodigious bear could live in such a northerly clime.
  • btrc.masdar.ac.ae Optimal Control of Chiller Condenser Sub-cooling, Compressor

    btrc.masdar.ac.ae Optimal Control of Chiller Condenser Sub-cooling, Compressor

    Effectiveness NTU Method. Cooling Tower Model. Assumptions, Specifications and Input/ Output Variables . Cooling Tower Model. Assumptions: Air exiting the tower is saturated with water vapor and is only characterized by its enthalpy.
  • Economics in the Law - University of Dayton

    Economics in the Law - University of Dayton

    A broken promise breaches a contract. When is it efficient to breach? What are the remedies for efficient breaches? Morality versus efficiency in the enforcement of contracts. Torts (unintentional) Tort law defines reciprocal obligations with regard to performance when contracts...
  • Presentación de PowerPoint

    Presentación de PowerPoint

    Abordatge e intervenció breu del consum d'alcohol: Formació de formadors Subdirecció General de Drogodependències