IdentifiantMot de passe
Loading...
Mot de passe oublié ?Je m'inscris ! (gratuit)
Navigation

Inscrivez-vous gratuitement
pour pouvoir participer, suivre les réponses en temps réel, voter pour les messages, poser vos propres questions et recevoir la newsletter

Livres Discussion :

Natural Language Processing with Python


Sujet :

Livres

  1. #1
    Expert éminent sénior

    Avatar de Djug
    Homme Profil pro
    Inscrit en
    Mai 2007
    Messages
    2 980
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Âge : 37
    Localisation : Algérie

    Informations forums :
    Inscription : Mai 2007
    Messages : 2 980
    Points : 17 970
    Points
    17 970
    Par défaut Natural Language Processing with Python
    Bonjour,

    La rédaction de DVP a lu pour vous l'ouvrage suivant: Natural Language Processing with Python, de Steven Bird, Ewan Klein, et Edward Loper.


    Citation Envoyé par Résumé de l'éditeur
    This book offers a highly accessible introduction to natural language processing, the field that supports a variety of language technologies, from predictive text and email filtering to automatic summarization and translation. With it, you'll learn how to write Python programs that work with large collections of unstructured text. You'll access richly annotated datasets using a comprehensive range of linguistic data structures, and you'll understand the main algorithms for analyzing the content and structure of written communication.

    Packed with examples and exercises, Natural Language Processing with Python will help you:
    • Extract information from unstructured text, either to guess the topic or identify "named entities"
    • Analyze linguistic structure in text, including parsing and semantic analysis
    • Access popular linguistic databases, including WordNet and treebanks
    • Integrate techniques drawn from fields as diverse as linguistics and artificial intelligence


    This book will help you gain practical skills in natural language processing using the Python programming language and the Natural Language Toolkit (NLTK) open source library. If you're interested in developing web applications, analyzing multilingual news sources, or documenting endangered languages -- or if you're simply curious to have a programmer's perspective on how human language works -- you'll find Natural Language Processing with Python both fascinating and immensely useful.
    L'avez-vous lu? Comptez-vous le lire bientôt?

    Quel est votre avis?

    Exprimez-vous!! Votre avis nous intéresse.

  2. #2
    Membre émérite
    Avatar de Franck Dernoncourt
    Homme Profil pro
    PhD student in AI @ MIT
    Inscrit en
    Avril 2010
    Messages
    894
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Âge : 36
    Localisation : France, Paris (Île de France)

    Informations professionnelles :
    Activité : PhD student in AI @ MIT
    Secteur : Enseignement

    Informations forums :
    Inscription : Avril 2010
    Messages : 894
    Points : 2 464
    Points
    2 464
    Par défaut
    Voici une liste de définitions que j'ai trouvé intéressantes dans ce livre (les pages indiquées sont sous format n° de page du livre / n° de page de mon PDF) :
    • hypernym/hyponym relation, i.e., the relation between superordinate and subordinate concepts (p69 / 90)
    • Another rimportant way to navigate the WordNet network is from items to their components (meronyms) or to the things they are contained in (holonyms) (p710 / 91)
    • the same dictionary word (or lemma) (p104 / 125)
    • strip off any affixes, a task known as stemming. (p107 / 128)
    • Tokenization is the task of cutting a string into identifiable linguistic units that constitute a piece of language data (p109 / 130)
    • Tokenization is an instance of a more general problem of segmentation. (p112 § 133)
    • The %s and %d symbols are called conversion specifiers (p118 / 139)
    • The process of classifying words into their parts-of-speech and labeling them accord-ingly is known as part-of-speech tagging, POS tagging, or simply tagging. Parts-of-speech are also known as word classes or lexical categories. The collection of tagsused for a particular task is known as a tagset. Our emphasis in this chapter is onexploiting tags, and tagging text automatically. (p179 / 200)
    • As n gets larger, the specificity of the contexts increases, as does the chance that the data we wish to tag contains contexts that were not present in the training data. This is known as the sparse data problem, and is quite pervasive in NLP. As a consequence, there is a trade-off between the accuracy and the coverage of our results (and this is related to the precision/recall trade-off in information retrieval) (p205 / 226)
    • A convenient way to look at tagging errors is the confusion matrix. It charts expected tags (the gold standard) against actual tags gen-erated by a tagger (p207 / 228)
    • All languages acquire new lexical items. A list of words recently added to the Oxford Dictionary of English includes cyberslacker, fatoush, blamestorm, SARS, cantopop,bupkis, noughties, muggle, and robata. Notice that all these new words are nouns, and this is reflected in calling nouns an open class. By contrast, prepositions are regarded as a closed class. That is, there is a limited set of words belonging to the class. (p211 / 232)
    • Common tagsets often capture some morphosyntactic information, that is, informa-tion about the kind of morphological markings that words receive by virtue of theirsyntactic role. (p212 / 233)
    • Classification is the task of choosing the correct class label for a given input. (p221 / 242)
    • The first step in creating a classifier is deciding what features of the input are relevant,and how to encode those features. For this example, we’ll start by just looking at thefinal letter of a given name. The following feature extractor function builds a dictionary containing relevant information about a given name. (p223 / 244)
    • Recognizing the dialogue acts underlying the utterances in a dialogue can be an important first step in understanding the conversation. The NPS Chat Corpus, which was demonstrated in Section 2.1, consists of over 10,000 posts from instant messaging sessions. These posts have all been labeled with one of 15 dialogue act types, such as “Statement,” “Emotion,” “y/n Question,” and “Continuer.” (p235 / 256)
    • Recognizing textual entailment (RTE) is the task of determining whether a given piece of text T entails another text called the “hypothesis”. (p235 / 256)
    • A confusion matrix is a table where each cell [i,j] indicates how often label j was pre-dicted when the correct label was i. (p240 / 261)
    • Numeric features can be converted to binary features by binning, which replaces them with features such as “4<x<6.” (p249 / 270)
    • Named entities are definite noun phrases that refer to specific types of individuals, such as organizations, persons, dates, and so on. The goal of a named entity recognition (NER) system is to identify all textual men-tions of the named entities. This can be broken down into two subtasks: identifyingthe boundaries of the NE, and identifying its type. (p281 / 302)
    • Since our grammar licenses two trees for this sentence, the sentence is said to be structurally ambiguous. The ambiguity in question is called a prepositional phrase attachment ambiguity. (p299 / 320)
    • A grammar is said to be recursive if a category occurring on the left hand side of a production also appears on the righthand side of a production. (p301 / 322)
    • A parser processes input sentences according to the productions of a grammar, and builds one or more constituent structures that conform to the grammar. A grammar is a declarative specification of well-formedness—it is actually just a string, not a program. A parser is a procedural interpretation of the grammar. It searches through the space of trees licensed by a grammar to find one that has the required sentence alongits fringe. (p302 / 323)
    • Phrase structure grammar is concerned with how words and sequences of words combine to form constituents. A distinct and complementary approach, dependency grammar, focuses instead on how words relate to other words. (p310 / 331)
    • A dependency graph is projective if, when all the words are written in linear order, the edges can be drawn above the words without crossing. (p311 / 332)
    • In the tradition of dependency grammar, the verbs in Table 8-3 (whose dependents have Adj, NP, S and PP, which are often called complements of the respective verbs, are different) are said to have different valencies. (p313 / 335)
    • This ambiguity is unavoidable, and leads to horrendous inefficiency in parsing seemingly innocuous sentences. The solution to these problems is provided by probabilistic parsing, which allows us to rank the parses of an ambiguous sentence on the basis of evidence from corpora. (p318 / 339)
    • A probabilistic context-free grammar (or PCFG) is a context-free grammar that as-sociates a probability with each of its productions. It generates the same set of parses for a text that the corresponding context-free grammar does, and assigns a probability to each parse. The probability of a parse generated by a PCFG is simply the product ofthe probabilities of the productions used to generate it. (p320 / 341)
    • We can see that morphological properties of the verb co-vary with syntactic properties of the subject noun phrase. This co-variance is called agreement. (p329 / 350)
    • A feature path is a sequence of arcs that can be followed from the root node (p339 / 360)
    • A more general feature structure subsumes a less general one. (p341 / 362)
    • Merging information from two feature structures is called unification. (p342 / 363)
    • The two sentences in (5) can be both true, whereas those in (6) and (7) cannot be. In other words, the sentences in (5) are consistent, whereas those in (6) and (7) are inconsistent. (p365 / 386)
    • A model for a set W of sentences is a formal representation of a situation in which allthe sentences in W are true. (p367 / 388)
    • An argument is valid if there is no possible situation in which its premises are all true and its conclusion is not true. (p369 / 390)
    • In the sentences "Cyril is tall. He likes maths.", we say that he is coreferential with the noun phrase Cyril. (p373 / 394)
    • In the sentence "Angus had a dog but he disappeared.", "he" is bound by the indefinite NP "a dog", and this is a different relationship than coreference. If we replace the pronoun he by a dog, the result "Angus had a dog but a dog disappeared" is not semantically equivalent to the original sentence "Angus had a dog but he disappeared." (p374 / 395)
    • In general, an occurrence of a variable x in a formula F is free in F if that occurrence doesn’t fall within the scope of all x or some x in F. Conversely, if x is free in formula F, then it is bound in all x.F and exists x.F. If all variable occurrences in a formulaare bound, the formula is said to be closed. (p375 / 396)
    • The general process of determining truth or falsity of a formula in a model is called model checking. (p379 / 400)
    • Principle of Compositionality: the meaning of a whole is a function of the meaningsof the parts and of the way they are syntactically combined. (p385 / 406)
    • ? is a binding operator, just as the first-order logic quantifiers are. (p387 / 408)
    • A discourse representation structure (DRS) presents the meaning of discourse in terms of a list of discourse referents and a list of conditions.The discourse referents are the things under discussion in the discourse, and they correspond to the individual variables of first-order logic. The DRS conditions apply to those discourse referents, and correspond to atomic open formulas of first-orderlogic. (p397 / 418)
    • Inline annotation modifies the original document by inserting special symbols or control sequences that carry the annotated information. For example, when part-of-speech tagging a document, the string "fly" might be replacedwith the string "fly/NN", to indicate that the word fly is a noun in this context. Incontrast, standoff annotation does not modify the original document, but instead creates a new file that adds annotation information using pointers that reference the original document. For example, this new document might contain the string "<token id=8pos='NN'/>", to indicate that token 8 is a noun. (p421 / 442)
    Un autre dictionnaire de NLP disponible online : http://www.cse.unsw.edu.au/~billw/nlpdict.html

  3. #3
    Membre émérite
    Avatar de Franck Dernoncourt
    Homme Profil pro
    PhD student in AI @ MIT
    Inscrit en
    Avril 2010
    Messages
    894
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Âge : 36
    Localisation : France, Paris (Île de France)

    Informations professionnelles :
    Activité : PhD student in AI @ MIT
    Secteur : Enseignement

    Informations forums :
    Inscription : Avril 2010
    Messages : 894
    Points : 2 464
    Points
    2 464
    Par défaut
    Également, pour ceux intéressés par le sujet, Stanford lance un cours d'introduction au traitement automatique des langues naturelles : http://www.nlp-class.org/

    Images attachées Images attachées  

  4. #4
    Membre émérite
    Avatar de Franck Dernoncourt
    Homme Profil pro
    PhD student in AI @ MIT
    Inscrit en
    Avril 2010
    Messages
    894
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Âge : 36
    Localisation : France, Paris (Île de France)

    Informations professionnelles :
    Activité : PhD student in AI @ MIT
    Secteur : Enseignement

    Informations forums :
    Inscription : Avril 2010
    Messages : 894
    Points : 2 464
    Points
    2 464
    Par défaut
    Le sommaire en détails :
    Chapter 1. Language Processing and Python 
    Section 1.1. Computing with Language: Texts and Words 
    Section 1.2. A Closer Look at Python: Texts as Lists of Words 
    Section 1.3. Computing with Language: Simple Statistics 
    Section 1.4. Back to Python: Making Decisions and Taking Control 
    Section 1.5. Automatic Natural Language Understanding 
    Section 1.6. Summary 
    Section 1.7. Further Reading 
    Section 1.8. Exercises 
    Chapter 2. Accessing Text Corpora and Lexical Resources 
    Section 2.1. Accessing Text Corpora 
    Section 2.2. Conditional Frequency Distributions 
    Section 2.3. More Python: Reusing Code 
    Section 2.4. Lexical Resources 
    Section 2.5. WordNet 
    Section 2.6. Summary 
    Section 2.7. Further Reading 
    Section 2.8. Exercises 
    Chapter 3. Processing Raw Text 
    Section 3.1. Accessing Text from the Web and from Disk 
    Section 3.2. Strings: Text Processing at the Lowest Level 
    Section 3.3. Text Processing with Unicode 
    Section 3.4. Regular Expressions for Detecting Word Patterns 
    Section 3.5. Useful Applications of Regular Expressions 
    Section 3.6. Normalizing Text 
    Section 3.7. Regular Expressions for Tokenizing Text 
    Section 3.8. Segmentation 
    Section 3.9. Formatting: From Lists to Strings 
    Section 3.10. Summary 
    Section 3.11. Further Reading 
    Section 3.12. Exercises 
    Chapter 4. Writing Structured Programs 
    Section 4.1. Back to the Basics 
    Section 4.2. Sequences 
    Section 4.3. Questions of Style 
    Section 4.4. Functions: The Foundation of Structured Programming 
    Section 4.5. Doing More with Functions 
    Section 4.6. Program Development 
    Section 4.7. Algorithm Design 
    Section 4.8. A Sample of Python Libraries 
    Section 4.9. Summary 
    Section 4.10. Further Reading 
    Section 4.11. Exercises 
    Chapter 5. Categorizing and Tagging Words 
    Section 5.1. Using a Tagger 
    Section 5.2. Tagged Corpora 
    Section 5.3. Mapping Words to Properties Using Python Dictionaries 
    Section 5.4. Automatic Tagging 
    Section 5.5. N-Gram Tagging 
    Section 5.6. Transformation-Based Tagging 
    Section 5.7. How to Determine the Category of a Word 
    Section 5.8. Summary 
    Section 5.9. Further Reading 
    Section 5.10. Exercises 
    Chapter 6. Learning to Classify Text 
    Section 6.1. Supervised Classification 
    Section 6.2. Further Examples of Supervised Classification 
    Section 6.3. Evaluation 
    Section 6.4. Decision Trees 
    Section 6.5. Naive Bayes Classifiers 
    Section 6.6. Maximum Entropy Classifiers 
    Section 6.7. Modeling Linguistic Patterns 
    Section 6.8. Summary 
    Section 6.9. Further Reading 
    Section 6.10. Exercises 
    Chapter 7. Extracting Information from Text 
    Section 7.1. Information Extraction 
    Section 7.2. Chunking 
    Section 7.3. Developing and Evaluating Chunkers 
    Section 7.4. Recursion in Linguistic Structure 
    Section 7.5. Named Entity Recognition 
    Section 7.6. Relation Extraction 
    Section 7.7. Summary 
    Section 7.8. Further Reading 
    Section 7.9. Exercises 
    Chapter 8. Analyzing Sentence Structure 
    Section 8.1. Some Grammatical Dilemmas 
    Section 8.2. What's the Use of Syntax? 
    Section 8.3. Context-Free Grammar 
    Section 8.4. Parsing with Context-Free Grammar 
    Section 8.5. Dependencies and Dependency Grammar 
    Section 8.6. Grammar Development 
    Section 8.7. Summary 
    Section 8.8. Further Reading 
    Section 8.9. Exercises 
    Chapter 9. Building Feature-Based Grammars 
    Section 9.1. Grammatical Features 
    Section 9.2. Processing Feature Structures 
    Section 9.3. Extending a Feature-Based Grammar 
    Section 9.4. Summary 
    Section 9.5. Further Reading 
    Section 9.6. Exercises 
    Chapter 10. Analyzing the Meaning of Sentences 
    Section 10.1. Natural Language Understanding 
    Section 10.2. Propositional Logic 
    Section 10.3. First-Order Logic 
    Section 10.4. The Semantics of English Sentences 
    Section 10.5. Discourse Semantics 
    Section 10.6. Summary 
    Section 10.7. Further Reading 
    Section 10.8. Exercises 
    Chapter 11. Managing Linguistic Data 
    Section 11.1. Corpus Structure: A Case Study 
    Section 11.2. The Life Cycle of a Corpus 
    Section 11.3. Acquiring Data 
    Section 11.4. Working with XML 
    Section 11.5. Working with Toolbox Data 
    Section 11.6. Describing Language Resources Using OLAC Metadata 
    Section 11.7. Summary 
    Section 11.8. Further Reading 
    Section 11.9. Exercises
    

  5. #5
    Membre du Club Avatar de CompuTux
    Homme Profil pro
    Développeur Python et Django
    Inscrit en
    Août 2004
    Messages
    82
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Âge : 45
    Localisation : France, Bas Rhin (Alsace)

    Informations professionnelles :
    Activité : Développeur Python et Django
    Secteur : Conseil

    Informations forums :
    Inscription : Août 2004
    Messages : 82
    Points : 68
    Points
    68
    Par défaut
    Très intéressant ! Merci beaucoup pour cette lecture !

    Je pense que je vais le feuilleter car je bosse sur une application de génération de texte.

    Je pense que je vais avoir besoin d'un cadre théorique donc je vais aussi jeter un œil sur la théorie, pas que sur les applications pratiques en python.

    Une question me vient à l'esprit, vous qui l'avez lu :

    Peut-on adapter et convertir le code de NLTK en PHP5?

    Je ne sais pas encore réellement quel langage vais je devoir choisir pour programmer mon application.

  6. #6
    Membre émérite
    Avatar de Franck Dernoncourt
    Homme Profil pro
    PhD student in AI @ MIT
    Inscrit en
    Avril 2010
    Messages
    894
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Âge : 36
    Localisation : France, Paris (Île de France)

    Informations professionnelles :
    Activité : PhD student in AI @ MIT
    Secteur : Enseignement

    Informations forums :
    Inscription : Avril 2010
    Messages : 894
    Points : 2 464
    Points
    2 464
    Par défaut
    Content que cela t'intéresse, à ma connaissance il n'existe malheureusement pas d'outils pour convertir du code python en PHP (et a fortiori NLTK)... par contre, tu peux appeler sans problème du code Python à partir de PHP.

  7. #7
    Membre du Club Avatar de CompuTux
    Homme Profil pro
    Développeur Python et Django
    Inscrit en
    Août 2004
    Messages
    82
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Âge : 45
    Localisation : France, Bas Rhin (Alsace)

    Informations professionnelles :
    Activité : Développeur Python et Django
    Secteur : Conseil

    Informations forums :
    Inscription : Août 2004
    Messages : 82
    Points : 68
    Points
    68
    Par défaut
    J'avais pensé initialement écrire mon application en PHP5 mais je peux certainement changer d'avis, et l'écrire en python. En effet python est très bien, très simple, et se déploie facilement sur internet.

  8. #8
    Membre émérite
    Avatar de Franck Dernoncourt
    Homme Profil pro
    PhD student in AI @ MIT
    Inscrit en
    Avril 2010
    Messages
    894
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Âge : 36
    Localisation : France, Paris (Île de France)

    Informations professionnelles :
    Activité : PhD student in AI @ MIT
    Secteur : Enseignement

    Informations forums :
    Inscription : Avril 2010
    Messages : 894
    Points : 2 464
    Points
    2 464
    Par défaut
    Why Python?


Discussions similaires

  1. Critique de "Rapid GUI Programming With Python and Qt"
    Par johnlamericain dans le forum Qt
    Réponses: 0
    Dernier message: 21/08/2010, 11h43
  2. Start with python
    Par elghadi_mohamed dans le forum Général Python
    Réponses: 4
    Dernier message: 21/04/2007, 00h33

Partager

Partager
  • Envoyer la discussion sur Viadeo
  • Envoyer la discussion sur Twitter
  • Envoyer la discussion sur Google
  • Envoyer la discussion sur Facebook
  • Envoyer la discussion sur Digg
  • Envoyer la discussion sur Delicious
  • Envoyer la discussion sur MySpace
  • Envoyer la discussion sur Yahoo