python - How to make POS n-grams more effective? -

- March 15, 2010

I am text-texturing with SOM, as features using POS N-Gram. But I only take 2 hours to complete POS Unigram. I have 5000 lessons, each lesson has 300 words. Here's my code:

  def posNgrams (s, n): '' 'Calculate POS n-gram and return a dictionary' '' text = nltk.word_tokenize (s) text_tags = Text_tags: tag_lot (textlist -n + 1) for i in pos_tag (text) taglist = [] output = {}: g = '' .join for item in text_tags: taglist.append (item [1]) (Taglist [i: i + n]) output.setdefault (g, 0) output [g] + = 1 return output

I used the same method to n-gram Have tried and it only took me many minutes to make me fast about my POS ngram Can you give some information?

Using the server with inxi -C with these things: / p>

  CPU (s): 2 hexa-core Intel Xeon E5-2430 CPU V2S (-HT- Mseepi- SMP-) cash: 30720 KB flags (lM NX SSE SSE2 SSE3 sse4_1 sse4_2 SSSE3 VMX ) Clock speed: 1: 2500.036 MHz

Generally, using batch tagging with canonical answer pos_tag_sents , but it does not seem that it is fast is.

Before attempting POS tags, try to profile some steps (using only 1 core):

  nltk. Import import time Kopas Brown nltk import send_tokenize, word_tokenize, the nltk import pos_tag_sents # Load Brown Fund start = time.time (pos_tag) brown_corpus = brown.raw () loading_time = time.time () - start print "Loading has taken fund brown ", loading_time # sentence tokenizing Fund began Timektime = () = Braun_sent sent Tokyo (Braunconpas) Sent_time = Smayktaim () - Start Print" phrase taken Tokning corpus ", SENT_TIME # Word tokenizing fund [word_tokenize for brown_sents i (i)] = time.time () brown_words = start word_time = time.time () - sh True print took the word "tokenizing", word_time # loading, sent_tokenize, word_tokenize all together. Start = time.time () loading brown_words = [sent_tokenize s word_tokenize (s) (brown.raw (in))] tokenize_time = time.time () - Start Print "and took tokenizing Fund", tokenize_time # position a sentence tagging taken once Start = time.time () brown_tagged = [pos_tag (word_tokenize s_tokenize (brown.raw ())) for s] tagging_time = time.time () - Start Print "tagging by sentence Took the sentence ", use tagging_time # batch_pos_tag Start = time.time () brown_tagged = pos_tag_sents (word_tokenize for s in [sent_tokenize (s) (brown.raw ())]) tagging_time = time.time () - start putting sentences tagging print "batch", tagging_time   [out]:  
  loading brown fund took 0.154870033264 sentence tokenizing fund took 3.7720630168 9 token tokenizing fund 13.98, 28,45,068 Loading and Tokenizing Treasures Sentence 17.884783 9323 By Taking the Sentence Taken 1,114.65085101 Tagged sentences by batch 1104.63432097  
  Note: that  pos_tag_sents  was previously stated before the NLT 3.0 in the  batch_pos_tag  version 
  Finally , I think you will need to consider other POS tagger to pre-print your data, or you will have to use  threading  to handle POS tags.




















Get link





Facebook





X





Pinterest





Email





Other Apps




Comments





Post a Comment



Popular posts from this blog




winforms - C# Form - Property Change -



-



April 15, 2012








    I have a form that has a  property  named  car  A class with many properties.   The  Properties  is being displayed on the current  car  on the basis of some user actions that are being displayed.   So I have to know whether the property has been allocated or set to zero.   I'm aware of  INotifyPropertyChanged , but in my case I'm not sure that since this change I have changed my  car  properties  car  do not want to monitor the property.   Any idea how to accomplish this?   In advance thank you      If you have created a true asset in your class then You should be a great and setter, you can add the code directly to the form property in order to take action on the basis of "form":    Public partial category Form 1: form {public form 1 () {InitializeComponent (); } // define car square public class car {public string name = string. Empty; } // private car to keep the value of the current car private variable _car = null; // Property of public form which you can ru...





Read more





javascript - amcharts makechart not working -



-



March 15, 2012








    I am trying to create an amiratars serial chart on my Durandal single page application. This chart was working fine on Chrome browser but it is not working on IE.   Can you please help me in this regard.   Error while using Amcharts. The Macchart () method does not support the object in this object in IE.      





Read more





java - Algorithm negotiation fail SSH in Jenkins -



-



August 15, 2013








    I am trying to do SSH on the local server from Jenkins, but the following error has been thrown:    [SSH] Exception: Algorithm negotiation fail com.jcraft.jsch.JSchException: algorithm talk com.jcraft.jsch.Session.receive_kexinit (Session.javaatar20) at com.jcraft.jsch.Session.connect (Sesson at com.jcraft.jsch.Session.connect (Session.java:150) at org.jvnet.hudson.plugins.SSHSite.createSession (SSHSite.java:141) on failed failed org.jvnet.hudson.plugins. java: 286) .SSHS on org.jvnet.hudson.plugins.SSHBuildWrapper.executePreBuildScript (SSHBuildWrapper.java:75) on Org.jvnet.hudson.plugins.SSHBuildWrapper.setUp (SSHBuildWrapper.java59) ite.executeCommand (SSHSite.java151) on hudson.model.Build $ BuildExecution.doRun hudson.model.AbstractBuild $ AbstractBuildExecution.run (AbstractBuild.java:533) to (Build.java:154) hudson.model.Run.execute ( Run.java:1754) in ended .model.FreeStyleBuild.run (FreeStyl hudson.model.ResourceController.execute (ResourceController.java:89) on hudson.mod...





Read more

Search This Blog

Contra

python - How to make POS n-grams more effective? -

Comments

Post a Comment

Popular posts from this blog

winforms - C# Form - Property Change -

javascript - amcharts makechart not working -

java - Algorithm negotiation fail SSH in Jenkins -