search - Applying "tag" to millions of documents, using bulk/update methods -


We have our elaborate search examples about 55,000,000 documents. We have a CSV file with User_IDs, the largest CSV has 9M entries in our document as user key in the form of the user_id, so it is convenient

I am posting a question because I discuss this I want to discuss the best option and there are different ways to overcome this problem. We need to add a new "label" to the document, if it is not in the user document yet, such as tagging the user with "Stackworflow" or "Gitub".

  1. Classic endpoint It seems very slow because we need more than 9m of user_id and for each of them to issue an API call.
  2. There is, which provides some better performance but can be mentioned with limited 1000-5000 documents. Knowing more in a call when the batch is huge, some know that we have to go Need to know in
  3. Then there are too many traffic for the endpoint, but no confirmation was implemented in this standard release.
  4. There is a mention on this open issue for which some should provide better operation, but there are old and open issues where users are complaining about performance issues and memory problems.
  5. I'm not sure that this is Eligible on EL, but I thought I would load all the CSV entries in a separate index, and in some way would join two indexes and implement the script which Tag if it's still present

then the question remains that what is the best way to do this, and if you've already done some of that before, So make sure that you Share / Share the show and you will do it differently at this time

Using the above, you will simply call: ("Filter": {"not": {"term": {{"term"}: {"non" "tag": "gitub"}}}}}, " Script ":" ctx._source.label = \ "github \" "} '

The update-by-query plugin only accepts one script, not a partial document.

For issues of performance and memory, I think the best thing is to try. / P>


Comments

Popular posts from this blog

winforms - C# Form - Property Change -

javascript - amcharts makechart not working -

java - Algorithm negotiation fail SSH in Jenkins -