Profile overview for mrdrozdov.
Submission statistics

This user has mostly submitted to the following subverses (showing top 5):

2 submissions to machinelearning

2 submissions to NYU

1 submissions to dataisbeautiful

This user has so far shared a total of 5 links, started a total of 0 discussions and submitted a total of 4 comments.

Voting habits

Submissions: This user has upvoted 27 and downvoted 0 submissions.

Comments: This user has upvoted 2 and downvoted 0 comments.

Submission ratings

5 highest rated submissions:

Theano: Recurrent Neural Networks with Word Embeddings, submitted: 12/28/2015 9:57:21 PM, 4 points (+4|-0)

Faculty spotlight: Professor Saul Rosenberg, submitted: 12/28/2015 9:30:28 PM, 2 points (+2|-0)

NYU Google Scholar Citations, submitted: 12/28/2015 9:32:46 PM, 2 points (+2|-0)

Yelp Dataset Challenge (Deadline: 12/31/2015), submitted: 12/29/2015 5:27:45 PM, 1 points (+1|-0)

The Fallen of WWII, submitted: 12/30/2015 5:00:24 AM, 1 points (+1|-0)

5 lowest rated submissions:

Yelp Dataset Challenge (Deadline: 12/31/2015), submitted: 12/29/2015 5:27:45 PM, 1 points (+1|-0)

The Fallen of WWII, submitted: 12/30/2015 5:00:24 AM, 1 points (+1|-0)

Faculty spotlight: Professor Saul Rosenberg, submitted: 12/28/2015 9:30:28 PM, 2 points (+2|-0)

NYU Google Scholar Citations, submitted: 12/28/2015 9:32:46 PM, 2 points (+2|-0)

Theano: Recurrent Neural Networks with Word Embeddings, submitted: 12/28/2015 9:57:21 PM, 4 points (+4|-0)

Comment ratings

3 highest rated comments:

Why Java? Tales from a Python Convert : sookocheff.com submitted by el_cordoba to programming

mrdrozdov 0 points 2 points (+2|-0) ago

"Use the right tool for the right job."

Praise be!!! Taxi TVs Could Soon be Turned Off, TLC Says submitted by Empire_of_the_mind to newyork

mrdrozdov 0 points 0 points (+0|-0) ago

Cash Cab? Is that still around?

3 lowest rated comments:

Praise be!!! Taxi TVs Could Soon be Turned Off, TLC Says submitted by Empire_of_the_mind to newyork

mrdrozdov 0 points 0 points (+0|-0) ago

Cash Cab? Is that still around?

New to machine learning, want to dive in submitted by waratte to machinelearning

mrdrozdov 0 points 0 points (+0|-0) ago

I'd recommend to start looking into Natural Language Processing (NLP). Start by learning the various NLP tasks (part-of-speech tagging, language modeling, word embeddings, translation, semantic parsing, summarization, etc.), and then experiment with some approaches to each of these problems. It's typical that there is a dataset that is associated with one of these tasks. For example, the Penn WSJ Treebank (PTB) is commonly used for part-of-speech tagging. Using the same dataset allows many researches to compare their approaches and discuss the benefits of downsides of each one. It might seem that the best approach is the one that get's the highest accuracy (after training on the training set of the PTB, achieves the best accuracy on the test of the PTB), but this is not always the case since some algorithms take longer to train or require more resources. One of the most important aspects of an approach is how well in generalizes. In other words, the training set does not contain every example that you'd ever see, so how accurate will you be on data that you've never seen before.

There is clearly much more to learn. Here are two papers that discuss part-of-speech tagging and some useful approaches:

  1. http://www.aclweb.org/anthology/W02-1001
  2. http://nlp.stanford.edu/~manning/papers/tagging.pdf

Natural Language Toolkit (NLTK) is an excellent learning resource for these sorts of tasks. It is written in Python and has many algorithms already implemented, plus there is a free textbook. I would share a link, but it is easy enough to Google. One caveat is that many of the implemented algorithms are not as fast or as accurate as what is possible to implement yourself once you have a better grasp on the theory. Nonetheless, I'd still recommend it, and many people use it in production.

It's worth noting that not all approaches in NLP rely on machine learning, but many if not all of the recent best approaches do.