Tuesday, December 24, 2013

Amazon EC2 Education Grants - ML Focus - January 10 Deadline Approaching

Just learned from Guy Ernest, that Amazon AWS is setting up a new grant program with focus on Machine Learning. The nearest deadline for submission is January 10st. Don't forget to mention I sent you!

From their call:
"AWS in Education is proud to support selected research projects in machine learning and big data analysis with grants that offer free access to AWS infrastructure services. We are particularly interested in supporting novel applications in the area of distributed data transformation, feature discovery and feature selection, large-scale and/or online classification, regression, recommendation and clustering learning as well as structure discovery. "

Friday, December 20, 2013

5th Workshop on Graph Data Management (GDM) - April 4, 2014

I got this from Sherif Sakr (NICTA):
Recently, there has been a lot of interest in the application of graphs in different domains. They have been widely used for data modeling of different application domains such as chemical compounds, multimedia databases, protein networks, social networks and semantic web. With the continued emergence and increase of massive and complex structural graph data, a graph database that efficiently supports elementary data management mechanisms is crucially required to effectively understand and utilize any collection of graphs.
The overall goal of the workshop is to bring people from different fields together, exchange research ideas and results, and encourage discussion about how to provide efficient graph data management techniques in different application domains and to understand the research challenges of such area.
  • Paper submission deadline: November 25, 2013 (Extended)
  • Author Notification: December 25, 2013
  • Final Camera-ready Copy Deadline: January 5, 2014
  • Workshop: April 4, 2014

Wednesday, December 18, 2013

WWW 2014 Workshop on Big Graph Mining Announced

BGM is a full-day workshop organized in conjunction with the
23rd International World Wide Web Conference (WWW) in Seoul, Korea on April 7.
Paper submission deadline is January 7, 2014.

http://poloclub.gatech.edu/bgm2014/


* Organizers

U Kang, KAIST
Leman Akoglu, Stony Brook University
Polo Chau, Georgia Tech
Christos Faloutsos, Carnegie Mellon University


* Workshop Goals

We aim to bring together researchers and practitioners to address
various aspects of graph mining in this new era of big data, such as new
graph mining platforms, theories that drive new graph mining techniques,
scalable algorithms and visual analytics tools that spot patterns and anomalies,
applications that
touch our daily lives, and more. Together, we explore and discuss how
these important facets of are advancing in this age of big graphs.


* Topics of Interest include, but are not limited to

- Scalable graph mining, e.g., parallelized, distributed
- Heterogenous graph analysis
- Complex network analysis
- Graph mining platforms, libraries, and databases
- Interactive/human-in-the-loop graph mining
- Online graph mining algorithms
- Visual analytics and visualization of large graphs
- Analysis of streaming/dynamic/time-evolving graphs
- Machine learning on graphs
- Community detection
- Graph sampling
- Spectral graph analysis
- Social network analysis
- Biological network analysis
- Anomaly detection in graphs
- Active learning / mining
- Theoretical/complexity analysis of graph mining
- Demonstrations of graph mining applications
- Applications of graph mining methods on real-world problems


* Important Dates


Submission   : Fri, Jan 17, 2014 (23:59 Hawaii Time)
Acceptance   : Mon, Feb 3, 2014
Camera-ready : Mon, Feb 10, 2014
Workshop     : Mon, Apr 7, 2014

* Submission Information

We welcome many kinds of papers such as novel research papers, demo
papers, work-in-progress papers, and visionary (white) papers.
All papers will be peer reviewed, single-blinded.

Authors should clearly indicate in their abstracts the kinds of
submissions that the papers belong to, to help reviewers better
understand their contributions.

Submissions should be in PDF, written in English, with a maximum of 6 pages.
Shorter papers are welcome.
Format your paper using the standard double-column ACM Proceedings Style
 http://www.acm.org/sigs/publications/proceedings-templates

Submit at EasyChair:
 http://www.easychair.org/conferences/?conf=bgm14www

At least one author of an accepted paper must attend the workshop to present
the work. Accepted papers will be published through the ACM Digital Library

If you plan to extend your workshop paper submitted to our BGM'14 workshop,
and submit that extended work to future WWW conferences, please note the
following message from the workshop co-chairs:
"Any paper published by the ACM, IEEE, etc. which can be properly cited
constitutes research which must be considered in judging the novelty of
a WWW submission, whether the published paper was in a conference,
journal, or workshop. Therefore, any paper previously published as part
of a WWW workshop must be referenced and extended with new content to
qualify as a new submission to the Research Track at the WWW conference."


* Further information and Contact

Website: http://poloclub.gatech.edu/bgm2014/
Email: bgm14www (at) gmail.com

Interesting Sentiment Analysis Demo from Stanford

Following my previous blog post on etcml, I accidentally found out the
related demo http://nlp.stanford.edu:8080/sentiment/rntnDemo.html

And here is the abstract:
This website provides a live demo for predicting the sentiment of movie reviews. Most sentiment prediction systems work just by looking at words in isolation, giving positive points for positive words and negative points for negative words and then summing up these points. That way, the order of words is ignored and important information is lost. In contrast, our new deep learning model actually builds up a representation of whole sentences based on the sentence structure. It computes the sentiment based on how words compose the meaning of longer phrases. This way, the model is not as easily fooled as previous models. 


Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher Manning, Andrew Ng and Christopher Potts Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. Conference on Empirical Methods in Natural Language Processing (EMNLP 2013) pdf

Tuesday, December 17, 2013

Grades 2014 workshop announced

I am honored to participate as a program committee at Grades 2014 - Graph data management and experience workshop to be held joinly with SIGMOD. The workshop will take place on June 22, 2014 in Snowbird resort in Utah, USA.

Open positions for big data research at Georgia Tech

I got this from my colleague Polo Chao:

===== Open Positions for Postdoctoral Fellows and Research Scientists at Georgia Tech =====
Applications are sought for multiple postdoctoral/research scientist positions in the School of Computational Science and Engineering within the College of Computing at Georgia Institute of Technology. The positions are in the broad area of next generation sequencing and high performance computing/bigdata analytics. Successful candidates will have a strong background in the following areas: bioinformatics, next-generation sequencing, string/graph/parallel algorithms, and writing large, complex HPC software. Candidates who have a subset of these skills with strong interest in acquiring the others are encouraged to apply. All positions are funded by NSF/NIH grants, and can be continued for at least three years contingent on satisfactory annual progress. A minimum of two year commitment is required.

Successful candidates will join a vibrant group in bioinformatics and high performance computing that is engaged in interdisciplinary research with multiple collaborators and industrial partners. Georgia Tech offers competitive benefits and retirement programs. Interested candidates should contact Prof. Srinivas Aluru via email toaluru@cc.gatech.edu.

Srinivas Aluru, Professor
School of Computational Science and Engineering
College of Computing
Georgia Institute of Technology
Klaus Advanced Computing Building
266 Ferst Drive, Atlanta GA 30332

Ph: 404-385-1486    Fax: 404-385-7337
Email: aluru@cc.gatech.edu
URL: http://www.cc.gatech.edu/~saluru

Richard Socher - etcml

I got this from both my colleague Chris DuBois and my friend Sagie Davidovich 
Richard Socher, a PhD student from Stanford, has released an interesting new tool:
http://www.etcml.com/ for text classification.

Monday, December 16, 2013

Data Science Salaries

I got the following email from one of my readers, Frank Lo:

Danny - I really like your blog. I'm a data scientist myself and work within machine learning frameworks every day; I'm always perusing to see what others in the space have to say.

I wanted to share a link and see if you had any interest in mentioning it in a blog post:

Data Science Salary Research

I bet your readers are curious trying to figure out ML/data analysis more clearly as a profession, which is why I wanted to mention it to you. What should people with ML/data science skills expect for salaries when they apply their expertise in industry?

Of course I know the topic is a little different from your regular postings, but I thought it's still very relevant to your base of readers.



It seems the above blog post is conservative by applying a huge range of salaries. I suggest to ignore the lower range and focus on the higher range, especially in the bay area.

Tuesday, December 10, 2013

Ervin Peretz - JSMapreduce

Here is an interesting talk I got from Ervin Peretz:
JSMapreduce a wrapper which tries to make map reduce code writing and deployment easier.

Sunday, December 8, 2013

Microsoft AdPredictor (Click Through Rate Prediction) is now implemented in GraphLab!

A couple of years ago, we competed in KDD CUP 2012 and won the 4th place. We used Microsoft's AdPredictor as one of three useful models in this competition as described in our paper: Xingxing Wang, Shijie Lin, Dongying Kong, Liheng Xu, Qiang Yan, Siwei Lai, Liang Wu, Guibo Zhu, Heng Gao, Yang Wu, Danny Bickson, Yuanfeng Du, Neng Gong, Chengchun Shu, Shuang Wang, Fei Tan, Jun Zhao, Yuanchun Zhou, Kang Liu. Click-Through Prediction for Sponsored Search Advertising with Hybrid Models. In ACM KDD CUP workshop 2012.

AdPredictor is described in the paper:
Graepel, Thore, et al. "Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft's bing search engine." Proceedings of the 27th International Conference on Machine Learning (ICML-10). 2010. html

I tried to look for an open source implementation of AdPredictor and did not find even a single source code. Not surprising, considering the fact that several companies are using it in production for predicting ads  CTR (click through rate). So I have decided to go for a fun weekend activity for implementing AdPredictor for GraphLab. The code is available here.

In a nutshell, AdPredictor computes a linear regression model with probit link function.
The input to the algorithm are observations of the type
-1 3:1 4:1 6:1 9:1
1 4:1 5:1 18:1 19:1
...
where the first field -1 is the action (did not click) or 1 (clicked). Next there are pairs of binary features.
The output of the algorithm are weights for each feature. When a new ad comes in, we should simply sum up the weights for the matching features. If the weights are smaller than zero then the prediction is -1 and vice versa.

You are welcome to download graphlab from http://graphlab.org and try it out!

Adpredictor takes file in libsvm format. You should prepare a sub folder with the training file and validation (file needs to end with .validate).

You can run adpredictor using the command:
./adpredictor --matrix=folder/ --max_iter=10 --beta=1

As always let me know if you try it out!


Amnon Shashua's TED Talk

Amnon is a Professor at the Hebrew University of Jerusalem, his latest startup is one of the companies that really helps the human kind:

Saturday, December 7, 2013

Wednesday, December 4, 2013

Recommender Systems Course from GroupLens

I got the following course link from my colleague Tim Muss. The GroupLens research group (Univ. of Minnesota) have released a coursera course about recommender systems.  Michael Konstan and Michael Ekstrand are lecturing. Any reader of my blog which has an elephant memory will recall I wrote about the Lenskit project already 2 years ago where I intreviewed Michael Ekstrand.