Tuesday, November 15, 2011

Useful Dataset sites

Dataset Lists

Dublin Dashboard
http://www.dublindashboard.ie/pages/index

Quandl.com
http://www.quandl.com

KDNuggests: Datasets for Data Mining
http://www.kdnuggets.com/datasets/

Google: Dataset Directory
http://www.google.com/publicdata/directory

opendata.ie
http://opendata.ie/

StatLib---Datasets Archive
http://lib.stat.cmu.edu/datasets/

U.S. Census Bureau
http://factfinder.census.gov/servlet/DatasetMainPageServlet

Datasets: 2010 UK Election Results
http://www.guardian.co.uk/news/datablog/2010/may/07/uk-election-results-data-candidates-seats

Computer Vision Papers Datasets
http://www.cvpapers.com/datasets.html

Dataset Analytics Vocabulary
http://vocab.sindice.net/analytics

Datasets - DNA Analytics CGH
http://www.genomics.agilent.com/GenericA.aspx?pagetype=Custom&subpagetype=Custom&pageid=2079

Datasets collected by bitly
https://bitly.com/bundles/hmason/1

MCFC Analytics
http://www.mcfc.co.uk/The-Club/MCFC-Analytics

Truthy: Information Diffusion in Online Social Networks
http://cnets.indiana.edu/groups/nan/truthy/


Datasets from Competitions

KD-Nuggets
http://www.kdnuggets.com/competitions/index.html

Berlin Brain-Computer Interface
http://www.bbci.de/competition/

Netflix Prize
http://www.netflixprize.com/

ACM KDD Cup

http://www.sigkdd.org/kddcup/index.php

Santa Fe Time Series Competition Data Set B
http://www.physionet.org/physiobank/database/santa-fe/

Time Series Forecasting Grand Competition for Computational Intelligence
http://www.neural-forecasting-competition.com/downloads/NN5/datasets/download.htm

PAN 2012 - Uncovering Plagiarism, Authorship and Social Software Misuse
http://www.uni-weimar.de/medien/webis/research/events/pan-12/pan12-web/authorship.html


Brian Mac Namee recommends:

UC Irvine Machine Learning Repository
http://archive.ics.uci.edu/ml/

Central Statistics Office Ireland
http://cso.ie, also check out the 2011 census data

InfoChimps: Find data for apps & analytics
http://www.infochimps.com/

Data in Gapminder World
http://www.gapminder.org/data/

Welcome to the London Datastore
http://data.london.gov.uk/

Kaggle
http://www.kaggle.com/

U.S. data.gov
http://www.data.gov/ 

Last.fm Music Preferences Data
http://denoiserthebetter.posterous.com/music-recommendation-datasets

Opinion Mining, Sentiment Analysis, and Opinion Spam Detection
http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html

A Big List From Mahout
https://cwiki.apache.org/MAHOUT/collections.html

What Is Data in Literary Studies?
http://arcade.stanford.edu/content/what-data-literary-studies-1


Brendan Spillane recommends:

IMDB - Alternative Interfaces
http://www.imdb.com/interfaces

Export.ly: Export your data from social media as Excel or CSV
http://www.makeuseof.com/dir/exportly-export-your-data/


Garrett Duffy recommends:

LOGD Dataset Catalog
http://logd.tw.rpi.edu/datasets

Alan Cooke recommends:

DBpedia
http://dbpedia.org/About

DataSift
http://datasift.com/


Colman McMahon recommends:

30 Places to Find Open Data on the Web
http://blog.visual.ly/data-sources/

Finding Data on the Internet
http://www.inside-r.org/howto/finding-data-internet

WikiVis
http://www.wikiviz.org/wiki/Data_sources

Forbes: Special Report - Data Driven
http://www.forbes.com/special-report/data-driven.html

2 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. This comment has been removed by a blog administrator.

    ReplyDelete