Big data datasets (large dataset examples)

By alvin ~ Posted Fri, 02/10/2012 - 19:53

When you first start working with MapReduce, Hadoop, mongoDB, or any other NoSQL approach, you might need some good sample big data data sets. Fortunately those are pretty easy to find these days.

As I worked through some Hadoop and MongoDB tutorials last year, I made notes of the big data datasets I kept encountering, and jotted down their URLs. I just ran across my notes again, and thought I'd share the information.

Here then is a collection of publicly available big data datasets you can use in your own tests and examples:

The Quora website has a list of large, publicly-available datasets.

A website named BigFastBlog has a list of large datasets.

Depending on your specific needs related MapReduce, Hadoop, MongoDB, or NoSQL in general, hopefully some of those "big data" datasets will be helpful.

As usual, reporting live from Boulder, Colorado, this is Alvin Alexander of Valley Programming. (You can find more information about me as Alvin Alexander on Twitter, as well as thousands of programming tutorials on devdaily.com.)

Valley Programming is currently a one-person business, owned and operated by Alvin Alexander. If you’re interested in anything you read here, feel free to contact me at “al” at (“@”) this website name (“valleyprogramming.com”), or at the phone number shown below. I’m just getting back to business here in January, 2026, and eventually I’ll get a contact form set up here, but until then, I hope that works.

Recent blog posts

More

Main menu

Big data datasets (large dataset examples)

Recent blog posts