“Big data” and “data science” are some of the buzzwords of our era, perhaps second only to “machine learning” or “artificial intelligence.” In our globalized, Internet-ized society of plentiful information galore, data has become perhaps the most important commodity of all. Across all kinds of academic disciplines, working with large amounts of data has become a necessity: universities and corporations advertise positions for “data scientists,” and media outlets warn ominously of the privacy risks associated with the rise of “big data.”
This isn’t an article that discusses the broader, societal implications of “big data,” although I highly encourage all readers to learn more about this important topic. Instead, I’m here purely to provide you some (hopefully) broadly applicable tips to working with large amounts of data in any academic context.
In my own field of climate science, data is paramount: researchers work with gigantic databases and arrays containing millions of elements (e.g., how different climate variables, such as temperature or precipitation, change over both space and time). But data, and opportunities for working with data, are present in every field, from operations research to history. Below is an overview of some existing data operation tools that can hopefully assist you on your budding data science career!
Continue reading Demystifying Big Data: An Introduction to Some Useful Data Operation Tools