Skip to Main Content

Data analysis and visualisation: Analysis Tools

Analysis tools available at the University of Sydney

The University of Sydney provides licences to some commercial software packages for staff and students. Please visit the lists of software available to staff and software available to students through the university to see what software you are eligible to access, and for information on how to obtain access to the available packages.

Tools specific to data analysis that are available at the University are listed and described below. Keep in mind that analysis and visualisation are often overlapping activities, so be sure to check both the analysis and visualisation sections to ensure that you don’t miss the ideal tool for your data!

Qualitative Data Analysis

NVivo – Allows you to handle rich text based information, where deep levels of analysis on both small and large volumes of data are required. It removes many of the manual tasks associated with analysis, like classifying, sorting and arranging information, so you have more time to explore trends, build and test theories and ultimately arrive at answers to questions. Get NVivo (staff and students).

Spreadsheets

Microsoft Excel – A spreadsheet application that is part of the Microsoft Office356 package made available by the University. Excel uses a grid of cells to organise data manipulations and arithmetic, and offers graphing tools, pivot tables and a macro programming language. Get Excel as part of Office 365 (staff) or get Excel (students).

Statistical Analysis

GraphPad Prism – Combines scientific graphing, comprehensive curve fitting (nonlinear regression), understandable statistics, and data organisation. While it won't replace a heavy-duty statistics program, Prism lets you easily perform basic statistical tests commonly used by laboratory and clinical researchers. GraphPad Prism is not available for install on student-owned devices. Get GraphPad Prism (staff).

GenStat – One of the statistics packages made available by the University, Gen Stat is a data analysis tool used to manage and illustrate your data, summarize and compare, model relationships, design investigations and of analyse your experiments from the simplest of ANOVA’s right through to the most complex REML. Get GenStat (staff) or get GenStat (students).

Mathematica – One of the statistics packages made available by the University, Mathematica is a computational software program used in scientific, engineering, and mathematical fields and other areas of technical computing. Get Mathematica (staff and students).

SAS – One of the statistics packages made available by the University, SAS (Statistical Analysis System) is an analytical, data manipulation application with reporting capabilities. It provides tools to master the four data-driven tasks common to virtually any application: data access, management, analysis and presentation—all within a powerful applications development environment. Get SAS (staff) or get SAS (students).

SPSS and AMOS – One of the statistics packages made available by the University, SPSS offers you the ability to analyse large datasets efficiently and easily and uncover unexpected relationships in the data using an intuitive visual interface. The AMOS application provides you with structural equation modelling (SEM) software, which allows you to create more realistic models than if you used standard multivariate statistics or multiple regression models alone. Get SPSSor AMOS (staff) or access SPSS (students) through the UniConnect Cloud portal using the Citrix Workspace app.

Programming Languages and Tools

MATLAB – A numerical computing environment that uses its own MATLAB scripting language to manipulate, analyse and visualise data. A large number of toolboxes exist that extend its functionality. Get MATLAB (staff and students).

Geospatial Data

ArcGIS – ArcGIS is a geographic information system that allows you to analyse and visualise geospatial data. It allows you to create maps, georeference data, analyse mapped data, visualise geospatial data, and manage geographic information in a database. Get ArcGIS (staff) or get ArcGIS (students).

Chemistry, Biology and Molecule Visualisation and Analysis

Cambridge Structural Database System – The CSDS is a powerful suite of software tools that allow you to explore, utilise, analyse and visualise the data in the Cambridge Structural Database, a repository of small molecule crystal structures. Access the CSDS (staff and students).

ChemOffice – A chemistry and biology drawing and analysis suite. ChemOffice provides you with a collection of applications for chemical structure drawing and analysis combined with biological pathway drawing. Get ChemOffice (staff and students).

Freely available analysis tools

In addition to commercial software, a host of open-source and/or freely available tools exist for data analysis. We have collected a short list of some of the more widely used, or easy to use tools that are available. Keep in mind that analysis and visualisation are often overlapping activities, so be sure to check both the analysis and visualisation sections to ensure that you don’t miss the ideal tool for your data!

Data Cleaning

OpenRefine – A powerful tool for working with messy data: cleaning it, transforming it from one format into another, and extending it with web services and external data.

Talend Data Preparation – A data cleaning tool similar to OpenRefine that has some extra features, such as greater auto-detection of data types and useful functions, but can be more complex to use. There are both free and paid versions available.

Qualitative Data Analysis

Lexos – A web-based tool designed for transforming, analyzing, and visualizing texts, designed for use primarily with small to medium-sized text collections, and especially for use with ancient languages and languages that do not employ the Latin alphabet. Lexos was created as an entry-level platform for Humanities scholars and students new to computational techniques while providing tools and techniques sophisticated enough for advanced research.

Statistical Analysis

JASP – An open-source statistical analysis software package that features popular classical analysis tools such as ANOVA and regression, but also contains their Bayesian counterparts. It's designed to be easy to use with a spreadsheet layout and a drag-and-drop interface, very similar to SPSS.

Text and Data Mining

Voyant Tools – A web application for performing simple text mining and analysis. You can upload, link to, or copy and paste text documents directly into the web browser. The basic analysis includes a word cloud, word counts, word frequencies, and word trends.

Overview – A document mining application designed to visualise and analyse sets of documents, from dozens to millions of pages of material. It includes built-in OCR (optical character recognition), document annotation, word clouds, entity detection, and topic-based document clustering. Overview can be run through the browser or can be installed on your local machine.

Rattle – A free, open-source graphical user interface for data mining using R. It presents statistical and visual summaries of data, transforms data so that it can be readily modelled, builds both unsupervised and supervised machine learning models from the data, presents the performance of models graphically, and scores new datasets for deployment into production. Interactions through the graphical user interface are captured as an R script that can be readily executed in R independently of the Rattle interface.

Orange – A free, open-source graphical user interface for data mining and machine learning using Python. No programming knowledge is necessary, as the visual programming interface allows you to drag-and-drop widgets and connect them up to create your data analysis workflows.

Programming Languages and Tools

R – A programming language and software environment for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, etc.) and graphical techniques. A large number of packages that extend the functionality of R for data analysis beyond the basics are available. R does have a steep learning curve, and so may take some time to pick up for those who are new to programming.

RStudio – RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management.

Python – A general purpose programming language with a focus on simplicity and readability that make it relatively easy to learn for those with little programming experience. Python requires additional libraries to be usable for data analysis and visualisation (e.g. NumPy and pandas for analysis, and Seaborn or Bokeh for visualisation). Installation of Python through the Anaconda distribution is recommended, as it includes many of the most commonly used Python packages.

Jupyter Notebook – The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualisations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modelling, machine learning and more. The Notebook has support for over 40 programming languages, including those popular in Data Science such as Python, R, Julia and Scala.

Geospatial Data

QGIS – An open source geographic information system that allows you to visualise, manage, edit, and analyse geospatial data, and to compose printable maps.

Aboriginal and Torres Strait Islander peoples are advised that this website may contain images, voices and names of people who have died.

The University of Sydney Library acknowledges that its facilities sit on the ancestral lands of Aboriginal and Torres Strait Islander peoples, who have for thousands of generations exchanged knowledge for the benefit of all. Learn more