Data Analytics

Chapter 2 - OVERVIEW OF MICROSOFT EXCEL, STATA, SPSS, AND POWER BI

NCI DATA ANALYTICS2024/04/23 13:17
Follow

OVERVIEW OF MICROSOFT EXCEL, STATA, SPSS, AND POWER BI

Microsoft Excel

This is an electronic spreadsheet application developed by Microsoft to solve complex problems associated with accounting, calculations and for manipulating complex Datasets. It has over 100 built-in functions in which several are useful tools for Descriptive and Quantitative Analysis (Data Analysis). Microsoft Excel is a widely used spreadsheet application developed by Microsoft. It is part of the Microsoft Office suite and is designed for tasks related to data entry, management, analysis, and visualization. Excel provides a grid-like interface where users can organize data into rows and columns, perform calculations, create charts, and generate reports.

Excel 2003 and earlier versions support up to 65,536 rows and 256 columns. But the later versions like Excel 2007 expanded the capacity, supporting up to 1,048,576 rows and 16,384 columns.

Uses of Microsoft Excel in Data Analysis?

Data Entry and Organization: Excel is used often in data entry, providing a structured environment to input and organize data efficiently.

Data Cleaning and Transformation: It facilitates the cleaning and transformation of raw data. Users can identify and rectify errors, remove duplicates and format data consistently.

Basic Statistical Analysis: Excel offers a range of built-in functions and tools for basic statistical analysis. Users of Microsoft Excel can calculate means, medians, standard deviations, and perform hypothesis testing using functions such as AVERAGE, MEDIAN, STDEV, and T.TEST.

Data Visualization: Excel enables users to create various types of charts and graphs, such as bar charts, pie charts, and line graphs. These visualizations help in understanding data patterns and trends.

Pivot Tables: Pivot Tables are powerful features in Excel for summarizing and analyzing data. They allow users to create customized tables, pivot data and generate aggregated reports quickly.

What-If Analysis: Excel's scenario manager and data tables facilitate "what-if" analysis. This is particularly useful for making predictions or exploring the impact of different variables on a model.

Regression Analysis: While Excel's regression analysis capabilities are limited compared to dedicated statistical software, it can be used for simple linear regression and correlation analysis.

Financial Analysis: Excel is widely used for financial modeling, budgeting, and forecasting. Functions like NPV, IRR, and PMT are commonly employed for financial analysis.

Data Validation: Excel offers data validation features to ensure that data entered into a spreadsheet adheres to predefined criteria, reducing errors and maintaining data integrity.

Automating Repetitive Tasks: Excel supports the use of macros and Visual Basic for Applications (VBA) to automate repetitive data processing and analysis tasks, saving time and reducing errors.

Data Export and Integration: Excel can import data from various sources and export results. It is often used to combine and integrate data from different databases, spreadsheets, and applications.

Reporting: Users can create professional reports and dashboards in Excel to communicate data analysis results effectively to stakeholders.

Microsoft Excel's versatility, accessibility, and wide range of features make it a valuable tool for data analysis tasks, particularly when dealing with smaller to moderately sized datasets and basic to intermediate analysis requirements.

Basic Terms in Microsoft Excel.

Cell: This is the intersection of rows and columns in a spreadsheet.

Cell Address: It is the combination of column heading and row number in a spreadsheet which denotes the current or active cell.

Name Box: This is a small rectangular box at the top-left corner of the Excel spreadsheet below the menu bar, which displays the cell address of an active cell.

Formula Bar: It is situated right-ward beside the Name Box above the spreadsheet. Formula Bar displays the information contained in a cell or the information being entered into an active cell. It shows the content of an active cell whether it is a text, number or formula (functions).

Active Cell: This is a term used to describe a cell that is selected for an action. It is denoted by a black rectangular box showing a cell that is ready for editing or entering in of information.


STATA

The name "STATA" is a syllabic abbreviation of the words "Statistics" and "Data."

STATA is a powerful and widely used software package for data analysis, statistics, and data management. It is popular in academic, research, and professional settings and is especially well-suited for handling large and complex datasets.

Early Development (1980s): The first version of STATA, STATA 1.0, was released in 1985. It was initially developed by a group of economists who sought a comprehensive and flexible tool for data analysis and econometrics. The name "STATA" is a syllabic abbreviation of the words "Statistics" and "Data."

Growth and Expansion (1980s-1990s): STATA quickly gained popularity among researchers, especially in the field of economics and social sciences. It underwent several updates and expansions during this period, becoming more sophisticated and versatile.

Graphics and Data Management (1990s-2000s): In the 1990s, STATA introduced significant improvements in graphics and data management capabilities. These enhancements allowed researchers to explore and visualize data more effectively.

STATA 10 (2007): STATA 10, released in 2007, marked a major milestone with the introduction of the "long strings" feature, which allowed handling longer text variables and more extensive data.

Interactivity and Automation (2010s): Recent versions of STATA have focused on enhancing interactivity and automation. Features like the graphical user interface (GUI), do-files, and ado-files have made it easier for users to conduct complex data analysis and automate repetitive tasks.

Current State (2020s): STATA continues to be at the forefront of data analysis and statistical software. It is regularly updated to keep pace with emerging research and analysis needs. It is used across various disciplines, including economics, sociology, political science, epidemiology, and public health.

STATA is known for its robust data analysis capabilities, extensive statistical tools, and excellent support for data management. It offers a versatile environment for conducting basic and advanced statistical analysis, regression analysis, hypothesis testing, data manipulation, and data visualization.

One of STATA's key strengths is its capacity to handle large datasets and its adaptability to a wide range of research and analytical needs. Researchers and analysts rely on STATA for tasks such as survey data analysis, panel data analysis, longitudinal studies, and advanced econometric modeling.

STATA also provides extensive documentation, a user community, and technical support, making it a well-supported and widely adopted tool for those engaged in empirical research and data-driven decision-making. It continues to evolve, serving as an essential resource for quantitative analysis in both academic and practical contexts.

Features of STATA Window

Command Line Interface: STATA primarily operates through a command line interface, where users can enter text-based commands to perform various tasks. This is one of the distinguishing features of STATA, and it allows for precise control over data analysis. Users can enter commands directly in the command window, and results are displayed there as well.

The general syntax of STATA command is represented as Command Var1 Var2, Option1, Option2. Note that STATA is case-sensitive and accepts only lower-case letters as it’s command. Variable names should be entered exactly as the appear on the dataset loaded into the STATA software.

Output Window: The output window displays the results of executed commands. It includes tables, graphs, summaries, and other output generated during data analysis. Users can scroll through and save these results for reporting and documentation.

Data Editor: STATA provides a data editor where users can view and edit data in a spreadsheet-like format. This is particularly useful for manually entering or modifying data. The data editor allows you to work with datasets directly and provides tools for data cleaning and transformation.

Results Viewer: The results viewer is a dedicated window for examining and managing output. It offers a structured view of the results, making it easier to navigate, save, and export tables and graphs. It provides better organization and retrieval of output than the standard output window.

Variable Manager: The variable manager is a tool for managing and editing variable properties in your dataset. Users can change variable labels, data types, formats, and more using this feature.

Review Window: The review window is useful for documenting your analysis. It allows you to record your work, including commands, comments, and explanations, making it easier to reproduce and understand your analysis at a later time. This is valuable for reproducibility and collaboration.

Data Browser: The data browser provides a quick and interactive way to inspect and explore the data. Users can filter and sort the data, view specific observations, and perform simple calculations using this tool.

Graph Editor: For data visualization, STATA offers a graph editor that allows users to customize and fine-tune graphs created using various plot and graph commands. This editor enables detailed control over the appearance and content of graphs.

DO-Files: DO-files are text files that contain a series of STATA commands. Users can create and save DO-files to automate tasks, reproduce analysis, and document the steps taken during data analysis. DO-files are useful for maintaining a record of your work.