A journal of habit-forming shenanigans
ποΈ Date: 2023-02-15
Course
While taking the course Introduction to Statistics as part of the track Data Analyst in SQL, I had the chance to review probability, distributions, the central limit theorem, correlation, and hypothesis testing. While revising the dependence and conditional probabilities, I was also able to recall the normal and poisson distributions (k = * n).
I also took Introduction to SQL as part of the same curriculum, which helped me revise the basic sql queries to read and view data from tables. Because of this revision, I learned about "VIEW," a concept I wasΒ never aware of before. To summarize, views are virtual tables whose contents are determined by queries. It only allows you to restrict access to the database and does not significantly increase the performance of SQL queries. Nonetheless, it was a useful trick to have in my SQL toolbox for increasing readability.
ποΈ Date: 2023-02-16
Course
Continuing on from Day 1, I chose the Intermediate SQL course from the same track, which included queries for selecting, filtering, aggregating, sorting, and grouping. Unlike the previous time, I did not get to learn a new concept, but it was a good recollection of all these principles, particularly concerning conventions for writing SQL to promote readability, as I had become a little sloopy regarding this.
ποΈ Date: 2023-02-17
Course
I took the course Joining Data in SQL, the fifth Course under the track Data Analyst in SQL. It included an introduction to various types of joins (inner, outer, cross & self) as well as set theory (union, intersect & except) joins. The cross joins and set theory section was incredibly beneficial as my perspective on desiging tables using minimal readable query was expanded due to these concepts. While I recall reading about it in my undergrad curriculum, putting it into practice has helped me comprehend it much better. In addition, subqueries in the "WHERE", "FROM" and "SELECT" keywords were covered in the course. I had never used subqueries in the "SELECT" & "FROM" section before, hence I learned some cool tricks up my sleeves. I have added some syntaxes that I learned as follows:
--- Creates all possible combinations
SELECT column_name(s)
FROM table1
CROSS JOIN table2;
--- UNION Operator : shows unique rows
SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;
--- UNION ALL Operator : shows duplicate rows
SELECT column_name(s) FROM table1
UNION ALL
SELECT column_name(s) FROM table2;
--- EXCEPT Operator : shows rows not present in the table
SELECT column_name(s) FROM table1
EXCEPT
SELECT column_name(s) FROM table2;
--- Example 1: Sub query with in WHERE
SELECT name, country_code
FROM cities
WHERE name in (
SELECT capital
FROM countries
)
--- Example 2: Sub query with in SELECT
SELECT countries.name AS country_name, (
SELECT COUNT(\*)
FROM cities
WHERE cities.country_code = country.code
) AS cities_num
FROM countries
--- Example 3: Sub query with in FROM
SELECT coutries.name AS country_name, lang_num
FROM countries,
(SELECT code, COUNT(\*) AS lang_num
FROM languages
GROUP BY code) AS sub
WHERE countries.code = sub.code
ORDER BY lang_num DESC;
ποΈ Date: 2023-02-20
Course
Machine learning, the most trending topic in today's generation is nothing more than a series of if and else statements. With SQL, a similar scenario occurs when you use the CASE statement to insert new values into a table based on existing records. To be more specific, the first module in Data Manipulation in SQL that I took,' 'We'll Take the CASE' module focused on using case statements to generate labels, probability, and percentage based on supplied criteria. While accounting for only one-quarter of the course, this subject proved useful in a variety of ways. The following are some examples of the statement:
--- Example 1 : Basic
SELECT title,
length,
CASE
WHEN length> 0 AND length <= 50
THEN 'Short'
WHEN length > 50 AND length <= 120
THEN 'Medium'
WHEN length> 120
THEN 'Long'
ELSE
'Outlier'
END AS duration
FROM film
ORDER BY title;
--- Example 2 : Count
SELECT
c.name AS country,
-- Count games from the 2012/2013 season
count(CASE WHEN m.season = '2012/2013'
THEN m.id ELSE NULL end) AS matches_2012_2013
FROM country AS c
LEFT JOIN match AS m
ON c.id = m.country_id
-- Group by country name alias
GROUP BY country;
--- Example 3 : Percentage
SELECT
c.name AS country,
-- Round the percentage of tied games to 2 decimal points
ROUND(AVG(CASE WHEN m.season='2013/2014' AND m.home_goal = m.away_goal THEN 1
WHEN m.season='2013/2014' AND m.home_goal != m.away_goal THEN 0
END),2) AS pct_ties_2013_2014,
ROUND(AVG(CASE WHEN m.season='2014/2015' AND m.home_goal = m.away_goal THEN 1
WHEN m.season='2014/2015' AND m.home_goal != m.away_goal THEN 0
END),2) AS pct_ties_2014_2015
FROM country AS c
LEFT JOIN matches AS m
ON c.id = m.country_id
GROUP BY country;
ποΈ Date: 2023-02-21
Course
Continuing the remaining modules Data Manipulation in SQL course, I was able to gain insights on Simple Subqueires Joins, Correlated Subqueries (takes higher processing time), Multiple/Nested Subqueries, and Common Table Expressions (CTE). These concepts were handful in allowing to perform complex actions within SQL and gain data points that I once thought were only possible through pandas (a python library).
However, more significantly, I learned about window functions and the various types, such as Over, Rank, Partition, and Slide, throughout this course. While I had seen it before, I had never utilized it in practice, and I am pleased that this course allowed me to do so. Aggregating on columns that aren't in the grouping columns is likely the most useful skillΒ to have, especially when doing comparative analysis.
SELECT
-- Select country ID, date, home, and away goals from match
main.country_id,
main.date,
main.home_goal,
main.away_goal
FROM match AS main
WHERE
-- Filter for matches with the highest number of goals scored
(home_goal + away_goal) >
(SELECT MAX(home_goal + sub.away_goal)
FROM match AS sub
WHERE main.country_id = sub.country_id
AND main.season = sub.season);
WITH match_list AS (
SELECT
country_id,
id
FROM match
-- Select league and count of matches from the CTE
SELECT
l.name AS league,
COUNT(match_list.id) AS matches
FROM league AS l
-- Join the CTE to the league table
LEFT JOIN match_list ON l.id = match_list.country_id
GROUP BY l.name;
-- Example 1 : Over function
SELECT
m.id,
c.name AS country,
m.season,
m.home_goal,
m.away_goal,
-- Use a window to include the aggregate average in each row
AVG(m.home_goal + m.away_goal) OVER() AS overall_avg
FROM match AS m
LEFT JOIN country AS c ON m.country_id = c.id;
-- Example 2 : Rank function
SELECT
l.name AS league,
AVG(m.home_goal + m.away_goal) AS avg_goals,
-- Rank each league according to the average goals
RANK() OVER(ORDER BY AVG(m.home_goal + m.away_goal) DESC) AS league_rank
FROM league AS l
LEFT JOIN match AS m
ON l.id = m.country_id
WHERE m.season = '2011/2012'
GROUP BY l.name
ORDER BY league_rank;
-- Example 3 : Partition function
SELECT
c.name,
m.season,
(home_goal + away_goal) AS goals,
AVG(home_goal + away_goal)
OVER(PARTITION BY m.season, c.name) AS season_country_avg
FROM country AS c
LEFT JOIN match AS m
ON c.id = m.country_id;
-- Example 4 : Sliding Function
SELECT
date,
home_goal,
away_goal,
-- Create a running total and running average of home goals
SUM(home_goal) OVER(ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total,
AVG(home_goal) OVER(ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_avg
FROM match
WHERE
hometeam_id = 9908
AND season = '2011/2012';
ποΈ Date: 2023-02-22
Course
Articles
With the continuation of window functions, I have gotten slightly familiar with the notion of window function types, particularly fetching, framing, and ranking functions, which I had practiced today. While these functions seemed intimidating at first, they turned out to be considerably easy than I had anticipated.
Beside this, I attempted to put my knowledge into practice by answering practice questions in the "Advanced sql" sectionΒ of kaggle. It was a valuable experience since I was able to accurately utilize window functions and also learn about the 'UNNEST' function to load nested and repeated data from the tables.
Operator | Description |
---|---|
LAG(column, n)
|
Returns column's value at the row n rows before the current row |
LEAD(column, n)
|
Returns column's value at the row n rows after the current row |
FIRST_VALUE(column)
|
Returns the first value in table or partition |
LAST_VALUE(column)
|
Returns the last value in table or partition |
Operator | Description |
---|---|
ROW/RANGE | Uses the given row or range as a frame. |
PRECEDING | Rows before the current row. |
UNBOUNDED PRECEDING | Return all rows before the current row. |
UNBOUNDED FOLLOWING | Return all rows after the current row. |
CURRENT ROW | Current row of query execution. |
Operator | Description |
---|---|
ROW_NUMBER | Unique sequential number for each row in the specified partition |
RANK | Unique rank number for the each distinct row within the specified partition, but equal values share same rank |
DENSE_RANK | Unique rank number for the each distinct row within the specified partition without skipping any duplicate values |
NTILE | Distribute the rows in to the rows set with a specific n number of groups. |
ποΈ Date: 2023-02-23
Course
Taking a break from the regular SQL courses, I delved into the everyday life of a data scientist, complete with current data science issues and how data scientists manage themselves and the organizations for which they operate. I was also able to take the following course on the non-technical abilities of a successful data scientist, which addressed not just the attributes that a person should have but also the role diplomacy plays while working in a professional setting. In addition, to polish my pandas abilities, I completed a Kaggle Learn course that served as a refresher on the techniques I use on a daily basis.
ποΈ Date: 2023-02-24
Course
Leveraging the same elements in different ways has always lit up the neurons in my brain, allowing me to perceive the world in new ways. This occurred when learning how to use the aggregrate functions within the window functions to obtain new results. In fact, utilizing the same `SUM` and `AVG` functions to deliver moving totals and averages within sql itself with the assistance of frames and aggregrate functions made me leap on top of my bed.Β There were so many things that sql could do that I had always assumed only pandas could accomplish. While creating sophisticated queries in pandas is faster, the execution time would be much faster if same queries were implemented directly in SQL without loading the dataset into memory.
Continuing this discovery, pivoting tables in SQL was also conceivable with `CROSSTAB`, as well as other beneficial functions like `ROLLUP`, `CUBE`, `COALESCE`, and `STRING AGG`, which would come in handy when relying only on SQL.
Syntax ROWS BETWEEN [start] AND [finish]
n PRECEDING
: n
rows before the current row
CURRENT ROW
: the current row
n FOLLOWING
: n
rows after the current row
Examples
ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
ROWS BETWEEN 4 PRECEDING AND 4 FOLLOWING
ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING
Before using crosstab, use the to create an extension
CREATE EXTENSION IF NOT EXISTS tablefunc;
SELECT * FROM CROSSTAB($$
source_sql TEXT
$$) AS ct(
column_1 DATA_TYPE_1,
column_2 DATA_TYPE_2,
...,
column_n DATA_TYPE_N
);
SELECT
country, warehouse, SUM(quantity)
FROM
inventory
GROUP BY ROLLUP (country, warehouse);
`ROLLUP` is hierarchical, de-aggregrating from the leftmost provided column to the right-most.
ROLLUP (country, warehouse) -- includes country level totals
ROLLUP (warehouse, country) -- includes warehouse level totals
However, when we need all possible group-level aggregrations, we use `CUBE` which shares similar properties to `ROLLUP`.
CUBE (country, warehouse) -- country level and warehouse level, and grand total
COALESCE(null, null, 1, null, 2) -- returns 1
- STRING_AGG `STRING_AGG(column, separator)` takes all the values of a column and concatenates them, with `separator` in between each value.
ποΈ Date: 2023-02-25
Course
The focus of today's course was on data manipulation in PostgreSQL utilizing both built-in and user-defined functions. The built-in functions of PostgreSQL included common data types and their casts, date/time functions and operators, and string parsing and manipulation functions. While the most of the operators were familiar, I learned about several new ones, such as `INTERVAL` and `INITCAP`. Nevertheless, the postgreSQL extensions and full-text search capabilities were entirely new subjects, particularly `tsvector` (text search vector) to execute a full text search beyond the scope of the 'LIKE' operator. Knowing that PostgreSQL offers built-in extensions such as fuzzy string matching through 'levenshtein' and'similarity' blew my mind as I had previously only used it in Python. Learning the syntax to develop my own functions was also quite instructive. Overall, it was a productive weekend spent learning more about PostgreSQL.
-- Example 1 : Extracting all table names from system database
SELECT table_name, table_type
FROM INFORMATION_SCHEMA.TABLES
WHERE table_schema = 'public';
-- Example 2 : Extracting column data types from table
SELECT
column_name,
data_type
FROM INFORMATION_SCHEMA.COLUMNS
WHERE table_name = 'actor';
INTERVAL '3 days' -- goes forward in time
INTERVAL '2 months ago'; -- goes back in time due to the keyword 'ago'
INTERVAL '3 hours 20 minutes';
-- Example 1 : Addition of timeframe
SELECT rental_date + INTERVAL '2 days' as expected_return
FROM rental;
-- Example 2: Conversion of column to interval
SELECT INTERVAL '1' day \* rental_duration
FROM rental
Operator | Description |
---|---|
AGE() | Subtract with current_date (at midnight) when empty and with the other arguments when two values are provided |
NOW() | Get current timestamp with microsecond precision |
CURRENT_TIMESTAMP() | Gets similar timestamp to now but allows precision parameter to round off seconds |
CURRENT_DATE/CURRENT_TIME | Get current date and time |
EXTRACT( field from source ) |
Get subfield |
DATE_PART(' field ', source ) |
Get subfield (equivalent to extract) |
DATE_TRUNC(' field ', source ) |
Truncate timestamp or interval data types with precision |
ISFINITE() | Test for finite date, time and interval (not +/-infinity) |
Operator | Description |
---|---|
UPPER/LOWER( source ) |
Converts column to upper or lower case |
INITCAP( source ) |
Converts column to title case |
REPLACE( source , ' find_string ', ' replace_string ') |
Replaces the source string with the replacement string |
REVERSE( source ) |
Reverses the string |
LENGTH( source ) |
Extract the length of the string |
POSITION(' char ' IN source ) |
Extract the first position of a character in a string |
LEFT( source , n ) |
Extract the n number of characters from left side of the given source |
RIGHT( source , n ) |
Extract the n number of characters from right side of the given source |
SUBSTRING( source , start , length ) |
Extract a string containing a specific number of characters from a particular position of a given string |
TIRM([leading|trailing|both] [characters] FROM source ) |
Removes characters from source |
LPAD( source , n , char ) |
Left-pads a string with another string, to a certain length |
RPAD( source , n , char ) |
Right-pads a string with another string, to a certain length |
-- Example 1 : Check if the title contains 'elf'
SELECT title, description
FROM film
WHERE to_tsvector(title) @@ to_tsquery('elf');
- Fuzzystring
-- Enable the fuzzystrmatch extension
CREATE EXTENSION IF NOT EXISTS fuzzystrmatch;
-- Confirm that fuzzystrmatch has been enabled
SELECT extname FROM pg_extension;
SELECT levenshtein('hello', 'jelly'); -- number of edits required to be a perfect match
SELECT similarity('hello', 'jelly'); -- similarity between two strings from 0 to 1
CREATE TYPE dayofweek AS
ENUM('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday');
-- Check
SELECT typname, typcategory
FROM pg_type
WHERE typname='dayofweek';
CREATE FUNCTION squared(i integer) RETURNS integer AS $$
BEGIN
RETURN i * i;
END;
$$ LANGUAGE plpgsql;
ποΈ Date: 2023-02-27
Course
Breaking the usual heavy dosage of study sessions, this particular course covered about the usage of relationship diagrams, constraints (primary key, foreign key, unique and not null), and data types for the columns. The most significant functions from this course are 'corr' and 'percentile desc,' which allow you to get correlation and discrete value from a percentile. Moreover, temporary tables were a notion I had heard of but had never used in practice, and this course was a huge help in reinforcing the concept of breaking large queries into smaller chunks.
-- Cast Function syntax
SELECT CAST (value AS value_type);
-- Alternate Cast Function with :: notation
SELECT value::new_type;
-- Example 1 : Casting float to integer
SELECT CAST (3.7 AS integer);
-- Example 1 : Basic series
SELECT generate_series(1, 10, 2);
-- Example 2 : Float series
SELECT generate_series(0, 1, 0.1);
Function | Description |
---|---|
CORR( source1 , source2 ) |
Returns the correlation between two columns |
percentile_disc( percentile ) WITHIN GROUP (ORDER BY column_name ) |
Returns the value representing the percentile of the column using discrete method |
-- Dropping the table
DROP TABLE IF EXISTS table_name
-- Create a temporary table
CREATE TEMP TABLE table_name AS
SELECT column1, column2
FROM table;
ποΈ Date: 2023-02-28
Course
The remaining modules of the course delved into the topic of character types in PostgreSQL, specifically character, varchar, and text. It also covered common challenges that arise when grouping categorical variables and dealing with unstructured text data. The modules included exercises on data cleaning such as dealing with cases and white spaces, as well as data manipulation techniques such as splitting strings using delimiters and concatenating multiple strings. Additionally, the course covered working with date and timestamps to create complex queries through series.
-- Syntax
SELECT generate_series(from, to, interval);
-- Example 1
SELECT generate_series('2018-01-01', '2018-01-15', '2 days'::interval)
ποΈ Date: 2023-03-01
Course
ProjectWith all the skills that I had accumilated so far, it was only about implementing them. While a proper implementation is yet to come, I could still practice within a real evironment through the course "Data-Driven Decision Making in SQL" and the project "When Was the Golden Age of Video Games?". These allowed me to use all of the concepts from data cleaning, manipulation to aggregration and concentrated on using groupings, joins and pivots to create complex tables. Today marks the end of the career track, and I'm over the moon with all the knowledge I've gained in these 12 days. Yay for learning!
ποΈ Date: 2023-03-02
Course
Before diving into the world of mathematica, I needed to grasp the foundations that I would need to build as a Data Analyst. Attending the LinkedIn Learning career course "Data Science & Analytics Career Pathways & Certifications" was quite beneficial in this regard. It began by discussing the applications of data science, such as fraud detection, social media analytics, disease control, dating services, simulations, climate research, and network security. It also discussed the abilities required to be relevant in the sector. Data mining, machine learning, natural language processing, statistics, and visualization were among the crucial skills mentioned. It also discussed certificates that can help advance one's career and establish one as a specialist in a particular subject. Overall, the course was beneficial in aiding comprehension of the principles of being relevant in the ever-changing world of data science.
ποΈ Date: 2023-03-03
Course
As visualizing data through narrative storytelling is one of the most crucial skills for a data analyst to have, which sets them apart from their colleagues. I took a data visualization course that included story structure and its components (begining, middle, end, plot, protagonist, problem and transformation). It also demonstrated the use of flow diagrams to successfully represent linear data flow for effective story telling. Most notably, the course taught the principles of learning to demonstrate your analytic abilities utilizing the 4x4 progressive depth model:
The watercooler moment
The cafe content
The research library
The Lab Experience
ποΈ Date: 2023-03-05
Course
I took a break from learning today to prepare for the journey ahead! I made my own IBM account and configured Watson Studio to publish notebooks directly to my GitHub repository. I also explored in the world of SpaceX's rest API in order to extract useful data for future projects. We can get so enthused in learning new things that we forget to take a deep breath and get organized. However, not today.
ποΈ Date: 2023-03-06
Course
Building on yesterday's exploration, today was all about extracting launch data from SpaceX using requests and beautiful soup. The objective was to determine the fruitfulness of starting a new business for a hypothetical company, SpaceY. During the course, I delved into the concepts of Exploratory Data Analysis and Feature Engineering, utilizing both python and SQL to analyze the data. Wrapping up with data science, I visually represented our findings using scatterplots and barplots to identify factors such as landing site, booster, and payload mass that can contribute to a higher success rate.
Aside from that, I took an AI ethics course and was introduced to Human-Centered-Design for AI and its significance. It not only helped me assess whether a project is worth transitioning to be done underΒ AI, but it also helped me grasp that AI systems are more effective when they work alongside people rather than independently. Also, I learned about the numerous types of biases and fairness that can emerge in an ML model when biased data/model is used, as garbage in, garbage out.
ποΈ Date: 2023-03-07
Course
After completing exploratory data analysis, I delved into creating an interactive dashboard with plolty dash and folium to facilitate in real-time data analysis. It was a good refresher on the concept of dash callbacks to help translate user inputs and update existing charts based on those inputs. In addition, as part of the course, I touched on predictive analysis to determine the optimum model and hyperparameters needed to develop a model capable of predicting the launch's success rate. To do this, I used Preprocessing, GridSearchCV, LogisticRegression, DecisionTreeClassifier, and KNeighborsClassifier to help automate model selection, as well as a confusion matrix to evaluate true accuracy much more clearly.
With plenty of time left in the day, I investigated the creation of an effective data analysis report and its components. While data reports vary depending on the use and data included, I was able to get a general idea of how a data report should look through the course.
ποΈ Date: 2023-03-08
Course
After a thorough analysis of Space X's launches, it was time to predict the first stage's successful landing to give competition to the likes of Space X with the assistance of Company Y. Armed with a lengthy 50-page presentation, a combination of online resources and a dash of personal passion was instrumental in completing the task, and in the process, honed valuable presentation creation skills. In addition, the power of context cannot be overstated, as it aided in comprehending the insights more easily, with an executive summary for those uninterested in the subject matter. All in all, it was a remarkable learning experience that showcased the importance of a compelling narrative and a comprehensive overview for maximum impact.
ποΈ Date: 2023-03-09
Course
In the data governance course, I gained insights into the significance of efficient data management in organizations. The course taught me that data governance involves creating and enforcing policies, procedures, and standards to manage data assets of an organization, which includes data privacy, quality, security, and access. A crucial lesson that I learned was how data governance plays a critical role in ensuring the trustworthiness and correctness of an organization's data. It enables high-quality data that can be relied upon to drive decision-making processes. Moreover, data governance can also aid organizations in complying with regulatory obligations related to data privacy and security.
ποΈ Date: 2023-03-10
Course
Six individuals, each with their unique experiences in data, shared their stories through the course. These narratives covered their journeys starting out in data analytics, the inspiration behind their work, the impact of their contributions on the organization, their current endeavors, and practical advice based on their experiences. One of the prominent discussions was about the ethical considerations that data scientists face while conducting an analysis, where certain data points may conflict with their personal values. However, what had a significant impact on my outlook towards the data science field was gaining insights about the industry and the people involved in it during the course.
In addition to my current pursuits, I have become interested in side hustle strategies for data science. The monotony of only having one job motivated me to seek out new opportunities to expand my abilities and skills. I discovered a range of options such as writing, training, consulting, attending conferences, and engaging with academics. These activities may include co-authoring a book, writing a chapter in a second edition, providing training in R or Python during free time, through in-site or online classes. These endeavors not only benefit the individual but can also contribute to growing the data science industry. As one gains expertise, opportunities such as giving speeches at conferences and consulting with organizations can lead to expanding networks and discovering new possibilities. It's important to note that if one is currently involved in academics as a student or teacher, there are resources beyond the classroom that can be taken advantage of, such as university libraries and websites like GitHub Education. By making the most of what is available and staying informed about the latest tools and patterns in data analytics, one can continue to expand their knowledge and skills in this new and exciting field.
ποΈ Date: 2023-03-19
Course
Today marks the first day of my journey into the world of Tableau, following a 7-day break from my usual learning streak. I began my day by loading workbooks and familiarizing myself with the navigation of Tableau through the menu pane and tool bar. Next, I dove into the world of sorting and various types of filters (extract filter, data source filter, context filter, dimension filter, and measure filter) in Tableau. I then proceeded to using aggregration and creating custom columns using calculated fields that leverage the inbuilt functions within Tableau. Moving on, I explored the topic of creating visualizations on geo maps using geocoding. I was thrilled to see how easy it was to map data and extract meaningful insights based on the location of the data points. I also learned how to work with dates and create reference lines, trend lines, and forecasting using Tableau. Finally, I learned how to convey my findings with dashboards and stories in Tableau.
Overall, it was an interesting and exciting day of learning to fully comprehend the tool that has/can significantly decrease my burden by providing rapid visualizations that would take hours to complete in Python. I'm looking forward to applying these skills to real-world scenarios and deepening my knowledge in the days to come.
ποΈ Date: 2023-03-21
Course
Today's focus was on preparing for analysis using Tableau. I began by familiarizing myself with various slicing and dicing functions and visualizations that are readily available in the software. It was exciting to get the opportunity to make complex visualizations and help analyze a hypothetical scenario. The scenario contained a "Two by two discount" promotional campaign, which seeks to increase customer purchases on the two overall lowest riding days, as well as the two lowest windows of time, excluding nighttime. Some of my favorite visualizations included a heat map to show sales volume by product and a scatter plot to show the correlation between time of day and sales. It was a challenging yet rewarding experience that allowed me to build on my existing Tableau skills and gain new insights into data analysis.
Furthermore, I explored more of Tableau's capabilities with user data. I added filters and created KPI for dashboards, which helped to better understand user behavior and trends. Additionally, I created a histogram using bins and compared it with a traditional line chart to understand the distribution of a specific variable. I found that the histogram provided a more granular view of the data and revealed insights that were not visible through the line chart.
ποΈ Date: 2023-03-27
Course
Today, I learnt how to use Tableau to map customer activity by analyzing popular bike locations and looking at each station's user base. This involves using various techniques such as color, size, layers, tooltips, pages, and quick table calculations to create maps that reveal trends and insights in the data. I have also learned the importance of adding other charts inside tooltips to view data more closely and provide additional context.
Similarly, tableau provides various features that can be used to group, filter, and parameterize data in order to gain insights and create more meaningful visualizations that I was able to utilize. To explain further, groups can be used to combine similar data points into a single category. Sets are similar to groups, but instead of combining data points into a category, they allow you to create subsets of data based on specific criteria. Parameters allow you to create user-defined inputs that can be used to adjust various aspects of a visualization.
ποΈ Date: 2023-03-28
Course
Today, I learned how to create interactive dashboards in Tableau that can be used to share data insights with others. By combining different worksheets and allowing self-serve data exploration, I was able to create dashboards that provide users with the ability to analyze data in a more interactive and meaningful way. Through the course, I learned best practices for creating dashboards, including alignment and adding worksheets inside tooltips to allow for better analysis. Using the Divy dataset provided in DataCamp, I was able to create various dashboards with appropriate worksheets, parameters, objects, and actions to increase the readability and interactivity of the dashboard. Overall, I found the process of creating interactive dashboards in Tableau to be both challenging and rewarding, and I look forward to applying these skills in future data science projects.
ποΈ Date: 2023-03-30
Course
Today, I learned about sharing insights from data using different techniques, with a focus on the importance of effective presentation. Specifically, I gained knowledge on the ways to utilize the power of Tableau to create visually appealing and informative presentations through the use of stories, annotations, and highlights. This will enable me to effectively convey key points to my audience and keep their attention focused on the most important parts of the visualization. Additionally, I discovered tips for optimizing dashboards to ensure they are responsive and user-friendly on mobile devices.
ποΈ Date: 2023-04-01
Course
Today's learning focused on understanding the data analysis flow in Tableau, which involves data check, data exploration, analyzing and visualizing data, dashboarding, and communicating insights. To effectively learn these concepts, I decided to download Tableau Public and perform the analysis. This decision proved to be helpful, as it eliminated any lag present while using the virtual machine provided by Datacamp, allowing me to work faster and more efficiently. However, using the public version meant that I was restricted and couldn't save my progress. During the analysis process, I utilized a range of visualizations, such as dual-axis graphs, scatter plots, and maps, to better understand the data. Additionally, I made use of calculated fields, bins, and bin sizes, which helped me analyze the data more effectively. To take advantage of Tableau's interactivity feature, I leveraged the power of filters, visualizations as filters, and parameters. By understanding the data analysis flow in Tableau and utilizing its features, I was able to effectively find customer churn for a hypothetical company.
ποΈ Date: 2023-04-02
Course
Today's course focused on the crucial task of combining multiple datasets into one. As data rarely comes in one single file, this process is essential in the real world. Fortunately, Tableau has built-in features to combine multiple datasets using unions, joins, and relationships, which made the course comparatively straightforward. Along with learning about combining datasets, I was also introduced to the different types of Tableau file formats, including packaged workbooks, workbooks, data sources, and extracts. These formats enable users to share their worksheets with their peers easily. Although the course was relatively short compared to others I had taken, it was crucial as it laid the foundation for the data import process, which is arguably the most important part of data analysis as analysis can't start without data.
ποΈ Date: 2023-04-03
Course
Today's course was focused on the importance and need of data visualization in businesses, as well as the process of creating effective data visualization using Tableau. In addition to this, the course emphasized the use of various visualization tools such as boxplots, waterfall/bridge charts, heat maps, and scatter plots to depict complex data analysis, using IMdb movie reviews as a case study. Despite being mentally exhausted, I was able to learn a great deal from today's class.
ποΈ Date: 2023-04-06
Course
Despite feeling exhausted from work and physically sick, I remained committed to learning and decided to focus on one manageable module. Today's course centered on the best practices for creating effective data visualizations, highlighting common mistakes such as using the wrong type of visualization, misleading design elements, and mishandling missing data. The course also emphasized the importance of designing dashboards thoughtfully, considering factors such as design, formatting, and interactivity, rather than overloading a single page with charts. What really hit home for me, though, was the importance of designing dashboards thoughtfully. Instead of cramming all the charts onto one page, it's crucial to consider the design, formatting, and interactivity of the dashboard. This way, you can make sure that your dashboard is clear, concise, and easy to use.
ποΈ Date: 2023-04-07
Course
Taking a break from Tableau, I jumped back into the LinkedIn course for becoming a data scientist. As expected, the course started with a focus on preparing data using in-house data, open data, and third-party data sources, all while adhering to data ethics and staying within the scope of the analysis. Next, the course covered sorting and filtering data to perform exploratory data analysis, and emphasized the usage of ratios to gain a better understanding of the overall picture. But what really caught my attention was when the course revisited various charts that can be used to visualize data. The instructor highlighted the limitations of pie charts, and warned against using them as they can often mislead the data. Instead, the course introduced me to some new concepts such as dot plots, sparklines, and data maps, and explained their unique use cases. I found it fascinating to learn about these new techniques and how they can help convey complex data in a more meaningful way.
ποΈ Date: 2023-04-08
Course
Today, I continued with the remaining modules from the last course and came across a section that was all about describing data. This involved looking at various methods to access the center, variability, rescale, and associations within a dataset. When it comes to finding the center of the data, data analysts usually refer to the mode (most common score), mean (average), and median (middle score/50%). On the other hand, variability or how spread out the data is can be found using techniques such as range (difference from maximum to minimum), quartiles (split into 4 equal parts), IQR (Q3 - Q1), variance, and standard deviation (average distance from the mean). In order to rescale data for better interpretation, comparison, and probability analysis, the course introduced the use of z-score. The course concluded with a summary of basic and conditional probability, along with sampling variations.
ποΈ Date: 2023-04-10
Course
Today, as I returned to Tableau, I delved into the world of digital cartography by learning how to create maps using geo-coordinates. Although it was a short module, it covered some essential topics from converting a measure to geo-coordinate to creating your own maps using x and y coordinates. However, the most exciting part of the module was the visualization of paths for a hypothetical museum where the size of the line represented the number of visitors who visited based on the hour. This technique provides a fascinating and intuitive way to represent data, and I look forward to using it in my future visualizations.
ποΈ Date: 2023-04-11
Course
In today's course, I encountered advanced visualization techniques such as the waffle chart, DNA chart, sparklines, and the famous Sankey chart. It was fascinating to see how these visualizations could be used to communicate complex information in an intuitive way. The waffle chart, for example, allowed me to represent the percentage of a whole using a grid of squares, while the DNA chart provided a unique way to represent hierarchical data. Additionally, the sparkline chart enabled me to visualize trends over time using a small line chart within a single cell. Although the Sankey chart wasn't a part of the module, the remaining charts challenged me to find creative ways to create them within the limitations of Tableau.
ποΈ Date: 2023-04-12
Course
Today's course in Tableau was focused on performing calculations within the software to create new columns and rows using calculated fields. Throughout the lesson, I encountered various types of errors, which were detected by Tableau, as well as errors that resulted from the equations themselves. The course provided exercises to sharpen my skills in finding and tackling these errors. Following this, the course delved into data granularity and how it increases with the number of dimensions exposed in Rows and Columns or in the Marks, such as detail, color, shape, and so on. While "Group by" is used to tackle data granularity in pandas and SQL, in Tableau, LOD expressions are used to compute aggregations easily. There are three types of LOD Expressions: INCLUDE, EXCLUDE, and FIXED, each controlling the level of detail at which a calculation is performed and not depending on the dimensions used in the visualization.
ποΈ Date: 2023-04-16
Course
Today's module focused on table calculations, covering both inbuilt and custom expressions based on single and multiple dimensions. Various examples of table calculations were explored, including running average, running table, window sum, and more. Following this, the module also revisited parameters, which allow for custom interactions between the user and the worksheet.
ποΈ Date: 2023-04-17
Course
Today's course module was centered on time series analysis, including the common calculations used in time series, along with a summary of how to use individual rows of data to create date and time. After covering these topics, I performed analysis on the data source to find patterns and growth by analyzing seasonality and year-to-year trends.