Mengetahui struktur dataset

Untuk menampilkan informasi struktur dataset dapat menggunakan fungsi info() dari pandas.

Data diambil dari https://catalog.data.gov/dataset/alzheimers-disease-and-healthy-aging-data/

#!/usr/bin/env python3

import pandas as pd
data = pd.read_csv("Alzheimer_s_Disease_and_Healthy_Aging_Data.csv")
print(data.info())

Outputnya

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 178539 entries, 0 to 178538
Data columns (total 39 columns):
 #   Column                      Non-Null Count   Dtype  
---  ------                      --------------   -----  
 0   RowId                       178539 non-null  object 
 1   YearStart                   178539 non-null  int64  
 2   YearEnd                     178539 non-null  int64  
 3   LocationAbbr                178539 non-null  object 
 4   LocationDesc                178539 non-null  object 
 5   Datasource                  178539 non-null  object 
 6   Class                       178539 non-null  object 
 7   Topic                       178539 non-null  object 
 8   Question                    178539 non-null  object 
 9   Response                    0 non-null       float64
 10  Data_Value_Unit             178539 non-null  object 
 11  DataValueTypeID             178539 non-null  object 
 12  Data_Value_Type             178539 non-null  object 
 13  Data_Value                  120885 non-null  float64
 14  Data_Value_Alt              0 non-null       float64
 15  Data_Value_Footnote_Symbol  70619 non-null   object 
 16  Data_Value_Footnote         70619 non-null   object 
 17  Low_Confidence_Limit        120750 non-null  float64
 18  High_Confidence_Limit       120750 non-null  float64
 19  Sample_Size                 0 non-null       float64
 20  StratificationCategory1     178539 non-null  object 
 21  Stratification1             178539 non-null  object 
 22  StratificationCategory2     178539 non-null  object 
 23  Stratification2             178539 non-null  object 
 24  StratificationCategory3     0 non-null       float64
 25  Stratification3             0 non-null       float64
 26  Geolocation                 159375 non-null  object 
 27  ClassID                     178539 non-null  object 
 28  TopicID                     178539 non-null  object 
 29  QuestionID                  178539 non-null  object 
 30  ResponseID                  0 non-null       float64
 31  LocationID                  178539 non-null  int64  
 32  StratificationCategoryID1   178539 non-null  object 
 33  StratificationID1           178539 non-null  object 
 34  StratificationCategoryID2   178539 non-null  object 
 35  StratificationID2           178539 non-null  object 
 36  StratificationCategoryID3   0 non-null       float64
 37  StratificationID3           0 non-null       float64
 38  Report                      0 non-null       float64
dtypes: float64(12), int64(3), object(24)
memory usage: 53.1+ MB

Dari output tersebut dapat diketahui antara lain:

  1. Data tersebut memiliki 178539 observasi (baris) dan 39 kolom
  2. Nama kolomnya apa saja
  3. Tipe data masing-masing
  4. Besarnya memory yang digunakan

 Share!

 
comments powered by Disqus