Untuk menampilkan informasi struktur dataset dapat menggunakan fungsi info() dari pandas.
Data diambil dari https://catalog.data.gov/dataset/alzheimers-disease-and-healthy-aging-data/
#!/usr/bin/env python3
import pandas as pd
data = pd.read_csv("Alzheimer_s_Disease_and_Healthy_Aging_Data.csv")
print(data.info())
Outputnya
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 178539 entries, 0 to 178538
Data columns (total 39 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 RowId 178539 non-null object
1 YearStart 178539 non-null int64
2 YearEnd 178539 non-null int64
3 LocationAbbr 178539 non-null object
4 LocationDesc 178539 non-null object
5 Datasource 178539 non-null object
6 Class 178539 non-null object
7 Topic 178539 non-null object
8 Question 178539 non-null object
9 Response 0 non-null float64
10 Data_Value_Unit 178539 non-null object
11 DataValueTypeID 178539 non-null object
12 Data_Value_Type 178539 non-null object
13 Data_Value 120885 non-null float64
14 Data_Value_Alt 0 non-null float64
15 Data_Value_Footnote_Symbol 70619 non-null object
16 Data_Value_Footnote 70619 non-null object
17 Low_Confidence_Limit 120750 non-null float64
18 High_Confidence_Limit 120750 non-null float64
19 Sample_Size 0 non-null float64
20 StratificationCategory1 178539 non-null object
21 Stratification1 178539 non-null object
22 StratificationCategory2 178539 non-null object
23 Stratification2 178539 non-null object
24 StratificationCategory3 0 non-null float64
25 Stratification3 0 non-null float64
26 Geolocation 159375 non-null object
27 ClassID 178539 non-null object
28 TopicID 178539 non-null object
29 QuestionID 178539 non-null object
30 ResponseID 0 non-null float64
31 LocationID 178539 non-null int64
32 StratificationCategoryID1 178539 non-null object
33 StratificationID1 178539 non-null object
34 StratificationCategoryID2 178539 non-null object
35 StratificationID2 178539 non-null object
36 StratificationCategoryID3 0 non-null float64
37 StratificationID3 0 non-null float64
38 Report 0 non-null float64
dtypes: float64(12), int64(3), object(24)
memory usage: 53.1+ MB
Dari output tersebut dapat diketahui antara lain:
- Data tersebut memiliki 178539 observasi (baris) dan 39 kolom
- Nama kolomnya apa saja
- Tipe data masing-masing
- Besarnya memory yang digunakan