IMPLEMENTASI BIG DATA ANALYTICS DALAM KLASIFIKASI KUALITAS UDARA MENGGUNAKAN ALGORITMA GRADIENT-BOOSTED TREE CLASSIFIER PADA PYSPARK

Authors

  • Muhamad Fuat Asnawi Universitas Sains Al-Qur'an
  • Nur Fitriyanto Universitas Amikom Yogyakarta
  • M. Agoeng Pamoengkas Universitas Amikom Yogyakarta

DOI:

https://doi.org/10.58641/technomedia.v2i1.124

Keywords:

Big Data Analytics, Gradient-Boosted Tree, Kualitas Udara, PySpark

Abstract

This study aims to classify air quality based on PM1.0, PM2.5, and PM10 parameters using a Big Data Analytics approach with the Gradient-Boosted Tree Classifier (GBT) algorithm implemented on the PySpark framework. The dataset used was downloaded from OpenAQ, covering the period from April 14, 2021, to April 16, 2023, with a total of 1,048,154 entries, representing a large and complex volume of data. The research process includes data preprocessing to address data imbalance, dataset splitting for training and testing, and hyperparameter tuning using grid search and cross-validation to optimize model performance. By leveraging PySpark’s advantage in parallel processing of large data, the GBT model achieved an accuracy of 98.87%, precision of 99.00%, recall of 98.87%, and an F1-Score of 98.90%. This study demonstrates how Big Data Analytics can enhance efficiency and accuracy in air quality classification, contributing significantly to the development of real-time monitoring systems that support air pollution mitigation and data-driven policy-making.

Downloads

Published

2025-01-31

How to Cite

Muhamad Fuat Asnawi, Nur Fitriyanto, & M. Agoeng Pamoengkas. (2025). IMPLEMENTASI BIG DATA ANALYTICS DALAM KLASIFIKASI KUALITAS UDARA MENGGUNAKAN ALGORITMA GRADIENT-BOOSTED TREE CLASSIFIER PADA PYSPARK. TECHNOMEDIA : Informatics and Computer Science, 2(1), 15–20. https://doi.org/10.58641/technomedia.v2i1.124

Issue

Section

Articles