Capstone Project: Car Accident Severity

Capstone Project: Car Accident Severity

Car Accident Severity

1. Introduction/Business Problem

1.1 Report

Road traffic crashes result in the deaths of approximately 1.35 million people around the world each year and leave between 20 and 50 million people with non-fatal injuries. More than half of all road traffic deaths and injuries involve vulnerable road users, such as pedestrians, cyclists and motorcyclists and their passengers (1).

A number of factors contribute to the risk of collisions, including vehicle design, speed of operation, road design, road environment, driving skills, impairment due to alcohol or drugs, and behavior, notably distracted driving, speeding and street racing (2).

A number of physical injuries can commonly result from the blunt force trauma caused by a collision, ranging from bruising and contusions to catastrophic physical injury (e.g., paralysis) or death (2).

Human factors in vehicle collisions include anything related to drivers and other road users that may contribute to a collision. Examples include driver behavior, visual and auditory acuity, decision-making ability, and reaction speed (2).

1.2 Target

This project looks into using various Python-based machine learning and data science libraries in an attempt to build a machine learning model capable of predicting Car Accident Severity.

2. Data

2.1 Data Source

The Original Data Came from:

Data-Collisions.csv

3. Feature

Create Data Dictionary

  1. LOCATION : Description of the general location of the collision

  2. SEVERITYCODE : A code that corresponds to the severity of the collision:

       * 3 — fatality
       * 2b —serious injury
       * 2 — injury
       * 1 — prop damage
       * 0 — unknown   
    
  3. SEVERITYDESC : A detailed description of the severity of the collision

  4. COLLISIONTYPE : Collision type

  5. PERSONCOUNT : The total number of people involved in the collision
  6. PEDCOUNT : The number of pedestrians involved in the collision. This is entered by the state.
  7. PEDCYLCOUNT : The number of bicycles involved in the collision. This is entered by the state.
  8. VEHCOUNT : The number of vehicles involved in the collision. This is entered by the state.
  9. INJURIES : The number of total injuries in the collision. This is entered by the state.
  10. SERIOUSINJURIES : The number of serious injuries in the collision. This is entered by the state.
  11. FATALITIES : The number of fatalities in the collision. This is entered by the state.
  12. INCDATE : The date of the incident.
  13. INCDTTM : The date and time of the incident.
  14. JUNCTIONTYPE : Category of junction at which collision took place
  15. SDOT_COLCODE : A code given to the collision by SDOT.
  16. SDOT_COLDESC : A description of the collision corresponding to the collision code.
  17. INATTENTIONIND : Whether or not collision was due to inattention. (Y/N)
  18. UNDERINFL : Whether or not a driver involved was under the influence of drugs or alcohol.
  19. WEATHER : A description of the weather conditions during the time of the collision.
  20. ROADCOND : The condition of the road during the collision.
  21. LIGHTCOND : The light conditions during the collision.
  22. PEDROWNOTGRNT : Whether or not the pedestrian right of way was not granted. (Y/N)
  23. SDOTCOLNUM : A number given to the collision by SDOT.
  24. SPEEDING : Whether or not speeding was a factor in the collision. (Y/N)
  25. ST_COLCODE : A code provided by the state that describes the collision.
  26. ST_COLDESC : A description that corresponds to the state’s coding designation.
  27. SEGLANEKEY : A key for the lane segment in which the collision occurred.
  28. CROSSWALKKEY : A key for the crosswalk at which the collision occurred.
  29. HITPARKEDCAR : Whether or not the collision involved hitting a parked car. (Y/N)

4. Methodology

4.1 Data Exploration

We look now for the target values ( SEVERITYCODE ) and do some analysis

image.png

  • Make some visualization

image.png

  • Here we look at the correlation between Feature

image.png

  • Make Correlation more beautiful

image.png

  • We now will select feature that will help us in Machine learning Model

image.png

  • This’s our data now

image.png

  • Make Soma Analysis to our data :

image.png

image.png

image.png

image.png

image.png

  • Remove Outlier like “Unknown”

image.png

image.png

  • Find Patterns between Light Condition VS SEVERITYCODE

image.png

image.png

image.png

image.png

image.png

  • Define Function to Fit and Score all models in one cell:

image.png

  • It takes times to run

image.png

image.png

-Try to improve our model by using different Parameter

image.png

  • I Face some problem with my PC when I try in Random Forest and KNN, so I will work in Logistic Regression

image.png

image.png

image.png

image.png

image.png

5.Discussion

After the analysis data, we found

  1. most car accident severity :

                  - prop damage
                  - injury
    
  1. Most car accident happens when the weather is :

                   - Clear 
                   - Raining 
                   - Overcast
    
  2. The condition of the road during the collision is :

                  - Dry
                  - Wet
    

6.Conclusion

  • We need to add some other models to precisely predict traffic accident severity. We need to improve the accuracy of the model too

  • In the future, this model still needs improvement, and we need some additional features to increase prediction accuracy