Train Schedules

Check train schedules today for all Egypt stations easily

Data Methodology

Scientific Approach to Data Collection and Processing

At Egypt Trains, we follow a rigorous scientific methodology for collecting, processing, and analyzing Egyptian railway data. This document details the methods, techniques, and algorithms we use to ensure the highest levels of accuracy and reliability.

1. Methodological Framework

General Approach

Core Principles

2. Data Sources and Verification

Source Hierarchy

Source Verification Process

1. Source validation
   ├── Is the source official?
   ├── What is its reliability level?
   └── When was the last update?

2. Data analysis
   ├── Is the data logical?
   ├── Does it match usual patterns?
   └── Are there anomalies?

3. Cross-verification
   ├── Compare with other sources
   ├── Check historical consistency
   └── Verify general context
    

3. Data Processing Algorithms

Data Cleaning Algorithm

def remove_duplicates(train_data):
    unique_trains = {}
    for train in train_data:
        key = f"{train.number}_{train.date}_{train.route}"
        if key not in unique_trains:
            unique_trains[key] = train
        else:
            if train.last_updated > unique_trains[key].last_updated:
                unique_trains[key] = train
    return list(unique_trains.values())
    

Anomaly Detection Algorithm

def detect_schedule_anomalies(train_schedule):
    anomalies = []
    for i in range(len(train_schedule.stops) - 1):
        distance = calculate_distance(
            train_schedule.stops[i], 
            train_schedule.stops[i+1]
        )
        time_diff = train_schedule.stops[i+1].time - train_schedule.stops[i].time
        speed = distance / time_diff.hours
        if speed > 200 or speed < 10:
            anomalies.append({
                'type': 'unrealistic_speed',
                'calculated_speed': speed,
                'segment': f"{train_schedule.stops[i].name} - {train_schedule.stops[i+1].name}"
            })
    return anomalies
    

Delay Prediction Algorithm

from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split

def train_delay_prediction_model(historical_data):
    features = extract_features(historical_data)
    delays = extract_delays(historical_data)
    X_train, X_test, y_train, y_test = train_test_split(
        features, delays, test_size=0.2, random_state=42
    )
    model = RandomForestRegressor(
        n_estimators=100,
        max_depth=10,
        random_state=42
    )
    model.fit(X_train, y_train)
    accuracy = model.score(X_test, y_test)
    return model, accuracy
    

Current model accuracy: 87.3% for predicting delays within ±10 minutes

4. Verification and Analysis

def calculate_daily_accuracy_metrics(actual_times, predicted_times):
    mae = mean_absolute_error(actual_times, predicted_times)
    rmse = sqrt(mean_squared_error(actual_times, predicted_times))
    accuracy_5min = sum(
        abs(actual - predicted) <= 5 
        for actual, predicted in zip(actual_times, predicted_times)
    ) / len(actual_times) * 100
    return {
        'mae': mae,
        'rmse': rmse,
        'accuracy_5min': accuracy_5min
    }
    

5. Big Data Technologies

6. Quality Assurance and Testing

def test_distance_calculation():
    cairo_coords = (30.0626, 31.2497)
    alexandria_coords = (31.2001, 29.9187)
    calculated_distance = calculate_distance(cairo_coords, alexandria_coords)
    expected_distance = 208.5
    assert abs(calculated_distance - expected_distance) <= 5
    

7. Continuous Improvement and Development


Last updated: June 2025
Version: 2.1
Next review: December 2025
For technical inquiries about the methodology, contact: methodology@egypttrains.com