Worldscope

Alien Dictionary in C++

Palavras-chave:

Publicado em: 05/08/2025

Alien Dictionary in C++

This article explains how to determine the order of characters in an alien language, given a list of words sorted lexicographically by the rules of that language. This ordering isn't based on standard alphabetical order but is inferred from the given word list.

Fundamental Concepts / Prerequisites

To effectively understand the solution, you should be familiar with the following concepts:

  • Graph Data Structure: The solution represents the character dependencies as a directed graph.
  • Topological Sorting: The alien dictionary order is derived by performing topological sorting on the dependency graph. Kahn's algorithm for topological sort is often used.
  • C++ Standard Template Library (STL): Familiarity with STL containers like `vector`, `queue`, and `unordered_map` is essential.

Core Implementation/Solution


#include <iostream>
#include <vector>
#include <queue>
#include <unordered_map>

using namespace std;

string alienOrder(vector<string>& words) {
    if (words.empty()) return "";

    // 1. Build the graph and in-degree map
    unordered_map<char, vector<char>> adj;
    unordered_map<char, int> inDegree;

    // Initialize the in-degree of all characters to 0
    for (const string& word : words) {
        for (char c : word) {
            inDegree[c] = 0;
        }
    }

    // Build the adjacency list based on word order
    for (int i = 0; i < words.size() - 1; ++i) {
        string word1 = words[i];
        string word2 = words[i + 1];
        int minLen = min(word1.length(), word2.length());

        for (int j = 0; j < minLen; ++j) {
            if (word1[j] != word2[j]) {
                if (find(adj[word1[j]].begin(), adj[word1[j]].end(), word2[j]) == adj[word1[j]].end()){
                    adj[word1[j]].push_back(word2[j]);
                    inDegree[word2[j]]++;
                }
                break; // Only the first different character matters
            }
            //Check for invalid input, such as ["abc", "ab"]. This is invalid, as the common prefix length dictates an order
            if (j == minLen - 1 && word1.length() > word2.length()) return "";

        }
    }

    // 2. Topological Sort using Kahn's Algorithm
    queue<char> q;
    for (auto const& [key, val] : inDegree) {
        if (val == 0) {
            q.push(key);
        }
    }

    string result = "";
    while (!q.empty()) {
        char u = q.front();
        q.pop();
        result += u;

        for (char v : adj[u]) {
            inDegree[v]--;
            if (inDegree[v] == 0) {
                q.push(v);
            }
        }
    }

    // 3. Check for cycle
    if (result.length() != inDegree.size()) {
        return ""; // Cycle detected, invalid dictionary
    }

    return result;
}

int main() {
    vector<string> words = {"wrt", "wrf", "er", "ett", "rftt"};
    string order = alienOrder(words);
    cout << "Alien Order: " << order << endl; // Output: Alien Order: wertf

    vector<string> words2 = {"z", "x"};
     order = alienOrder(words2);
    cout << "Alien Order: " << order << endl; // Output: Alien Order: zx

    vector<string> words3 = {"z", "x", "z"};
     order = alienOrder(words3);
    cout << "Alien Order: " << order << endl; // Output: Alien Order: 

     vector<string> words4 = {"abc", "ab"};
      order = alienOrder(words4);
    cout << "Alien Order: " << order << endl; // Output: Alien Order:

     vector<string> words5 = {"baa", "abcd", "abca", "cab", "cad"};
    order = alienOrder(words5);
    cout << "Alien Order: " << order << endl; // Output: Alien Order: bdac

    return 0;
}

Code Explanation

The `alienOrder` function determines the alien alphabet order from a list of lexicographically sorted words.

1. Graph and In-Degree Construction:

  • An `unordered_map` called `adj` stores the adjacency list representing the character dependencies. For instance, if 'a' comes before 'b' in the alien alphabet, `adj['a']` will contain 'b'.
  • An `unordered_map` called `inDegree` stores the in-degree of each character. The in-degree of a character is the number of characters that come before it in the alien alphabet.
  • The code iterates through all the words to initialize all unique characters in the inDegree map to have a value of 0.
  • The code iterates through consecutive pairs of words. For each pair, it finds the first differing character. If `word1[j]` and `word2[j]` are different, an edge is added from `word1[j]` to `word2[j]` in the adjacency list, and the in-degree of `word2[j]` is incremented.
  • A check is performed to ensure the input is valid, checking that word1 does not have a longer prefix than word2. If this is the case, then it is invalid input, and an empty string is returned.

2. Topological Sort (Kahn's Algorithm):

  • A queue `q` is initialized with all characters having an in-degree of 0. These are the starting characters in the alien alphabet.
  • While the queue is not empty, the algorithm extracts a character `u` from the queue, appends it to the `result` string, and iterates through its neighbors `v`.
  • For each neighbor `v`, the in-degree is decremented. If the in-degree becomes 0, `v` is added to the queue, as it is now ready to be processed.

3. Cycle Detection:

  • After the topological sort, the algorithm checks if the length of the `result` string is equal to the total number of unique characters. If not, a cycle exists in the graph, indicating an invalid alien dictionary, and an empty string is returned.

The function returns the `result` string, which represents the alien alphabet order.

Complexity Analysis

Time Complexity: O(V + E), where V is the number of unique characters (vertices) and E is the number of dependencies (edges). The graph construction iterates through the words, taking O(N * K) time, where N is the number of words, and K is the average length of a word. The topological sort itself takes O(V + E) time. In total this simplifies to O(N*K).

Space Complexity: O(V + E). The adjacency list (`adj`) and in-degree map (`inDegree`) store the graph, requiring O(V + E) space. The queue (`q`) used in the topological sort can hold up to V elements in the worst case.

Alternative Approaches

Depth-First Search (DFS) for Topological Sort: Instead of Kahn's algorithm (using a queue), DFS can also be used for topological sorting. The DFS approach involves recursively visiting each node and marking it as visited. The nodes are added to the topological order in the reverse order they finish being processed. The trade-off is that DFS can be slightly less intuitive to implement, and requires a mechanism to detect cycles (e.g., using "visiting" and "visited" sets).

Conclusion

This article presented a solution for determining the order of characters in an alien language using topological sorting. The core idea is to build a dependency graph from the lexicographically sorted words and then use Kahn's algorithm to derive the order. The approach ensures cycle detection, handles invalid inputs gracefully, and provides an efficient way to determine the alien alphabet order.