LateX

Wednesday, July 27, 2011

Algorithms on strings - XVIII Polish CS Olympiad - Difference (4/10)

This was quite a fun question. Although solution turned out to be simple it was a little bit unconventional and it took me a couple of minutes to work out the solution.
You can find the english version of the text as well as the checker here:


So first natural approach to this problem is to analyse all subsequences separately. There are $\Theta(n^2)$ of them. For each sequence we just count for each letter how many times it occurs in this sequence and then find a minimum and maximum. As sequence can have pessimistically about $\Theta(n)$ length, the whole algorithm complexity is $\Theta(n^3)$. This is way is sufficient for short sequence but way to slow for $1000000$ element sequence.

The Maximum Sum contiguous subsequence problem

You should probably know this problem before you proceed. If you already know it you can skip this section. The problem is the following:
You are given the sequence of integers (can be negative). Find such a contiguous subsequence, that sum of it's elements is maximum possible.
For example if we consider sequence
2 1 -5 4 -2 4 -1 -2 -3 5 
The answer is subseqence 4 -2 4 having sum 6.

Now assuming we have $n$ elements this problem can be solved in $\Theta(n)$. At this point you might want to try to work it out on your own.

As this problem is quite well described in the internet I won't give exhaustive description. Here's how you can approach it. As the target algorithm is $\Theta(n)$ it's sort of a clue that we scan the sequence only once. OK, let's try left to right maybe this will work out. The simplest idea that comes to my head is to calculate sum of the prefixes and see what happens.

Seq.21-54-24-1-2-35
Sum23-2       
[Best so far: 5]

And look what happened now! We have negative prefix sum. This means that we don't want to add this prefix (2 1 -5) to our final result (there's no benefit at all!). But what about (1 -5) or (-5) ? Both of them sums to negative number and therefore are worthless. So let's totally forget about first three numbers. We will indicate that in our calculations by replacing -2 with 0 in our table.

Seq.21-54-24-1-2-45
Sum23042653-1 
[Best so far: 6]

Ok now it happened again. In case somebody's lost we are currently considering subsequence (4 -2 4 -1 -2 -4). Now again let's notice that however many terms you strip from the beginning of this sequence we will never exceed -1. Let's forget about all of them then analogous to above.

Seq.21-54-24-1-2-45
Sum2304265305
[Best averally: 6]

Voille! The result is 6.
But hold on! How did it happened? Each time we were able to forget about whole prefix totally. What if it was say 1 -3 -3 4. It sums to -1 so we would get rid of all of it. But we can strip first three terms and get 4 alone, which would be quite useful in our future calculations. What now? Is the algorithm wrong? Did we just choose lucky example above?
No! Think about it! Here's the trick! We would never be considering such a prefix in our algorithm! After reading 1 and then -3 we would get rid of them (because they sum to negative value) and passed on 0. Then we read -3, which we also get rid of and pass on 0. Then we read 4 and pass it on. Hey, it works!


This way we ended up with linear time algorithm. We just do one pass and always get rid of currently considered terms once we get a negative sum. Also we don't have to think about the terms explicitly. Just change the variable holding sum so far to 0 once it's negative. Simple, isn't it?

// sequence contains 1-based n-element sequence. 
int res = 0, best = 0; 
for (int i = 1; i <= n; ++i) {
  res += sequence[i];
  if(res > best) best = res;   
  if(res < 0) res = 0;
}
For geeks: More formally we can prove this algorithm is correct inductively on the sequence length. The hypothesis is that after ith iteration of loop variable res contains maximum sum that subsequence (may be empty) finishing at i-th index can give. This indeed true. Assuming it's true for i-1. Now assume that to get better result for i we need to use other sequence that the one from i-1. Then it would be better for i-1 as well if we stripped the last term. Contradicion with the fact that we had optimal answer for i-1. Also we didn't take into account that in particular we can just have an empty sequence. It gives 0 so it's worth doing only if otherwise we would get negative number.

Back to reality now - the problem is quite complicated so let's solve simpler version first. Let's assume that magic fairy told us that in the subsequence which is the answer the letter that occurs the most is 'b' and the letter that occurs the least is 'a'. Problem is to calculate such a subsequence that difference between number of occurrences of 'b' and 'a' (if there are more 'a' then it would be negative in this case) is maximum possible. Now - can we do that?

I seriously recommend that you try to work it out on your own given the solution to the problem in yellow box before continuing. It's important at least to try, you don't have to succeed!
Here's the solution. Let's call the letter that occurs the biggest number of times the big letter and the one that occurs the least number of times the small letter . Hence, each big letter is sort of like +1, right? Each small letter is like a -1. What about rest? Completely unimportant so let it be 0. Now we just have to run the algorithm for maximum sum subsequence, right? Not quite, but close. If we read the question in detail we will notice the sentence "We are assuming that the least frequent letter has to occur at least once in the resulting fragment". So we can just check if in current sequence there is such a letter. The code would look sort of like this.

int calculate_maximum_difference(char small_letter, char big_letter) {   
  int res = 0, best = 0;   
  bool has_small = false;   
  for(int i = 0; i < n; ++i) {
      if (sequence[i] == small_letter) {
        --res;
        has_small = true;
        if(res < 0) {
          has_small = false;
          res = 0;
        } 
      } else if (sequence[i] == big_letter) {
        ++res;
      }
      if(best < res && has_small) best = res;
  }
  return best;
}
OK, so this code have to work! No? Of course not. Consider example:
baaaa
where b is small and a is big. Now we read one b then get rid of it, then read 4 a's but make no use out of them because we do not have a small letter. So sometimes it might be worth keeping the a letter even if it subtracts one from final result. But on the other hand in this example:
baaaabaaaaa
We definitely don't want to consider first b. So when to do what? I don't care! Let's just test each case separately. Our function will have additional boolean parameter discard, that will determine whether this time we are discarding the small letter or not. OK great, let's change this to general problem solution! We just consider each pair of letters. The worst that could happen if we consider wrong pair is that we will get result worse than optimal, but we don't really care because at some point we will encounter the optimal result. The full code of this solution is presented below:

#include <cstdio> 
#include <vector> 
#include <algorithm> 
using namespace std;  

int n; 
char sequence[1000006]; // always add some more just in case :)  

int calculate_maximum_difference(char small_letter, 
                                 char big_letter, 
                                 bool discard) {   
  int res = 0, best = 0;   
  bool has_small = false;   
  for(int i = 0; i < n; ++i) {
    if (sequence[i] == small_letter) {
      --res; has_small = true;
      if(discard) {
        if(res<0) { res = 0; has_small = false; }
      } else {
        if(res<-1) { res = -1; }
      }
    } else if (sequence[i] == big_letter) {
      ++res;
    }
    if(best < res && has_small) best = res;
  }
  return best;
}


int main() {
  scanf("%d", &n);
  scanf(" %s", sequence); // one of the fastest ways to read a string
  int best = 0;
  for (char i = 'a'; i <= 'z'; ++i) {
    for(char  j = 'a'; j <= 'z'; ++j) {
      if(i==j) continue;
      best = max(max(calculate_maximum_difference(i, j, true),
                 calculate_maximum_difference(i, j, false)), 
             best);
    }
  }
  printf("%d\n", best);
}
This is working solution. The only disadvantage of this solution is that it's to slow to satisfy the task requirements (it get's 60/100 points in tester). Let's think about a complexity. Let's denote the alphabet size by $\Omega$. For each unordered pair we do single scan through entire sequence. This is $\Theta(\Omega^2 n)$. It's quite a lot. Looking at the actual numbers ($26 * 26 * 1 000 000 = 676 000 000$). Another optimization is quite straightforward. Guessing the appropriate pair of letters sound difficult and non-trivial, so let's think what we can actually do to make calculate_maximum_difference run faster. So what's the actual thing? Oh, come on it's easy. Thinking... Thinking... ... Got it! We can remove every letter from considered string that is nor small, neither big. For example if a is big and b is small then string:
abcsdsfbabgfabgab 
is equivalent to:
abbababab 
But how to do it efficient? We can just store for each letter it's indexes in the input string. Then we can create strings looking just at the two necessary letters rather than whole string. So after another bit of thinking we end up with the code:

#include <cstdio> 
#include <vector> 
#include <algorithm> 
#include <string> 
using namespace std;  

int n; 
char sequence[1000006]; // always add some more just in case :) 
vector<int> letter_positions[256]; // letter_positions['a'] is 
                                         // a vector of all the indexes 
                                         // at which letter 'a' occurs 
                                         // in big string.  
int calculate_maximum_difference(const string& text, 
                                 char small_letter, 
                                 char big_letter, 
                                 bool discard) {   
  int res = 0, best = 0;   
  bool has_small = false;   
  for(int i = 0; i < text.size(); ++i) {
      if (text[i] == small_letter) {
        --res; has_small = true;
        if(discard) {
          if(res<0) { res = 0; has_small = false; }
        } else {
          if(res<-1) { res = -1; }
        }
      } else {
        ++res;
      }
      if(best < res && has_small) best = res;
  }
  return best;
}


int main() {
  scanf("%d", &n);
  scanf(" %s", sequence); // one of the fastest ways to read a string
  for(int i=0; i < n; ++i) 
    letter_positions[sequence[i]].push_back(i);
  int best = 0;
  for (char i = 'a'; i <= 'z'; ++i) {
    for(char  j = 'a'; j <= 'z'; ++j) {
      if(i==j) continue;
      string text; // we will store a text that only contains i and j 
                   // letters preserving the original order
      vector<int>::iterator small = letter_positions[i].begin(), 
                                  big = letter_positions[j].begin();
      while(small != letter_positions[i].end() || 
            big != letter_positions[j].end()) {
        if(big == letter_positions[j].end() ||
           (small != letter_positions[i].end() && *small < *big)) {
          text.push_back(i); ++small;
        } else { 
          text.push_back(j); ++big;
        }
      }
      best = max(max(calculate_maximum_difference(text, i, j, true),
                 calculate_maximum_difference(text, i, j, false)), 
             best);
    }
  }
  printf("%d\n", best);
}
This time when we send it, we wait, wait, wait and yes!!!!! It gets a 100/100. But why? Was the optimization that good? What is the complexity now? Hmm... hard to tell. But intuitively, the biggest amount of work will be done if we do substantial amount of work for each pair of letter. How to construct such a case? Let's have equally many letters of each kind. So we have $\Theta(n/ \Omega)$ letters of each kind. For each of $\Omega^2$ pairs we do approximately $\Theta(n/ \Omega)$ operations. So the final complexity is $\Theta(\Omega n)$. That's about $26000000$ operations, which is not that bad, assuming we keep our code tidy. So yeah, it works! Yeey!

Bugs in my mind: (in this section i describe the problems I had while solving this problem):
  • First of all it was hard to came up with the right ideas. I kept thinking about standard approaches to such problems: binary search length of the substring, crawl along it, do some other standard techniques. Only after about 20 minutes it stroke me that we actually can do sth for each pair and consider only relevant letters ant there for decrease complexity by a factor of $\Omega$. Then rest followed.
  • I also spend quite some time on getting the discarding/non-discarding to work. I was trying some more general approaches and each of them failed. Now it seems trivial, but it took me about an half an hour of tries and looking at the tests that I kept getting wrong answer on to actually fix that.
  • The last very, very stupid error I made that was about 15 minute to debug is that I created vector instead of vector initially.

9 comments:

  1. This is good piece of writing and pleasant urging commented
    here, I am really enjoying by these.
    black seed oil for hair growth

    ReplyDelete
  2. شركة تنظيف شقق بالرياض https://is.gd/asR8nF التنظيف ليس أمر عابر أو أمر من أمور الرفاهية فهو من أساسيات الحياة وطالما حافظت علي منزلك نظيف خالي من الأتربة والرواسب ستحافظ علي حياتك خالية من كافة الأمراض المختلفة
    متلك اسطول عمالة ذوى خبرة كبيرة حيث يعمل معنا منذ فترة طويلة خبرتة الكبيرة التى اكتسبها والأمانة التى يتصف بها جعلتنا فى صدارة شركات التنظيف.يقوم فريق العمالة بتقسيم نفسة الى اكثر من فريق ليعلم كل واحد منهم مهمتة ويؤديها بإتقان يتحرى الدقة طوال مراحل العمل

    لوجود مشرف يتمم على عملية التنظيف. فريق يقوم بتنظيف غرف النوم وفريق اخر مهمتة تنظيف غرف المعيشة والاخر لتنظيف المطابخ والحمامات

    لن يخرج فريق العمل الا بعد ان يسلمك بيت نظيف خالى تماما من اى اتربة او غبار ومطبخ نظيف تماما خالى من الدهون والزيوت .
    كما ستبقى النظافة سلوك طبيعي دوري بالنسبة إليك ولذلك فشركة دريم هاوس تقدم كل خدماتها الخاصة بالنظافة وذلك حرصا منها علي نظافة وصحة العملاء كما توفر كل هذا من خلال طريقة مميزة ومحترفة جدا في التنظيف بشكل عام .… اقرأ المزيد
    المصدر: شركة تنظيف شقق بالرياض

    شركة مكافحة الفئران بالرياض https://is.gd/KmVFow أصبح يوجد الكثير من شركات مكافحة الفئران في الرياض ، التي تعمل في القضاء على أي نوع من الحشرات سواء كان قوارض أو زاحفه أو طائره بشكل نهائي، و نظراً لكثره شكاوي مدينه الرياض من الفئران والقوارض.

    فقد قررت الشركة على أن تكثف اهتماماتها بأعمال مكافحة الفئران و القضاء على مشكله القوارض المزعجة بشكل نهائي.
    كما أن حرصت شركة مكافحة فئران بالرياض على أن توفر طرق تساعد على التخلص من كافة أنواع الفئران بأحجامها المختلفة.
    لأننا جميعاً نعلم أن تواجد القوارض في المنزل من أكثر الأشياء التي تسبب للسيدة الرعب والفزع، ولكل للموجودين بالمنزل وبالأخص الأطفال ومن المعروف أن الفئران من القوارض التي تقوم بتخريب المنزل وتقضي على الأخضر واليابس بها عن طريق على تدمير الأثاث الخشبي والطعام لذا حرصت الركة علي ان تخلصك من كل هذه المشكلة في أسرع وقت ممكن فبادر بالاتصال بها. … اقرأ المزيد

    المصدر: شركة مكافحة الفئران بالرياض

    افضل شركة جلي بلاط بالرياض https://is.gd/ghTHhA لا شك أن رونق البلاط أو الرخام يساهم في تحسين المظهر في الفلل والقصور كما يضفي بريقا لامعا على مظهرها مما يعني أن اتساخها و تغطيتها بالأتربة يجعلها تفقد ذلك البريق .
    تقدم شركتنا أفضل خدمة لجلي البلاط والرخام وإعادة مظهره اللامع والبراق له مع توظيف كافة المعدات الحديثة لديها و أمهر عمالة مدربة على هذا الغرض .
    للشركة باع طويلة وخبرة سابقة في تنظيف الرخام والبلاط للفلل والقصور والمنازل ومداخل الشركات والمكاتب والمؤسسات
    لسنا الوحيدين لكننا الأفضل في عالم جلي الرخام والبلاط وإعادة مظهره البراق ولمعته وبدون تجريحه أو خدشه وهذا سر تميزنا عن باقي الشركات … اقرأ المزيد

    المصدر: شركة جلي بلاط بالرياض

    ReplyDelete
  3. افضل شركة شراء اثاث مستعمل بالرياض  https://is.gd/BEYCRD حيث نقدم خدماتنا لأركان المدينة بالكامل ونغطيها بدون اي مشاكل كما اننا نقدم جميع خدمات التنظيف والتي نتميز فيها عن منافسينا من الشركات الاخري المتعددة في نفس المجال حيث نعتمد علي ارخص الاسعار التي نقدمها إلي العملاء في شركتنا لضمان الحصول علي ثقتكم الغالية والتي اهم ما يشغلنا… اقرأ المزيد

    المصدر: شركة شراء اثاث مستعمل بالرياض

    افضل شركة تنظيف مسابح بالرياض https://is.gd/NFNC7L  من الطبيعي أن تجد المنازل قد اتسخت أو مُلئت بالغبار و الأتربة مع مرور الوقت خاصة مع تغيرات الفصول .. مما يزيد نسبة الإصابة بالأمراض لذا فإن عملية التنظيف بالنسبة للمنزل بما في ذلك الجدران و السلالم و المفروشات.
    لاشك أن الأطفال يفضلون كثيرا اللعب في المسابح خاصة في الأوقات شديدة الحرارة في فصل الصيف.
    لكن ذلك يعقبه اتساخ في أرضية المسبح أو في مياهه وتزداد هذه المشكلة تعقيدا حينما يكون حجم المسبح كبيرا الأمر الذي يجعل المنزل أو الفيلا تبدوا بمظهر خارجي أقل جمالا ورونقا .
    تضمن الشركة تقديم خدمات التنظيف الشاملة للمسبح وبمعدات و أدوات صيانة على أعلى مستوى من الجودة والنقاوة
    إن جميع أشكال المسابح تفتقر إلى عزل وهذا لقيام مؤسسة عزل المسابح والحمامات بالرياض بحفظ المسابح وإطالة عمرها كما أنها تمنع التسربات التى قد تتم بأحجام هائلة أو يتم إهدارها , كما أنه مع القيام بعملية مؤسسة عزل المسابح والحمامات بالرياض يحاول أن المحافظة على المواد التى تستخدم لتنظيف وتعقيم المسابح والتى يتم وضعها لتنقية المسابح من مختلَف الجراثيم أو البكتيريا والتى قد كان سببا الضرر القوي للإنسان والذى يكون له إحتكاك مباشر مع المسابح وغيره .
    الأمر الذي يسبب له الأمراض التى تتغاير شدتها وخطورتها ولذا الداعِي يلزم الإهتمام بقيام بعملية شركة عزل مسابح بالرياض وجعلها وجوب قسوة , والحرص على القيام بها قبل إستعمال المسبح , وهذا لتجنب إنتشار الميكروبات والجراثيم و البكتيريا والتى قد كان سببا الأمراض والحساسية لأشخاص الشخص والمجتمع والعائلة خاصة الأطفال وكبار العمر .

    بما في ذلك من شفط وتنظيف و تعقيم و أيضا علاج حالات التسرب التي قد تحدث أحيانا بعد تصميم المسبح
    تعد عملية ضرورية و يجب أن تتم من حين إلى آخر و بصفة دورية.
    تختص شركتنا بكافة أعمال التنظيف المنزلية ,والتي قد تتطلب مجهودا بدنيا شاقا قد يعجز أفراد الأسرة عن إنجازه وبشكل احترافي بما في ذلك تنظيف الجدران و الرخام و المفروشات و المسابح و حتى الحديقة مع إزالة الروائح الكريهة و رش المبيدات الحشرية .… اقرأ المزيد

    المصدر: شركة تنظيف مسابح بالرياض

    ReplyDelete
  4. Trained employment

    The company has the best team of trained workers and experienced technicians who can deal with all kinds of movables with great professionalism and keenness so that our dear customer can get free of scraps, fractures and scratchesشركة نقل عفش

    ReplyDelete