Pages

16 April 2018

Fixing failed FreeIPA web UI login

Dear readers

This time I met interesting case, thus I think it is worth sharing it with you all. But first, big disclaimer. I AM NOT FREE IPA EXPERT AT ALL. The investigation done here simply use common trouble shooting sense in UNIX/Linux world, no more, no less.

A calm weekend suddenly turned into detective work. A team mate contacted that he was unable to login to Free IPA web UI. The OS is CentOS 7 64 bit. For those who don't know what Free IPA is, think it's like LDAP-turned-into-ActiveDirectory-ish. What? You don't know what Active Directory is? You're not alone, me too :)

The fact I got was this. Two Free IPA instances were setup on two different virtual machines, say A and B. On A, my mate could log in. Whereas on B, he couldn't. He insist that he did the same exact steps to configure IPA. So what's wrong?

Since we're dealing with computer, we're dealing with basic logic concept: if you setup same software (and same version) using same exact steps on same exact OS version, library and hardware, then the result MUST be the same. There is no exception here. Hardware flaws? Ok, that might screw things, but I am talking about the general logic. We are on the same page up to this point, right?

There were also clue. On httpd error_log, I saw:
[:error] [pid 13588] [remote xx.xx.xx.xx:yy] mod_wsgi (pid=13588): Exception occurred processing WSGI script '/usr/share/ipa/wsgi.py'.
[:error] [pid 13588] [remote xx.xx.xx.xx:yy] Traceback (most recent call last):
[:error] [pid 13588] [remote xx.xx.xx.xx:yy] File "/usr/share/ipa/wsgi.py", line 51, in application [:error] [pid 13588] [remote xx.xx.xx.xx:yy] return api.Backend.wsgi_dispatch(environ, start_response)
 [:error] [pid 13588] [remote xx.xx.xx.xx:yy] File "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 262, in __call__
 [:error] [pid 13588] [remote xx.xx.xx.xx:yy] return self.route(environ, start_response)
 [:error] [pid 13588] [remote xx.xx.xx.xx:yy] File "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 274, in route
 [:error] [pid 13588] [remote xx.xx.xx.xx:yy] return app(environ, start_response) [:error] [pid 13588] [remote xx.xx.xx.xx:yy] File "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 929, in __call__
[:error] [pid 13588] [remote xx.xx.xx.xx:yy] self.kinit(user_principal, password, ipa_ccache_name) [:error] [pid 13588] [remote xx.xx.xx.xx:yy] File "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 965, in kinit
 [:error] [pid 13588] [remote xx.xx.xx.xx:yy] pkinit_anchors=[paths.KDC_CERT, paths.KDC_CA_BUNDLE_PEM],
[:error] [pid 13588] [remote xx.xx.xx.xx:yy] File "/usr/lib/python2.7/site-packages/ipalib/install/kinit.py", line 125, in kinit_armor
 [:error] [pid 13588] [remote xx.xx.xx.xx:yy] run(args, env=env, raiseonerr=True, capture_error=True)
 [:error] [pid 13588] [remote xx.xx.xx.xx:yy] File "/usr/lib/python2.7/site-packages/ipapython/ipautil.py", line 512, in run
[:error] [pid 13588] [remote xx.xx.xx.xx:yy] raise CalledProcessError(p.returncode, arg_string, str(output))
 [:error] [pid 15892] [remote xx.xx.xx.xx:yy] CalledProcessError: Command '/usr/bin/kinit -n -c /var/run/ipa/ccaches/armor_15892 -X X509_anchors=FILE:/var/kerberos/krb5kdc/kdc.crt -X X509_anchors=FILE:/var/lib/ipa-client/pki/kdc-ca-bundle.pem' returned non-zero exit status 1

Later, I found out that I just have to focus on that last line that said the kinit thingy. But before that, I do the usual mantra. I compare several latest modified file on A and use diff to see how it is compared to B. The find command is more or less:

sudo find / -path /proc -prune -o -path /dev -prune -o -path /sys -prune -o -path /run -prune -o type f -printf "%T@ %T+ %p\n" | sort -nr | less

 That way, I got sorted list of files, starting from recently modified ones. This is really manual work. I looked over files named .conf or similar. I ended up suspect two files:
/etc/sysconfig/authconfig
/etc/nsswitch.conf

I edited those files on B so the contents mimic the ones on A. The diffs are:
$ diff /etc/sysconfig/authconfig{.orig,}
10c10
< USEFAILLOCK=yes
---
> USEFAILLOCK=no


$ diff /etc/nsswitch.conf{.orig,}
40c40
< hosts:      files dns myhostname
---
> hosts:      files dns

Finger crossed. Login again. F**k! Still not working.

Then I googled and found I need to install mod_auth_kerb rpm. I did that, but somehow after yum install, my hunch told me it was not the root cause. Okay. Comparing the enabled services on A and B. I found A have realmd, B not. Installing realmd, enabling and starting it. Also restarting ipa, certmonger and httpd. S**t, nothing changes. Comparing the installed rpms on A and B really bored me. So I tossed them out of the game.

I stepped back and checked the above call trace. Googling drove me to this site. It mentions that you need to make sure your CentOS is up-to-date, since there is bug in the IPA components. Great. But yum upgrade said it had all latest update. lsb_release also confirmed that A and B had same release number. Almost dead end here.....

Back to kinit error message above, I confirmed that the file exist. Ok, did nasty experiment. What if I removed it? back it up first somewhere else, trigger the login. Same call trace. man kinit clearly mentions that -c means it will create the file in the said directory. But none there. Something is really wrong here, but not exposed by error log.

How do you deal with error that is not seen? I chose to use strace. The pid in the error log was actually pid of wsgi process, so logically it was indeed the responsible process that invoked kinit.

ran strace and hook it to process id of wsgi. Something like:
strace -p 13588 -ff -o trace.txt

Refresh the web UI, wait a bit, login, boom , failed. Then stop strace.

Strace wrote several file named trace.txt suffixed with numbers. What we're going to see here? Wild guess, let's see if it fails to open or read some files (likely libraries):
grep open trace.txt*

Kazaaammmm. Lucky hit. I saw several EPERM (unable to read due to permission). The files are:
/var/lib/ipa-client/pki/kdc-ca-bundle.pem
/var/lib/ipa-client/pki/ca-bundle.pem
/etc/ipa/ca.crt

ls confirms that "other" has no permissions at all upon them. And on A, they have at least "r" on "other". Thus, simply run chmod o+r on them.

restart httpd, certmonger, ipa again. Login. F***k**f success dude!!!

Moral of the story: Complicated problem sometimes demands really simple solution. You just need to open your mind for many possibilities and know how to trim them down to limited suspect. If not, you will tangle yourself with endless headache.

Until next time, ciao....

regards,

Mulyadi

05 January 2017

Sysadmin's tale: Fixing broken logical volume in Linux

Dear respected readers

This time I'd like to share something related to troubleshooting in Linux. What makes me want to share it, is not about how sophisticated the step I did, but how awfully simple it was! Please read on.

Ok, so it started when I was about to left my office. Then suddenly my phone rang and turned out to be whatsapp message from my colleague, saying "we need your help". Anyway, before we proceed, let's state one thing: I try my best to reconstruct the situation based on my brain's memory, so I beg you pardon if something is missed. I hope that still deliver the message.

......Oh and, one more thing, lately when I read "we need your help", I began to think it actually means "we're in deep shit". Hehehehe

The situation is as follow: my co-worker said that a SAN connection was disconnected without clear reason. Thus, it made multipath construction broke and eventually one or more mount points unmounted. And the goal: put the mount point back ASAP.

Clock was ticking and I didn't have time to check the root cause. So, I shifted my attention to seek ways to restore it. Co-worker said that the missing device, /dev/mapper/mpaths was back online. So, it was a bit easier. But he said, the lvm should be back by itself. "Right?". Ehm well, not really....

Then I asked few things, with the hope to know the situation, right before the disaster happen:
Me (M): "how many logical volume is missing?"
Co-worker (C): "one"
M: "and numbers of volume groups?"
C: "one"
M: "and the physical volume?"
C: "yes, one, again"
M:"so you're saying in this mpaths, to the best you could recall, there is only on physical partition and that formatted as physical volume?"
C: "yup, indeed"
M:"and how big is this single partition?"
C: "not sure, but I guess it is as big as the disk"

Pay attention that the last question is critical. Why? Because my plan is to recreate the partition using fdisk. If somehow I create smaller physical partition, then there is a chance I put the entire logical volume in jeopardy. And what if bigger? maybe no danger. But ideally, I have to made it exactly as big as it was.

One more thing: we don't know the start sector of the partition. This might affect sector alignment and again, break the logical volume. So, it was like tossing the dice in the air. Got the right number? Good. Wrong number? ...hehhehe, need to say? :D

Since not much option left, I ask for approval from the team leader while explaining the risk of my method. He answered okay. (actually not okay, but we left with no better choice. So my method is the best bet, before we have to pull out from backups. Oh, forgot to say, this is THE backup server. Double burn, eh? heheheh )

Do backup first:
dd if=/dev/sda of=/mnt/backup/sda.img bs=32K
dd if=/dev/sdb of=/mnt/backup/sdb.img bs=32K

I don't exactly recall why I backup sda and sdb. But usually it was my instinct to be safe. Better safe than sorry, right?

Do partitioning:

fdisk /dev/mapper/mpaths

create new primary partition

size? make it as big as it can. luckily fdisk handle the alignment for us. And for the bonus, fdisk also calculate the biggest possible size for us.

don't forget to write the change

run partprobe.

Then, run a series of:
pvscan
vgscan
lvscan

then check:
pvs
vgs
lvs

And Thank God, it back!!

Oh, like a movie, here is the twist: before I did all the above, actually I cheat and check first few lines of:
strings -a /dev/mapper/mpaths

Then I see that the LVM descriptor was still there, unharmed! And it also mentioned that the one and only logical volume there (which I forgot the name) was based on physical volume named mpaths1. So my co-worker information was indeed correct. Thus once the partition was back, there is no need to recreate physical volume and so on. They were automagically restored by themselves!


Breaking news: the other day, similar situation happened. This time, volume group informed that one of the physical volume was missing.

Again, same interview like above. And from checking first physical partition that formed the volume group using "strings", it said the missing one is, let's name it, /dev/sdf

Okay, /dev/sdf. Why there is no number suffix? like sdf1 or something like that. My guess is, the sys admin was using the entire disk as physical volume.

Again, need permission from the leaders. Got it, then I simply ran:
pvcreate /dev/sdf

Run again series of magic:
pvscan && vgscan && lvscan && pvs && vgs && lvs

Thank God, again! It back!

Moral of the story: you don't need to master "out of the planet scripting language" or "able to read hexcodes" to solve problem. All you need are:
1. Calm down. I never heard or see or read somebody able to tackle an issue while screaming all over the place....
2. Try to reconstruct WHY it happened
3. Based on #3 (hypotheses), create plan
4. before executing plan, create BACKUP!
5. execute your plan. (this is why it is called plan, to be executed. So even if your plan sounds sophisticated, but if you don't have guts to do it, then the value is zero, my friend)

Hope it helps you in day to day operation, until next time!

regards,

Mulyadi Santosa

14 June 2016

Sysadmin tales: Analyzing slow database server

Update:
  • June 14th, 2016 6:01 PM UTC+7: using direct I/O or raw access also bypass filesystem caching. This also has effect to avoid double caching. I dare to guess Oracle does caching. So if everything cache, we would have: device caching, page cache, Oracle's own caching. Direct I/O eliminates page caching.


Dear readers

It's been awhile since my last post, so I guess I'll try to share something that might be useful for perfomance monitoring. This is something that I focus at now, and I found tightly related to OS knowledge. So please read on.

One day, I was urgently assigned by upper management to help identify certain (unknown) problem in Linux server. I quickly visited the site and did various checking.

The reported issue was that this server (a RHEL/CentOS one) was terribly slow at certain time. This server is running Oracle DB and running certain CRM software, as VM on top of VMWare. So slow that you barely could not do anything in ssh shell or interactive shell. And admins reported load average shoot to whooping 300-400 ("What the f**k?" Yeah I know, I was shocked too. 300 loadvg for few seconds was fine, but for minutes?). The machine itself has 30 core, by the way, so you can imagine this load is way too much

My quick guess was it was something related to I/O, to be specific disk read/write. Why? Because I/O code path mostly work in uninterruptible way. So once the calling process does read, for example, it will jump into kernel space and keep working until either it is finished or its scheduling interval is running out so it gets scheduled out.

Note: If the I/O is done in asynchronous style, the calling process could quickly return and continue executing next instruction. The work will be taken over by aio kernel thread and will trigger the callback mechanism back to the reading process once the read (or write) is finished. However, it is the aio thread that does the work now so system wise, the load would be the fairly the same. Only who does the work is different.

I check various logs, /proc entries and not to forgot, dmesg. Fortunately, I saw very enlightening hint there (not everyday I got this kind of clue at first sight, to be honest):

INFO: task oracle:22763 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
oracle D ffff81323c100600 0 22763 1 22779 22728 (NOTLB)
ffff810f0a3c3b88 0000000000000086 ffff812c3108e7b0 ffff81153c3ce100
ffff812c3108e7b0 0000000000000009 ffff812c3108e7b0 ffff8131ff133040
0000975e2fd08e7f 0000000000003e2c ffff812c3108e998 0000000c80063002
Call Trace:
[] do_gettimeofday+0x40/0x90
[] :jbd:start_this_handle+0x2e9/0x370
[] autoremove_wake_function+0x0/0x2e
[] :jbd:journal_start+0xcb/0x102
[] :ext3:ext3_dirty_inode+0x28/0x7b
[] __mark_inode_dirty+0x29/0x16e
[] do_generic_mapping_read+0x347/0x359
[] file_read_actor+0x0/0x159
[] __generic_file_aio_read+0x14c/0x198
[] generic_file_aio_read+0x36/0x3b
[] do_sync_read+0xc7/0x104
[] __dentry_open+0x101/0x1dc
[] autoremove_wake_function+0x0/0x2e
[] do_filp_open+0x2a/0x38
[] vfs_read+0xcb/0x171
[] sys_pread64+0x50/0x70
[] tracesys+0x71/0xdf
[] tracesys+0xd5/0xdf


Note: the above stack tracing happen automatically when kernel config CONFIG_DETECT_HUNG_TASK is enabled (=y). And this is enabled by default in RHEL/CentOS. So yes, this is an awesome feature and life saver.

What we can see is that it is starting as read operation (sys_pread). Then it is invoking read operation of virtual filesystem (vfs_read). Directory entry is opened then (dentry_open). It happens because before you open and access the file, you need to locate the directory that contains the file first.

Then the file is read in synchronous style. At this point, you might already suspect that this is the root cause. I partially agree. However, I am more concerned with the amount of data it is writing and how the Oracle database engine managed it.

Ok, let's move on. After usual marking inode as dirty (i am not sure about this, I guess it is dirty because access time metadata is updated, hence dirtying inode), and then journalling kicks in (jdb:journal start). Followed then by fetching current time (gettimeofday). And this is where things slow down.

First thing first. Getting local time shouldn't be that slow, even if you still count on PIT. HPET or TSC should be very fast. So getting time is out of equation.

Journalling is the next attention. Indeed that the filesystem Oracle puts data on is ext3. So for every write/read, journalling is updated so in the case of incidents, read/write could be replayed thus avoiding data corruption as much as possible.

Alright, so here is the early hypotheses. Oracle write huge amout of data, filesystem getting stressed. Journalling tries to keep up at the same speed. So who's to blame?

In my opinion, for database as sophisticated as Oracle, what they need is that bypassing filesystem layer. That could be implement by using O_DIRECT flag in read/write, or simply accessing raw device (e.g sda1 instead of /data for example). I am not sure which one is doable, but from operating system's point of view, those are the options.

Also, it is very worth to try to use noop I/O scheduler, instead of CFQ or deadline. The premise is same. Instead of letting filesystem layer manage the way Oracle works toward disk, why not just let it decide what is best for itself? Noop precisely do this. It is still does some I/O merging (front, back merge), but it doesn't do I/O prioritization, time out etc. So it feels almost like plain "I want to do it now!" and Linux kernel simple says "be my guest".

Other fact that I found, which is also shocking, is this (taken from /proc/meminfo):
PageTables: 116032004 kB

That's around 110 GiB, ladies and gentlemen, just to hold page tables! And don't forget that those page tables need to stay in RAM and can't be swapped out. So, simply speaking, 110 GiB of your RAM can't be used by your application and it might be your application that will be kicked out to swap in the event of shortage of free memory (before Out of Memory killer kicks in).

Thus, although not primary root cause, this must be solved too. The solution is to use huge page. This is the term to refer of using 2 MiB or 4 MiB (depending on page setting by kernel) page size instead of standar 4 KiB kernel uses. Let's assume later we will use 2 MiB page size and all allocated pages could fit into 2 MiB page size without fragmentation, then we could squeeze the PageTables usage by 512:1 ratio, or down to 0.2 GiB. Sounds good, right?

The full story might be boring, so hopefully I summarize it well enough for light study case.

Cross check result (by other team)
  1. related team confirms that using asynchronous I/O together with direct I/O within Oracle relieve the load stress
  2. Huge page is going to be tested. Not sure how it really goes since I am not updated, but I am sure, it will be helpful too

Until next time, cheerio! :)

regards,

Mulyadi Santosa

12 January 2016

VirtualBox bugs: guest can not ping host and vice versa

Hi all

Happy New Year 2016. May God bless us with health and prosper.

Okay, just quick update. I found quite annoying fact: VirtualBox version 5.0.4 has bug: in bridged adapter mode, host can not ping guest VM and vice versa!

I googled about this and some people had early conclusion that it might be related to bugs in NDIS6 adapter.

However, there is easier workaround: just make sure you upgrade to latest version (version 5.0.12 as January 12th, 2016). Or, just stay in 4.3.x version. In my personal opinion, so far version 4.3.x ( which is still in active maintenance mode) is more stable than 5.0.x.

Hope it helps somebody out there.....

regards,

Mulyadi Santosa

08 July 2012

Fixing segfault in Pathload

Dear readers

Couple days ago (it's 1st week of July 2012), I came across this nifty tool called Pathload. Essentially, it helps you determine the real upstream and downstream connection of yours.

Inaccidentally, when I tried to run it, it segfaulted immediately. With a bit help of gdb and trial/errors, I found and fix the bug. Here's the complete email message that I sent to its maintainer (which is I find no longer maintain it anymore) describing the problem. For those who just seek for the patch, just scroll to the end of this post (normal patch format):

Dear Constantinos

I came across this nice tool Pathload of yours today while exploring
about network management in Linux kernel. Of course, quickly I
downloaded the link to the source tarball (I use Linux -- Centos 5.x)
and compiled it.

When running it, it suddenly stopped due to segfault. After checking
the stack trace in the resulting core dump image, it leads to line 132
in client.c:
  strcpy(serverList[i],buf);

My intuition suddenly told me it must out of bound char copy. Short
story short, I landed to client.h at this line:
#define MAXDATASIZE 25 // max number of bytes we can get at once

I did quick test and change the above line. Now it reads
#define MAXDATASIZE 50

I do "make clean" followed by "make". Now it runs perfectly fine as
far as I can tell.

Hopefully it means something to you for upcoming release. Meanwhile,
once again thank you for this piece of good work.

PS: strangely, without modifying any single line of source code, the
resulting binary worked fine inside GNU debugger (gdb). That's why I
suspected a race condition initially.

--- client.h.old    2012-07-07 11:10:54.000000000 +0700
+++ client.h    2012-07-07 11:10:37.000000000 +0700
@@ -62,7 +62,7 @@
 #define UNCL    4

 #define SELECTPORT 55000 // the port client will be connecting to
-#define MAXDATASIZE 25 // max number of bytes we can get at once
+#define MAXDATASIZE 50 // max number of bytes we can get at once
 #define NUM_SELECT_SERVERS 3

 EXTERN int send_fleet() ;

30 March 2012

"useradd" and "adduser" are the same? think again....

Well, actually they are not that different. Only small not-so-obvious-but-a-bit-bothering fact.

I did this in Ubuntu Natty (11.04):
sudo useradd -m user_a

and next:
sudo adduser -m user_b

Of course I put password on both of them, let's say "123456" (weak one, I know :) ). And then, if I did:
su - user_a
I got:
$
Just plain dollar sign. "Uhm, what's wrong?".

But, if I did:
su - user_b
I got:
user_b@localhost $

Grrrr.... I quickly concluded that something is different in their bash initialization. So a quick:

sudo diff -Naur /home/user_a/ /home/user_b/
should pin point the difference if there are any, right away. But I was wrong. They were exactly identical.

Then I decided to take a peek at /etc/passwd. No strong reason though, just plain curiousity:
grep -E 'user_a|user_b' /etc/passwd
the result:
user_a:..........:/bin/sh
user_b:..........:/bin/bash
[The passwd entries are shortened to focus on the important fields only]

Great! We found it! "But wait, isn't that /bin/sh a symbolic link to /bin/bash?". Well yes, at least sometimes ago. But recently, at least on latest releases of Ubuntu and its derivatives, /bin/sh is now pointing to "dash".

Dash is a "bash" alike shell but with smaller file size and fewer capability, which result to incompabilities with Bash in many aspects. So, no wonder that ".bashrc" didn't initialize the shell prompt along with other thing (enabling Tab completion, IIRC) correctly.

Therefore, to fix the useradd behaviour, simply use:
sudo useradd -s /bin/bash user_a

Another case closed, folks :)

PS: It's really a wonder how much you can do with grep and diff, if you know where to look ..... :D

NB:

regards,

Mulyadi Santosa

25 January 2012

"libc.so.6 not found"? here we go again...

Hi all...

I've been tinkering with Linux Mint for the last month, so my CentOS installation was kinda abandoned. However, I took my chance to update CentOS via the usual chroot trick. It works.... however...

I found a glitch. I was aware of it when I ran my self-made wifi connection script which calls dhclient program. It said:
libc.so.6 not found

Great...ldd said the same thing too. However, libc.so.6 is still in /lib/libc.so.6, so it's not really missing. Hmmmm...

As a important note: recent update shows that there is another libc.so.6 which reside in /lib/i686/nosegneg. From random googling, I concluded that it is a "Xen friendly" library. It's a short way to describe that those libraries are not using certain segmentation techniques that might confuse or break Xen, so to speak.

Then, somehow I felt that it *might* be related to SELinux (i make it enforcing). Here are few lines from /var/log/messages that shows such quirk:
kernel: [    5.195941] type=1400 audit(1327499418.190:3):
 avc:  denied  { read } for  pid=860 comm="restorecon" name="libc.so.6" dev=xxxx ino=4821369 scontext=system_u:system_r:restorecon_t:s0 tcontext=system_u:object
_r:file_t:s0 tclass=lnk_file

and the output of "ls" is:
$ ls -lZ /lib/libc.so.6
 
lrwxrwxrwx  root root system_u:object_r:file_t          /lib/libc.so.6 -> libc-2.5.so
(the above output might be slightly incorrect, just focus on "file_t" attribute)

Alright, so SELinux attribute of libc.so.6 is wrong. I didn't know what exactly causing that during the chroot session. My best guess is that since it was done inside Linux Mint, which in turn doesn't use SELinux, partial relabeling or anything related to fix SELinux attribute simply fails.

The fix is fortunately easy:
1. edit /etc/sysconfig/selinux. change "SELINUX=enforcing" into "SELINUX=permissive"
2. do "sudo touch /.autorelabel". Notice the . (dot) prefix.
3. reboot

SELinux will relabel everything inside your mounted filesystem according to its default configuration once Linux enters normal runlevel.

To confirm your problem is gone, pick random binary, say dhclient and run ldd. Here's mine:
$  ldd /sbin/dhclient
libc.so.6 => /lib/i686/nosegneg/libc.so.6


And problem is solved :) Now you can turn SELinux back into enforcing mode.

PS: SELinux is both fun and frustating..... but with careful log analysis, usually you can pinpoint the root of the problem pretty fast.

regards,

Mulyadi Santosa

18 November 2011

The correct gcc parameter for Intel Core Duo

On the quest of optimization....

One thing that bugs my mind lately is: which architecture Intel Core Duo uses? If we read this Wikipedia entry, one will quickly conclude that it is "enhanced" Pentium M.

So, does gcc agree with it? Not really. Using the idea taken from this blog entry, Core Duo is a Prescott! Here's the output:

/usr/lib/gcc/i486-linux-gnu/4.4.3/cc1 -E -quiet -v - -D_FORTIFY_SOURCE=2 -march=prescott --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=2048 -mtune=generic -fstack-protector

Surprisingly, this is indeed correct. Gentoo's wiki page support this, even Intel's engineer puts amen.

Summary: I conclude, Core Duo it's Yonah (Pentium M), but optimization wise, assume it's Prescott.

regards,

Mulyadi Santosa

03 October 2011

hashing? great.... which one?

Hi folks....

hashing is quite large subject. I myself simply use hash to confirm whether two files (or more) are identical or not (using md5sum, sha256sum).

But as the books say, hashing could has collisions. And hashing, one way or another, could be reversed. or in other word, there is no such perfect true one way hashing. Alright, we can't pursue perfection here. So what's the recipe to pick the best hashing method?

Fortunately, an article written by Valerie Aurora gives us the clue. For the impatient, you better use something like SHA-2 (sha256 or better). I found the article nicely explain the issues behind hashing with quite friendly (read: non hacker-ish) tone :)

Cheers and have a nice day ....

regards,

Mulyadi Santosa

02 July 2011

Feedback regarding my "stat or ls" post

Hi all

Several people are kind enough to share her/his thoughts about my "stat or ls" post. One of them even share this forum post. Quite neat I must say!

Basically they said that both "ls" and "stat" output are correct. One even compare it with "du" output (by default, "du" is using block size unit when showing file size).

What I might failed to stress was, the tests done in my last post was done on top of SELinux enabled-ext3 filesystem. "So what?" you might ask. Briefly maybe none. But my friend pointed that stat was accouting extra blocks that might (I say "might" because my friend is not so sure) contain metadata such as SELinux and ACL.

So far, I find it consistent that the used blocks reported in "ls -ls" is always half of one reported by "stat". It must be something related to return value of function I stated in my previous post. 1KiB? hmmmm......

PS: Further info regarding by block device and filesystem. Thanks to Justin Cook who pointed me to this neat tool:

# blockdev --getbsz /dev/sda3
4096
# blockdev --getss /dev/sda3
512
The first is my fs block size, the latter is my disk sector size.

regards,

Mulyadi Santosa.